When fat fingers take down your business

Join 14,000 others and follow Sean Hull on twitter @hullsean. Github goes nuclear I was flipping through reddit last night, and hit this crazy story. strange pushes on GitHub. For those who don’t know, github houses source code. It’s version control for the software world. Lots of projects use it, to keep track of change …

Root Cause Analysis – What is it and why is it important?

Root Cause Analysis is the means to identify the ultimate source and cause of an outage.  When an outage occurs that causes serious downtime of a website, typically organizations are in crisis mode.  Urgency of resolution sometimes pushes aside due process, change management and general caution.  Root Cause Analysis attempts to as much as possible …

Offsite Backups – What are they and why are they important?

Backups are obviously an important part of any managed infrastructure deployment.  Computing systems are inherently fallible, through operator error or hardware failure.  Existing systems must be backed up, from configurations, software and media files, to the backend data store. In a managed hosting environment or cloud hosting environment, it is convenient to use various filesystem …

Infrastructure Provisioning – What is it and why is it important?

In the old days… You would have a closet in your startup company with a rack of computers.  Provisioning involved: Deciding on your architectural direction, what, where & how Ordering the new hardware Waiting weeks for the packages to arrive Setup the hardware, wire things together, power up Discover some component is missing, or failed …

Business Continuity Planning – What is it and why is it important?

BCP or BCRP if you want to also include “resiliency” in the acronym, basically outlines planning for the worst.  In the old days you had a filing cabinet with documents, for example there might be a central government office which houses birth certificates or titles and deeds.  Perhaps a copy of documents is regularly created, …

Zero Downtime – What is it and why is it important?

For most large web applications, uptime is of foremost importants.  Any outage can be seen by customers as a frustration, or opportunity to move to a competitor.  What’s more for a site that also includes e-commerce, it can mean real lost sales. Zero Downtime describes a site without service interruption.  To achieve such lofty goals, …

Backups – What are they and why are they important?

Backups are obviously a crucial component in any enterprise application.  Modern internet components are prone to failure, and backups keep your bases covered.  Here’s what you should consider: Is your database backed up, including object structures, data, stored procedures, grants, and logins? Is your webserver doc-root backed up? Is your application source code in version …

What is disaster recovery and why is it important?

Disaster recovery involves the anticipation of major business outage, and the contingency planning to avoid business loss in revenue, customers or sales. All of the technology components that make up your enterprise applications should be carefully considered against loss.  What happens if this database server disappears?  Do we have all the data backed up somewhere? …

iHeavy Insights 79 – Plumbing the Interwebs

I meet new people all the time.  It’s a way of life in New York.  One of the first questions new people ask each other is “What do you do?”.  It begins to sound like a cliche after a while, but it can also provide endless fascinating discussions as there are so many people with …

Amazon EC2 Outage – Failures, Lessons and Cloud Deployments

Now that we’ve had a chance to take a deep breath after last week’s AWS outage, I’ll offer some comments of my own.  Hopefully just enough time has passed to begin to have a broader view, and put events in perspective. Despite what some reports may have announced, Amazon wasn’t down, but rather a small …