With traditional managed hosting solutions, we have best practices, we have business continuity plans, we have disaster recovery, we document our processes and all the moving parts in our infrastructure. At least we pay lip service to these goals, though from time to time we admit to getting side tracked with bigger fish to fry, high priorities and the emergency of the day. We add “firedrill” to our todo list, promising we’ll test restoring our backups. But many times we find it is in the event of an emergency that we are forced to find out if we actually have all the pieces backed up and can reassemble them properly.
Cloud Computing is different. These goals are no longer be lofty ideals, but must be put into practice. Here’s why.
EC2 virtual servers can and will die. Your spinup scripts and infrastructure should consider this possibility not as some far off anomalous event, but a day-to-day concern. With proper scripts and testing of various scenarios, this should become manageable. Use snapshots to backup EBS root volumes, and build spinup scripts with AMIs that have all the components your application requires. Then test, test and test again.
Amazon EC2′s SLA – Only 99.95%
The computing industry throws around the 99.999% or five-nines uptime SLA standard around a lot. That amounts to less than six minutes of downtime. Amazon’s 99.95% allows for 263 minutes of downtime. Greater downtime merely gets you a credit on your account. With that in mind, repeatable processes and scripts to bring your infrastructure back up in different availability zones or even different datacenters is a necessity. Along with your infrastructure scripts, offsite backups also become a wise choice. You should further take advantage of availability zones and regions to make your infrastructure more robust. By using private IP addresses and network, you can host a MySQL database slave in a separate zone, for instance. You can also do GDLB or Geographically Distributed Load Balancing to send customers on the west coast to that zone, and those on the east coast to one closer to them. In the event that one region or availability zone goes out, your application is still responding, though perhaps with slightly degraded performance.
Devops – Infrastructure as Code
With traditional hosting, you either physically manage all of the components in your infrastructure, or have someone do it for you. Either way a phone call is required to get things done. With EC2, every piece of your infrastructure can be managed from code, so your infrastructure itself can be managed as software. Whether you’re using waterfall method, or agile as your software development lifecycle, you have the new flexibility to place all of these scripts and configuration files in version control. This raises manageability of your environment tremendously. It also provides a type of ongoing documentation of all of the moving parts. In a word, it forces you to deliver on all of those best practices you’ve been preaching over the years.
EC2 Environment Considerations
When servers get restarted they get new IP addresses – both private and public. This may affect configuration files from webservers to mail servers, and database replication too, for example. Your new server may mount an external EBS volume which contains your database. If that’s the case your start scripts should check for that, and not start MySQL until it finds that volume. To further complicate things, you may choose to use software raid over a handful of EBS volumes to get better performance.
The more special cases you have, the more you quickly realize how important it is to manage these things in software. The more the process needs to be repeated, the more the scripts will save you time.
New Flexibility in the Cloud
Ultimately if you take into consideration less reliable virtual servers, and mitigate that with zones and regions, and automated scripts, you can then enjoy all the new benefits of the cloud.