AWS had a few outages in December. It’s interesting to note that when you’re as big as AWS you get a lot of media attention no matter what happens.
Join 35,000 others and follow Sean Hull on twitter @hullsean.
It’s worthwhile to consider how many eggs are in one basket. Is multi-cloud an option? If I stick with a single cloud, am I redundant enough to weather it when USE1-AZ4 goes down? Or 3 or 2? Have we tested that scenario with all hands on deck? Have we tested weird edge cases, and isolated our dependencies?
1. Regions & Availability Zones matter
Amazon’s cloud offering is a globe spanning service. They call their data centers availability zones (AZs), and there are multiple of them in each region. In fact Amazon’s documentation lists 26 worldwide regions and a total of 84 availability zones.
Where are your data sits, has never mattered more. You want it to be close to your customers, so response is fast. CloudFront can provide redundancy for assets, but your application still has to sit somewhere. EC2 servers, S3, Route53, EBS, RDS, File Gateway, each of these myriad services has different region and AZ patterns.
To make it all just a little more interesting, new services are typically deployed in a few popular regions first, so every region has a different mix of the total soup of AWS offerings.
2. Even if you’re not *using* a region, you might be vulnerable
US-East-1 was the first region in the AWS service offering. So it has the oldest equipment and wiring. It’s also the default, so more customers are deployed there, which means more competition, slower API response etc.
Also AWS has strange dependencies still built into it. Some resources are stored in regions, not globally. So even if you don’t have EC2 instances in US-East-1 you may have other resources you need access to.
3. Have multiple types of backups and recovery paths
Even if you’ve double and triple checked your IAS rebuild scripts. Even if you believe you’ve got DR dialed in and locked down. Test once, test twice, and triple check.
What’s more have multiple different ways to rebuild. For example on my iheavy.com site, I have an article-by-article dump of the database that WordPress can create. That’s XML, and works across different versions of WP so even if I lost everything, I still have the text! I have copies of that sent to private s3 bucket, and to my local laptop.
Next I have database backups, sent to a private S3 bucket, and also copies to my laptop.
Then I have Infrastructure as code scripts in Ansible, and Terraform, that will rebuild the components of my site quickly and easily. Those I test regularly.
Sure you’re business is much more complicated than my little blog, but the concepts remain the same!