We’ve all heard by now how awesome the cloud is. Spinup infrastructure instantly. Just add water! No up front costs! Autoscale to meet seasonal application demands!
But less well known or even understood by most engineering teams are the seasonal weather patterns of the cloud environment itself!
Join 28,000 others and follow Sean Hull on twitter @hullsean.
Sure there are firms like Netflix, who have turned the fickle cloud into one of virtues & reliability. But most of the firms I work with everyday, have moved to Amazon as though it’s regular bare-metal. And encountered some real problems in the process.
1. Everyday hardware outages
Many of the firms I’ve seen hosted on AWS don’t realize the servers fail so often. Amazon actually choosing cheap commodity components as a cost-savings measure. The assumption is, resilience should be built into your infrastructure using devops practices & automation tools like Chef & Puppet.
The sad reality is most firms provision the usual way, through the dashboard, with no safety net.
2. Ongoing network problems
Network latency is a big problem on Amazon. And it will affect you more. One reason is you’re most likely sitting on EBS as your storage. EBS? That’s elastic block storage, it’s Amazon’s NAS solution. Your little cheapo instance has to cross the network to get to storage. That *WILL* affect your performance.
If you’re not already doing so, please start using their most important & easily missed performance feature – provisioned IOPS.
3. Hard to be as resilient as netflix
From what I’m seeing at startups, most have a bit of devops in place, a bit of automation, such as autoscaling around the webservers. But little in terms of cross-region deployments. What’s more their database tier is protected only by multi-az or just a read-replica or two. These are fine for what they are, but will require real intervention when (not if) the server fails.
I recommend building a browse-only mode for your application, to eliminate downtime in these cases.
4. Provisioning isn’t your only problem
But the cloud gives me instant infrastructure. I can spinup servers & configure components through an API! Yes this is a major benefit of the cloud, compared to 1-2 hours in traditional environments like Softlayer or Rackspace. But you can also compare that with an outage every couple of years! Amazon’s hardware may fail a couple times a hear, more if you’re unlucky.
Meanwhile you’re going to deal with season weather problems *INSIDE* your datacenter. Think of these as swarms of customers invading your servers, like a DDOS attack, but self-inflicted.
Amazon is like a weak immune system attacking itself all the time, requiring constant medication to keep the host alive!
5. RDS is going to bite you
Besides all these other problems, I’m seeing more customers build their applications on the managed database solution MySQL RDS. I’ve found RDS terribly hard to manage. It introduces downtime at every turn, where standard MySQL would incur none.
In my experience Upgrading RDS is like a shit-storm that will not end!