Why Your Cloud Is Speeding for A Scalability Cliff
Don’t believe me that you’re headed for the cliff?
Here is the explanation why your cloud is speeding for a scalability cliff.
A Startup Scales Up to No Avail
Towards the end of 2012, I worked with an internet startup in the online education space. Their web application was not unusual, built in PHP and using Linux, Apache & Mysql all running on Amazon web services. They had three webservers in the mix and were seeing 1000 simultaneous users during peak traffic.
All this sounds normal except they were hitting major stalls and app slowdowns. Before I was brought in, they had scaled their MySQL server from a large to extra large instance but were still seeing slowdowns. What can we do, they asked.
I dug in and took a look at the server variables. They seemed to have substantial memory allocated to the server and Innodb. I then dug into the slow query log. This is a great facility in MySQL that sifts through activity happening against your database and logs those which take a long time. In this case, we had it set to ½ second and found tons of activity.
What was happening? Turns out there were lots of missing indexes, and badly written SQL queries.
A related popular piece AirBNB didn’t have to fail despite Amazon’s outage.
How Can We Resolve These Problems?
The customer asked me to explain the situation. I asked them to imagine finding a friend’s apartment in NYC without an address. Not easy right? You have to visit all of its 8 million residents until you locate your friend’s home.
Also, check out: Real Disaster Recovery Lessons from Sandy.
This is what you’re asking the database to do without indexes. It’s very serious. It’s even compounded when you have hundreds or thousands of other users hitting different pages all with the same problems. Your whole dataset can fit in memory you tell me? So-called logical I/Os still cost, and can indeed cost dearly. What’s more, sorting, joining, and grouping all compound the amount of memory your dataset can require.
Related: Why you can’t find a MySQL DBA
Why Didn’t A Bigger Server Help?
Modern computers are fast and EC2 extra large instances have a lot of memory. But with thousands or tens of thousands of users hitting pages simultaneously, you can take down even the largest servers.
Throwing hardware at the problem is like kicking the can down the road. Ultimately you have to pay your debt and optimize your code.
Read: Why Twitter made a shocking admission about their data centers in the IPO
High-Performance Code Isn’t Automatic
We have automation, we have agile processes, and we can scale web, cache, and search servers with ease. The danger is in thinking that deploying in the cloud will magically deliver scalability.
Another danger is thinking that ORMs like ActiveRecord in Ruby or Hibernate in Java will solve these problems. Yes, they are great tools to speed up prototyping, but we become dependent on them, and they are difficult to rip out later.
Want more, check out our 5 Things Toxic To Scalability.
Also: Why startups are trying to do without techops and failing
Fred Wilson Says Speed Is an Essential Feature
Fred Wilson recently gave a talk on his top 10 golden principles to successful web applications. He says speed is the most important feature. Enough said!
So, that’s all about why your cloud speeding for a scalability cliff. We hope, now you have understood the mistakes that we should avoid after reading this. For further queries regarding this topic, feel free to ask in our comment section below. Thanks for reading!
Hiring a MySQL DBA? Check out our DBA Hiring Guide with advice and hints for candidates and CTOs as well!