Why your cloud is speeding for a scalability cliff

Join 14,000 and follow Sean Hull on twitter.

Don’t believe me that you’re headed for the cliff?

A startup scales up to no avail

Towards the end of 2012 I worked with an internet startup in the online education space. Their web application was not unusual, built in PHP and using Linux, Apache & Mysql all running on Amazon web services. They had three webservers in the mix and were seeing 1000 simultaneous users during peak traffic.

All this sounds normal except they were hitting major stalls, and app slowdowns. Before I was brought in they had scaled their MySQL server from a large to extra large instance, but were still seeing slow downs. What can we do, they asked?

I dug in and took at look at the server variables. They seemed to have substantial memory allocated to the server and Innodb. I then dug into the slow query log. This is a great facility in MySQL which sifts through activity happening against your database, and logs those which take a long time. In this case we had it set to ½ second and found tons of activity.

What was happening? Turns out there were lots of missing indexes, and badly written SQL queries.

A related popular piece AirBNB didn’t have to fail despite Amazon’s outage.

Also: Why generalists are better at scaling the web

How can we resolve these problems?

The customer asked me to explain the situation. I asked them to imagine finding a friend’s apartment in NYC without an address. Not easy right? You have to visit all of it’s 8 million residents until you locate your friend’s home.

Also check out: Real Disaster Recovery Lessons from Sandy.

This is what you’re asking the database to do without indexes. It’s very serious. It’s even compounded when you have hundreds or thousands of other users hitting different pages all with the same problems. Your whole dataset can fit in memory you tell me? So-called logical I/Os still cost, and can indeed cost dearly. What’s more sorting, joining, and grouping all compound the amount of memory your dataset can require.

Related: Why you can’t find a MySQL DBA

Why didn’t a bigger server help?

Modern computers are fast and EC2 extra large instances have a lot of memory. But with thousands or tens of thousands of users hitting pages simultaneously, you can take down even the largest servers.

[quote]Throwing hardware at the problem is like kicking the can down the road. Ultimately you have to pay your debt and optimize your code.[/quote]

Read: Why Twitter made a shocking admission about their data centers in the IPO

High performance code isn’t automatic

We have automation, we have agile processes, we can scale web, cache and search servers with ease. The danger is in thinking that deploying in the cloud will magically deliver scalability. Another danger is thinking that ORMs like ActiveRecord in Ruby or Hibernate in Java will solve these problems. Yes they are great tools to speed up prototyping, but we become dependent on them, and they are difficult to rip out later.

Want more, check out our 5 Things Toxic To Scalability.

Also: Why startups are trying to do without techops and failing

Fred Wilson says Speed is an essential Feature

Fred Wilson recently gave a talk on his top 10 golden principals to successful web applications. He says speed is the most important feature. Enough said!

The 10 Golden Principles of Successful Web Apps from Carsonified on Vimeo.

Hiring a MySQL DBA? Check out our DBA Hiring Guide with advice and hints for candidates and CTOs as well!

Read this: Why a four letter word still divides dev and ops

Want more? Grab our Scalable Startups monthly for more tips and special content. Here’s a sample

  • MH

    A query system based on general web searches is not going to
    be sustainable in the long run..

    The one of the problems is that a lot of the content is a
    bit difficult to clearly label in a defined manner.

    One possible solution to this could be to use faceted
    hierarchies as are used in the AAT ( Art and Architecture Theasaurus) at
    the Getty Research Institute:

    http://www.getty.edu/research/tools/vocabularies/aat/about.html…

    People think art and architecture are overly esoteric, and that is a fair
    argument but I could see these type of vocabularies being used effectively
    for something like airbnb, maybe in conjunction with existing indexing systems.…

    The other issue is if as such an indexing system evolves it needs to be fairly
    accessible to people as needed across the board, yet at the same time standardized.
    Like it might need its own association or something, for standardization but also
    because things work better if there is some sort of general agreement.