The internet stack is a complex infrastructure of interlocking components. An scalability engineer must be adept at Linux, plus webservers, caching servers, search servers, automation services, and relational databases on the backend. We think a generalist with a broad base of experience is most suited to the job of scalability engineer.
The truth that is half right. True there are a lot of moving parts, and a lot to setup. The internet stack made up of Linux, Apache, MySQL & PHP. LAMP as it’s called, was built to be resilient, dynamic, and scalable. It’s essentially why Amazon works. Why what they’re doing is possible. Windows & .NET for example don’t scale well. Strange to see Oracle mating with them, but I digress…
Linux and LAMP that is built on top of it, are highly scalable and dynamic to begin with.
The answer is possibly, but 9 times out of 10 what really needs to happen is queries must be tuned.
In 17 years of consulting that is the single largest cause of scalability problems. Fix those queries and your problems are over.
The Elephant in the Room – Query Tuning
I was talking with a colleague today at AppNexus. He said, so should we do some of that work inside the application, instead of doing a huge UNION or a large JOIN? I said yes you can move work onto the application, but it makes the application more complex. On the flip side the webserver tier is easier to scale. So there are tradeoffs.
I said this:
By and large, if scalability is our goal, we should work to quiet the activity in the slow query log. This is an active project for developers & DBAs. Keep it quiet and your server will run well.
Yet I still talk to teams where this is mysterious. It’s unclear. There’s no conviction there. And that’s where I think DBAs are failing. Because this is our subject matter expertise, and if we haven’t convinced developer teams of this, we’re not working together enough. API teams aren’t separate from DBA and operations. Siloing technology departments is a killer…
As you roll out new code, if some queries show up, then those need attention. Tweak the code until the queries drop out. This is the primary project of scalability.
When should I think about upgrading hardware?
If your code is stable, but you’re seeing a steady line rising on load average of the server, *THEN* go up in hardware. Load average means cpu & disk are being taxed. The server can’t keep up.
I close with a final point. Devops means bring dev & ops together! Don’t silo them off in different wings. Communicate. DBAs it’s your job to educate Developers about scalability and help with query tuning. Devs, profile new SQL code, test with large datasets & for god sakes don’t use an ORM – it’s one of 5 things toxic to scalability. Run explain and be sure to index all the right columns.
Why does it do this? UNION is defined that way in SQL. Duplicates must be removed and this is an efficient way for the MySQL engine to remove them. Combine results, sort, remove duplicates and return the set.
Queries with UNION can be accelerated in two ways. Switch to UNION ALL or try to push ORDER BY, LIMIT and WHERE conditions inside each subquery. You’ll be glad you did!
What if we did UNION ALL? The result would look like this:
Here the WHERE clause works on this 11 record temp table:
But it would be much faster to move the WHERE inside each subquery like this:
(SELECT type, release FROM short_sleeve WHERE release >=2013)
(SELECT type, release FROM long_sleeve WHERE release >=2013);
That would be operating on a combined 3 record table. Faster to sort & remove duplicates. Smaller result sets cache better too, providing a pay forward dividend. That’s what performance optimization is all about!
Remember multi-million row sets in each part of this query will quickly illustrate the optimization. We’re using very small results to make visualizing easier.
You can also use this optimization for ORDER BY and for LIMIT conditions. By reducing the number of records returned by EACH PART of the UNION, you reduce the work that happens at the stage where they are all combined.
If you’re seeing some UNION queries in your slow query log, I suggest you try this optimization out and see if you can tweak
Queries are so-named because of the lovely language we call SQL – simplified query language. That’s a bit of sarcasm on my part, I’ve never found it particularly simple or elegant. Profiling them involves finding how they are spending their time, and what they are asking the server to do. In this way you can make them faster, and improve performance of the whole server, and thus your website.
At any rate queries ask the database for information. If they are on the simpler side, something like give me all the users whose name starts with “hu” for example, and last name is indexed, that will run very fast. The database will lookup in the index the last name field, and find the subset of ones starting with those letters, then go lookup the records by id and return them to you. Cache the index blocks, cache the data blocks. Great! However, say those are customers, and you want their calling cellphone calling history. Ok, now you have to join on another table, matching by some key, hold all those records in memory, shuffle them around, and so on.
So queries are effectively little jobs or bits of work you ask your database server to perform on your behalf. With websites you typically have hundreds of concurrently running sessions all sending their own little SQL jobs to the server to get processed, and records returned. And blip in the radar slows everyone down, so you want them to all run quickly.
That’s where profiling comes in. MySQL, Oracle, and SQL Server alike all have EXPLAIN type facilities for showing the plan with which the database will use to fetch your data. It shows indexes, sorting, joins, order, and so forth. All of this contributes to the overall execution time and resources used on the database server.
SQL is a difficult acronym for a difficult language, but what it does is shuttle information into and out of your database in an organized manner. Your web applications and developers have to speak it, and your database – whether Oracle, MySQL, Postgres or some other will return information back using this computing dialect.
Since every movement on your website, from page to page (sessions) and purchase to purchase all involve interaction using these queries, writing them well can have a huge impact on your website performance. How big? We’ve fixed queries by adding indexes or rewriting them and seen improvements by as much as 100x. That’s converting pages that take ten seconds to ones that take 1/10 of a second. Be especially vigilant about those queries generated by Object Relational Mappers like Active Record, Ruby’s ORM layer.