Tag Archives: query tuning

Scalability Tips & Greatest Hits

autoscaling MySQL

Join 8000 others and follow Sean Hull on twitter @hullsean.

In the past two years we’ve written a ton of material on scalability. Here’s the greatest hits…

Why Generalists Are Better at Scaling the Web

The internet stack is a complex infrastructure of interlocking components. An scalability engineer must be adept at Linux, plus webservers, caching servers, search servers, automation services, and relational databases on the backend. We think a generalist with a broad base of experience is most suited to the job of scalability engineer.

5 Things Toxic to Scalability

ORMs should keep you up at night, but so should coupled and locking processes, a single copy of your database, missing metrics and no deployment feature flags.

5 More Things Deadly to Scalability

A followup to the original, we touch on Disk I/O, RAID, queuing in the database (a no-no), full-text searching, insufficient or missing caching and lastly the dreaded technical debt.

Scalability Happiness

A Zen monk might ask what is the sound of one hand clapping? That’s the sound your servers will be making when you apply this one simple principal.

5 Ways to Boost MySQL Scalability

Deploying MySQL as your web-facing database? Here are a few key tips to boost speed & performance.

3 Ways To Boost Cloud Scalability

Building your startup in the Amazon Web Services cloud? There are 3 things you absolutely must do.

Why Your Cloud Is Speeding for a Scalability Cliff

The cloud may seem like the obvious place to build new applications & infrastructure, but there is a precipice hidden from sight…

Get some in your inbox: Exclusive monthly Scalable Startups. We share tips and special content. Here’s a sample

Scalability Happiness – A Quiet Query Log

Peter Van Allen - Pin Drop

Join 7500 others and follow Sean Hull on twitter @hullsean.

There’s a lot of talk on the web about scalability. Making web applications scale is not easy. The modern web architecture has so many moving parts. How can we grapple with the underlying problem?

Also: Why Are MySQL DBAs So Hard to Find?

The LAMP stack scales well

The truth that is half right. True there are a lot of moving parts, and a lot to setup. The internet stack made up of Linux, Apache, MySQL & PHP. LAMP as it’s called, was built to be resilient, dynamic, and scalable. It’s essentially why Amazon works. Why what they’re doing is possible. Windows & .NET for example don’t scale well. Strange to see Oracle mating with them, but I digress…

[quote]
Linux and LAMP that is built on top of it, are highly scalable and dynamic to begin with.
[/quote]

Also: AirBNB Didn’t Have to Fail During an AWS Outage

Ok, so what’s this got to do with MySQL? Well a LOT.

The webserver tier, the caching layers like memcache & varnish, as well as the search tier solr. These all scale fairly easily because their assets are fixed. Or almost so.

The database tier is different. So what affects performance of a database server? Server size? Main memory? Disk speed? The truth is all of those. But

Also check out: The Sexiest New Feature of AWS Speeds Up EBS

After you setup the server – set memory settings and so forth, it’s a fairly fixed object. True there are parameters to tweak but on the whole there isn’t a ton of day-to-day tuning to do.

Well if that’s true, why does performance take a hit?! As applications grow, the db server slows down, don’t we need to tweak server settings? Do we need new hardware?

Read this: A CTO Must Never Do This

The answer is possibly, but 9 times out of 10 what really needs to happen is queries must be tuned.

[quote]
In 17 years of consulting that is the single largest cause of scalability problems. Fix those queries and your problems are over.
[/quote]

The Elephant in the Room – Query Tuning

I was talking with a colleague today at AppNexus. He said, so should we do some of that work inside the application, instead of doing a huge UNION or a large JOIN? I said yes you can move work onto the application, but it makes the application more complex. On the flip side the webserver tier is easier to scale. So there are tradeoffs.

I said this:

[quote]
By and large, if scalability is our goal, we should work to quiet the activity in the slow query log. This is an active project for developers & DBAs. Keep it quiet and your server will run well.
[/quote]

Also: Top MySQL DBA Interview Questions for Candidates, Hiring Managers & Recruiters

Yet I still talk to teams where this is mysterious. It’s unclear. There’s no conviction there. And that’s where I think DBAs are failing. Because this is our subject matter expertise, and if we haven’t convinced developer teams of this, we’re not working together enough. API teams aren’t separate from DBA and operations. Siloing technology departments is a killer…

[mytweetlinks]

As you roll out new code, if some queries show up, then those need attention. Tweak the code until the queries drop out. This is the primary project of scalability.

When should I think about upgrading hardware?

If your code is stable, but you’re seeing a steady line rising on load average of the server, *THEN* go up in hardware. Load average means cpu & disk are being taxed. The server can’t keep up.

Related: Should I use RDS or build a MySQL server on AWS?

Devops means work together!

I close with a final point. Devops means bring dev & ops together! Don’t silo them off in different wings. Communicate. DBAs it’s your job to educate Developers about scalability and help with query tuning. Devs, profile new SQL code, test with large datasets & for god sakes don’t use an ORM – it’s one of 5 things toxic to scalability. Run explain and be sure to index all the right columns.

Together we can tackle this scalability thing!

Get some in your inbox: Exclusive monthly Scalable Startups. We share tips and special content. Here’s a sample

How to Optimize MySQL UNION For High Speed

obama innauguration big data sets

Join 6100 others and follow Sean Hull on twitter @hullsean.

There are two ways to speedup UNIONs in a MySQL database. First use UNION ALL if at all possible, and second try to push down your conditions.

[mytweetlinks]

1. UNION ALL is much faster than UNION

How does a UNION work? Imagine you have two tables for shirts. The short_sleeve table looks like this:

[code]
blue
green
gray
black
[/code]

And long_sleeve another that looks like this:

[code]
red
green
yellow
blue
[/code]

Related: Why Generalists are Better at Scaling the Web

If you UNION those two tables, first MySQL will sort the combined set into a temp table like this:

[code]
black
blue
blue
gray
green
green
red
yellow
[/code]

Once it’s done this sort, it can easily remove the duplicate blue & duplicate green for this resulting set:

[code]
black
blue
gray
green
red
yellow
[/code]

See also: Mythical MySQL DBA – the talent drought.

Why does it do this? UNION is defined that way in SQL. Duplicates must be removed and this is an efficient way for the MySQL engine to remove them. Combine results, sort, remove duplicates and return the set.

[quote]
Queries with UNION can be accelerated in two ways. Switch to UNION ALL or try to push ORDER BY, LIMIT and WHERE conditions inside each subquery. You’ll be glad you did!
[/quote]

What if we did UNION ALL? The result would look like this:

[code]
blue
green
gray
black
red
green
yellow
blue
[/code]

Read this: MySQL DBA Interview & Hiring Guide.

It doesn’t have to sort, and doesn’t have to remove duplicates. If you imagine combining two 10 million row tables, and don’t have to sort, this speedup can be HUGE.

2. Use Push-down Conditions to speedup UNION in MySQL

Imagine with our example above the shirts have a design date, the year they were released. Yes we’re keeping this example very simple to illustrate the concept.

Here is the short_sleeve table:
[code]
blue 2013
green 2013
green 2012
gray 2011
black 2009
black 2011
[/code]

And long_sleeve table looks like this:

[code]
red 2012
red 2013
green 2011
yellow 2010
blue 2011
[/code]

For 2013 designs could combine them like this:

[code]
(SELECT type, release FROM short_sleeve)
UNION
(SELECT type, release FROM long_sleeve);
WHERE release >=2013;
[/code]

See also: 5 More Things Deadly to Scalability and the original 5 Things Toxic to Scalability..

Here the WHERE clause works on this 11 record temp table:

[code]
black 2009
black 2011
blue 2011
blue 2013
gray 2011
green 2013
green 2012
green 2011
red 2012
red 2013
yellow 2010
[/code]

But it would be much faster to move the WHERE inside each subquery like this:

[code]
(SELECT type, release FROM short_sleeve WHERE release >=2013)
UNION
(SELECT type, release FROM long_sleeve WHERE release >=2013);
[/code]

That would be operating on a combined 3 record table. Faster to sort & remove duplicates. Smaller result sets cache better too, providing a pay forward dividend. That’s what performance optimization is all about!

Read this: RDS or MySQL – 10 Use Cases.

Remember multi-million row sets in each part of this query will quickly illustrate the optimization. We’re using very small results to make visualizing easier.

You can also use this optimization for ORDER BY and for LIMIT conditions. By reducing the number of records returned by EACH PART of the UNION, you reduce the work that happens at the stage where they are all combined.

If you’re seeing some UNION queries in your slow query log, I suggest you try this optimization out and see if you can tweak

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Here’s a sample

Query Profiling – What is it and why is it important?

Queries are so-named because of the lovely language we call SQL – simplified query language.  That’s a bit of sarcasm on my part, I’ve never found it particularly simple or elegant.  Profiling them involves finding how they are spending their time, and what they are asking the server to do.   In this way you can make them faster, and improve performance of the whole server, and thus your website.

At any rate queries ask the database for information.  If they are on the simpler side, something like give me all the users whose name starts with “hu” for example, and last name is indexed, that will run very fast.  The database will lookup in the index the last name field, and find the subset of ones starting with those letters, then go lookup the records by id and return them to you.  Cache the index blocks, cache the data blocks.  Great!  However, say those are customers, and you want their calling cellphone calling history.  Ok, now you have to join on another table, matching by some key, hold all those records in memory, shuffle them around, and so on.

So queries are effectively little jobs or bits of work you ask your database server to perform on your behalf.  With websites you typically have hundreds of concurrently running sessions all sending their own little SQL jobs to the server to get processed, and records returned.  And blip in the radar slows everyone down, so you want them to all run quickly.

That’s where profiling comes in.  MySQL, Oracle, and SQL Server alike all have EXPLAIN type facilities for showing the plan with which the database will use to fetch your data.  It shows indexes, sorting, joins, order, and so forth.  All of this contributes to the overall execution time and resources used on the database server.

Quora discussion by Sean Hull – What is query profiling and why is it important?

SQL – What is it and why is it important?

The What:

SQL is a difficult acronym for a difficult language, but what it does is shuttle information into and out of your database in an organized manner.  Your web applications and developers have to speak it, and your database – whether Oracle, MySQL, Postgres or some other will return information back using this computing dialect.

The Why:

Since every movement on your website, from page to page (sessions) and purchase to purchase all involve interaction using these queries, writing them well can have a huge impact on your website performance.  How big?  We’ve fixed queries by adding indexes or rewriting them and seen improvements by as much as 100x.  That’s converting pages that take ten seconds to ones that take 1/10 of a second.  Be especially vigilant about those queries generated by Object Relational Mappers like Active Record, Ruby’s ORM layer.

What is SQL on Quora