How to Optimize MySQL UNION For High Speed

obama innauguration big data sets

Join 6100 others and follow Sean Hull on twitter @hullsean.

There are two ways to speedup UNIONs in a MySQL database. First use UNION ALL if at all possible, and second try to push down your conditions.

[mytweetlinks]

1. UNION ALL is much faster than UNION

How does a UNION work? Imagine you have two tables for shirts. The short_sleeve table looks like this:

[code]
blue
green
gray
black
[/code]

And long_sleeve another that looks like this:

[code]
red
green
yellow
blue
[/code]

Related: Why Generalists are Better at Scaling the Web

If you UNION those two tables, first MySQL will sort the combined set into a temp table like this:

[code]
black
blue
blue
gray
green
green
red
yellow
[/code]

Once it’s done this sort, it can easily remove the duplicate blue & duplicate green for this resulting set:

[code]
black
blue
gray
green
red
yellow
[/code]

See also: Mythical MySQL DBA – the talent drought.

Why does it do this? UNION is defined that way in SQL. Duplicates must be removed and this is an efficient way for the MySQL engine to remove them. Combine results, sort, remove duplicates and return the set.

[quote]
Queries with UNION can be accelerated in two ways. Switch to UNION ALL or try to push ORDER BY, LIMIT and WHERE conditions inside each subquery. You’ll be glad you did!
[/quote]

What if we did UNION ALL? The result would look like this:

[code]
blue
green
gray
black
red
green
yellow
blue
[/code]

Read this: MySQL DBA Interview & Hiring Guide.

It doesn’t have to sort, and doesn’t have to remove duplicates. If you imagine combining two 10 million row tables, and don’t have to sort, this speedup can be HUGE.

2. Use Push-down Conditions to speedup UNION in MySQL

Imagine with our example above the shirts have a design date, the year they were released. Yes we’re keeping this example very simple to illustrate the concept.

Here is the short_sleeve table:
[code]
blue 2013
green 2013
green 2012
gray 2011
black 2009
black 2011
[/code]

And long_sleeve table looks like this:

[code]
red 2012
red 2013
green 2011
yellow 2010
blue 2011
[/code]

For 2013 designs could combine them like this:

[code]
(SELECT type, release FROM short_sleeve)
UNION
(SELECT type, release FROM long_sleeve);
WHERE release >=2013;
[/code]

See also: 5 More Things Deadly to Scalability and the original 5 Things Toxic to Scalability..

Here the WHERE clause works on this 11 record temp table:

[code]
black 2009
black 2011
blue 2011
blue 2013
gray 2011
green 2013
green 2012
green 2011
red 2012
red 2013
yellow 2010
[/code]

But it would be much faster to move the WHERE inside each subquery like this:

[code]
(SELECT type, release FROM short_sleeve WHERE release >=2013)
UNION
(SELECT type, release FROM long_sleeve WHERE release >=2013);
[/code]

That would be operating on a combined 3 record table. Faster to sort & remove duplicates. Smaller result sets cache better too, providing a pay forward dividend. That’s what performance optimization is all about!

Read this: RDS or MySQL – 10 Use Cases.

Remember multi-million row sets in each part of this query will quickly illustrate the optimization. We’re using very small results to make visualizing easier.

You can also use this optimization for ORDER BY and for LIMIT conditions. By reducing the number of records returned by EACH PART of the UNION, you reduce the work that happens at the stage where they are all combined.

If you’re seeing some UNION queries in your slow query log, I suggest you try this optimization out and see if you can tweak

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Here’s a sample

  • http://www.iheavy.com/blog/ Sean Hull

    A slight clarification. Although UNION ALL is faster than UNION, it may not be *HUGELY* faster. That’s because in MySQL the temporary table is still built. That said MySQL won’t have to do the work to eliminate duplicate rows and so won’t need the full row to determine uniqueness.

  • emcomments

    Nice post.

    Is the semi-colon a mistake, at the end of line 3 in:
    (SELECT type, release FROM short_sleeve)
    UNION
    (SELECT type, release FROM long_sleeve);
    WHERE release >=2013;

    • http://www.iheavy.com/blog/ Sean Hull

      typo. oops. good catch!

  • http://www.iheavy.com/blog/ Sean Hull

    For those interested in speedups to UNION, the above optimization I mentioned hit on a bug in MySQL making UNION ALL often as slow as UNION. It shouldn’t be because the sort isn’t necessary, but somewhere along the way the code got mixed up.

    Turns out a bug fix is coming in MySQL 5.7:

    http://bugs.mysql.com/bug.php?id=50674

    Thx to Morgan Tocker for the tip!