5 Ways to Boost MySQL Scalability

There are a lot of scalability challenges we see with clients over and over. The list could easily include 20, 50 or even 100 items, but we shortened it down to the biggest five issues we see.

1. Tune those queries

By far the biggest bang for your buck is query optimization. Queries can be functionally correct and meet business requirements without being stress tested for high traffic and high load. This is why we often see clients with growing pains, and scalability challenges as their site becomes more popular. This also makes sense. It wouldn’t necessarily be a good use of time to tune a query for some page off in a remote corner of your site, that didn’t receive real-world traffic. So some amount of reactive tuning is common and appropriate.

Enable the slow query log and watch it. Use mk-query-digest, the great tool from Maatkit to analyze the log. Also make sure the log_queries_not_using_indexes flag is set.  Once you’ve found a heavy resource intensive query, optimize it!  Use the EXPLAIN facility, use a profiler, look at index usage and create missing indexes, and understand how it is joining and/or sorting.

Also: Why generalists are better at scaling the web

2. Employ Master-Master Replication

Master-master active-passive replication, otherwise known as circular replication, can be a boon for high availability, but also for scalability.  That’s because you immediately have a read-only slave for your application to hit as well.  Many web applications exhibit an 80/20 split, where 80% of activity is read or SELECT and the remainder is INSERT and UPDATE.  Configure your application to send read traffic to the slave or rearchitect so this is possible.  This type of horizontal scalability can then be extended further, adding additional read-only slaves to the infrastructure as necessary.

If you’re setting up replication for the first time, we recommend you do it using hotbackups. Here’s how.

Keep in mind MySQL’s replication has a tendency to drift, often silently from the master. Data can really get out of sync without throwing errors! Be sure to bulletproof your setup with checksums.

Related: Why you can’t find a MySQL DBA

3. Use Your Memory

It sounds very basic and straightforward, yet there are often details overlooked.  At minimum be sure to set these:

  • innodb_buffer_pool_size
  • key_buffer_size (MyISAM index caching)
  • query_cache_size – though beware of issues on large SMP boxes
  • thread_cache & table_cache
  • innodb_log_file_size & innodb_log_buffer_size
  • sort_buffer_size, join_buffer_size, read_buffer_size, read_rnd_buffer_size
  • tmp_table_size & max_heap_table_size

Read: Why Twitter made a shocking admission about their data centers in the IPO

4. RAID Your Disk I/O

What is underneath your database?  You don’t know?  Well please find out!  Are you using RAID 5?  This is a big performance hit.  RAID5 is slow for inserts and updates.  It is also almost non-functional during a rebuild if you lose a disk.  Very very slow performance.  What should I use instead?  RAID 10 mirroring and striping, with as many disks as you can fit in your server or raid cabinet.  A database does a lot of disk I/O even if you have enough memory to hold the entire database.  Why?  Sorting requires rearranging rows, as does group by, joins, and so forth.  Plus the transaction log is disk I/O as well!

Are you running on EC2?  In that case EBS is already fault tolerant and redundant.  So give your performance a boost by striping-only across a number of EBS volumes using the Linux md software raid.

Also checkout our Intro to EC2 Cloud Deployments.

Also of interest autoscaling MySQL on EC2.

Also: Why startups are trying to do without techops and failing

5. Tune Key Parameters

These additional parameters can also help a lot with performance.


This speeds up inserts & updates dramatically by being a little bit lazy about flushing the innodb log buffer.  You can do more research yourself but for most environments this setting is recommended.


Innodb was developed like Oracle with the tablespace model for storage.  Apparently the kernel developers didn’t do a very good job.  That’s because the default setting to use a single tablespace turns out to be a performance bottleneck.  Contention for file descriptors and so forth.  This setting makes innodb create tablespace and underlying datafile for each table, just like MyISAM does.

Read this: Why a four letter word still divides dev and ops

Made it to the end eh?!?! Grab our newsletter.

  • Pingback: Kenneth Ham | 5 Ways to Boost MySQL Scalability

  • Pingback: Best of Guide - Highlights of Our Popular Content

  • Pingback: 5 things toxic to scalability

  • Pingback: Atom Wire » Blog Archive » Thoughts on Upcoming MySQL 5.6 Defaults

  • jametong

    innodb_flush_log_at_trx_commit=2. about this,you better tell your reader, they may be loss data because of it.

    • http://www.iheavy.com/blog/ Sean Hull

      Thx for the comment Jame. My feeling is for more web-applications this parameter is important to set. The performance gain is huge and real for customers everyday. The data loss is an extremely outlier case, like the butterfly’s wings in china.

  • jonaldomo

    Good post Sean, I just recently setup multi master replication with MariaDB and documented my instructions here if anybody would find it useful http://jmoses.co/2014/03/18/setting-up-a-mysql-cluster-with-mariadb-galera.html

    • http://www.iheavy.com/blog/ Sean Hull

      nice demo jonal. Thx for posting. I noticed you didn’t use “commit” statements. Are they set to autocommit?

      • jonaldomo

        Yeah, do you think it’s best to disable the default autocommit?

        • http://www.iheavy.com/blog/ Sean Hull

          For illustration purposes, it may help. Since transactions are isolated, other nodes won’t see results until a commit. So when they appear, it may be confusing to people.

    • http://www.iheavy.com/blog/ Sean Hull

      Oh one more thing. With haproxy, how what is the best way to load balance? I ask because round-robin, may cause problems when one node has cached results that another node has not (inside innodb buffer cache). So with that it makes sense to send queries to the nodes where the data is cached. How can haproxy get this information?

      I recall when I worked on Oracle, with parallel server and then RAC later, the sqlnet listener had this issue as well.

      • jonaldomo

        I just recently set it up and am putting it through testing right now. I am not familiar with innodb buffer cache and synchronous replication. I have not read anything about that on the Galera documentation http://codership.com/products/galera_replication. They claim the synchronous replication prevents this. Would be interesting to hear/see otherwise.

        • http://www.iheavy.com/blog/ Sean Hull

          Oh i’m sure they are right about that. You see it happens before replication sees it. If a transaction has not yet committed, then other sessions *on that server* won’t see the change, nor will it have been replicated yet. It’s at the time of commit (or rollback) where the change must be propagated or not.

          In other words at the time of commit, all nodes should see the new image of all data.

    • http://www.iheavy.com/blog/ Sean Hull

      btw Jonal, how did you create that command line session in the link above? Is it an animated gif? Did you start with an movie file, and then convert? Love to know that process as I think its a great way to present information.

      PS: Love you blog.

      • jonaldomo

        Thanks. I created the gif with licecap http://www.cockos.com/licecap/. It creates the gif for you, no movie just a gif.

        • http://www.iheavy.com/blog/ Sean Hull

          Cool tool. Thx!