Tag Archives: innodb_file_per_table

Thoughts on Upcoming MySQL 5.6 Defaults

During Oracle Open World 2012 and the parallel MySQL Connect conference, the new 5.6 version was announced. It’s only release candidate right now, but that means the GA release is just around the corner.

With that James Day has posted changes to various of the new parameter defaults. Many of them you may not run into on day-to-day production systems. However there are a few which I do see a lot.

Welcome changes to defaults

innodb_file_per_table = 1 (formerly 0 or off)

This is a parameter which really needs to be set on most systems. It tells InnoDB not to use just one large tablespace for all tables, but rather to create an individual tablespace for each InnoDB table. If you come from the Oracle world you might wonder why this is necessary, but due to the evolution of InnoDB itself, this having lots of individual tablespaces helps a lot with concurrency. And that’s something you want for scalability of those web applications.

What made the default really troublesome before is that if you forgot to set it, all your existing tables would be created in the single large tablespace. That would leave you only one option. Set the parameter, and then REBUILD all those previously created tables. The syntax to do rebuild wasn’t difficult, but the load it would generate on your systems surely was.

innodb_log_file_size = 48m (formerly 5m)

Here’s another parameter that didn’t have a great default previously. The more transactional activity you have in your database, the quicker these files fill up. As one fills up the database must switch to the next. You want to keep this switching to a minimum just like switching redologs in Oracle. By resizing these files larger, you make your database faster.

With the previous default of 5m, one had to jump through a number of hoops to remedy the default. First you would stop the database, then delete or move the files, then change the parameter in the my.cnf file and start MySQL again. If you started just by changing the value in my.cnf, you’d get various errors and the database wouldn’t start. So all and all a larger default avoids all these problems. Bravo!

Possibly unwelcome changes to defaults

sync_master_info=10000 (formerly 0)
sync_relay_log=10000 (formerly 0)
sync_relay_log_info=10000 (formerly 0)

I first discovered these three parameters reading the most recent printing of High Performance MySQL by Baron Schwartz, Peter Zaitsev & Vadim Tkachenko. I highly recommend the book if you haven’t picked up a copy. It is an in-depth technical tomb of MySQL information, a real tour-de-force that every database administrator should own.

I mention these parameters elsewhere in my article on bulletproofing MySQL replication with integrity checking.

The master.info and relay-log.info files on MySQL slaves are not by themselves crash safe. So if your slave crashes, it can fail to update these files with the servers current position. So upon restart the slave can fail.

I decided to put those parameters into use on a production slave server. I set each of them to a value of 1. This tells the server to sync after ever 1 event or 1 transaction respectively. It didn’t occur to me to do some performance testing with the parameters on. Mistake, big mistake.

Although the change made the slave crash safe, it also made the slave much slower. It began to lag behind the master significantly until it was showing old data. Since our Drupal setup used served most of the site off of the read-only slaves, we began to see old pages.

Although these new defaults set the value to 10,000 I would not recommend setting these parameters on by default. If you do use them on production boxes I would recommend doing performance tests to make sure they don’t cause unnecessary lag to your replication setup. Perhaps the new multi-threaded replication will help somewhat with this in 5.6.

Read this far? Grab our newsletter scalable startups!

5 Ways to Boost MySQL Scalability

There are a lot of scalability challenges we see with clients over and over. The list could easily include 20, 50 or even 100 items, but we shortened it down to the biggest five issues we see.

1. Tune those queries

By far the biggest bang for your buck is query optimization. Queries can be functionally correct and meet business requirements without being stress tested for high traffic and high load. This is why we often see clients with growing pains, and scalability challenges as their site becomes more popular. This also makes sense. It wouldn’t necessarily be a good use of time to tune a query for some page off in a remote corner of your site, that didn’t receive real-world traffic. So some amount of reactive tuning is common and appropriate.

Enable the slow query log and watch it. Use mk-query-digest, the great tool from Maatkit to analyze the log. Also make sure the log_queries_not_using_indexes flag is set.  Once you’ve found a heavy resource intensive query, optimize it!  Use the EXPLAIN facility, use a profiler, look at index usage and create missing indexes, and understand how it is joining and/or sorting.

Also: Why generalists are better at scaling the web

2. Employ Master-Master Replication

Master-master active-passive replication, otherwise known as circular replication, can be a boon for high availability, but also for scalability.  That’s because you immediately have a read-only slave for your application to hit as well.  Many web applications exhibit an 80/20 split, where 80% of activity is read or SELECT and the remainder is INSERT and UPDATE.  Configure your application to send read traffic to the slave or rearchitect so this is possible.  This type of horizontal scalability can then be extended further, adding additional read-only slaves to the infrastructure as necessary.

If you’re setting up replication for the first time, we recommend you do it using hotbackups. Here’s how.

Keep in mind MySQL’s replication has a tendency to drift, often silently from the master. Data can really get out of sync without throwing errors! Be sure to bulletproof your setup with checksums.

Related: Why you can’t find a MySQL DBA

3. Use Your Memory

It sounds very basic and straightforward, yet there are often details overlooked.  At minimum be sure to set these:

  • innodb_buffer_pool_size
  • key_buffer_size (MyISAM index caching)
  • query_cache_size – though beware of issues on large SMP boxes
  • thread_cache & table_cache
  • innodb_log_file_size & innodb_log_buffer_size
  • sort_buffer_size, join_buffer_size, read_buffer_size, read_rnd_buffer_size
  • tmp_table_size & max_heap_table_size

Read: Why Twitter made a shocking admission about their data centers in the IPO

4. RAID Your Disk I/O

What is underneath your database?  You don’t know?  Well please find out!  Are you using RAID 5?  This is a big performance hit.  RAID5 is slow for inserts and updates.  It is also almost non-functional during a rebuild if you lose a disk.  Very very slow performance.  What should I use instead?  RAID 10 mirroring and striping, with as many disks as you can fit in your server or raid cabinet.  A database does a lot of disk I/O even if you have enough memory to hold the entire database.  Why?  Sorting requires rearranging rows, as does group by, joins, and so forth.  Plus the transaction log is disk I/O as well!

Are you running on EC2?  In that case EBS is already fault tolerant and redundant.  So give your performance a boost by striping-only across a number of EBS volumes using the Linux md software raid.

Also checkout our Intro to EC2 Cloud Deployments.

Also of interest autoscaling MySQL on EC2.

Also: Why startups are trying to do without techops and failing

5. Tune Key Parameters

These additional parameters can also help a lot with performance.

innodb_flush_log_at_trx_commit=2

This speeds up inserts & updates dramatically by being a little bit lazy about flushing the innodb log buffer.  You can do more research yourself but for most environments this setting is recommended.

innodb_file_per_table

Innodb was developed like Oracle with the tablespace model for storage.  Apparently the kernel developers didn’t do a very good job.  That’s because the default setting to use a single tablespace turns out to be a performance bottleneck.  Contention for file descriptors and so forth.  This setting makes innodb create tablespace and underlying datafile for each table, just like MyISAM does.

Read this: Why a four letter word still divides dev and ops

Made it to the end eh?!?! Grab our newsletter.