Tag Archives: performance tuning

5 Ways to Boost MySQL Scalability

There are a lot of scalability challenges we see with clients over and over. The list could easily include 20, 50 or even 100 items, but we shortened it down to the biggest five issues we see.

1. Tune those queries

By far the biggest bang for your buck is query optimization. Queries can be functionally correct and meet business requirements without being stress tested for high traffic and high load. This is why we often see clients with growing pains, and scalability challenges as their site becomes more popular. This also makes sense. It wouldn’t necessarily be a good use of time to tune a query for some page off in a remote corner of your site, that didn’t receive real-world traffic. So some amount of reactive tuning is common and appropriate.

Enable the slow query log and watch it. Use mk-query-digest, the great tool from Maatkit to analyze the log. Also make sure the log_queries_not_using_indexes flag is set.  Once you’ve found a heavy resource intensive query, optimize it!  Use the EXPLAIN facility, use a profiler, look at index usage and create missing indexes, and understand how it is joining and/or sorting.

Also: Why generalists are better at scaling the web

2. Employ Master-Master Replication

Master-master active-passive replication, otherwise known as circular replication, can be a boon for high availability, but also for scalability.  That’s because you immediately have a read-only slave for your application to hit as well.  Many web applications exhibit an 80/20 split, where 80% of activity is read or SELECT and the remainder is INSERT and UPDATE.  Configure your application to send read traffic to the slave or rearchitect so this is possible.  This type of horizontal scalability can then be extended further, adding additional read-only slaves to the infrastructure as necessary.

If you’re setting up replication for the first time, we recommend you do it using hotbackups. Here’s how.

Keep in mind MySQL’s replication has a tendency to drift, often silently from the master. Data can really get out of sync without throwing errors! Be sure to bulletproof your setup with checksums.

Related: Why you can’t find a MySQL DBA

3. Use Your Memory

It sounds very basic and straightforward, yet there are often details overlooked.  At minimum be sure to set these:

  • innodb_buffer_pool_size
  • key_buffer_size (MyISAM index caching)
  • query_cache_size – though beware of issues on large SMP boxes
  • thread_cache & table_cache
  • innodb_log_file_size & innodb_log_buffer_size
  • sort_buffer_size, join_buffer_size, read_buffer_size, read_rnd_buffer_size
  • tmp_table_size & max_heap_table_size

Read: Why Twitter made a shocking admission about their data centers in the IPO

4. RAID Your Disk I/O

What is underneath your database?  You don’t know?  Well please find out!  Are you using RAID 5?  This is a big performance hit.  RAID5 is slow for inserts and updates.  It is also almost non-functional during a rebuild if you lose a disk.  Very very slow performance.  What should I use instead?  RAID 10 mirroring and striping, with as many disks as you can fit in your server or raid cabinet.  A database does a lot of disk I/O even if you have enough memory to hold the entire database.  Why?  Sorting requires rearranging rows, as does group by, joins, and so forth.  Plus the transaction log is disk I/O as well!

Are you running on EC2?  In that case EBS is already fault tolerant and redundant.  So give your performance a boost by striping-only across a number of EBS volumes using the Linux md software raid.

Also checkout our Intro to EC2 Cloud Deployments.

Also of interest autoscaling MySQL on EC2.

Also: Why startups are trying to do without techops and failing

5. Tune Key Parameters

These additional parameters can also help a lot with performance.

innodb_flush_log_at_trx_commit=2

This speeds up inserts & updates dramatically by being a little bit lazy about flushing the innodb log buffer.  You can do more research yourself but for most environments this setting is recommended.

innodb_file_per_table

Innodb was developed like Oracle with the tablespace model for storage.  Apparently the kernel developers didn’t do a very good job.  That’s because the default setting to use a single tablespace turns out to be a performance bottleneck.  Contention for file descriptors and so forth.  This setting makes innodb create tablespace and underlying datafile for each table, just like MyISAM does.

Read this: Why a four letter word still divides dev and ops

Made it to the end eh?!?! Grab our newsletter.

Data warehousing – What is it and why is it important?

A data warehouse is a special type of database.  It is used to store large amounts of data, such as analytics, historical, or customer data, and then build large reports and data mining against it.  It is markedly different from a web-facing or high-transaction database, which typically has many many small transactions or pieces of data that are constantly changing, through many 100’s or 1000’s or small user sessions.  These typically execute in speeds on the order of 1/100th of a second, while in data warehouse you have fewer large queries which can take minutes to execute.

Data warehouses are tuned for updates happening in bulk via batch jobs, and for large queries which need big chunks of memory to sort and cross-tabulate data from different tables.  Often full table scans are required because of the specialized one-off nature of these reports.  The same queries are not executed over and over.

It’s important not to mix data warehousing databases with transactional databases in the same instance, whether you are dealing with MySQL or Oracle.  That’s because they are tuned totally differently.  It would be like trying to use the same engine for commuting to work, and a container ship traveling around the world.  Different jobs require different databases or databases that with their dials set for different uses.

Quora discussion of data warehousing – Sean Hull

Query Profiling – What is it and why is it important?

Queries are so-named because of the lovely language we call SQL – simplified query language.  That’s a bit of sarcasm on my part, I’ve never found it particularly simple or elegant.  Profiling them involves finding how they are spending their time, and what they are asking the server to do.   In this way you can make them faster, and improve performance of the whole server, and thus your website.

At any rate queries ask the database for information.  If they are on the simpler side, something like give me all the users whose name starts with “hu” for example, and last name is indexed, that will run very fast.  The database will lookup in the index the last name field, and find the subset of ones starting with those letters, then go lookup the records by id and return them to you.  Cache the index blocks, cache the data blocks.  Great!  However, say those are customers, and you want their calling cellphone calling history.  Ok, now you have to join on another table, matching by some key, hold all those records in memory, shuffle them around, and so on.

So queries are effectively little jobs or bits of work you ask your database server to perform on your behalf.  With websites you typically have hundreds of concurrently running sessions all sending their own little SQL jobs to the server to get processed, and records returned.  And blip in the radar slows everyone down, so you want them to all run quickly.

That’s where profiling comes in.  MySQL, Oracle, and SQL Server alike all have EXPLAIN type facilities for showing the plan with which the database will use to fetch your data.  It shows indexes, sorting, joins, order, and so forth.  All of this contributes to the overall execution time and resources used on the database server.

Quora discussion by Sean Hull – What is query profiling and why is it important?

Website Optimization – What is it and why is it important?

When you enter a website name in your browser or click on a google result, you start a cascade of events to unfold.  Your request various pieces and components that make up the webpage from a remote server which hosts that website.  Those pieces are sent back to you, and your browser assembles them.

There are many moving parts in that process.  Anywhere along the way you can hit a snag, slowing down the overall process of that page displaying.  Website Optimization attempts to identify all of those processes and components, and organize them by slowest to fastest.  This allows us to focus our attention on the slowest part of the process.  Like a physician looking at your vascular system, it allows the performance expert to identify and then fix those pipes that are slowing you down.

Since website performance has been shown to directly influence customer retention, conversion, and user experience, overall website performance and optimization are key to your business success.

Sean Hull asks on Quora: What is website optimization and why is it important?

MySQL Optimization

Experiencing performance problems with your MySQL 4.x or 5.x database? We can help. We have over a decade of experience tuning sophisticated multi-tier web applications. This allows us to identify and pinpoint problems quickly, drilldown to the relevant application and SQL, and sift out the problem code.

All this makes your applications run faster, and your customers happy!

call_quote