Tag Archives: slow query log

Howto automate MySQL slow query analysis with amazon RDS

iRobot1

If you’ve used relational databases for more than ten minutes, I hope you’ve heard of slow queries. Those are those pesky little gremlins that are slowing down your startup, and preventing scalability you so desperately need.

Luckily there’s a solution. What I’ve found is if I send a report to developers every week, it keeps these issues front and center, for folks that are very busy indeed.

The script below is for RDS, but you can surely modify it if you have a physical server or roll-your-own MySQL box on Amazon. Take a look & enjoy!

Join 26,000 others and follow Sean Hull on twitter @hullsean.

1. install percona tools

Percona as many probably already know, are a wildly successful services firm that support MySQL and related technologies. They also have a very popular & scalable MySQL distribution by the same name.

Even if you’re not using Percona MySQL, you definitely want to get ahold of the percona toolkit. It provides all sorts of useful tools, including the one this article is based on, query-digest.

This tool takes your stock MySQL slow query logfile as input, and summarizes it into a very useful and readable report. Formerly mk-query-digest, it’s not called pt-query-digest. See below.

You can install the percona tools easily by grabbing the repository file and installing that with rpm. From there you can just use yum or apt-get depending on your distribution.

Related: Why a killer title can make or break your content efforts

2. install aws command line tool

Amazon has consolidated all it’s command line tools into a single one called just “aws”. The options can be a little arcane, and the error messages misleading besides. What’s good though is it is slightly easier to install & configure.

Do you already use Python? Install it this way:


$ pip install awscli

If not, you’ll need to dig into the aws cli installation instructions further.

Also: Do managers underestimate operational costs?

3. edit .aws/config

After you get the tool installed, you need to setup your environment. I edited a file named /home/shull/.aws/config as follows:


[default]
region = us-east-1
aws_access_key_id = BLIBJZMKLWIL5UTNRBMQ
aws_secret_access_key = MF5J/2z7HmN92lQUrV12ZO/FBXNjDVjL52TNRWsG

Those access_key_id and secret_access_key you can find on your amazon dashboard. Click upper right hand corner under your name, select the menu item “Security Credentials”.

Check out: Are SQL Databases Dead?

4. edit send_query_report.sh

I wrote the script below so you can fairly easily edit it.


#!/bin/bash
#

# get the rds db instanceID from command line (or crontab) entry
#
AWS_INSTANCE=$1

# here's where we'll store the latest slowquery.log
#
SLOWLOG=/tmp/rds_slow.log
#SLOWLOG=`/bin/ls -tr /home/shull/*.log | /usr/bin/tail -1`

# fetch slow query log from rds box
# here I always grab the latest one.
#
/usr/local/bin/aws rds download-db-log-file-portion --db-instance-identifier $AWS_INSTANCE --output text --log-file-name slowquery/mysql-slowquery.log > $SLOWLOG

# query report output
SLOWREPORT=/tmp/reportoutput.txt

# pt-query-digest location
MKQD=/usr/local/bin/pt-query-digest

# run the tool to get analysis report
$MKQD $SLOWLOG > $SLOWREPORT

# today's date in a variable
TODAY=`/bin/date +\%m/\%d/\%Y-\%H:\%S`
#YESTERDAY=`/bin/date -d "1 day ago" +\%m/\%d/\%Y-\%H:\%S`

# report subject
SUBJECT="Sean Query Report -- $TODAY "

# recipient
EMAIL="hullsean@gmail.com"

# send an email using /bin/mail
/usr/bin/mailx -s "$SUBJECT" "$EMAIL" < $SLOWREPORT

Note, if you don't have mailx installed, it should be available in your repository. Use apt-get or yum as necessary to get it installed.

Also: Is high availability overrated & near impossible to deliver?

5. Add to crontab

After you've tested the above script from command line, you will want to add it to a weekly cron job. Voila, automation! Don't forget to chmod +x to make it executable. 🙂


00 09 * * 5 /home/shull/send_query_report.sh seandb

Read: Are MySQL DBA's impossible to find?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don't work with recruiters

Scalability Tips & Greatest Hits

autoscaling MySQL

Join 8000 others and follow Sean Hull on twitter @hullsean.

In the past two years we’ve written a ton of material on scalability. Here’s the greatest hits…

Why Generalists Are Better at Scaling the Web

The internet stack is a complex infrastructure of interlocking components. An scalability engineer must be adept at Linux, plus webservers, caching servers, search servers, automation services, and relational databases on the backend. We think a generalist with a broad base of experience is most suited to the job of scalability engineer.

5 Things Toxic to Scalability

ORMs should keep you up at night, but so should coupled and locking processes, a single copy of your database, missing metrics and no deployment feature flags.

5 More Things Deadly to Scalability

A followup to the original, we touch on Disk I/O, RAID, queuing in the database (a no-no), full-text searching, insufficient or missing caching and lastly the dreaded technical debt.

Scalability Happiness

A Zen monk might ask what is the sound of one hand clapping? That’s the sound your servers will be making when you apply this one simple principal.

5 Ways to Boost MySQL Scalability

Deploying MySQL as your web-facing database? Here are a few key tips to boost speed & performance.

3 Ways To Boost Cloud Scalability

Building your startup in the Amazon Web Services cloud? There are 3 things you absolutely must do.

Why Your Cloud Is Speeding for a Scalability Cliff

The cloud may seem like the obvious place to build new applications & infrastructure, but there is a precipice hidden from sight…

Get some in your inbox: Exclusive monthly Scalable Startups. We share tips and special content. Here’s a sample

Scalability Happiness – A Quiet Query Log

Peter Van Allen - Pin Drop

Join 7500 others and follow Sean Hull on twitter @hullsean.

There’s a lot of talk on the web about scalability. Making web applications scale is not easy. The modern web architecture has so many moving parts. How can we grapple with the underlying problem?

Also: Why Are MySQL DBAs So Hard to Find?

The LAMP stack scales well

The truth that is half right. True there are a lot of moving parts, and a lot to setup. The internet stack made up of Linux, Apache, MySQL & PHP. LAMP as it’s called, was built to be resilient, dynamic, and scalable. It’s essentially why Amazon works. Why what they’re doing is possible. Windows & .NET for example don’t scale well. Strange to see Oracle mating with them, but I digress…

[quote]
Linux and LAMP that is built on top of it, are highly scalable and dynamic to begin with.
[/quote]

Also: AirBNB Didn’t Have to Fail During an AWS Outage

Ok, so what’s this got to do with MySQL? Well a LOT.

The webserver tier, the caching layers like memcache & varnish, as well as the search tier solr. These all scale fairly easily because their assets are fixed. Or almost so.

The database tier is different. So what affects performance of a database server? Server size? Main memory? Disk speed? The truth is all of those. But

Also check out: The Sexiest New Feature of AWS Speeds Up EBS

After you setup the server – set memory settings and so forth, it’s a fairly fixed object. True there are parameters to tweak but on the whole there isn’t a ton of day-to-day tuning to do.

Well if that’s true, why does performance take a hit?! As applications grow, the db server slows down, don’t we need to tweak server settings? Do we need new hardware?

Read this: A CTO Must Never Do This

The answer is possibly, but 9 times out of 10 what really needs to happen is queries must be tuned.

[quote]
In 17 years of consulting that is the single largest cause of scalability problems. Fix those queries and your problems are over.
[/quote]

The Elephant in the Room – Query Tuning

I was talking with a colleague today at AppNexus. He said, so should we do some of that work inside the application, instead of doing a huge UNION or a large JOIN? I said yes you can move work onto the application, but it makes the application more complex. On the flip side the webserver tier is easier to scale. So there are tradeoffs.

I said this:

[quote]
By and large, if scalability is our goal, we should work to quiet the activity in the slow query log. This is an active project for developers & DBAs. Keep it quiet and your server will run well.
[/quote]

Also: Top MySQL DBA Interview Questions for Candidates, Hiring Managers & Recruiters

Yet I still talk to teams where this is mysterious. It’s unclear. There’s no conviction there. And that’s where I think DBAs are failing. Because this is our subject matter expertise, and if we haven’t convinced developer teams of this, we’re not working together enough. API teams aren’t separate from DBA and operations. Siloing technology departments is a killer…

[mytweetlinks]

As you roll out new code, if some queries show up, then those need attention. Tweak the code until the queries drop out. This is the primary project of scalability.

When should I think about upgrading hardware?

If your code is stable, but you’re seeing a steady line rising on load average of the server, *THEN* go up in hardware. Load average means cpu & disk are being taxed. The server can’t keep up.

Related: Should I use RDS or build a MySQL server on AWS?

Devops means work together!

I close with a final point. Devops means bring dev & ops together! Don’t silo them off in different wings. Communicate. DBAs it’s your job to educate Developers about scalability and help with query tuning. Devs, profile new SQL code, test with large datasets & for god sakes don’t use an ORM – it’s one of 5 things toxic to scalability. Run explain and be sure to index all the right columns.

Together we can tackle this scalability thing!

Get some in your inbox: Exclusive monthly Scalable Startups. We share tips and special content. Here’s a sample