Are startup CEO’s hiding their scalability problems?

Russian_Dolls

Join 27,000 others and follow Sean Hull on twitter @hullsean.

Your site is running fine right? You have 1000 customers, and it usually runs smoothly. Just this one lingering question, why does it take five high performance EC2 instances to run the database, all on flash drives? Goood question!

The truth is one of the highest trafficed sites I managed, pulled in 100 million uniques a month, and only used three backend databases. That site was one of these wildly popular celebrity gossip sites, the ultimate guilty pleasure when you’re at the office and can’t watch reality tv!

Snickers aside, this is huge traffic. And all of the above was built on Drupal, with no ORM in the mix. It could even run, albeit noticeably slower, while memcache was disabled.

1. Servers with solid state drives

I’m very excited to see Amazon introduce servers with SSD drives. They can bring you 100x improvement of disk I/O, and that my friends is the end all and be all for databases. So why complain?

If you deploy on these boxes right out of the gates, it may be like using a crutch. You become dependent on it, and ignore real performance tuning. Solid state drives still won’t obviate that ORM middleware you’re using.

Also: Do managers & CEO’s underestimate operational costs?

2. Memcache saving your bad queries

Memcache is also a powerful tool. It sits between the database and your webservers, reducing load on the database by as much as 10x. That’s a great way to get better response time, and reduce drag on your db tier. But it’s still worthwhile performance tuning without it.

Why? If you can get your site to run without caching, it will run blazingly fast *with* it. Don’t use it as a crutch, use it as rocket fuel for your well tuned site.

Read this: Do startups need techops?

3. A legion of read slaves

I’ve seen smaller sites, using a ton of read slaves. All of it deployed to cover up slow & redundant queries pouring out of an ORM middleware layer, in this case Cake PHP.

Again, read slaves are great, but tune & test with less hardware, and get the performance up the hard way. With elbow grease!

Related: Howto automate MySQL query analysis with Amazon RDS

4. Really really big memory

64G, 128G, 256G of main memory? If I wax on about the days when you’d get excited by 64k, I’ll sound like an old timer. But with those extreme limitations, you had to write tight code. Otherwise it just wouldn’t do anything.

Really really big memory of today’s servers allows us to get lazy. I hear developers say “Hey, the database is 10G of data, and we have 64G main memory, so the whole thing will fit in memory. Problem solved!”

Duhhh… No. Why not? Because you still have to slice and dice that data. You still have to scan through for bits & pieces that aren’t indexed, then sort, and organize that into temporary memory space. In DBA speak, you’re still doing a ton of logical IOs.

Picture it another way, imagine the days when you’re on horseback, riding across the west. You travel light cause frankly your horse can carry only so much. Then along come cars, and you start loading up the trunk. You add the kitchen sign, and the rear tires are hanging on the ground. All seems fine until you hit a steep mountain, and you’re car is almost stalling at 20mph. If you had only carried the same load as you did on horseback, you’d be speeding across the country at lightning pace.

Read: Is Amazon RDS hard to manage?

5. Deploying poor code

Deadlines are looming, and new features must be deployed. So performance testing can wait until later. The code works after all.

Been there, done that. Code gets deployed and all of a sudden there are spikes on server load in the evening. Ops & DBA teams are screaming, “Who wrote this code?”.

Load testing should be a part of everyday QA & test. It’s the only way to avoid growing scalability problems.

Check this: Are SQL databases dead?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Howto automate MySQL slow query analysis with amazon RDS

iRobot1

If you’ve used relational databases for more than ten minutes, I hope you’ve heard of slow queries. Those are those pesky little gremlins that are slowing down your startup, and preventing scalability you so desperately need.

Luckily there’s a solution. What I’ve found is if I send a report to developers every week, it keeps these issues front and center, for folks that are very busy indeed.

The script below is for RDS, but you can surely modify it if you have a physical server or roll-your-own MySQL box on Amazon. Take a look & enjoy!

Join 26,000 others and follow Sean Hull on twitter @hullsean.

1. install percona tools

Percona as many probably already know, are a wildly successful services firm that support MySQL and related technologies. They also have a very popular & scalable MySQL distribution by the same name.

Even if you’re not using Percona MySQL, you definitely want to get ahold of the percona toolkit. It provides all sorts of useful tools, including the one this article is based on, query-digest.

This tool takes your stock MySQL slow query logfile as input, and summarizes it into a very useful and readable report. Formerly mk-query-digest, it’s not called pt-query-digest. See below.

You can install the percona tools easily by grabbing the repository file and installing that with rpm. From there you can just use yum or apt-get depending on your distribution.

Related: Why a killer title can make or break your content efforts

2. install aws command line tool

Amazon has consolidated all it’s command line tools into a single one called just “aws”. The options can be a little arcane, and the error messages misleading besides. What’s good though is it is slightly easier to install & configure.

Do you already use Python? Install it this way:


$ pip install awscli

If not, you’ll need to dig into the aws cli installation instructions further.

Also: Do managers underestimate operational costs?

3. edit .aws/config

After you get the tool installed, you need to setup your environment. I edited a file named /home/shull/.aws/config as follows:


[default]
region = us-east-1
aws_access_key_id = BLIBJZMKLWIL5UTNRBMQ
aws_secret_access_key = MF5J/2z7HmN92lQUrV12ZO/FBXNjDVjL52TNRWsG

Those access_key_id and secret_access_key you can find on your amazon dashboard. Click upper right hand corner under your name, select the menu item “Security Credentials”.

Check out: Are SQL Databases Dead?

4. edit send_query_report.sh

I wrote the script below so you can fairly easily edit it.


#!/bin/bash
#

# get the rds db instanceID from command line (or crontab) entry
#
AWS_INSTANCE=$1

# here's where we'll store the latest slowquery.log
#
SLOWLOG=/tmp/rds_slow.log
#SLOWLOG=`/bin/ls -tr /home/shull/*.log | /usr/bin/tail -1`

# fetch slow query log from rds box
# here I always grab the latest one.
#
/usr/local/bin/aws rds download-db-log-file-portion --db-instance-identifier $AWS_INSTANCE --output text --log-file-name slowquery/mysql-slowquery.log > $SLOWLOG

# query report output
SLOWREPORT=/tmp/reportoutput.txt

# pt-query-digest location
MKQD=/usr/local/bin/pt-query-digest

# run the tool to get analysis report
$MKQD $SLOWLOG > $SLOWREPORT

# today's date in a variable
TODAY=`/bin/date +\%m/\%d/\%Y-\%H:\%S`
#YESTERDAY=`/bin/date -d "1 day ago" +\%m/\%d/\%Y-\%H:\%S`

# report subject
SUBJECT="Sean Query Report -- $TODAY "

# recipient
EMAIL="hullsean@gmail.com"

# send an email using /bin/mail
/usr/bin/mailx -s "$SUBJECT" "$EMAIL" < $SLOWREPORT

Note, if you don't have mailx installed, it should be available in your repository. Use apt-get or yum as necessary to get it installed.

Also: Is high availability overrated & near impossible to deliver?

5. Add to crontab

After you've tested the above script from command line, you will want to add it to a weekly cron job. Voila, automation! Don't forget to chmod +x to make it executable. :)


00 09 * * 5 /home/shull/send_query_report.sh seandb

Read: Are MySQL DBA's impossible to find?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don't work with recruiters