Scalability Happiness – A Quiet Query Log

Peter Van Allen - Pin Drop

Join 7500 others and follow Sean Hull on twitter @hullsean.

There’s a lot of talk on the web about scalability. Making web applications scale is not easy. The modern web architecture has so many moving parts. How can we grapple with the underlying problem?

Also: Why Are MySQL DBAs So Hard to Find?

The LAMP stack scales well

The truth that is half right. True there are a lot of moving parts, and a lot to setup. The internet stack made up of Linux, Apache, MySQL & PHP. LAMP as it’s called, was built to be resilient, dynamic, and scalable. It’s essentially why Amazon works. Why what they’re doing is possible. Windows & .NET for example don’t scale well. Strange to see Oracle mating with them, but I digress…

Linux and LAMP that is built on top of it, are highly scalable and dynamic to begin with.

Also: AirBNB Didn’t Have to Fail During an AWS Outage

Ok, so what’s this got to do with MySQL? Well a LOT.

The webserver tier, the caching layers like memcache & varnish, as well as the search tier solr. These all scale fairly easily because their assets are fixed. Or almost so.

The database tier is different. So what affects performance of a database server? Server size? Main memory? Disk speed? The truth is all of those. But

Also check out: The Sexiest New Feature of AWS Speeds Up EBS

After you setup the server – set memory settings and so forth, it’s a fairly fixed object. True there are parameters to tweak but on the whole there isn’t a ton of day-to-day tuning to do.

Well if that’s true, why does performance take a hit?! As applications grow, the db server slows down, don’t we need to tweak server settings? Do we need new hardware?

Read this: A CTO Must Never Do This

The answer is possibly, but 9 times out of 10 what really needs to happen is queries must be tuned.

In 17 years of consulting that is the single largest cause of scalability problems. Fix those queries and your problems are over.

The Elephant in the Room – Query Tuning

I was talking with a colleague today at AppNexus. He said, so should we do some of that work inside the application, instead of doing a huge UNION or a large JOIN? I said yes you can move work onto the application, but it makes the application more complex. On the flip side the webserver tier is easier to scale. So there are tradeoffs.

I said this:

By and large, if scalability is our goal, we should work to quiet the activity in the slow query log. This is an active project for developers & DBAs. Keep it quiet and your server will run well.

Also: Top MySQL DBA Interview Questions for Candidates, Hiring Managers & Recruiters

Yet I still talk to teams where this is mysterious. It’s unclear. There’s no conviction there. And that’s where I think DBAs are failing. Because this is our subject matter expertise, and if we haven’t convinced developer teams of this, we’re not working together enough. API teams aren’t separate from DBA and operations. Siloing technology departments is a killer…


As you roll out new code, if some queries show up, then those need attention. Tweak the code until the queries drop out. This is the primary project of scalability.

When should I think about upgrading hardware?

If your code is stable, but you’re seeing a steady line rising on load average of the server, *THEN* go up in hardware. Load average means cpu & disk are being taxed. The server can’t keep up.

Related: Should I use RDS or build a MySQL server on AWS?

Devops means work together!

I close with a final point. Devops means bring dev & ops together! Don’t silo them off in different wings. Communicate. DBAs it’s your job to educate Developers about scalability and help with query tuning. Devs, profile new SQL code, test with large datasets & for god sakes don’t use an ORM – it’s one of 5 things toxic to scalability. Run explain and be sure to index all the right columns.

Together we can tackle this scalability thing!

Get some in your inbox: Exclusive monthly Scalable Startups. We share tips and special content. Here’s a sample

Relational Database – What is it and why is it important?

A relational database is the warehouse of your data.  Your crown jewels.  It’s your excel spreadsheet or filing cabinet writ large.  You use them everyday and may not know it.  Your smartphone stores it’s contacts in a relational database, most likely sqlite – the ever present but ever invisible embedded platform.  Your online bank at Citibank or Chase stores all your financial history, statements, contact info, personal data and so forth, all in a relational database.

Relational databases are:

  • organized around records
  • data points are columns in a table
  • relationships are enforced with constraints
  • indexing data brings hi-speed access
  • SQL is used to get data in and out of the db
  • triggers, views, stored procs & materialized views may also be supported

Need help? Check out my pricing page and also How daily notes are helping me work better with clients

Like excel, relational databases are organized around records.  A record is like a 3×5 card with a number of different data points on it.  Say you have 3×5 cards for your addressbook.  Each card holds one address, phone number, email, picture, notes and so forth.  By organizing things nicely on cards, and for each card predictable fields such as first name, last name, birthday etc, you can then search on those data points.  Want all the people in your addressbook with birthday of July 5th, no problem.

Related: What is a database migration & why is it important?

While relational databases have great advantages, they require a lot of work to get all of your information into neatly organized files.  What’s more the method for getting things into and out of them – SQL is a quirky and not very friendly language.  What’s more relational databases have trouble clustering, and scaling horizontally.  NOSql database have made some headway in these departments, but at costs to consistency and reliability of data.

Read: 5 Ways to get your data to Amazon Redshift warehouse database

As servers continue to get larger, it becomes rarer that a single web-facing database really needs more than one single server.  If it’s tuned right, that is.  Going forward and looking to the future, the landscape will probably continue to be populated by a mix of traditional relational databases, new nosql type databases, key-value stores, and other new technologies yet to be dreamed up.

Also: Are SQL databases dead?

Sean Hull asks on Quora – What is an rdbms and why are they important?

NOSQL Database – What is it and why is it important?

NOSQL is a sort of all-encompassing term which includes very simple key/value databases like Memcache, along with more sophisticated non-relational databases such as Mongodb and Cassandra.

Relational databases have been around since the 70’s so they’re a very mature technology.  In general they support transactions allowing you to make changes to your data in discrete, controlled manner, they support constraints such as uniqueness, primary and foreign keys, and check constraints.  And furthermore they use SQL or so-called Simplified Query Language to access ie fetch data, and also modify data by inserting, updating or deleting records.

SQL though is by no means simple, and developers over the years have taken a disliking to it like the plague.  For good reason.  Furthermore RDBMS’ aka relational database management systems, don’t horizontally scale well at all.  To some degree you can get read-only scalability with replication, but with a lot of challenges.  But write-based scaling has been much tougher a problem to solve.  Even Oracle’s RAC (formerly Parallel Server) also known as Real Application Clusters, faces a lot of challenges keeping it’s internal caches in sync over special data interconnects.  The fact is changes to your data – whether it’s on your iphone, desktop addressbook or office directory, those changes take time to propagate to various systems.  Until that data is propagated, you’re looking at stale data.

Enter NOSQL databases like MongoDB which attempt to address some of these concerns.  For starters data is not read/written to the database using the old SQL language, but rather using an object-oriented method which is developers find very convenient and intuitive.  What’s more it supports a lot of different type of indexing for fast lookups of specific data later.

But NOSQL databases don’t just win fans among the development side of the house, but with Operations too, as it scales very well.  MongoDB for instance has clustering built-in, and promises an “eventually consistent” model to work against.

To be sure a lot of high-profile companies are using NOSQL databases, but in general they are in use for very specific needs.  What’s more it remains to be seen whether or not many of those databases as they grow in size, and the needs for which they are put stretch across more general applications, if they won’t need to be migrated to more traditional relational datastores later.

Sean Hull asks on Quora – What is NOSQL and why is it important?