Tag Archives: rdbms

Scalability Happiness – A Quiet Query Log

Peter Van Allen - Pin Drop

Join 7500 others and follow Sean Hull on twitter @hullsean.

There’s a lot of talk on the web about scalability. Making web applications scale is not easy. The modern web architecture has so many moving parts. How can we grapple with the underlying problem?

Also: Why Are MySQL DBAs So Hard to Find?

The LAMP stack scales well

The truth that is half right. True there are a lot of moving parts, and a lot to setup. The internet stack made up of Linux, Apache, MySQL & PHP. LAMP as it’s called, was built to be resilient, dynamic, and scalable. It’s essentially why Amazon works. Why what they’re doing is possible. Windows & .NET for example don’t scale well. Strange to see Oracle mating with them, but I digress…

[quote]
Linux and LAMP that is built on top of it, are highly scalable and dynamic to begin with.
[/quote]

Also: AirBNB Didn’t Have to Fail During an AWS Outage

Ok, so what’s this got to do with MySQL? Well a LOT.

The webserver tier, the caching layers like memcache & varnish, as well as the search tier solr. These all scale fairly easily because their assets are fixed. Or almost so.

The database tier is different. So what affects performance of a database server? Server size? Main memory? Disk speed? The truth is all of those. But

Also check out: The Sexiest New Feature of AWS Speeds Up EBS

After you setup the server – set memory settings and so forth, it’s a fairly fixed object. True there are parameters to tweak but on the whole there isn’t a ton of day-to-day tuning to do.

Well if that’s true, why does performance take a hit?! As applications grow, the db server slows down, don’t we need to tweak server settings? Do we need new hardware?

Read this: A CTO Must Never Do This

The answer is possibly, but 9 times out of 10 what really needs to happen is queries must be tuned.

[quote]
In 17 years of consulting that is the single largest cause of scalability problems. Fix those queries and your problems are over.
[/quote]

The Elephant in the Room – Query Tuning

I was talking with a colleague today at AppNexus. He said, so should we do some of that work inside the application, instead of doing a huge UNION or a large JOIN? I said yes you can move work onto the application, but it makes the application more complex. On the flip side the webserver tier is easier to scale. So there are tradeoffs.

I said this:

[quote]
By and large, if scalability is our goal, we should work to quiet the activity in the slow query log. This is an active project for developers & DBAs. Keep it quiet and your server will run well.
[/quote]

Also: Top MySQL DBA Interview Questions for Candidates, Hiring Managers & Recruiters

Yet I still talk to teams where this is mysterious. It’s unclear. There’s no conviction there. And that’s where I think DBAs are failing. Because this is our subject matter expertise, and if we haven’t convinced developer teams of this, we’re not working together enough. API teams aren’t separate from DBA and operations. Siloing technology departments is a killer…

[mytweetlinks]

As you roll out new code, if some queries show up, then those need attention. Tweak the code until the queries drop out. This is the primary project of scalability.

When should I think about upgrading hardware?

If your code is stable, but you’re seeing a steady line rising on load average of the server, *THEN* go up in hardware. Load average means cpu & disk are being taxed. The server can’t keep up.

Related: Should I use RDS or build a MySQL server on AWS?

Devops means work together!

I close with a final point. Devops means bring dev & ops together! Don’t silo them off in different wings. Communicate. DBAs it’s your job to educate Developers about scalability and help with query tuning. Devs, profile new SQL code, test with large datasets & for god sakes don’t use an ORM – it’s one of 5 things toxic to scalability. Run explain and be sure to index all the right columns.

Together we can tackle this scalability thing!

Get some in your inbox: Exclusive monthly Scalable Startups. We share tips and special content. Here’s a sample

Relational Database – What is it and why is it important?

A relational database is the warehouse of your data.  Your crown jewels.  It’s your excel spreadsheet or filing cabinet writ large.  You use them everyday and may not know it.  Your smartphone stores it’s contact database in a relational database, most likely sqlite – the ever present but ever invisible embedded database platform.  Your online bank at Citibank or Chase stores all your financial history, statements, contact info, personal data and so forth, all in a relational database.

  • organized around records
  • data points are columns in a table
  • relationships are enforced with constraints
  • indexing data brings hi-speed access
  • SQL is used to get data in and out of the db
  • triggers, views, stored procs & materialized views may also be supported

Like excel, relational databases are organized around records.  A record is like a 3×5 card with a number of different data points on it.  Say you have 3×5 cards for your addressbook.  Each card holds one address, phone number, email, picture, notes and so forth.  By organizing things nicely on cards, and for each card predictable fields such as first name, last name, birthday etc, you can then search on those data points.  Want all the people in your addressbook with birthday of July 5th, no problem.

While relational databases have great advantages, they require a lot of work to get all of your information into neatly organized files.  What’s more the method for getting things into and out of them – SQL is a quirky and not very friendly language.  What’s more relational databases have trouble clustering, and scaling horizontally.  NOSql database have made some headway in these departments, but at costs to consistency and reliability of data.

As servers continue to get larger, it becomes rarer that a single web-facing database really needs more than one single server.  If it’s tuned right, that is.  Going forward and looking to the future, the landscape will probably continue to be populated by a mix of traditional relational databases, new nosql type databases, key-value stores, and other new technologies yet to be dreamed up.

Sean Hull asks on Quora – What is an rdbms and why are they important?

Database Change Management – What is it and why is it important?

During the software development process, whether you’re cutting edge Agile developers, or traditional waterfall method folks, your code changes are periodically accompanied by database changes.  For instance tables have particular rows and columns.  When developers add new columns (ie fields on an index card), or create new tables, relationships, indexes or other database objects, all of these are lumped together as database changes.

Version control systems have brought great manageability gains to software projects, even ones involving only  a single developer.  That’s because they allow you to rewind to any savepoints, just like you can undo and redo changes in a word document.  With database changes, however the picture because more muddied.

Database Change Management Best Practices

  1. Require developers to include a roll forward and a roll back script with each set of schema changes.
  2. Check in those scripts into the version control system just like other software code.
  3. Be sure developers test the roll forward and roll backward operation on dev.
  4. Ensure that changes are documented, as well as possible side affects.

During deployment, operations folks and/or a DBA must still have their fingers on the trigger.  Some frameworks such as Ruby on Rails include Migration scripts.  Do not for example allow rollback scripts to run automatically.  This is a recipe for disaster.

  • Be sure to take a backup of the database before running any schema change scripts.
  • Consider application downtime if ALTERs or other operations may lock large tables.
  • Perform another backup following the schema changes
  • If database is particularly large you may isolate your backup to the schema or tables being altered.

Above all use common sense, and always second and/or third guess yourself.  Better to be safe than sorry when juggling your crown jewels.

Quora discussion by Sean Hull – What is database change management and why is it important?

NOSQL Database – What is it and why is it important?

NOSQL is a sort of all-encompassing term which includes very simple key/value databases like Memcache, along with more sophisticated non-relational databases such as Mongodb and Cassandra.

Relational databases have been around since the 70’s so they’re a very mature technology.  In general they support transactions allowing you to make changes to your data in discrete, controlled manner, they support constraints such as uniqueness, primary and foreign keys, and check constraints.  And furthermore they use SQL or so-called Simplified Query Language to access ie fetch data, and also modify data by inserting, updating or deleting records.

SQL though is by no means simple, and developers over the years have taken a disliking to it like the plague.  For good reason.  Furthermore RDBMS’ aka relational database management systems, don’t horizontally scale well at all.  To some degree you can get read-only scalability with replication, but with a lot of challenges.  But write-based scaling has been much tougher a problem to solve.  Even Oracle’s RAC (formerly Parallel Server) also known as Real Application Clusters, faces a lot of challenges keeping it’s internal caches in sync over special data interconnects.  The fact is changes to your data – whether it’s on your iphone, desktop addressbook or office directory, those changes take time to propagate to various systems.  Until that data is propagated, you’re looking at stale data.

Enter NOSQL databases like MongoDB which attempt to address some of these concerns.  For starters data is not read/written to the database using the old SQL language, but rather using an object-oriented method which is developers find very convenient and intuitive.  What’s more it supports a lot of different type of indexing for fast lookups of specific data later.

But NOSQL databases don’t just win fans among the development side of the house, but with Operations too, as it scales very well.  MongoDB for instance has clustering built-in, and promises an “eventually consistent” model to work against.

To be sure a lot of high-profile companies are using NOSQL databases, but in general they are in use for very specific needs.  What’s more it remains to be seen whether or not many of those databases as they grow in size, and the needs for which they are put stretch across more general applications, if they won’t need to be migrated to more traditional relational datastores later.

Sean Hull asks on Quora – What is NOSQL and why is it important?

Sharding – What is it and why is it important?

Sharding is a way of partitioning your datastore to benefit from the computing power of more than one server.  For instance many web-facing databases get sharded on user_id, the unique serial number your application assigns to each user on the website.

Sharding can bring you the advantages of horizontal scalability by dividing up data into multiple backend databases.  This can bring tremendous speedups and performance improvements.

Sharding, however has a number of important costs.

  • reduced availability
  • higher administrative complexity
  • greater application complexity

High Availability is a goal of most web applications as they aim for always-on or 24×7 by 365 availability.  By introducing more servers, you have more components that have to work flawlessly.  If the expected downtime of any one backend database is 1/2 hour per month and you shard across five servers, your downtime has now increased by a factor of five to 2.5 hours per month.

Administrative complexity is an important consideration as well.  More databases means more servers to backup, more complex recovery, more complex testing, more complex replication and more complex data integrity checking.

Since Sharding keeps a chunk of your data on various different servers, your application must accept the burden of deciding where the data is, and fetching it there.  In some cases the application must make alternate decisions if it cannot find the data where it expects.  All of this increases application complexity and is important to keep in mind.

Sean Hull asks on Quora – What is Sharding and why is it important?