If you’re not convinced yet that writing to dual masters is a bad idea, we have a couple more reasons.
9. Temp tables break replication after restart
MySQL’s replication is sensitive to temporary tables. You shouldn’t use them. If your queries create them to work against, and a node crashes, the temp tables will be missing upon restart. You can imagine what this will do to all those queries which expect to see a temp table. That’s right, they’ll fail.
Theoretically you could have a start script which creates and populates such temp tables. However this may add operational complexity if not carefully coordinated with developers & change management.
Writing to two masters is like walking around with a loaded shotgun. Eventually one of your instances will fail and when it does, replications position & synchronization information could easily become corrupt!
Reason 8 – Crashed nodes cause big problems
MySQL instances, unfortunately can crash. When that happens, they don’t always sync the replication position properly. That’s a big risk. It’s one of the reasons why operational DBAs know that replication slaves need to be rebuilt from time to time.
If the replication position doesn’t sync properly when MySQL restarts it may do so at the wrong position and replay or miss some transactions. What this would do to an active-active setup is unpredictable. Again no single authoritative master means trouble!
MySQL 5.5 has introduced some parameters to address this. During my tests I’ve seen a performance hit, so use them with caution.
Multi-master replication provides redundant copies of your most important business assets. What’s more it allows applications to scale out, which is perfect for cloud hosting solutions like Amazon Web Services.
But when you decide you need to scale your write capacity, you may be considering active-active setup. This is dangerous, messy and prone to failure. We’ve outlined how.
We’ve all seen cloud computing discussed ad nauseam on blogs, on Twitter, Quora, Stack Exchange, your mom’s Facebook page… you get the idea. The tech bloggers and performance experts often pipe in with their graphs and statistics showing clearly that dollar-for-dollar, cloud hosted virtual servers can’t compete with physical servers in performance, so why is everyone pushing them? It’s just foolhardy, they say.
On the other end, management and their bean counters would simply roll their eyes saying this is why the tech guys aren’t running the business.
MySQL is a very scalable platform which has proven robust even in the most dense and complex data environments. MySQL’s indispensable replication function is ‘sold’ as being fail-safe so you have little to sweat about as long as your backups are running regularly. But what the ops guys aren’t telling you is MySQL performs replication with tiny margins of error that could cause big problems in times of disaster.
Imagine the scene, you use replication to backup your data. Your secondary database is your peace of mind. It’s the always-on clone of your crown jewels. You even perform backups off of it so you don’t impact your live website. Your backups run without errors. Your slave database runs without errors. Then the dreaded day comes when your primary database fails. You instruct your team to switchover your application to point to your live backup database. The site comes online again. But all is not right. You notice subtle differences and your team begins to question how deep the data divide could be.
The Problem with MySQL replication
Although MySQL replication is fairly easy to setup, and even to keep running without error, you may have unseen problems. MySQL’s core technology to replicate data between master and slave is primarily statement based. Various scenarios can cause what in other database platforms you might call database corruption, that is silent drifting of data from what tables and rows contain on the master. It is no fault of your own, or perhaps one might argue even of your operations team. It is a fundamental flaw in how MySQL performs replication.
Fortunately there is a solution. Checksums, the wonderful computational tool for comparing things can be put to work nicely to compare database. The Percona Toolkit (formerly maatkit) includes just such a utility for use with MySQL. It can be used to check the integrity of your slave databases.
If you’ve never performed such a check, you should do so ASAP. If your database has been running for months at a stretch, chances are there could be differences lying undiscovered between the two systems.
Depending on the volume changing in your database, you can continue to use this tool periodically to confirm that all is consistent. If integrity checks fail, there is another tool in Maatkit to syncronize differences, and bring everything back to order.
I like that this book is small; 150 pages means you can carry it easily. It’s also very no nonsense. It does not dig too deeply into theory unless it directly relates to your day-to-day needs. And those needs probably cluster heavily around optimizing SQL queries, as those pesky developers are always breaking things 😉
Jokes aside, this new book out on Oracle Press is a very readable volume. Bradford has drawn directly from real-world experience to give you the right bite size morsels you need in your day-to-day MySQL activities. Continue reading Book Review – Effective MySQL→
To be sure, this has raised waves of concern among the community, but on the whole I suspect it will be a good thing for MySQL. This brings more commercial addons to the table, which only increases the options for customers. Many will continue to use the core database product only, and avoid license hassles while others will surely embark on a hybrid approach if it solves their everyday business problems. Continue reading Oracle Announces Paid MySQL Add-ons→
3 ways your MySQL migration project can shake you up
Once a development or operations team gets over the hurdle of open-source, and start to feel comfortable with the way software works outside of the enterprise world, they will likely start to settle in and feel comfortable. Best not to get too cushy though for there are more surprises hiding around the corner. Here are a few of the biggest ones. Continue reading 3 Biggest MySQL Migration Surprises→
ORMs are popular among developers but not among performance experts. Why is that? Primarily these two engineers experience a web application from entirely different perspectives. One is building functionality, delivering features, and results are measured on fitting business requirements. Performance and scalability are often low priorities at this stage. ORMs allow developers to be much more productive, abstracting away the SQL difficulties of interacting with the backend datastore, and allowing them to concentrate on building the features and functionality.
Scalability is about application, architecture and infrastructure design, and careful management of server components.
On the performance side the picture is a bit different. By leaving SQL query writing to an ORM, you are faced with complex queries that the database cannot optimize well. What’s more ORMs don’t allow easy tweaking of queries, slowing down the tuning process further.
2. Synchronous, Serial, Coupled or Locking Processes
Locking in a web application operates something like traffic lights in the real world. Replacing a traffic light with a traffic circle often speeds up traffic dramatically. That’s because when you’re out somewhere in the country where there’s very little traffic, no one is waiting idly at a traffic light for no reason. What’s more even when there’s a lot of traffic, a traffic circle keeps things flowing. If you need locking, better to use InnoDB tables as they offer granular row level locking than table level locking like MyISAM tables.
Avoid things like semi-synchronous replication that will wait for a message from another node before allowing the code to continue. Such waits can add up in a highly transactional web application with many thousands of concurrent sessions.
Avoid any type of two-phase commit mechanism that we see in clustered databases quite often. Multi-phase commit provides a serialization point so that multiple nodes can agree on what data looks like, but they are toxic to scalability. Better to use technologies that employ an eventually consistent algorithm.
Without replication, you rely on only one copy of your database. In this configuration, you limit all of your webservers to using a single backend datastore, which becomes a funnel or bottleneck. It’s like a highway that is under construction, forcing all the cars to squeeze into one lane. It’s sure to slow things down. Better to build parallel roads to start with, and allow the application aka the drivers to choose alternate routes as their schedule and itinerary dictate.
Having no metrics in place is toxic to scalability because you can’t visualize what is happening on your systems. Without this visual cue, it is hard to get business units, developers and operations teams all on the same bandwagon about scalability issues. If teams are having trouble groking this, realize that these tools simple provide analytics for infrastructure.
There are tons of solutions too, that use SNMP and are non-invasive. Consider Cacti, Munin, OpenNMS, Ganglia and Zabbix to name a few. Metrics collections can involve business metrics like user registrations, accounts or widgets sold. And of course they should also include low level system cpu, memory, disk & network usage as well as database level activity like buffer pool, transaction log, locking sorting, temp table and queries per second activity.
Applications built without feature flags make it much more difficult to degrade gracefully. If your site gets bombarded by a spike in web traffic and you aren’t magically able to scale and expand capacity, having inbuilt feature flags gives the operations team a way to dial down the load on the servers without the site going down. This can buy you time while you scale your webservers and/or database tier or even retrofit your application to allow multiple read and write databases.
Without these switches in place, you limit scalability and availability.