When considering active-active multi-master, you must consider it’s foundation technology. Although MySQL replication is straightforward to setup, it can fail in a myriad of ways. Most of those are known and well understood. We can solve them only if we use the technology in the standard way.
Click through to the end for multi-master solutions that work with MySQL.
Reason 2 – Replication is brittle to start with
– statement based – throw it over the fence architecture
– use non-deterministic functions causes unpredictable behavior
– no globally unique transaction id (like Oracle SCN – system change number)
– no built in checksums – pt-table-checksum is a must
– replication position syncing to index files prone to breaking
– temp tables disappear after restart
– row-based still a new code path – doesn’t support zero downtime
– even with row-based mysql can fall back to statement
– row-based does not include SQL in binlogs
– MySQL replica slaves die frequently, require re-clone from authoritative master
NEXT: Reason 3 – MySQL row-based replication has limitations
PREV: Reason 1 – auto increment settings create new problems
Database replication is a service for shipping changes to your database, off to a copy housed on another server, potentially even in another data center.
Each change to the records of information in your database or groups of them are organized into transactions. These each get unique identifiers or change numbers. Those groupings of changes are logged to transaction logs which are then moved across to the sibling database server, and applied there.
Oracle offers this facility in the form of Standby Database aka Dataguard. Although it can be messy to setup, it is fairly bulletproof. If you don’t receive any errors in your logfiles, you can rest assured that the data on your main or master database server looks exactly the same as what’s on your secondary server. With MySQL the situation can be a bit more complicated. MySQL replication was built based on statements, not changed data. So those same statements or instructions (SQL statements of DML & DDL) get rerun on the secondary database server. Some of the problems associated with this are:
- MySQL standby transactions may execute at different times than master
- Serialization is important to maintain order. Transactions need to get applied in the same order as on the primary. However all tables & storage engines are not necessarily transactional in MySQL. This can cause troubles, and potentially break replication on the slave side.
- MySQL replication can fail silently, that is data can drift out of sync with the master without your knowing.
Caveats aside, MySQL replication is powerful, and quite fast. With the right caution, and correct tools, it can be as robust and reliable as Oracle’s own standby database technology.
- Use checksum tools like Maatkit to verify & compare tables on primary & secondary servers
- Use rigorous monitoring to watch error logs, and checksum logs hourly if necessary.
- Use Maatkit sync tool to resync tables if they get out of whack.
- Be especially vigilant if you’re taking backups off the secondary server
Replication can also be a powerful high availability tool in your arsenal. Setup as a master-master cluster, you can keep your application pointed to the “active” side or role while performing blocking operations on the inactive database server. Since master-master cluster also known as circular replication applies changes to both databases, you’ll need to stop replication during the blocking operation. Once it is completed on the inactive side, redirect the webservers & application to the database where you completed the operation, then reenable replication and watch the databases catch up with each other!
Quora discussions – What is database replication?