The term clustering is often used loosely in the context of enterprise databases. In relation to MySQL in the cloud you can configure:
- Master-master active/passive
- Sharded MySQL Database
- NDB Cluster
Master-Master active/passive replication
Also sometimes known as circular replication. This is used for high availability. You can perform operations on the inactive node (backups, alter tables or slow operations) then switch roles so inactive becomes active. You would then perform the same operations on the former master. Applications sees “zero downtime” because they are always pointing at the active master database. In addition the inactive master can be used as a read-only slave to run SELECT queries and large reporting queries. This is quite powerful as typical web applications tend to have 80% or more of their work performed with read-only queries such as browsing, viewing, and verifying data and information.
Sharded MySQL Database
This is similar to what in the Oracle world is called “application partitioning”. In fact before Oracle 10 most Parallel server and RAC installations required you to do this. For example a user table might be sharded by putting names A-F on node A, G-L on node B and so forth.
You can also achieve this somewhat transparently with user_ids. MySQL has an autoincrement column type to handle serving up unique ids. It also has a cluster-friendly feature called auto_increment_increment. So in an example where you had *TWO* nodes, all EVEN numbered IDs would be generated on node A and all ODD numbered IDs would be generated on node B. They would also be replicating changes to eachother, yet avoid collisions.
Obviously all this has to be done with care, as the database is not otherwise preventing you from doing things that would break replication and your data integrity.
One further caution with sharding your database is that although it increases write throughput by horizontally scaling the master, it ultimately reduces availability. An outage of any server in the cluster means at least a partial outage of the cluster itself.
This is actually a storage engine, and can be used in conjunction with InnoDB and MyISAM tables. Normally you would use it sparingly for a few special tables, providing availability and read/write access to multiple masters. This is decidedly *NOT* like Oracle RAC though many mistake it for that technology.
MySQL Clustering In The Cloud
The most common MySQL cluster configuration we see in the Amazon EC2 environment is by far the Master-Master configuration described above. By itself it provides higher availability of the master node, and a single read-only node for which you can horizontally scale your application queries. What’s more you can add additional read-only slaves to this setup allowing you to scale out tremendously.