All Business

Backup and Recovery in EC2 – 5 Point Checklist

backup and recovery checklistBest practices for backups and disaster recovery aren’t tremendously different in the cloud than from a managed hosting environment.  But they are more crucial since cloud servers are less reliable than physical servers.  Also the security aspect may play a heightened role in the cloud.  Here are some points to keep in mind.

Read the original article –
Intro to EC2 Cloud Deployments.

1. Perform multiple types of backups
2. Keep non-proprietary backups offsite
3. Test your backups – perform firedrills
4. Encrypt backups in S3
5. Perform Replication Integrity Checks

Perform Multiple Types of Backups

Your database tier is typically your primary datastore, so it’s backups are often the most crucial.  Snapshots of EBS volumes are powerful and fast ways to perform full database backups in the AWS environment.  This involves locking all tables briefly, and running the snapshot command, followed by a release of all those table locks.  Be sure to test this process to ensure that the temporary locks on the database don’t create a pileup on your webservers.

Check out our howto on using xtrabackup for hotbackups

Keep Non-proprietary Backups Offsite

The EC2 snapshots are great, but they only work in EC2.  So you’ll also want to perform other types of backups.  Personally I like having a few different options in the event I need to restore.  Logical backups are great for restoring one table, but are slow for restoring the entire database.  Hotbackups are great and fast to restore the whole database, but take a lot of space so may not be as efficient if you just need to restore one table.  So I like to have both.  Percona’s xtrabackup and the associated innobackupex script provide an open-source hotbackup solution for MySQL.  Get it!  Then intersperse those backups with mysqldumps as well.  Alternating days, for example.

Test Your Backups – Perform Firedrills

Any good disaster recovery plan must be thoroughly tested.  Set aside the time to actually run through this from start to finish.  This is where the cloud really excels to your advantage.  Spinup all the servers that makeup your entire environment, load balancer, webservers, database servers, checkout all the source code, and configuration files.  You put your configuration files in version control, right?  Then restore the database.  This firedrill tests your server spinup scripts, your version control of source code and configuration files, and your database backups.  All of these pieces must be in place for the fire drill to succeed.  Lastly running through the whole process forces you to document details, and you find out how long your disaster recovery would actually take.

Encrypt Backups in S3

S3 stores objects as private by default, however it makes sense for particularly sensitive data to also encrypt those backups.  Remember you control access to your encryption keys but not where the data is stored or where it might move around.  So it can’t hurt to be extra cautious.  We wrote a howto on bulletproofing MySQL replication with checksums.  

Perform Replication Integrity Checks

A MySQL slave or passive master database can be a great way to offload backups away from the primary database server.  This reduces impact to your customers while backups are running.  But MySQL replication is not bulletproof.  The slaves can drift out of sync with the master silently without throwing errors.  That’s why it’s important to use an integrity checking tool like Maatkit’s mk-table-checksum.  This tool can be set in cron to perform checksums on a slice of your database periodically.

Here’s an excellent article on using the tool.  Ongoing MySQL Integrity Checking with mk-table-checksum.

Oh yeah, one more thing! Our scalable startups newsletter… grab it here.