When migrating to the cloud consider security and resource variability, the cultural shift for operations and the new cost model.
There are a lot of components that make up modern internet websites, and a lot of places to get stuck in the mud. Website performance starts with the browser, what caching it is doing, their bandwidth to your server, what the webserver is doing (caching or not and how), if the webserver has sufficient memory, and then what the application code is doing and lastly how it is interacting with the backend database.
Best Practices, the term we hear thrown around a lot. But like going on that new years diet, too often ends up more talk than action.
Operator error ie typing the wrong command is always a risk. Logging into the wrong server to drop a database or typing the dump command such that you dump data into the database, these are risks that operations folks face everyday.
Accountability is important, be sure all of your systems folks login to their own accounts. Apply the least privileges model, give permissions on an as needed basis.
Set prompts with big bold names that indicate production servers and their purpose. Automate repetitive commands that are prone to typos.
Don’t be afraid to give developers read-only accounts on production servers.
Regular team meetings, a la the Agile stand ups are a great way to encourage folks to communicate. Bring the developers and operations folks together. Ask everyone in turn to voice their current todos, their concerns and risks they see. Encourage everyone to listen with an open mind. Consider different perspectives.
Communication is a cultural attribute. So it comes from the top. Encourage this as a CTO or CIO by asking questions, communicating your concerns, repeat your own requests in different words and paraphrase. Listen to what your team is saying, repeat and rephrase those concerns, and how and when they will be addressed.
A culture of documenting services, and processes is healthy. It provides a central location and knowledge base for the team. It also prevents sliding into the situation where only one team member understands how to administer critical business components. Were that person to be unavailable or to leave the company, you’re stuck reverse engineering your infrastructure and guessing at architectural decisions.
Rather than think of best practices as something you need to achieve today, think of it as an ongoing day-to-day quest for improvement.
- repetitive manual processes – employ automation & script those processes where possible.
- where steps require investigation and research – document it
- where production changes are involved – communicate with business units, qa & operations
- always be improving – striving for better practices
Software development is composed of individual components. As developers are building these units, they build tests to verify them for correctness. These tests can verify the environment, they can verify data, they can verify edge cases and include test harnesses. In essence they verify that the code meets the design specification.
There are a few key advantages to the unit testing approach:
- Self-Documenting – The tests themselves provide a type of documentation for the system as a whole.
- Advances Refactoring – At a later date you may need to repair, rewrite or refactor portions of code. Previously built unit tests provide a tremendous help to make sure your changes still meet the previous design specification.
- Simplifies Functional Testing – With unit testing as an ongoing concern, the final components will likely perform more reliably, and if not the tests & self-documentation may point to how or why they fail to meet some specification.
Backups are obviously a crucial component in any enterprise application. Modern internet components are prone to failure, and backups keep your bases covered. Here’s what you should consider:
- Is your database backed up, including object structures, data, stored procedures, grants, and logins?
- Is your webserver doc-root backed up?
- Is your application source code in version control and backed up?
- Are your server configurations backed up? Relevant config files might include those for apache, mysql, memcache, php, email (postfix or qmail), tomcat, Java solr or any other software your application requires.
- Are your cron or supporting scripts and jobs backed up?
- Have you tested all of these components and your overall documentation with a fire drill? This is the proof that you’ve really covered all the angles.
If you do your backups right, you should be able to restore without a problem.
Also find Sean Hull’s ramblings on twitter @hullsean.
There are a lot of considerations for deploying MySQL in the Cloud. Some concepts and details won’t be obvious to DBAs used to deploying on traditional servers. Here are eight best practices which will certainly set you off on the right foot.
This article is part of a multi-part series Intro to EC2 Cloud Deployments.
Master-Slave replication is easy to setup, and provides a hot online copy of your data. One or more slaves can also be used for scaling your database tier horizontally.
Master-Master active/passive replication can also be used to bring higher uptime, and allow some operations such as ALTER statements and database upgrades to be done online with no downtime. The secondary master can be used for offloading read queries, and additional slaves can also be added as in the master-slave configuration.
Caution: MySQL’s replication can drift silently out of sync with the master. If you’re using statement based replication with MySQL, be sure to perform integrity checking to make your setup run smoothly. Here’s our guide to bulletproofing MySQL replication.
You’ll want to create an AWS security group for databases which opens port 3306, but don’t allow access to the internet at large. Only to your AWS defined webserver security group. You may also decide to use a single box and security group which allows port 22 (ssh) from the internet at large. All ssh connections will then come in through that box, and internal security groups (database & webserver groups) should only allow port 22 connections from that security group.
When you setup replication, you’ll be creating users and granting privileges. You’ll need to grant to the wildcard ‘%’ hostname designation as your internal and external IPs will change each time a server dies. This is safe since you expose your database server port 3306 only to other AWS security groups, and no internet hosts.
You may also decide to use an encrypted filesystem for your database mount point, your database backups, and/or your entire filesystem. Be particularly careful of your most sensitive data. If compliance requirements dictate, choose to store very sensitive data outside of the cloud and secure network connections to incorporate it into application pages.
Be particularly careful of your AWS logins. The password recovery mechanism in Amazon Web Services is all that prevents an attacker from controlling your entire infrastructure, after all.
There are a few ways to backup a MySQL database. By far the easiest way in EC2 is using the AWS snapshot mechanism for EBS volumes. Keep in mind you’ll want to encrypt these snapshots as S3 may not be as secure as you might like. Although you’ll need to lock your MySQL tables during the snapshot, it will typically only take a few seconds before you can release the database locks.
Now snapshots are great, but they can only be used within the AWS environment, so it also behooves you to be performing additional backups, and moving them offsite either to another cloud provider or to your own internal servers. For this your choices are logical backups or hotbackups.
mysqldump can perform logical backups for you. These backups perform SELECT * on every table in your database, so they can take quite some time, and really destroy the warm blocks in your InnoDB buffer cache. What’s more rebuilding a database from a dump can take quite some time. All these factors should be considered before deciding a dump is the best option for you.
xtrabackup is a great open source tool available from Percona. It can perform hotbackups of all MySQL tables including MyISAM, InnoDB and XtraDB if you use them. This means the database will be online, not locking tables, with smarter less destructive hits to your buffer cache and database server as a whole. The hotbackup will build a complete copy of your datadir, so bringing up the server from a backup involves setting the datadir in your my.cnf file and starting.
4. Disk I/O
Obviously Disk I/O is of paramount performance for any database server including MySQL. In AWS you do not want to use instance store storage at all. Be sure your AMI is built on EBS, and further, use a separate EBS mount point for the database datadir.
An even better configuration than the above, but slightly more complex to configure is a software raid stripe of a number of EBS volumes. Linux’s software raid will create an md0 device file which you will then create a filesystem on top of – use xfs. Keep in mind that this arrangement will require some care during snapshotting, but can still work well. The performance gains are well worth it!
5. Network & IPs
When configuring Master & Slave replication, be sure to use the internal or private IPs and internal domain names so as not to incur additional network charges. The same goes for your webservers which will point to your master database, and one or more slaves for read queries.
6. Availability Zones
Amazon Web Services provides a tremendous leap in options for high availability. Take advantage of availability zones by putting one or more of your slaves in a separate zone where possible. Interestingly if you ensure the use of internal or private IP addresses and names, you will not incur additional network charges to servers in other availability zones.
7. Disaster Recovery
EC2 servers are out of the gates *NOT* as reliable as traditional servers. This should send shivers down your spine if you’re trying to treat AWS like a traditional hosted environment. You shouldn’t. It should force you to get serious about disaster recovery. Build bulletproof scripts to spinup your servers from custom built AMIs and test them. Finally you’re taking disaster recovery as seriously as you always wanted to. Take advantage of Availability Zones as well, and various different scenarios.
8. Vertical and Horizontal Scaling
Interestingly vertical scaling can be done quite easily in EC2. If you start with a 64bit AMI, you can stop such a server, without losing the root EBS mount. From there you can then start a new larger instance in EC2 and use that existing EBS root volume and voila you’ve VERTICALLY scaled your server in place. This is quite a powerful feature at the system administrators disposal. Devops has never been smarter! You can do the same to scale *DOWN* if you are no longer using all the power you thought you’d need. Combine this phenomenal AWS feature with MySQL master-master active/passive configuration, and you can scale vertically with ZERO downtime. Powerful indeed.
We wrote an EC2 Autoscaling Guide for MySQL that you should review.
Along with vertical scaling, you’ll also want the ability to scale out, that is add more servers to the mix as required, and scale back when your needs reduce. Build in smarts in your application so you can point SELECT queries to read-only slaves. Many web applications exhibit the bulk of there work in SELECTs so being able to scale those horizontally is very powerful and compelling. By baking this logic into the application you also allow the application to check for slave lag. If your slave is lagging slightly behind the master you can see stale data, or missing data. In those cases your application can choose to go to the master to get the freshest data.
What about RDS?
Wondering whether RDS is right for you? It may be. We wrote a comprehensive guide to evaluating RDS over MySQL.
If you read this far, you should grab our newsletter!