MySQL databases are great work horses of the internet. They back tons of modern websites, from blogs and checkout carts, to huge sites like Facebook. But these technologies don’t run themselves. When you’re faced with a system that is slowing down, you’ll need the right tools to diagnose and troubleshoot the problem. MySQL has a huge community following and that means scores of great tools for your toolbox. Here are 7 ways to troubleshoot MySQL. Continue reading “7 Ways to Troubleshoot MySQL”
Some of the high profile companies affected by Amazon’s April 2011 outage could have recovered had they kept a backup of their entire site outside of the cloud. With any hosting provider, managed traditional data center or cloud provider, alternate backups are always a good idea. A MySQL logical backup and/or incremental backup can be copied regularly offsite or to an alternate cloud provider. That’s real insurance! Continue reading “5 Ways to Avoid EC2 Outages”
Deploying in the Amazon cloud is touted as a great way to achieve high scalability while paying only for the computing power you use. How do you get the best scalability from the technology? Continue reading “3 Ways to Boost Cloud Scalability”
In search of a good book on Chef itself, I picked up this new title on O’Reilly. It’s one of their new format books, small in size, only 75 pages.
There was some very good material in this book. Mr. Nelson-Smith’s writing style is good, readable, and informative. The discussion of risks of infrastructure as code was instructive. With the advent of APIs to build out virtual data centers, the idea of automating every aspect of systems administration, and building infrastructure itself as code is a new one. So an honest discussion of the risks of such an approach is bold and much needed. I also liked the introduction to Chef itself, and the discussion of installation.
Chef isn’t really the main focus of this book, unfortunately. The book spends a lot of time introducing us to Agile Development, and specifically test driven development. While these are lofty goals, and the first time I’ve seen treatment of the topic in relation to provisioning cloud infrastructure, I did feel too much time was spent on that. Continue reading “Review – Test Driven Infrastructure with Chef – Stephen Nelson-Smith”
Amazon Web Services is a division of Amazon the bookseller, but this part of the business is devoted solely to infrastructure and internet servers. These are the building blocks of data centers, the workhorses of the internet. AWS’s offering of Cloud Computing solutions allows a business to setup or “spinup” in the jargon of cloud computing, new compute resources at will. Need a small single cpu 32bit ubuntu server with two 20G disks attached? One command and 30 seconds away, and you can have that!
As we discussed previously, Infrastructure Provisioning has evolved dramatically over the past fifteen years from something took time and cost a lot, to a fast automatic process that it is today with cloud computing. This has also brought with it a dramatic culture shift in the way that systems administration is being done, from a fairly manual process of physical machines, and software configuration, one that took weeks to setup new services, to a scriptable and automateable process that can then take seconds.
This new realm of cloud computing infrastructure and provisioning is called Infrastructure as a Service or IaaS, and Amazon Web Services is one of the largest providers of such compute resources. They’re not the only ones of course. Others include:
- Rackspace Cloud
Cloud Computing is still in it’s infancy, but is growing quickly. Amazon themselves had a major data center outage in April that we discussed in detail. It sent some hot internet startups into a tailspin!
IOPs are an attempt to standardize comparison of disk speeds across different environments. When you turn on a computer, everything must be read from disk, but thereafter things are kept in memory. However applications typically read and write to disk frequently. When you move to enterprise class applications, especially relational databases, a lot of disk I/O is happening so performance of disk resources is crucial.
For a basic single SATA drive that you might have in server or laptop, you can typically get 30-40 IOPs from it. These numbers vary if you are talking about random versus sequential reads or writes. Picture the needle on a vinyl record. It moves quicker around the center, and slower around the outside. That’s what’s happening the the magnetic needle inside your harddrive too.
In Amazon EC2 environment, there is a lot of variability in performance from EBS. You can stripe across four separate EBS volumes which will be on four different locations on the underlying RAID array and you’ll get a big boost in disk I/O. Also disk performance will vary from an m1.small, m1.large and m1.xlarge instance type, with the latter getting the lions share of network bandwidth, so better disk I/O performance. But in the end your best EBS performance will be in the range of 500-1000 IOPs. That’s not huge by physical hardware standards, so an extremely disk intensive application will probably not perform well in the Amazon cloud.
Still the economic pressures and infrastructure and business flexibility continue to push cloud computing adoption, so expect the trend to continue.
A lot of technical forums and discussions have highlighted the limitations of EC2 and how it loses on performance when compared to physical servers of equal cost. They argue that you can get much more hardware and bigger iron for the same money. So it then seems foolhardy to turn to the cloud. Why this mad rush to the cloud then? Of course if all you’re looking at is performance, it might seem odd indeed. But another way of looking at it is, if performance is not as good, it’s clearly not the driving factor to cloud adoption.
CIOs and CTOs are often asking questions more along the lines of, “Can we deploy in the cloud and settle with the performance limitations, and if so how do we get there?”
Another question, “Is it a good idea to deploy your database in the cloud?” It depends! Let’s take a look at some of the strengths and weaknesses, then you decide.
8 big strengths of the cloud
- Flexibility in disaster recovery – it becomes a script, no need to buy additional hardware
- Easier roll out of patches and upgrades
- Reduced operational headache – scripting and automation becomes central
- Uniquely suited to seasonal traffic patterns – keep online only the capacity you’re using
- Low initial investment
- Auto-scaling – set thresholds and deploy new capacity automatically
- Easy compromise response – take server offline and spinup a new one
- Easy setup of dev, qa & test environments
Some challenges with deploying in the cloud
- Big cultural shift in how operations is done
- Lower SLAs and less reliable virtual servers – mitigate with automation
- No perimeter security – new model for managing & locking down servers
- Where is my data? — concerns over compliance and privacy
- Variable disk performance – can be problematic for MySQL databases
- New procurement process can be a hurdle
Many of these challenges can be mitigated against. The promise of the infrastructure deployed in the cloud is huge, so digging our heels in with gradual adoption is perhaps the best option for many firms. Mitigate the weaknesses of the cloud by:
- Use encrypted filesystems and backups where necessary
- Also keep offsite backups inhouse or at an alternate cloud provider
- Mitigate against EBS performance – cache at every layer of your application stack
- Employ configuration management & automation tools such as Puppet & Chef
Now that we’ve had a chance to take a deep breath after last week’s AWS outage, I’ll offer some comments of my own. Hopefully just enough time has passed to begin to have a broader view, and put events in perspective.
Despite what some reports may have announced, Amazon wasn’t down, but rather a small part of Amazon Web Services went down. A failure, yes. Beyond their service level agreement of 99.95% yes also. Survivable, yes to this last question too.
Learning From Failure
The business management conversation du jour is all about learning from failure, rather than trying to avoid it. Harvard Business Review’s April issue headlined with “The Failure Issue – How to Understand It, Learn From It, and Recover From It”. The economist’s April 16th issue had some similarly interesting pieces one by Schumpeter “Fail often, fail well”,
and another in April 23rd issue “Lessons from Deepwater Horizon and Fukushima”.
With all this talk of failure there is surely one takeaway. Complex systems will fail and it is in the anticipation of that failure that we gain the most. Let’s stop howling and look at how to handle these situations intelligently.
How Do You Rebuild A Website?
In the cloud you will likely need two things. (a) scripts to rebuild all the components in your architecture, spinup servers, fetch source code, fetch software and configuration files, configure load balancers and mount your database and more importantly (b) a database backup from which you can rebuild your current dataset.
Want to stick with EC2, build out your infrastructure in an alternate availability zone or region and you’re back up and running in hours. Or better yet have an alternate cloud provider on hand to handle these rare outages. The choice is yours.
Mitigate risk? Yes indeed failure is more common in the cloud, but recovery is also easier. Failure should pressure the adoption of best practices and force discipline in deployments, not make you more of a gunslinger!
Want to see an extreme example of how this can play in your favor? Read Jeff Atwood’s discussion of so-called Chaos Monkey, a component whose sole job it is to randomly kill off servers in the Netflix environment at random. Now that type of gunslinging will surely keep everyone on their toes! Here’s a Wired article that discusses Chaos Monkey.
George Reese of enStratus discusses the recent failure at length. The I would argue calling Amazon’s outage the Cloud’s Shing Moment, all of his points are wisened and this is the direction we should all be moving.
Going The Way of Commodity Hardware
Though it is still not obvious to everyone, I’ll spell it out loud and clear. Like it or not, the cloud is coming. Look at these numbers.
Furthermore the recent outage also highlights how much and how many internet sites rely on cloud computing, and Amazon EC2.
Way back in 2001 I authored a book on O’Reilly called “Oracle and Open Source”. In it I discussed the technologies I was seeing in the real world. Oracle on the backend and Linux, Apache, and PHP, Perl or some other language on the frontend. These were the technologies that startups were using. They were fast, cheap and with the right smarts reliable too.
Around that time Oracle started smelling the coffee and ported it’s enterprise database to Linux. The equation for them was simple. Customers that were previously paying tons of money to their good friend and confidant Sun for hardware, could now spend 1/10th as much on hardware and shift a lot of that left over cash to – you guessed it Oracle! The hardware wasn’t as good, but who cares because you can get a lot more of it.
Despite a long entrenched and trusted brand like Sun being better and more reliable, guess what? Folks still switched to commodity hardware. Now this is so obvious, no one questions it. But the same trend is happening with cloud computing.
Performance is variable, disk I/O can be iffy, and what’s more the recent outage illustrates front and center, the servers and network can crash at any moment. Who in their right mind would want to move to this platform?
If that’s the question you’re stuck on, you’re still stuck on the old model. You have not truely comprehended the power to build infrastructure with code, to provision through automation, and really embrace managing those components as software. As the internet itself has the ability to route around political strife, and network outages, so too does cloud computing bring that power to mom & pop web shops.
- Have existing investments in hardware? Slow and cautious adoption makes most sense for you.
- Have seasonal traffic variations? An application like this is uniquely suited to the cloud. In fact some of the gaming applications which can autoscale to 10x or 100x servers under load, are newly solveable with the advent of cloud computing.
- Are you currently paying a lot for disaster recovery systems that primarily lay idle. Script your infrastructure for rebuilding from bare metal, and save that part of the budget for more useful projects.
Also find Sean Hull’s ramblings on twitter @hullsean.
There are a lot of considerations for deploying MySQL in the Cloud. Some concepts and details won’t be obvious to DBAs used to deploying on traditional servers. Here are eight best practices which will certainly set you off on the right foot.
This article is part of a multi-part series Intro to EC2 Cloud Deployments.
Master-Slave replication is easy to setup, and provides a hot online copy of your data. One or more slaves can also be used for scaling your database tier horizontally.
Master-Master active/passive replication can also be used to bring higher uptime, and allow some operations such as ALTER statements and database upgrades to be done online with no downtime. The secondary master can be used for offloading read queries, and additional slaves can also be added as in the master-slave configuration.
Caution: MySQL’s replication can drift silently out of sync with the master. If you’re using statement based replication with MySQL, be sure to perform integrity checking to make your setup run smoothly. Here’s our guide to bulletproofing MySQL replication.
You’ll want to create an AWS security group for databases which opens port 3306, but don’t allow access to the internet at large. Only to your AWS defined webserver security group. You may also decide to use a single box and security group which allows port 22 (ssh) from the internet at large. All ssh connections will then come in through that box, and internal security groups (database & webserver groups) should only allow port 22 connections from that security group.
When you setup replication, you’ll be creating users and granting privileges. You’ll need to grant to the wildcard ‘%’ hostname designation as your internal and external IPs will change each time a server dies. This is safe since you expose your database server port 3306 only to other AWS security groups, and no internet hosts.
You may also decide to use an encrypted filesystem for your database mount point, your database backups, and/or your entire filesystem. Be particularly careful of your most sensitive data. If compliance requirements dictate, choose to store very sensitive data outside of the cloud and secure network connections to incorporate it into application pages.
Be particularly careful of your AWS logins. The password recovery mechanism in Amazon Web Services is all that prevents an attacker from controlling your entire infrastructure, after all.
There are a few ways to backup a MySQL database. By far the easiest way in EC2 is using the AWS snapshot mechanism for EBS volumes. Keep in mind you’ll want to encrypt these snapshots as S3 may not be as secure as you might like. Although you’ll need to lock your MySQL tables during the snapshot, it will typically only take a few seconds before you can release the database locks.
Now snapshots are great, but they can only be used within the AWS environment, so it also behooves you to be performing additional backups, and moving them offsite either to another cloud provider or to your own internal servers. For this your choices are logical backups or hotbackups.
mysqldump can perform logical backups for you. These backups perform SELECT * on every table in your database, so they can take quite some time, and really destroy the warm blocks in your InnoDB buffer cache. What’s more rebuilding a database from a dump can take quite some time. All these factors should be considered before deciding a dump is the best option for you.
xtrabackup is a great open source tool available from Percona. It can perform hotbackups of all MySQL tables including MyISAM, InnoDB and XtraDB if you use them. This means the database will be online, not locking tables, with smarter less destructive hits to your buffer cache and database server as a whole. The hotbackup will build a complete copy of your datadir, so bringing up the server from a backup involves setting the datadir in your my.cnf file and starting.
4. Disk I/O
Obviously Disk I/O is of paramount performance for any database server including MySQL. In AWS you do not want to use instance store storage at all. Be sure your AMI is built on EBS, and further, use a separate EBS mount point for the database datadir.
An even better configuration than the above, but slightly more complex to configure is a software raid stripe of a number of EBS volumes. Linux’s software raid will create an md0 device file which you will then create a filesystem on top of – use xfs. Keep in mind that this arrangement will require some care during snapshotting, but can still work well. The performance gains are well worth it!
5. Network & IPs
When configuring Master & Slave replication, be sure to use the internal or private IPs and internal domain names so as not to incur additional network charges. The same goes for your webservers which will point to your master database, and one or more slaves for read queries.
6. Availability Zones
Amazon Web Services provides a tremendous leap in options for high availability. Take advantage of availability zones by putting one or more of your slaves in a separate zone where possible. Interestingly if you ensure the use of internal or private IP addresses and names, you will not incur additional network charges to servers in other availability zones.
7. Disaster Recovery
EC2 servers are out of the gates *NOT* as reliable as traditional servers. This should send shivers down your spine if you’re trying to treat AWS like a traditional hosted environment. You shouldn’t. It should force you to get serious about disaster recovery. Build bulletproof scripts to spinup your servers from custom built AMIs and test them. Finally you’re taking disaster recovery as seriously as you always wanted to. Take advantage of Availability Zones as well, and various different scenarios.
8. Vertical and Horizontal Scaling
Interestingly vertical scaling can be done quite easily in EC2. If you start with a 64bit AMI, you can stop such a server, without losing the root EBS mount. From there you can then start a new larger instance in EC2 and use that existing EBS root volume and voila you’ve VERTICALLY scaled your server in place. This is quite a powerful feature at the system administrators disposal. Devops has never been smarter! You can do the same to scale *DOWN* if you are no longer using all the power you thought you’d need. Combine this phenomenal AWS feature with MySQL master-master active/passive configuration, and you can scale vertically with ZERO downtime. Powerful indeed.
We wrote an EC2 Autoscaling Guide for MySQL that you should review.
Along with vertical scaling, you’ll also want the ability to scale out, that is add more servers to the mix as required, and scale back when your needs reduce. Build in smarts in your application so you can point SELECT queries to read-only slaves. Many web applications exhibit the bulk of there work in SELECTs so being able to scale those horizontally is very powerful and compelling. By baking this logic into the application you also allow the application to check for slave lag. If your slave is lagging slightly behind the master you can see stale data, or missing data. In those cases your application can choose to go to the master to get the freshest data.
What about RDS?
Wondering whether RDS is right for you? It may be. We wrote a comprehensive guide to evaluating RDS over MySQL.
If you read this far, you should grab our newsletter!