Also find Sean Hull’s ramblings on twitter @hullsean.
Autoscaling your webserver tier is typically straightforward. Image your apache server with source code or without, then sync down files from S3 upon spinup. Roll that image into the autoscale configuration and you’re all set.
With the database tier though, things can be a bit tricky. The typical configuration we see is to have a single master database where your application writes. But scaling out or horizontally on Amazon EC2 should be as easy as adding more slaves, right? Why not automate that process?
Below we’ve set out to answer some of the questions you’re likely to face when setting up slaves against your master. We’ve included instructions on building an AMI that automatically spins up as a slave. Fancy!
- How can I autoscale my database tier?
- Build an auto-starting MySQL slave against your master.
- Configure those to spinup. Amazon’s autoscaling loadbalancer is one option, another is to use a roll-your-own solution, monitoring thresholds on servers, and spinning up or dropping off slaves as necessary.
- Does an AWS snapshot capture subvolume data or just the SIZE of the attached volume?
- How do I freeze MySQL during AWS snapshot?
- How do I build an AMI mysql slave that autoconnects to master?
- Configure mysql to use your /data EBS mount.
- Set all your my.cnf settings including server_id
- Configure the instance as a slave in the normal way.
- When using GRANT to create the ‘rep’ user on master, specify the host with a subnet wildcard. For example ‘10.20.%’. That will subsequently allow any 10.20.x.y servers to connect and replicate.
- Point the slave at the master.
- When all is running properly, edit the my.cnf file and remove server_id. Don’t restart mysql.
- Freeze the filesystem as described above.
- Use the Amazon console, ylastic or API call to create your new image.
- Test it of course, to make sure it spins up, sets server_id and connects to master.
- Make a change in the test schema, and verify that it propagates to all slaves.
- How do I set server_id uniquely?
- Can you easily slave off of a slave? How?
- First enable slave updates. The setting is not dynamic, so if you don’t already have it set, you’ll have to restart your slave.
- Get an initial snapshot of your slave data. You can do that the locking way:
- On the slave, seed the database with your dump created above.
- Now point your slave to the original slave.
- Slave master is set as an IP address. Is there another way?
- Set this parameter in my.cnf
- Remove entries in mysql.user table where hostname is not an IP address. Those entries will be invalid for authentication after setting the above parameter.
- Doesn’t RDS take care of all of this for me?
- Simpler administration. Nuts and bolts are handled for you.
- Push-button replication. No more struggling with the nuances and issues of MySQL’s replication management.
- No access to the slow query log.
- Locked in downtime window
- Can’t use Percona Server to host your MySQL data.
- No access to filesystem, server metrics & command line.
- You are beholden to Amazon’s support services if things go awry.
- You can’t replicate to a non-RDS database.
In fact, if you have an attached EBS volume and you create an new AMI off of that, you will capture the entire root volume, plus your attached volume data. In fact we find this a great way to create an auto-building slave in the cloud.
mysql> flush tables with read lock;mysql> system xfs_freeze -f /data
At this point you can use the Amazon web console, ylastic, or ec2-create-image API call to do so from the command line. When the server you are imaging off of above restarts – as it will do by default – it will start with /data partition unfrozen and mysql’s tables unlocked again. Voila!
If you’re not using xfs for your /data filesystem, you should be. It’s fast! The xfsprogs docs seem to indicate this may also work with foreign filesystems. Check the docs for details.
Install mysql_serverid script below.
As you hopefully already know, in MySQL replication environment each node requires a unique server_id setting. In my Amazon Machine Images, I want the server to startup and if it doesn’t find the server_id in the /etc/my.cnf file, to add it there, correctly! Is that so much to ask?
Here’s what I did. Fire up your editor of choice and drop in this bit of code:
#!/bin/shif grep -q “server_id” /etc/my.cnf
: # do nothing – it’s already set
# extract numeric component from hostname – should be internet IP in Amazon environment
export server_id=`echo $HOSTNAME | sed ‘s/[^0-9]*//g’`
echo “server_id=$server_id” >> /etc/my.cnf
# restart mysql
Save that snippet at /root/mysql_serverid. Also be sure to make it executable:
$ chmod +x /root/mysql_serverid
Then just append it to your /etc/rc.local file with an editor or echo:
$ echo "/root/mysql_serverid" >> /etc/rc.local
Assuming your my.cnf file does *NOT* contain the server_id setting when you re-image, then it’ll set this automagically each time you spinup a new server off of that AMI. Nice!
It’s not terribly different from slaving off of a normal master.
mysql> flush tables with read lock;mysql> show master statusG;
mysql> system mysqldump -A > full_slave_dump.mysql
mysql> unlock tables;
You may also choose to use Percona’s excellent xtrabackup utility to create hotbackups without locking any tables. We are very lucky to have an open-source tool like this at our disposal. MySQL Enterprise Backup from Oracle Corp can also do this.
$ mysql < full_slave_dump.mysql
mysql> change master to master_user='rep', master_password='rep', master_host='192.168.0.1', master_log_file='server-bin-log.000004', master_log_pos=399;mysql> start slave;
mysql> show slave statusG;
It’s possible to use hostnames in MySQL replication, however it’s not recommended. Why? Because of the wacky world of DNS. Suffice it to say MySQL has to do a lot of work to resolve those names into IP addresses. A hickup in DNS can interrupt all MySQL services potentially as sessions will fail to authenticate. To avoid this problem do two things:
skip_name_resolve = true
RDS is Amazon’s Relational Database Service which is built on MySQL. Amazon’s RDS solution presents MySQL as a service which brings certain benefits to administrators and startups:
Simplicity of administration of course has it’s downsides. Depending on your environment, these may or may not be dealbreakers.
This is huge. The single best tool for troubleshooting slow database response is this log file. Queries are a large part of keeping a relational database server healthy and happy, and without this facility, you are severely limited.
When you signup for RDS, you must define a thirty minute maintenance window. This is a weekly window during which your instance *COULD* be unavailable. When you host yourself, you may not require as much downtime at all, especially if you’re using master-master mysql and zero-downtime configuration.
You won’t be able to do this in RDS. Percona server is a high performance distribution of MySQL which typically rolls in serious performance tweaks and updates before they make it to community addition. Well worth the effort to consider it.
Again for troubleshooting problems, these are crucial. Gathering data about what’s really happening on the server is how you begin to diagnose and troubleshoot a server stall or pileup.
That’s because you won’t have access to the raw iron to diagnose and troubleshoot things yourself. Want to call in an outside consultant to help you debug or troubleshoot? You’ll have your hands tied without access to the underlying server.
Have your own datacenter connected to Amazon via VPC? Want to replication to a cloud server? RDS won’t fit the bill. You’ll have to roll your own – as we’ve described above. And if you want to replicate to an alternate cloud provider, again RDS won’t work for you.
One of the great things about the Internet is how it has made it easier to put great ideas into practice. Whether the ideas are about improving people’s lives or a new way to sell and old-fashioned product, there’s nothing like a good little startup tale of creative disruption to deliver us from something old and tired.
We work with a lot of startup firms and we love being part of the atmosphere of optimism and ingenuity, peppered with a bit of youthful zeal – something very indie-rock-and-roll about it. But whether they are just starting out or already picking up pace every startup faces the same challenges to scale a business. Recently, we were reminded of this when we watched Inc’s video interview with Birchbox founders, Hayley Barna and Katia Beauchamp. Continue reading Scale Quickly Like Birchbox – Startup Scalability 101
One very strong case for cloud computing is that it can satisfy applications with seasonal traffic patterns. One way to test the advantages of the cloud is through a hybrid approach.
Cloud infrastructure can be built completely through scripts. You can spinup specific AMIs or machine images, automatically install and update packages, install your credentials, startup services, and you’re running.
All of these steps can be performed in advance of your need at little cost. Simply build and test. When you’re finished, shutdown those instances. What you walk away with is scripts. What do we mean?
The power here is that you carry zero costs for that burst capacity until you need it. You’ve already build the automation scripts, and have them in place. When your capacity planning warrants it, spinup additional compute power, and watch your internet application scale horizontally. Once your busy season is over, scale back and disable your usage until you need it again.
Amazon Web Services is a division of Amazon the bookseller, but this part of the business is devoted solely to infrastructure and internet servers. These are the building blocks of data centers, the workhorses of the internet. AWS’s offering of Cloud Computing solutions allows a business to setup or “spinup” in the jargon of cloud computing, new compute resources at will. Need a small single cpu 32bit ubuntu server with two 20G disks attached? One command and 30 seconds away, and you can have that!
As we discussed previously, Infrastructure Provisioning has evolved dramatically over the past fifteen years from something took time and cost a lot, to a fast automatic process that it is today with cloud computing. This has also brought with it a dramatic culture shift in the way that systems administration is being done, from a fairly manual process of physical machines, and software configuration, one that took weeks to setup new services, to a scriptable and automateable process that can then take seconds.
This new realm of cloud computing infrastructure and provisioning is called Infrastructure as a Service or IaaS, and Amazon Web Services is one of the largest providers of such compute resources. They’re not the only ones of course. Others include:
- Rackspace Cloud
Cloud Computing is still in it’s infancy, but is growing quickly. Amazon themselves had a major data center outage in April that we discussed in detail. It sent some hot internet startups into a tailspin!
A lot of technical forums and discussions have highlighted the limitations of EC2 and how it loses on performance when compared to physical servers of equal cost. They argue that you can get much more hardware and bigger iron for the same money. So it then seems foolhardy to turn to the cloud. Why this mad rush to the cloud then? Of course if all you’re looking at is performance, it might seem odd indeed. But another way of looking at it is, if performance is not as good, it’s clearly not the driving factor to cloud adoption.
CIOs and CTOs are often asking questions more along the lines of, “Can we deploy in the cloud and settle with the performance limitations, and if so how do we get there?”
Another question, “Is it a good idea to deploy your database in the cloud?” It depends! Let’s take a look at some of the strengths and weaknesses, then you decide.
8 big strengths of the cloud
- Flexibility in disaster recovery – it becomes a script, no need to buy additional hardware
- Easier roll out of patches and upgrades
- Reduced operational headache – scripting and automation becomes central
- Uniquely suited to seasonal traffic patterns – keep online only the capacity you’re using
- Low initial investment
- Auto-scaling – set thresholds and deploy new capacity automatically
- Easy compromise response – take server offline and spinup a new one
- Easy setup of dev, qa & test environments
Some challenges with deploying in the cloud
- Big cultural shift in how operations is done
- Lower SLAs and less reliable virtual servers – mitigate with automation
- No perimeter security – new model for managing & locking down servers
- Where is my data? — concerns over compliance and privacy
- Variable disk performance – can be problematic for MySQL databases
- New procurement process can be a hurdle
Many of these challenges can be mitigated against. The promise of the infrastructure deployed in the cloud is huge, so digging our heels in with gradual adoption is perhaps the best option for many firms. Mitigate the weaknesses of the cloud by:
- Use encrypted filesystems and backups where necessary
- Also keep offsite backups inhouse or at an alternate cloud provider
- Mitigate against EBS performance – cache at every layer of your application stack
- Employ configuration management & automation tools such as Puppet & Chef
Look at your website’s current traffic patterns, pageviews or visits per day, and compare that to your server infrastructure. In a nutshell your current capacity would measure the ceiling your traffic could grow to, and still be supported by your current servers. Think of it as the horsepower of you application stack – load balancer, caching server, webserver and database.
Capacity planning seeks to estimate when you will reach capacity with your current infrastructure by doing load testing, and stress testing. With traditional servers, you estimate how many months you will be comfortable with currently provisioned servers, and plan to bring new ones online and into rotation before you reach that traffic ceiling.
Your reaction to capacity and seasonal traffic variations becomes much more nimble with cloud computing solutions, as you can script server spinups to match capacity and growth needs. In fact you can implement auto-scaling as well, setting rules and thresholds to bring additional capacity online – or offline – automatically as traffic dictates.
In order to be able to do proper capacity planning, you need good data. Pageviews and visits per day can come from your analytics package, but you’ll also need more complex metrics on what your servers are doing over time. Packages like Cacti, Munin, Ganglia, OpenNMS or Zenoss can provide you with very useful data collection with very little overhead to the server. With these in place, you can view load average, memory & disk usage, database or webserver threads and correlate all that data back to your application. What’s more with time-based data and graphs, you can compare changes to application change management and deployment data, to determine how new code rollouts affect capacity requirements.
With cloud-based hosting solutions, new servers can be provisioned and “spun up” with a few options on the command line. This opens a whole new dimension for infrastructure, allowing software scripts to bring new computing power into your web infrastructure.
Internet based applications often exhibit seasonal traffic patterns where traffic stays steady or grows slowly over a period, but then experiences a sharp spike in demand requiring much higher computing resources to meet customer demand.
Enter auto-scaling, an even more powerful feature of cloud-based offerings. Define roles for your webservers and database servers, set capacity rules that control how much traffic will trigger new servers to be rolled out, and watch your infrastructure scale automatically to meet the needs of your internet application.
Cloud Computing has a few varied meanings from API services such as twitter to web-based (read cloud-based) email services such as gmail and yahoo.
An even bigger tectonic shift is happening though, in the area of infrastructure and hosting, to cloud based solutions. No longer is provisioning a slow ordering process, followed by a multi-year contract and commitment with an associated high price tag. Now computing resources can be provisioned and “spin-up” in seconds, even allowing for auto-scaling, bringing new computing resources online dynamically as seasonal traffic patterns demand.
- uniquely suited to applications with seasonal traffic requirements
- supports disaster recovery effectively for free
- allows temporary provisioning of test environments
- facilitates auto-scaling of bare metal servers
- no huge budgetary outlay, pay for only what you use
- bring up resources in seconds – supports true agile development
What’s more since cloud resources are all provisioned in software through an API, it encourages the treatment of infrastructure as a whole as software. Now the scripts to completely rebuild all of your systems, from spin-up, to package configuration to application configuration can all be done in software, and managed in version control.
Scalability in the cloud depends a lot on application design. Keep these important points in mind when you are designing your web application and you will scale much more naturally and easily in the cloud.
1. Think twice before sharding
- It increases your infrastructure and application complexity
- it reduces availability – more servers mean more outages
- have to worry about globally unique primary keys
2. Bake read/write database access into the application
- allows you to check for stale data, fallback to write master
- creates higher availability for read-only data
- gracefully degrade to read-only website functionality if master goes down
- horizontal scalability melds nicely with cloud infrastructure and IAAS
3. Save application state in the database
- avoid in-memory locking structures that won’t scale with multiple web application servers
- consider a database field for managing application locks
- consider stored procedures for isolating and insulating developers from db particulars
- a last updated timestamp field can be your friend
4. Consider Dynamic or Auto-scaling
- great feature of cloud, spinup new servers to handle load on-demand
- lean towards being proactive rather than reactive and measure growth and trends
- watch the procurement process closely lest it come back to bite you
5. Setup Monitoring and Metrics
- see trends over time
- spot application trouble and bottlenecks
- determine if your tuning efforts are paying off
- review a traffic spike after the fact
The cloud is not a silver bullet that can automatically scale any web application. Software design is still a crucial factor. Baking in these features with the right flexibility and foresight, and you’ll manage your websites growth patterns with ease.
Have questions or need help with scalability? Call us: +1-212-533-6828
Jeff Barr’s book on AWS is a very readable howto and a quick way to get started with EC2, S3, CloudFront, CloudWatch and SimpleDB. It is short on theory, but long on all the details of really getting your hands dirty. Learn how to:
- get started using the APIs to spinup servers
- create a load balancer
- add and remove application servers
- build custom AMIs
- create EBS volumes, attach them to your instances & format them
- snapshot EBS volumes
- use RAID with EBS
- setup CloudWatch to monitor your instances
- setup triggers with CloudWatch to enable AutoScaling
I would have liked to see examples in Chef rather than PHP, but hey you can’t have everything!