Your recent social media campaign has gone viral. It’s what you’ve been dreaming about, pinning your hopes on, and all of your hard work is now coming to fruition. Tens of thousands of internet users, hoards of them in fact, are now descending on your website. Only one problem, it went down!!
That’s a situation you want to avoid. Luckily there are some best practices for avoiding scenarios like the one I described. In engineering it’s termed “degrade gracefully”. That is continue functioning but with the heaviest features disabled.
Browsing Only, But Still Functioning
One way to do this is for your site to have a browsing only mode. On the database side you can still be functioning with a read-only database. With a switch like that, your site will continue to function while pointed to any of your read-only replication slaves. What’s more you can load balance across those easily, and keep your site up and running.
In software development, decoupling involves breaking apart components or pieces of an application that should not depend on one another. One way to do this is to use a queuing system such as Amazon’s SQS to allow pieces of the application to queue up work to be done. This makes those pieces asynchronous, ie they’ll return right away. Another way is to expose services internal to your site through web services. These individual components can then be scaled out as needed. This makes them more highly available, and reduces the need to scale your memcache, webservers or database servers – the hardest ones to scale.
Identify Features You Can Disable
Typically your application will have features that are more superfluous, or that are not part of the core functionality. Perhaps you have star ratings, or some other components that are heavy. Work with the development and operations teams to identify those areas of the application that are heaviest, and that would warrant disabling if the site hits heavy storms.
Once you’ve done all that, document how to disable and reenable those features, so other team members will be able to flip the switches if necessary.
With traditional managed hosting solutions, we have best practices, we have business continuity plans, we have disaster recovery, we document our processes and all the moving parts in our infrastructure. At least we pay lip service to these goals, though from time to time we admit to getting side tracked with bigger fish to fry, high priorities and the emergency of the day. We add “firedrill” to our todo list, promising we’ll test restoring our backups. But many times we find it is in the event of an emergency that we are forced to find out if we actually have all the pieces backed up and can reassemble them properly.
Cloud Computing is different. These goals are no longer be lofty ideals, but must be put into practice. Here’s why.
- Virtual servers are not as reliable as physical servers
- Amazon EC2 has a lower SLA than many managed hosting providers
- Devops introduces new paradigm, infrastructure scripts can be version controlled
- EC2 environment really demands scripting and repeatability
- New flexibility and peace of mind
EC2 virtual servers can and will die. Your spinup scripts and infrastructure should consider this possibility not as some far off anomalous event, but a day-to-day concern. With proper scripts and testing of various scenarios, this should become manageable. Use snapshots to backup EBS root volumes, and build spinup scripts with AMIs that have all the components your application requires. Then test, test and test again.
Amazon EC2’s SLA – Only 99.95%
The computing industry throws around the 99.999% or five-nines uptime SLA standard around a lot. That amounts to less than six minutes of downtime. Amazon’s 99.95% allows for 263 minutes of downtime. Greater downtime merely gets you a credit on your account. With that in mind, repeatable processes and scripts to bring your infrastructure back up in different availability zones or even different datacenters is a necessity. Along with your infrastructure scripts, offsite backups also become a wise choice. You should further take advantage of availability zones and regions to make your infrastructure more robust. By using private IP addresses and network, you can host a MySQL database slave in a separate zone, for instance. You can also do GDLB or Geographically Distributed Load Balancing to send customers on the west coast to that zone, and those on the east coast to one closer to them. In the event that one region or availability zone goes out, your application is still responding, though perhaps with slightly degraded performance.
Devops – Infrastructure as Code
With traditional hosting, you either physically manage all of the components in your infrastructure, or have someone do it for you. Either way a phone call is required to get things done. With EC2, every piece of your infrastructure can be managed from code, so your infrastructure itself can be managed as software. Whether you’re using waterfall method, or agile as your software development lifecycle, you have the new flexibility to place all of these scripts and configuration files in version control. This raises manageability of your environment tremendously. It also provides a type of ongoing documentation of all of the moving parts. In a word, it forces you to deliver on all of those best practices you’ve been preaching over the years.
EC2 Environment Considerations
When servers get restarted they get new IP addresses – both private and public. This may affect configuration files from webservers to mail servers, and database replication too, for example. Your new server may mount an external EBS volume which contains your database. If that’s the case your start scripts should check for that, and not start MySQL until it finds that volume. To further complicate things, you may choose to use software raid over a handful of EBS volumes to get better performance.
The more special cases you have, the more you quickly realize how important it is to manage these things in software. The more the process needs to be repeated, the more the scripts will save you time.
New Flexibility in the Cloud
Ultimately if you take into consideration less reliable virtual servers, and mitigate that with zones and regions, and automated scripts, you can then enjoy all the new benefits of the cloud.
- easy test & dev environment setup
- robust load & scalability testing
- vertically scaling servers in place – in minutes!
- pause a server – incurring only storage costs for days or months as you like
- cheaper costs for applications with seasonal traffic patterns
- no huge up-front costs
The term clustering is often used loosely in the context of enterprise databases. In relation to MySQL in the cloud you can configure:
- Master-master active/passive
- Sharded MySQL Database
- NDB Cluster
Master-Master active/passive replication
Also sometimes known as circular replication. This is used for high availability. You can perform operations on the inactive node (backups, alter tables or slow operations) then switch roles so inactive becomes active. You would then perform the same operations on the former master. Applications sees “zero downtime” because they are always pointing at the active master database. In addition the inactive master can be used as a read-only slave to run SELECT queries and large reporting queries. This is quite powerful as typical web applications tend to have 80% or more of their work performed with read-only queries such as browsing, viewing, and verifying data and information.
Sharded MySQL Database
This is similar to what in the Oracle world is called “application partitioning”. In fact before Oracle 10 most Parallel server and RAC installations required you to do this. For example a user table might be sharded by putting names A-F on node A, G-L on node B and so forth.
You can also achieve this somewhat transparently with user_ids. MySQL has an autoincrement column type to handle serving up unique ids. It also has a cluster-friendly feature called auto_increment_increment. So in an example where you had *TWO* nodes, all EVEN numbered IDs would be generated on node A and all ODD numbered IDs would be generated on node B. They would also be replicating changes to eachother, yet avoid collisions.
Obviously all this has to be done with care, as the database is not otherwise preventing you from doing things that would break replication and your data integrity.
One further caution with sharding your database is that although it increases write throughput by horizontally scaling the master, it ultimately reduces availability. An outage of any server in the cluster means at least a partial outage of the cluster itself.
This is actually a storage engine, and can be used in conjunction with InnoDB and MyISAM tables. Normally you would use it sparingly for a few special tables, providing availability and read/write access to multiple masters. This is decidedly *NOT* like Oracle RAC though many mistake it for that technology.
MySQL Clustering In The Cloud
The most common MySQL cluster configuration we see in the Amazon EC2 environment is by far the Master-Master configuration described above. By itself it provides higher availability of the master node, and a single read-only node for which you can horizontally scale your application queries. What’s more you can add additional read-only slaves to this setup allowing you to scale out tremendously.
In a recent trip to Southeast Asia, I visited the Malaysian city of Kuala Lumpur. It is a sprawling urban area, more like Los Angeles than New York. With all the congestion and constant traffic jams the question of city planning struck me. On a more abstract level this is the same challenge that faces web application and internet website designers. Architect and bake the quality into the framework, or hack it from start to finish?
Looking at cities like Los Angeles you can’t help but think that no one imagined there would ever be this many cars. You think the same thought when you are in Kuala Lumpur. The traffic reaches absurd levels at times. A local friend told me that when the delegates travel through the city, they have a cavalcade of cars, and a complement of traffic cops to literally move the traffic out of the way. It’s that bad!
Of course predicting how traffic will grow is no science. But still cities can be planned. Take a mega-city like New York for example. The grid helps with traffic. A system of one way streets, a few main arteries, and travelers and taxis a like can make better informed decisions about which way to travel. What’s more the city core is confined to an island, so new space is built upward rather than outward. Suddenly the economics of closeness wins out. Many buildings in midtown you can walk between, or at most take a quick taxi ride. Suddenly a car becomes a burden. What’s more the train system, a spider web of subways and regional transit branches North to upstate New York, Northeast to Connecticut, East to Long Island, and West to New Jersey.
If you’ve lived in the New York metropolitan region and bought a home, or work in real estate you know that proximity to a major train station affects the prices of homes. This is the density of urban development working for us. It is tough to add this sauce to a city that has already sprawled. And so it is with architecting websites and applications.
Architecting for the Web
Traffic to a website can be as unpredictable as traffic within the confines of an urban landscape. And the spending that goes into such infrastructure as delicate. Spend too much and you risk building for people who will never arrive. What’s more while the site traffic remains moderate, it is difficult to predict patterns of larger volumes of users. What areas of the site will the be most interested in? Have we done sufficient capacity planning around those functions? Do those particular functions cause bottlenecks around the basic functioning of the site, such as user logins, and tracking?
Baking in the sauce for scalability will never be an exact science of course. In urban planning you try to learn from the mistakes of cities that did things wrong, and try to replicate some of the things that you see in cities doing it right. Much the same can be said for websites and scalability.
For instance it may be difficult to do bullet proof stress testing and functional testing to cover every single possible combination. But there are best practices for architecting an application that will scale. Basics such as using version control – of course but I have seen clients who don’t. There are a few options to choose from, but they all provide versioning, and self-document your development process. Next build redundancy into the mix. Load balance your application servers of course, and build various levels of caching – reverse proxy caching such as varnish, and a key-value caching system like memcache. Build redundancy into the database layer, even if you aren’t adding all those servers just yet. Your application should be multi-database aware. Either use an abstraction layer, or organize your code around write queries, and read-only queries. If possible build in checks for stale data.
Also consider various cloud providers to host your application, such as Amazon’s Elastic Compute Cloud. These environments allow you to script your infrastructure, and build further redundancy into the mix. Not only can you take advantage of features like auto-scaling to support dynamic growth in traffic, but you can scale servers in place, moving your server images from medium to large, to x-large servers with minimal outage. In fact with MySQL multi-master active/passive replication on the database tier, you could quite easily switch to larger instances or from larger to smaller instances dynamically, without *any* downtime to your application.
Just as no urban planner would claim they can predict the growth of a city, a devops engineer won’t claim they can predict how traffic to your website will grow. What we can do is mitigate that growth, build quality by building scaffolding so it can grow organically, and then monitor, collect metrics and do basic capacity planning. A small amount of design up front will payoff over and over again.
Book Review: How To Disappear by Frank M Ahearn
With such an intimidating title you might think at first glance that this is a book only for the paranoid or criminally minded. Now granted Mr Ahearn is a Skip Tracer, and if you were one already you certainly wouldn’t need this book. Still Skip Tracers have a talent for finding people, just as an investigator or a detective has of catching the bad guys. And what a person like this can teach us about how they find people is definitely worth knowing.
If you’ve had your concerns about privacy, what companies have your personal information and how they use it, this is a very interesting real-world introduction to the topic. Of particular interest might be the chapter on identity thieves and another on social media. All-in-all a quick read and certainly one-of-a-kind advice!
Scalability in the cloud depends a lot on application design. Keep these important points in mind when you are designing your web application and you will scale much more naturally and easily in the cloud.
1. Think twice before sharding
- It increases your infrastructure and application complexity
- it reduces availability – more servers mean more outages
- have to worry about globally unique primary keys
2. Bake read/write database access into the application
- allows you to check for stale data, fallback to write master
- creates higher availability for read-only data
- gracefully degrade to read-only website functionality if master goes down
- horizontal scalability melds nicely with cloud infrastructure and IAAS
3. Save application state in the database
- avoid in-memory locking structures that won’t scale with multiple web application servers
- consider a database field for managing application locks
- consider stored procedures for isolating and insulating developers from db particulars
- a last updated timestamp field can be your friend
4. Consider Dynamic or Auto-scaling
- great feature of cloud, spinup new servers to handle load on-demand
- lean towards being proactive rather than reactive and measure growth and trends
- watch the procurement process closely lest it come back to bite you
5. Setup Monitoring and Metrics
- see trends over time
- spot application trouble and bottlenecks
- determine if your tuning efforts are paying off
- review a traffic spike after the fact
The cloud is not a silver bullet that can automatically scale any web application. Software design is still a crucial factor. Baking in these features with the right flexibility and foresight, and you’ll manage your websites growth patterns with ease.
Have questions or need help with scalability? Call us: +1-213-537-4465
Your website is slow but you’re not sure why. You do know that it’s impacting your business. Are you losing customers to the competition? Here are five quick tips to achieve scalability
1. Gather Intelligence
With any detective work you need information. That’s where intelligence comes in. If you don’t have the right data already, install monitoring and trending systems such as Cacti and Collectd. That way you can look at where your systems have been and where they’re going.
2. Identify Bottlenecks
Put all that information to use in your investigation. Use stress testing tools to hit areas of the application, and identify which ones are most troublesome. Some pages get hit A LOT, such as the login page, so slowness there is more serious than one small report that gets hit by only a few users. Work on the biggest culprits first to get the best bang for your buck.
3. Smooth Out the Wrinkles
Reconfigure your webservers to make more connections to your database, or spin-up more servers. On the database tier make sure you have fast RAIDed disk, and lots of memory. Tune queries coming from your application, and look at possible upgrades to servers.
4. Be Agile But Plan for the Future
Can your webserver tier scale horizontally? Pretty easy to add more servers under a load balancer. How about your database. Chances are with a little work and some HA magic your database can scale out with more servers too, moving the bulk of select operations to read-only copies of your primary server, while letting it focus on transactions, and data updates. Be ready and tested so you know exactly how to add servers without impacting the customers or application. Don’t know how? Look at the big guys like Facebook, an investigate how they’re doing it.
5. A Going Concern
Most importantly, just like your business, your technology infrastructure is an ongoing work in progress. Stay proactive with monitoring, analysis, trending, and vigilance. Watch application changes, and filter for slow queries. Have new hardware or additional hardware dynamically at-the-ready for when you need it.
Heavyweight Internet Group provides Professional Services and Consulting around database technologies. Our value add is aggressive pricing and personal service. Call us at +1-213-537-4465 .
- MySQL database setup and administration
- MySQL tuning and optimization of problem areas
- Correcting degraded MySQL application performance
- 24×7 remote support services
- Stress testing web applications & speedup
20 years of professional experience, excellent client facing skills, attention to detail, and a focus on your business needs.