There are a lot of components that make up modern internet websites, and a lot of places to get stuck in the mud. Website performance starts with the browser, what caching it is doing, their bandwidth to your server, what the webserver is doing (caching or not and how), if the webserver has sufficient memory, and then what the application code is doing and lastly how it is interacting with the backend database.
Your recent social media campaign has gone viral. It’s what you’ve been dreaming about, pinning your hopes on, and all of your hard work is now coming to fruition. Tens of thousands of internet users, hoards of them in fact, are now descending on your website. Only one problem, it went down!!
That’s a situation you want to avoid. Luckily there are some best practices for avoiding scenarios like the one I described. In engineering it’s termed “degrade gracefully”. That is continue functioning but with the heaviest features disabled.
Browsing Only, But Still Functioning
One way to do this is for your site to have a browsing only mode. On the database side you can still be functioning with a read-only database. With a switch like that, your site will continue to function while pointed to any of your read-only replication slaves. What’s more you can load balance across those easily, and keep your site up and running.
In software development, decoupling involves breaking apart components or pieces of an application that should not depend on one another. One way to do this is to use a queuing system such as Amazon’s SQS to allow pieces of the application to queue up work to be done. This makes those pieces asynchronous, ie they’ll return right away. Another way is to expose services internal to your site through web services. These individual components can then be scaled out as needed. This makes them more highly available, and reduces the need to scale your memcache, webservers or database servers – the hardest ones to scale.
Identify Features You Can Disable
Typically your application will have features that are more superfluous, or that are not part of the core functionality. Perhaps you have star ratings, or some other components that are heavy. Work with the development and operations teams to identify those areas of the application that are heaviest, and that would warrant disabling if the site hits heavy storms.
Once you’ve done all that, document how to disable and reenable those features, so other team members will be able to flip the switches if necessary.
In a recent trip to Southeast Asia, I visited the Malaysian city of Kuala Lumpur. It is a sprawling urban area, more like Los Angeles than New York. With all the congestion and constant traffic jams the question of city planning struck me. On a more abstract level this is the same challenge that faces web application and internet website designers. Architect and bake the quality into the framework, or hack it from start to finish?
Looking at cities like Los Angeles you can’t help but think that no one imagined there would ever be this many cars. You think the same thought when you are in Kuala Lumpur. The traffic reaches absurd levels at times. A local friend told me that when the delegates travel through the city, they have a cavalcade of cars, and a complement of traffic cops to literally move the traffic out of the way. It’s that bad!
Of course predicting how traffic will grow is no science. But still cities can be planned. Take a mega-city like New York for example. The grid helps with traffic. A system of one way streets, a few main arteries, and travelers and taxis a like can make better informed decisions about which way to travel. What’s more the city core is confined to an island, so new space is built upward rather than outward. Suddenly the economics of closeness wins out. Many buildings in midtown you can walk between, or at most take a quick taxi ride. Suddenly a car becomes a burden. What’s more the train system, a spider web of subways and regional transit branches North to upstate New York, Northeast to Connecticut, East to Long Island, and West to New Jersey.
If you’ve lived in the New York metropolitan region and bought a home, or work in real estate you know that proximity to a major train station affects the prices of homes. This is the density of urban development working for us. It is tough to add this sauce to a city that has already sprawled. And so it is with architecting websites and applications.
Architecting for the Web
Traffic to a website can be as unpredictable as traffic within the confines of an urban landscape. And the spending that goes into such infrastructure as delicate. Spend too much and you risk building for people who will never arrive. What’s more while the site traffic remains moderate, it is difficult to predict patterns of larger volumes of users. What areas of the site will the be most interested in? Have we done sufficient capacity planning around those functions? Do those particular functions cause bottlenecks around the basic functioning of the site, such as user logins, and tracking?
Baking in the sauce for scalability will never be an exact science of course. In urban planning you try to learn from the mistakes of cities that did things wrong, and try to replicate some of the things that you see in cities doing it right. Much the same can be said for websites and scalability.
For instance it may be difficult to do bullet proof stress testing and functional testing to cover every single possible combination. But there are best practices for architecting an application that will scale. Basics such as using version control – of course but I have seen clients who don’t. There are a few options to choose from, but they all provide versioning, and self-document your development process. Next build redundancy into the mix. Load balance your application servers of course, and build various levels of caching – reverse proxy caching such as varnish, and a key-value caching system like memcache. Build redundancy into the database layer, even if you aren’t adding all those servers just yet. Your application should be multi-database aware. Either use an abstraction layer, or organize your code around write queries, and read-only queries. If possible build in checks for stale data.
Also consider various cloud providers to host your application, such as Amazon’s Elastic Compute Cloud. These environments allow you to script your infrastructure, and build further redundancy into the mix. Not only can you take advantage of features like auto-scaling to support dynamic growth in traffic, but you can scale servers in place, moving your server images from medium to large, to x-large servers with minimal outage. In fact with MySQL multi-master active/passive replication on the database tier, you could quite easily switch to larger instances or from larger to smaller instances dynamically, without *any* downtime to your application.
Just as no urban planner would claim they can predict the growth of a city, a devops engineer won’t claim they can predict how traffic to your website will grow. What we can do is mitigate that growth, build quality by building scaffolding so it can grow organically, and then monitor, collect metrics and do basic capacity planning. A small amount of design up front will payoff over and over again.
Book Review: How To Disappear by Frank M Ahearn
With such an intimidating title you might think at first glance that this is a book only for the paranoid or criminally minded. Now granted Mr Ahearn is a Skip Tracer, and if you were one already you certainly wouldn’t need this book. Still Skip Tracers have a talent for finding people, just as an investigator or a detective has of catching the bad guys. And what a person like this can teach us about how they find people is definitely worth knowing.
If you’ve had your concerns about privacy, what companies have your personal information and how they use it, this is a very interesting real-world introduction to the topic. Of particular interest might be the chapter on identity thieves and another on social media. All-in-all a quick read and certainly one-of-a-kind advice!
A website aggregating twitter feeds for celebrities, with sophisticated search functionality.
Having been recently acquired by a large media and entertainment conglomerate, their traffic had already tripled. What’s more they expected their unique pageviews to grow by 20 to 30 times in the coming six months.
We worked closely with the lead architect and designer of the site to understand some of the technical difficulties they were encountering. We discussed key areas of the site, and where performance was most lacking.
Next we reviewed the underlying infrastructure with an eye for misconfigurations, misuse of or badly allocated resources, and general configuration best practices. They used Amazon EC2 cloud hosted servers for the database, webserver, and other components of the application.
Our first round of reviews spanned a couple of days. We found many issues with the configuration which could dramatically affect performance. We adjusted settings in both the webserver, and the database to optimally maximize the platform upon which they were hosted. These initial changes reduced the load average on the server from a steady level of 10.0 to an average of 2.0.
Our second round of review involved a serious look at the application. We worked closely with the developer to understand what the application was doing. We identified those areas of the application causing the heaviest footprint on the server, and worked with the developer to tune those specific areas. In addition we examined the underlying database structures, tables and looked for relevant indexes, adding those as necessary to support the specific requirements of the application.
After this second round of changes, tweaks, adjustments, and rearchitecting, the load average on the server was reduced dramatically, to a mere 0.10. The overall affect was dramatic. With 100 times reduction in the load on the server, the websites performance was snappy, and very responsive. The end user experience was noticeably changed. A smile comes on your face when you visit your favorite site, to find it working fast and furious!
The results to the business were dramatic. Not only were their short term troubles addressed, as the site was handling the new traffic without a hick up. What’s more they had the confidence and peace of mind now to go forward with new advertising campaigns, secure in the knowledge that the site really could perform, and handle a 20 to 30 times increase in traffic with ease.