Tag Archives: stress testing

Query Profiling – What is it and why is it important?

Queries are so-named because of the lovely language we call SQL – simplified query language.  That’s a bit of sarcasm on my part, I’ve never found it particularly simple or elegant.  Profiling them involves finding how they are spending their time, and what they are asking the server to do.   In this way you can make them faster, and improve performance of the whole server, and thus your website.

At any rate queries ask the database for information.  If they are on the simpler side, something like give me all the users whose name starts with “hu” for example, and last name is indexed, that will run very fast.  The database will lookup in the index the last name field, and find the subset of ones starting with those letters, then go lookup the records by id and return them to you.  Cache the index blocks, cache the data blocks.  Great!  However, say those are customers, and you want their calling cellphone calling history.  Ok, now you have to join on another table, matching by some key, hold all those records in memory, shuffle them around, and so on.

So queries are effectively little jobs or bits of work you ask your database server to perform on your behalf.  With websites you typically have hundreds of concurrently running sessions all sending their own little SQL jobs to the server to get processed, and records returned.  And blip in the radar slows everyone down, so you want them to all run quickly.

That’s where profiling comes in.  MySQL, Oracle, and SQL Server alike all have EXPLAIN type facilities for showing the plan with which the database will use to fetch your data.  It shows indexes, sorting, joins, order, and so forth.  All of this contributes to the overall execution time and resources used on the database server.

Quora discussion by Sean Hull – What is query profiling and why is it important?

Capacity Planning – What is it and why is it important?

Look at your website’s current traffic patterns, pageviews or visits per day, and compare that to your server infrastructure. In a nutshell your current capacity would measure the ceiling your traffic could grow to, and still be supported by your current servers. Think of it as the horsepower of you application stack – load balancer, caching server, webserver and database.

Capacity planning seeks to estimate when you will reach capacity with your current infrastructure by doing load testing, and stress testing. With traditional servers, you estimate how many months you will be comfortable with currently provisioned servers, and plan to bring new ones online and into rotation before you reach that traffic ceiling.

Your reaction to capacity and seasonal traffic variations becomes much more nimble with cloud computing solutions, as you can script server spinups to match capacity and growth needs. In fact you can implement auto-scaling as well, setting rules and thresholds to bring additional capacity online – or offline – automatically as traffic dictates.

In order to be able to do proper capacity planning, you need good data. Pageviews and visits per day can come from your analytics package, but you’ll also need more complex metrics on what your servers are doing over time. Packages like Cacti, Munin, Ganglia, OpenNMS or Zenoss can provide you with very useful data collection with very little overhead to the server. With these in place, you can view load average, memory & disk usage, database or webserver threads and correlate all that data back to your application. What’s more with time-based data and graphs, you can compare changes to application change management and deployment data, to determine how new code rollouts affect capacity requirements.

Sean Hull asks about Capacity Planning on Quora.

Stress Testing – What is it and why is it important?

Stress testing applications is like putting a car through crash tests, wear and tear tests, and performance tests.  It’s about finding the leaks, and bottlenecks before they become a limitation to growth.  In fact, stress testing is a big part of capacity planning.

There are a few different ways to stress test a web application.  You can start at the database side of the house itself, and just stress test the queries your application uses.  There are benchmarking tools included with MySQL such as mysqlslap which allow you to run a query or sets of queries repeated times against the database.  You can also run them in parallel and in large batches together.  All of these methods are an effort to push the limit and find out when the server can handle no more.

There are tools that operate by firing off repeated url requests to the webserver like httperf and also jmeter. These can be good for hammering away at the server, but if you want to do more complex and nuanced tests a like Selenium will allow you to record a web browsing session and play it back to the server, many times or in parallel again to simulate a greater load on the servers.

Sean Hull asks on Quora – What is Stress Testing and why is it important?

Zero Downtime – What is it and why is it important?

For most large web applications, uptime is of foremost importants.  Any outage can be seen by customers as a frustration, or opportunity to move to a competitor.  What’s more for a site that also includes e-commerce, it can mean real lost sales.

Zero Downtime describes a site without service interruption.  To achieve such lofty goals, redundancy becomes a critical requirement at every level of your infrastructure.  If you’re using cloud hosting, are you redundant to alternate availability zones and regions?  Are you using geographically distributed load balancing?  Do you have multiple clustered databases on the backend, and multiple webservers load balanced.

All of these requirements will increase uptime, but may not bring you close to zero downtime.  For that you’ll need thorough testing.  The solution is to pull the trigger on sections of your infrastructure, and prove that it fails over quickly without noticeable outage.  The ultimate test is the outage itself.

Sean Hull on Quora: What is zero downtime and why is it important?

Degrade Gracefully – What is it and why is it imporant?

Websites and web applications have traffic patterns that are often unpredictable.  After all growth in traffic is really what we’re after.  However, even with the best stress testing, it’s sometimes difficult to predict what areas of the site will get innundated, or how the site will scale.

Degrade gracefully describes an architecture built specially to unwind in a smooth manner without any real site-wide outage.  What do we mean by that?  We mean build in operational switches to turn off components in the site.  Have a star rating on pages?  Build an on/off switch for your operations team to disable it if necessary.  Have site-wide comments, or robust search?  Allow those features to be disabled.  If possible, architect in a read-only mode for your site that you can turn on in a real difficult situation.  By operationalizing these components, you give more flexibility to the operations team, and reduce the likelihood of having a complete outage.

Sean Hull asks on Quora: What does degrade gracefully mean, and why is it important?

Success Story–Media and Entertainment Conglomerate

The Business

A website aggregating twitter feeds for celebrities, with sophisticated search functionality.

The Problem

Having been recently acquired by a large media and entertainment conglomerate, their traffic had already tripled.  What’s more they expected their unique pageviews to grow by 20 to 30 times in the coming six months.

Our Process

We worked closely with the lead architect and designer of the site to understand some of the technical difficulties they were encountering.  We discussed key areas of the site, and where performance was most lacking.

Next we reviewed the underlying infrastructure with an eye for misconfigurations, misuse of or badly allocated resources, and general configuration best practices.  They used Amazon EC2 cloud hosted servers for the database, webserver, and other components of the application.

The Solution

Our first round of reviews spanned a couple of days.  We found many issues with the configuration which could dramatically affect performance.  We adjusted settings in both the webserver, and the database to optimally maximize the platform upon which they were hosted.  These initial changes reduced the load average on the server from a steady level of 10.0 to an average of 2.0.

Our second round of review involved a serious look at the application.  We worked closely with the developer to understand what the application was doing.  We identified those areas of the application causing the heaviest footprint on the server, and worked with the developer to tune those specific areas.  In addition we examined the underlying database structures, tables and looked for relevant indexes, adding those as necessary to support the specific requirements of the application.

After this second round of changes, tweaks, adjustments, and rearchitecting, the load average on the server was reduced dramatically, to a mere 0.10.  The overall affect was dramatic.  With 100 times reduction in the load on the server, the websites performance was snappy, and very responsive.  The end user experience was noticeably changed.  A smile comes on your face when you visit your favorite site, to find it working fast and furious!


The results to the business were dramatic.  Not only were their short term troubles addressed, as the site was handling the new traffic without a hick up.  What’s more they had the confidence and peace of mind now to go forward with new advertising campaigns, secure in the knowledge that the site really could perform, and handle a 20 to 30 times increase in traffic with ease.