Category Archives: All

Capacity Planning – What is it and why is it important?

Look at your website’s current traffic patterns, pageviews or visits per day, and compare that to your server infrastructure. In a nutshell your current capacity would measure the ceiling your traffic could grow to, and still be supported by your current servers. Think of it as the horsepower of you application stack – load balancer, caching server, webserver and database.

Capacity planning seeks to estimate when you will reach capacity with your current infrastructure by doing load testing, and stress testing. With traditional servers, you estimate how many months you will be comfortable with currently provisioned servers, and plan to bring new ones online and into rotation before you reach that traffic ceiling.

Your reaction to capacity and seasonal traffic variations becomes much more nimble with cloud computing solutions, as you can script server spinups to match capacity and growth needs. In fact you can implement auto-scaling as well, setting rules and thresholds to bring additional capacity online – or offline – automatically as traffic dictates.

In order to be able to do proper capacity planning, you need good data. Pageviews and visits per day can come from your analytics package, but you’ll also need more complex metrics on what your servers are doing over time. Packages like Cacti, Munin, Ganglia, OpenNMS or Zenoss can provide you with very useful data collection with very little overhead to the server. With these in place, you can view load average, memory & disk usage, database or webserver threads and correlate all that data back to your application. What’s more with time-based data and graphs, you can compare changes to application change management and deployment data, to determine how new code rollouts affect capacity requirements.

Sean Hull asks about Capacity Planning on Quora.

Stress Testing – What is it and why is it important?

Stress testing applications is like putting a car through crash tests, wear and tear tests, and performance tests.  It’s about finding the leaks, and bottlenecks before they become a limitation to growth.  In fact, stress testing is a big part of capacity planning.

There are a few different ways to stress test a web application.  You can start at the database side of the house itself, and just stress test the queries your application uses.  There are benchmarking tools included with MySQL such as mysqlslap which allow you to run a query or sets of queries repeated times against the database.  You can also run them in parallel and in large batches together.  All of these methods are an effort to push the limit and find out when the server can handle no more.

There are tools that operate by firing off repeated url requests to the webserver like httperf and also jmeter. These can be good for hammering away at the server, but if you want to do more complex and nuanced tests a like Selenium will allow you to record a web browsing session and play it back to the server, many times or in parallel again to simulate a greater load on the servers.

Sean Hull asks on Quora – What is Stress Testing and why is it important?

Caching – What is it and why is it important?

Caching keeps frequently accessed objects, images and data closer to where you need them, speeding up access to websites you hit often.

Your browser is the first layer of caching, keeping images and data from websites that you visit often.  Next the webserver itself has a caching layer, typically implemented by something like memcache, caching information that it would normally fetch from the database on the backend.  This avoids the network roundtrip, and also avoids the load and work of running the query to fetch that data again.

Furthermore you can install what’s called a reverse-proxy on the webserver, such as Varnish.  This can bring further speedups and performance benefits to your overall architecture.

On the database server you also do a lot of caching.  With MySQL you may configure the query cache, which caches query result sets inside of MySQL, eliminating the need to rerun those queries on subsequent calls.  And further the database server has various other caches such as the InnoDB buffer cache, to keep blocks of data in memory, reducing slower requests from disk.

On Quora, Sean Hull asks: What is caching and why is it important?

Zero Downtime – What is it and why is it important?

For most large web applications, uptime is of foremost importants.  Any outage can be seen by customers as a frustration, or opportunity to move to a competitor.  What’s more for a site that also includes e-commerce, it can mean real lost sales.

Zero Downtime describes a site without service interruption.  To achieve such lofty goals, redundancy becomes a critical requirement at every level of your infrastructure.  If you’re using cloud hosting, are you redundant to alternate availability zones and regions?  Are you using geographically distributed load balancing?  Do you have multiple clustered databases on the backend, and multiple webservers load balanced.

All of these requirements will increase uptime, but may not bring you close to zero downtime.  For that you’ll need thorough testing.  The solution is to pull the trigger on sections of your infrastructure, and prove that it fails over quickly without noticeable outage.  The ultimate test is the outage itself.

Sean Hull on Quora: What is zero downtime and why is it important?

Feature Flags – What are they and why are they important?

Feature flags are switches that developers architect into their web applications to allow a feature to be turned on or off.  It is simple sounding in description, but harder to implement or enable after the fact.

These switches allow the systems team to operationalize new application functionality.  It allows the ability to turn hot button features on or off as needed.  This can be bring a tremendous power and flexibility to the operations team for deployments where traffic patterns and site usage patterns cannot be known in advance.   It can increase uptime and availability of the overall site, by minimizing the impact any new feature might have.

Feature flags can also be implemented as feature dials, allowing the feature to be exposed to a percentage of users, select users, or some other meaningful way to turn it up or down gradually.

Sean Hull asks on Quora: What are feature flags and why are they important?

Venue Analytics – What is it and why is it important?

Analytics provide insight into what your web traffic represents.  It helps you answer questions like:

  • Who visits my website and what do they read?
  • What do those users click on?
  • How can I turn those users into customers?

Venue analytics is a growing area of tracking that provides this type of insight to venues, restaurants, and other bricks and mortar businesses.  If users are clicking around on Google, Yelp, Menupages, or New York Mag or finding a restaurant some other way, they are typically using their mobile phones to do so.  So venue analytics provides tools to businesses to answer questions like:

  • Who is searching for an italian restaurant like mine?
  • What other restaurants did they browse before coming to my restaurant?
  • They browsed my restaurant, but went elsewhere, why?
  • What can I do to entice customers when they are browsing by mobile phone?

Sean Hull asks on Quora: What is venue analytics and why is it important?

Devops – What is it and why is it important?

Devops is one of those fancy contractions that tech folks just love.  One part development or developer, and another part operations.  It imagines a blissful marriage where the team that develops software and builds features that fit the business, works closely and in concert with an operations and datacenter team that thinks more like developers themselves.

In the long tradition of technology companies, two separate cultures comprise these two roles.  Developers, focused on development languages, libraries, and functionality that match the business requirements keep their gaze firmly in that direction.  The servers, network and resources those components of software are consuming are left for the ops teams to think about.

So too, ops teams are squarely focused on uptime, resource consumption, performance, availability, and always-on.  They will be the ones worken up at 4am if something goes down, and are thus sensitive to version changes, unplanned or unmanaged deployments, and resource heavy or resource wasteful code and technologies.

Lastly there are the QA teams tasked with quality assurance, testing, and making sure the ongoing dearth of features don’t break anything previously working or introduce new show stoppers.

Devops is a new and I think growing area where the three teams work more closely together.  But devops also speaks to the emerging area of cloud deployments, where servers can be provisioned with command line api calls, and completely scripted.  In this new world, infrastructure components all become components in software, and thus infrastructure itself, long the domain of manual processes, and labor intensive tasks becomes repeatable, and amenable to the techniques of good software development.  Suddenly version control, configuration management, and agile development methodologies can be applied to operations, bringing a whole new level of professionalism to deployments.

Sean Hull asks on Quora – What is devops and why is it important?

Configuration Management – What is it and why is it important?

Every software service or component on a server requires configurations. In your desktop applications you set preferences for what your default page will be, how you’d like your margins set, or whether to save and restore cookies each time you restart.

Enterprise applications also require complex configuration settings.  Want to monitor a webserver and a database with Nagios, that’s set in the config file.  What to start MySQL with 8G of memory for InnoDB, that’s also set in a config file.  What’s more config files contain server specific settings, based on IP address, or the servers role, webserver or database for example.   The webserver may also have memcache and outbound email services running.

With more traditional deployments, the systems administrator will setup each physical box, and configure those services based on the business needs.  As you bring online 10′s or 100′s of servers, however, you can quickly see how labor intensive this process would be, and also how much redundancy there is.

Enter configuration management into the picture.  Previously I blogged about tools like Puppet that can bring great new best practices to the table. There is also cfengine, and the newer Chef which incorporates cloud deployments as well into the mix.  Configuration management allows you to remotely administer servers, install packages, manage dependencies, install configurations based on a central copy, and even define roles and templates for new servers.  This brings a whole new level of professionalism to deployments, and also newfound power and flexibility.

We’ll be writing more about configuration management, especially in the context of cloud deployments such as Amazon EC2 so please stay tuned.

Sean Hull asks on Quora – What is configuration management and why is it important?

Seasonal Traffic Variations – What is it and why is it important?

We applications and websites get measurable traffic, recorded in metrics such as pageviews, unique visitors, and visits.  All of this activity translates to hits to a webserver, and work for a database to retrieve information for those pages.

During one month your application might get 150,000 visits, then during one week where a large ad campaign hits, or some marketing feature goes viral, you suddenly get 500,000 visits in one week!  This is a “good problem to have” on the business side, but poses great challenges to an infrastructure as it represents a 7x increase.  What’s more if you do your capacity planning around that peak, you’ll have in 600% of your computing power and servers sitting around idle most of the year (assuming that’s just a blip).

Therein lies the challenge of seasonal traffic variations.  Capacity planning attempts to watch for trends in traffic, and growth over time of your user base.  But large spikes like the one described above can often be difficult to predict.  The whim of the masses.

Sean Hull asks on Quora – What are seasonal traffic variations and why are they important?

Decoupling – What is it and why is it important?

Processes are said to be coupled when they are tightly wound together, and dependent on one another.

A loose analogy might be replacing a traffic light by a traffic circle.  You keep the traffic moving, reducing the overall wait time for any car entering the intersection.

Decoupling web applications might involve replacing a makeshift queue your application currently implements in a table, with a message queuing service such as RabbitMQ or Amazon’s SQS.

Ultimately decoupling promotes scalability, as you can scale the pieces of your infrastructure that your capacity planning identifies to be bottlenecks.  What’s more you can make those pieces redundant, increasing high availability at the same time.

Sean Hull discusses on Quora: What is decoupling and why is it important?