Category Archives: All

Agile – What is it and why is it important?

Agile software development seeks a more lightweight methodology of making changes and releases to software.  In the traditional, incremental approach, large pieces of software are written at once, and releases happen less frequently.  Once features are complete, the testing phase happens, and then deployment to production.  These releases can happen over many weeks in time, so turnaround for new features tends to be slow.   Advocates would argue that this forces discipline in the process, and prevents haphazard releases and buggy software.

Agile methodologies, seek to accelerate releases of much smaller pieces of code. These releases can happen daily or even many times a day, as developers themselves are given the levers to push code.  Agile tends to be more reactive to business needs, with less planning and requirements gathering up front.

While Agile remains the buzzword of the day, it may not work for every software development project.  Web development & applications where small failures can easily be tolerated and where small teams are at work on the effort, make most sense.

Sean Hull asks on Quora – What is Agile software development and why is it important?

Sharding – What is it and why is it important?

Sharding is a way of partitioning your datastore to benefit from the computing power of more than one server.  For instance many web-facing databases get sharded on user_id, the unique serial number your application assigns to each user on the website.

Sharding can bring you the advantages of horizontal scalability by dividing up data into multiple backend databases.  This can bring tremendous speedups and performance improvements.

Sharding, however has a number of important costs.

  • reduced availability
  • higher administrative complexity
  • greater application complexity

High Availability is a goal of most web applications as they aim for always-on or 24×7 by 365 availability.  By introducing more servers, you have more components that have to work flawlessly.  If the expected downtime of any one backend database is 1/2 hour per month and you shard across five servers, your downtime has now increased by a factor of five to 2.5 hours per month.

Administrative complexity is an important consideration as well.  More databases means more servers to backup, more complex recovery, more complex testing, more complex replication and more complex data integrity checking.

Since Sharding keeps a chunk of your data on various different servers, your application must accept the burden of deciding where the data is, and fetching it there.  In some cases the application must make alternate decisions if it cannot find the data where it expects.  All of this increases application complexity and is important to keep in mind.

Sean Hull asks on Quora – What is Sharding and why is it important?

Object Relational Mapper – What is it and why is it important?

Object Relational Mappers or ORMs are a layer of software that sits between web developers and the database backend.  For instance if you’re using Ruby as your web development language, you’ll interact with MySQL through an ORM layer called ActiveRecord.  If you’re using Java, you may be fond of the ORM called Hibernate.

ORMs have been controversial because they expose two very different perspectives to software development.  On the one hand we have developers who are tasked with building applications, fulfilling business requirements, and satisfying functional requirements in a finite amount of time.  On the other hand we have operations teams which are tasked with managing resources, supporting applications, and maintaining uptime and availability.

Often these goals are opposing.  As many in the devops movement have pointed out, these teams don’t always work together keeping common goals in mind.  How does this play into the discussion of ORMs?

Relational databases are a technology developed in the 70′s that use an arcane language called SQL to move data in and out of them.  Advocates of ORMs would argue rightly so, that SQL is cumbersome and difficult to write, and that having a layer of software which helps you in this task is a great benefit.  To be sure it definitely helps the development effort, as software designers, architects and coders can focus more of their efforts on functional requirements and less on arcane minutiae of SQL.

Problems come when you bump up against scalability challenges.  The operations team is often tasked with supporting performance requirements.  Although this can often mean providing sufficient servers, disk, memory & cpu resources to support an application, it also means tuning the application.  Adding hardware can bring you 2x or 5x improvement.  Tuning an application can bring 10x or 100x improvement.  Inevitably this involves query tuning.

That’s where ORMs become problematic, as they don’t promote tweaking of queries.  They are a layer or buffer to keep query writing out of sight.

In our experience as performance and scalability experts for the past fifteen years, query tuning is the single biggest thing you can do to improve your web application.  Furthermore some of the most challenging and troublesome applications we’ve been asked to tune have been built on top of ORMs like Hibernate.

Sean Hull asks on Quora – What is an ORM and why is it important?

Big Data – What is it and why is it important?

There’s lots of debate about exactly what constitutes “big” when talking about big data.  Technical folks may be inclined to want a specific number.

But when most CTOs and operations managers are talking about big data, they mean data warehouse and analytics databases.  Data warehouses are unique in that they are tuned to run large reporting queries and churn through large multi-million row tables.  Here you load up on indexes to support those reports, because the data is not constantly changing as in a web-facing transaction oriented database.

More and more databases such as MySQL which were originally built as web-facing databases are being used to support big data analytics.  MySQL does have some advanced features to support large databases such as partitioned tables, but many operations still cannot be done *online* such as table alters, and index creation.  In these cases configuring MySQL in a master-master active/passive cluster provides higher availability.  Perform blocking operations on the inactive side of the cluster, and then switch the active node.

We’ve worked with MySQL databases as large as 750G in size and single user tables as large as 40 million records without problems.  Table size, however has to be taken into consideration for many operations and queries.  But as long as your tables are indexed to fit the query, and you minimize table scans especially on joins, your MySQL database server will happily support these huge datasets.

Sean Hull discusses on Quora – What is Big Data and why is it important?

Capacity Planning – What is it and why is it important?

Look at your website’s current traffic patterns, pageviews or visits per day, and compare that to your server infrastructure. In a nutshell your current capacity would measure the ceiling your traffic could grow to, and still be supported by your current servers. Think of it as the horsepower of you application stack – load balancer, caching server, webserver and database.

Capacity planning seeks to estimate when you will reach capacity with your current infrastructure by doing load testing, and stress testing. With traditional servers, you estimate how many months you will be comfortable with currently provisioned servers, and plan to bring new ones online and into rotation before you reach that traffic ceiling.

Your reaction to capacity and seasonal traffic variations becomes much more nimble with cloud computing solutions, as you can script server spinups to match capacity and growth needs. In fact you can implement auto-scaling as well, setting rules and thresholds to bring additional capacity online – or offline – automatically as traffic dictates.

In order to be able to do proper capacity planning, you need good data. Pageviews and visits per day can come from your analytics package, but you’ll also need more complex metrics on what your servers are doing over time. Packages like Cacti, Munin, Ganglia, OpenNMS or Zenoss can provide you with very useful data collection with very little overhead to the server. With these in place, you can view load average, memory & disk usage, database or webserver threads and correlate all that data back to your application. What’s more with time-based data and graphs, you can compare changes to application change management and deployment data, to determine how new code rollouts affect capacity requirements.

Sean Hull asks about Capacity Planning on Quora.

Stress Testing – What is it and why is it important?

Stress testing applications is like putting a car through crash tests, wear and tear tests, and performance tests.  It’s about finding the leaks, and bottlenecks before they become a limitation to growth.  In fact, stress testing is a big part of capacity planning.

There are a few different ways to stress test a web application.  You can start at the database side of the house itself, and just stress test the queries your application uses.  There are benchmarking tools included with MySQL such as mysqlslap which allow you to run a query or sets of queries repeated times against the database.  You can also run them in parallel and in large batches together.  All of these methods are an effort to push the limit and find out when the server can handle no more.

There are tools that operate by firing off repeated url requests to the webserver like httperf and also jmeter. These can be good for hammering away at the server, but if you want to do more complex and nuanced tests a like Selenium will allow you to record a web browsing session and play it back to the server, many times or in parallel again to simulate a greater load on the servers.

Sean Hull asks on Quora – What is Stress Testing and why is it important?

Caching – What is it and why is it important?

Caching keeps frequently accessed objects, images and data closer to where you need them, speeding up access to websites you hit often.

Your browser is the first layer of caching, keeping images and data from websites that you visit often.  Next the webserver itself has a caching layer, typically implemented by something like memcache, caching information that it would normally fetch from the database on the backend.  This avoids the network roundtrip, and also avoids the load and work of running the query to fetch that data again.

Furthermore you can install what’s called a reverse-proxy on the webserver, such as Varnish.  This can bring further speedups and performance benefits to your overall architecture.

On the database server you also do a lot of caching.  With MySQL you may configure the query cache, which caches query result sets inside of MySQL, eliminating the need to rerun those queries on subsequent calls.  And further the database server has various other caches such as the InnoDB buffer cache, to keep blocks of data in memory, reducing slower requests from disk.

On Quora, Sean Hull asks: What is caching and why is it important?

Zero Downtime – What is it and why is it important?

For most large web applications, uptime is of foremost importants.  Any outage can be seen by customers as a frustration, or opportunity to move to a competitor.  What’s more for a site that also includes e-commerce, it can mean real lost sales.

Zero Downtime describes a site without service interruption.  To achieve such lofty goals, redundancy becomes a critical requirement at every level of your infrastructure.  If you’re using cloud hosting, are you redundant to alternate availability zones and regions?  Are you using geographically distributed load balancing?  Do you have multiple clustered databases on the backend, and multiple webservers load balanced.

All of these requirements will increase uptime, but may not bring you close to zero downtime.  For that you’ll need thorough testing.  The solution is to pull the trigger on sections of your infrastructure, and prove that it fails over quickly without noticeable outage.  The ultimate test is the outage itself.

Sean Hull on Quora: What is zero downtime and why is it important?

Feature Flags – What are they and why are they important?

Feature flags are switches that developers architect into their web applications to allow a feature to be turned on or off.  It is simple sounding in description, but harder to implement or enable after the fact.

These switches allow the systems team to operationalize new application functionality.  It allows the ability to turn hot button features on or off as needed.  This can be bring a tremendous power and flexibility to the operations team for deployments where traffic patterns and site usage patterns cannot be known in advance.   It can increase uptime and availability of the overall site, by minimizing the impact any new feature might have.

Feature flags can also be implemented as feature dials, allowing the feature to be exposed to a percentage of users, select users, or some other meaningful way to turn it up or down gradually.

Sean Hull asks on Quora: What are feature flags and why are they important?

Venue Analytics – What is it and why is it important?

Analytics provide insight into what your web traffic represents.  It helps you answer questions like:

  • Who visits my website and what do they read?
  • What do those users click on?
  • How can I turn those users into customers?

Venue analytics is a growing area of tracking that provides this type of insight to venues, restaurants, and other bricks and mortar businesses.  If users are clicking around on Google, Yelp, Menupages, or New York Mag or finding a restaurant some other way, they are typically using their mobile phones to do so.  So venue analytics provides tools to businesses to answer questions like:

  • Who is searching for an italian restaurant like mine?
  • What other restaurants did they browse before coming to my restaurant?
  • They browsed my restaurant, but went elsewhere, why?
  • What can I do to entice customers when they are browsing by mobile phone?

Sean Hull asks on Quora: What is venue analytics and why is it important?