Category Archives: All

Object Relational Mapper – What is it and why is it important?

Object Relational Mappers or ORMs are a layer of software that sits between web developers and the database backend.  For instance if you’re using Ruby as your web development language, you’ll interact with MySQL through an ORM layer called ActiveRecord.  If you’re using Java, you may be fond of the ORM called Hibernate.

ORMs have been controversial because they expose two very different perspectives to software development.  On the one hand we have developers who are tasked with building applications, fulfilling business requirements, and satisfying functional requirements in a finite amount of time.  On the other hand we have operations teams which are tasked with managing resources, supporting applications, and maintaining uptime and availability.

Often these goals are opposing.  As many in the devops movement have pointed out, these teams don’t always work together keeping common goals in mind.  How does this play into the discussion of ORMs?

Relational databases are a technology developed in the 70′s that use an arcane language called SQL to move data in and out of them.  Advocates of ORMs would argue rightly so, that SQL is cumbersome and difficult to write, and that having a layer of software which helps you in this task is a great benefit.  To be sure it definitely helps the development effort, as software designers, architects and coders can focus more of their efforts on functional requirements and less on arcane minutiae of SQL.

Problems come when you bump up against scalability challenges.  The operations team is often tasked with supporting performance requirements.  Although this can often mean providing sufficient servers, disk, memory & cpu resources to support an application, it also means tuning the application.  Adding hardware can bring you 2x or 5x improvement.  Tuning an application can bring 10x or 100x improvement.  Inevitably this involves query tuning.

That’s where ORMs become problematic, as they don’t promote tweaking of queries.  They are a layer or buffer to keep query writing out of sight.

In our experience as performance and scalability experts for the past fifteen years, query tuning is the single biggest thing you can do to improve your web application.  Furthermore some of the most challenging and troublesome applications we’ve been asked to tune have been built on top of ORMs like Hibernate.

Sean Hull asks on Quora – What is an ORM and why is it important?

Big Data – What is it and why is it important?

There’s lots of debate about exactly what constitutes “big” when talking about big data.  Technical folks may be inclined to want a specific number.

But when most CTOs and operations managers are talking about big data, they mean data warehouse and analytics databases.  Data warehouses are unique in that they are tuned to run large reporting queries and churn through large multi-million row tables.  Here you load up on indexes to support those reports, because the data is not constantly changing as in a web-facing transaction oriented database.

More and more databases such as MySQL which were originally built as web-facing databases are being used to support big data analytics.  MySQL does have some advanced features to support large databases such as partitioned tables, but many operations still cannot be done *online* such as table alters, and index creation.  In these cases configuring MySQL in a master-master active/passive cluster provides higher availability.  Perform blocking operations on the inactive side of the cluster, and then switch the active node.

We’ve worked with MySQL databases as large as 750G in size and single user tables as large as 40 million records without problems.  Table size, however has to be taken into consideration for many operations and queries.  But as long as your tables are indexed to fit the query, and you minimize table scans especially on joins, your MySQL database server will happily support these huge datasets.

Sean Hull discusses on Quora – What is Big Data and why is it important?

Capacity Planning – What is it and why is it important?

Look at your website’s current traffic patterns, pageviews or visits per day, and compare that to your server infrastructure. In a nutshell your current capacity would measure the ceiling your traffic could grow to, and still be supported by your current servers. Think of it as the horsepower of you application stack – load balancer, caching server, webserver and database.

Capacity planning seeks to estimate when you will reach capacity with your current infrastructure by doing load testing, and stress testing. With traditional servers, you estimate how many months you will be comfortable with currently provisioned servers, and plan to bring new ones online and into rotation before you reach that traffic ceiling.

Your reaction to capacity and seasonal traffic variations becomes much more nimble with cloud computing solutions, as you can script server spinups to match capacity and growth needs. In fact you can implement auto-scaling as well, setting rules and thresholds to bring additional capacity online – or offline – automatically as traffic dictates.

In order to be able to do proper capacity planning, you need good data. Pageviews and visits per day can come from your analytics package, but you’ll also need more complex metrics on what your servers are doing over time. Packages like Cacti, Munin, Ganglia, OpenNMS or Zenoss can provide you with very useful data collection with very little overhead to the server. With these in place, you can view load average, memory & disk usage, database or webserver threads and correlate all that data back to your application. What’s more with time-based data and graphs, you can compare changes to application change management and deployment data, to determine how new code rollouts affect capacity requirements.

Sean Hull asks about Capacity Planning on Quora.

Stress Testing – What is it and why is it important?

Stress testing applications is like putting a car through crash tests, wear and tear tests, and performance tests.  It’s about finding the leaks, and bottlenecks before they become a limitation to growth.  In fact, stress testing is a big part of capacity planning.

There are a few different ways to stress test a web application.  You can start at the database side of the house itself, and just stress test the queries your application uses.  There are benchmarking tools included with MySQL such as mysqlslap which allow you to run a query or sets of queries repeated times against the database.  You can also run them in parallel and in large batches together.  All of these methods are an effort to push the limit and find out when the server can handle no more.

There are tools that operate by firing off repeated url requests to the webserver like httperf and also jmeter. These can be good for hammering away at the server, but if you want to do more complex and nuanced tests a like Selenium will allow you to record a web browsing session and play it back to the server, many times or in parallel again to simulate a greater load on the servers.

Sean Hull asks on Quora – What is Stress Testing and why is it important?

Caching – What is it and why is it important?

Caching keeps frequently accessed objects, images and data closer to where you need them, speeding up access to websites you hit often.

Your browser is the first layer of caching, keeping images and data from websites that you visit often.  Next the webserver itself has a caching layer, typically implemented by something like memcache, caching information that it would normally fetch from the database on the backend.  This avoids the network roundtrip, and also avoids the load and work of running the query to fetch that data again.

Furthermore you can install what’s called a reverse-proxy on the webserver, such as Varnish.  This can bring further speedups and performance benefits to your overall architecture.

On the database server you also do a lot of caching.  With MySQL you may configure the query cache, which caches query result sets inside of MySQL, eliminating the need to rerun those queries on subsequent calls.  And further the database server has various other caches such as the InnoDB buffer cache, to keep blocks of data in memory, reducing slower requests from disk.

On Quora, Sean Hull asks: What is caching and why is it important?

Zero Downtime – What is it and why is it important?

For most large web applications, uptime is of foremost importants.  Any outage can be seen by customers as a frustration, or opportunity to move to a competitor.  What’s more for a site that also includes e-commerce, it can mean real lost sales.

Zero Downtime describes a site without service interruption.  To achieve such lofty goals, redundancy becomes a critical requirement at every level of your infrastructure.  If you’re using cloud hosting, are you redundant to alternate availability zones and regions?  Are you using geographically distributed load balancing?  Do you have multiple clustered databases on the backend, and multiple webservers load balanced.

All of these requirements will increase uptime, but may not bring you close to zero downtime.  For that you’ll need thorough testing.  The solution is to pull the trigger on sections of your infrastructure, and prove that it fails over quickly without noticeable outage.  The ultimate test is the outage itself.

Sean Hull on Quora: What is zero downtime and why is it important?

Feature Flags – What are they and why are they important?

Feature flags are switches that developers architect into their web applications to allow a feature to be turned on or off.  It is simple sounding in description, but harder to implement or enable after the fact.

These switches allow the systems team to operationalize new application functionality.  It allows the ability to turn hot button features on or off as needed.  This can be bring a tremendous power and flexibility to the operations team for deployments where traffic patterns and site usage patterns cannot be known in advance.   It can increase uptime and availability of the overall site, by minimizing the impact any new feature might have.

Feature flags can also be implemented as feature dials, allowing the feature to be exposed to a percentage of users, select users, or some other meaningful way to turn it up or down gradually.

Sean Hull asks on Quora: What are feature flags and why are they important?

Venue Analytics – What is it and why is it important?

Analytics provide insight into what your web traffic represents.  It helps you answer questions like:

  • Who visits my website and what do they read?
  • What do those users click on?
  • How can I turn those users into customers?

Venue analytics is a growing area of tracking that provides this type of insight to venues, restaurants, and other bricks and mortar businesses.  If users are clicking around on Google, Yelp, Menupages, or New York Mag or finding a restaurant some other way, they are typically using their mobile phones to do so.  So venue analytics provides tools to businesses to answer questions like:

  • Who is searching for an italian restaurant like mine?
  • What other restaurants did they browse before coming to my restaurant?
  • They browsed my restaurant, but went elsewhere, why?
  • What can I do to entice customers when they are browsing by mobile phone?

Sean Hull asks on Quora: What is venue analytics and why is it important?

Devops – What is it and why is it important?

Devops is one of those fancy contractions that tech folks just love.  One part development or developer, and another part operations.  It imagines a blissful marriage where the team that develops software and builds features that fit the business, works closely and in concert with an operations and datacenter team that thinks more like developers themselves.

In the long tradition of technology companies, two separate cultures comprise these two roles.  Developers, focused on development languages, libraries, and functionality that match the business requirements keep their gaze firmly in that direction.  The servers, network and resources those components of software are consuming are left for the ops teams to think about.

So too, ops teams are squarely focused on uptime, resource consumption, performance, availability, and always-on.  They will be the ones worken up at 4am if something goes down, and are thus sensitive to version changes, unplanned or unmanaged deployments, and resource heavy or resource wasteful code and technologies.

Lastly there are the QA teams tasked with quality assurance, testing, and making sure the ongoing dearth of features don’t break anything previously working or introduce new show stoppers.

Devops is a new and I think growing area where the three teams work more closely together.  But devops also speaks to the emerging area of cloud deployments, where servers can be provisioned with command line api calls, and completely scripted.  In this new world, infrastructure components all become components in software, and thus infrastructure itself, long the domain of manual processes, and labor intensive tasks becomes repeatable, and amenable to the techniques of good software development.  Suddenly version control, configuration management, and agile development methodologies can be applied to operations, bringing a whole new level of professionalism to deployments.

Sean Hull asks on Quora – What is devops and why is it important?

Configuration Management – What is it and why is it important?

Every software service or component on a server requires configurations. In your desktop applications you set preferences for what your default page will be, how you’d like your margins set, or whether to save and restore cookies each time you restart.

Enterprise applications also require complex configuration settings.  Want to monitor a webserver and a database with Nagios, that’s set in the config file.  What to start MySQL with 8G of memory for InnoDB, that’s also set in a config file.  What’s more config files contain server specific settings, based on IP address, or the servers role, webserver or database for example.   The webserver may also have memcache and outbound email services running.

With more traditional deployments, the systems administrator will setup each physical box, and configure those services based on the business needs.  As you bring online 10′s or 100′s of servers, however, you can quickly see how labor intensive this process would be, and also how much redundancy there is.

Enter configuration management into the picture.  Previously I blogged about tools like Puppet that can bring great new best practices to the table. There is also cfengine, and the newer Chef which incorporates cloud deployments as well into the mix.  Configuration management allows you to remotely administer servers, install packages, manage dependencies, install configurations based on a central copy, and even define roles and templates for new servers.  This brings a whole new level of professionalism to deployments, and also newfound power and flexibility.

We’ll be writing more about configuration management, especially in the context of cloud deployments such as Amazon EC2 so please stay tuned.

Sean Hull asks on Quora – What is configuration management and why is it important?