For most large web applications, uptime is of foremost importants. Any outage can be seen by customers as a frustration, or opportunity to move to a competitor. What’s more for a site that also includes e-commerce, it can mean real lost sales.
Zero Downtime describes a site without service interruption. To achieve such lofty goals, redundancy becomes a critical requirement at every level of your infrastructure. If you’re using cloud hosting, are you redundant to alternate availability zones and regions? Are you using geographically distributed load balancing? Do you have multiple clustered databases on the backend, and multiple webservers load balanced.
All of these requirements will increase uptime, but may not bring you close to zero downtime. For that you’ll need thorough testing. The solution is to pull the trigger on sections of your infrastructure, and prove that it fails over quickly without noticeable outage. The ultimate test is the outage itself.
Feature flags are switches that developers architect into their web applications to allow a feature to be turned on or off. It is simple sounding in description, but harder to implement or enable after the fact.
These switches allow the systems team to operationalize new application functionality. It allows the ability to turn hot button features on or off as needed. This can be bring a tremendous power and flexibility to the operations team for deployments where traffic patterns and site usage patterns cannot be known in advance. It can increase uptime and availability of the overall site, by minimizing the impact any new feature might have.
Feature flags can also be implemented as feature dials, allowing the feature to be exposed to a percentage of users, select users, or some other meaningful way to turn it up or down gradually.
Analytics provide insight into what your web traffic represents. It helps you answer questions like:
Who visits my website and what do they read?
What do those users click on?
How can I turn those users into customers?
Venue analytics is a growing area of tracking that provides this type of insight to venues, restaurants, and other bricks and mortar businesses. If users are clicking around on Google, Yelp, Menupages, or New York Mag or finding a restaurant some other way, they are typically using their mobile phones to do so. So venue analytics provides tools to businesses to answer questions like:
Who is searching for an italian restaurant like mine?
What other restaurants did they browse before coming to my restaurant?
They browsed my restaurant, but went elsewhere, why?
What can I do to entice customers when they are browsing by mobile phone?
Devops is one of those fancy contractions that tech folks just love. One part development or developer, and another part operations. It imagines a blissful marriage where the team that develops software and builds features that fit the business, works closely and in concert with an operations and datacenter team that thinks more like developers themselves.
In the long tradition of technology companies, two separate cultures comprise these two roles. Developers, focused on development languages, libraries, and functionality that match the business requirements keep their gaze firmly in that direction. The servers, network and resources those components of software are consuming are left for the ops teams to think about.
So too, ops teams are squarely focused on uptime, resource consumption, performance, availability, and always-on. They will be the ones worken up at 4am if something goes down, and are thus sensitive to version changes, unplanned or unmanaged deployments, and resource heavy or resource wasteful code and technologies.
Lastly there are the QA teams tasked with quality assurance, testing, and making sure the ongoing dearth of features don’t break anything previously working or introduce new show stoppers.
Devops is a new and I think growing area where the three teams work more closely together. But devops also speaks to the emerging area of cloud deployments, where servers can be provisioned with command line api calls, and completely scripted. In this new world, infrastructure components all become components in software, and thus infrastructure itself, long the domain of manual processes, and labor intensive tasks becomes repeatable, and amenable to the techniques of good software development. Suddenly version control, configuration management, and agile development methodologies can be applied to operations, bringing a whole new level of professionalism to deployments.
Every software service or component on a server requires configurations. In your desktop applications you set preferences for what your default page will be, how you’d like your margins set, or whether to save and restore cookies each time you restart.
Enterprise applications also require complex configuration settings. Want to monitor a webserver and a database with Nagios, that’s set in the config file. What to start MySQL with 8G of memory for InnoDB, that’s also set in a config file. What’s more config files contain server specific settings, based on IP address, or the servers role, webserver or database for example. The webserver may also have memcache and outbound email services running.
With more traditional deployments, the systems administrator will setup each physical box, and configure those services based on the business needs. As you bring online 10’s or 100’s of servers, however, you can quickly see how labor intensive this process would be, and also how much redundancy there is.
Enter configuration management into the picture. Previously I blogged about tools like Puppet that can bring great new best practices to the table. There is also cfengine, and the newer Chef which incorporates cloud deployments as well into the mix. Configuration management allows you to remotely administer servers, install packages, manage dependencies, install configurations based on a central copy, and even define roles and templates for new servers. This brings a whole new level of professionalism to deployments, and also newfound power and flexibility.
We’ll be writing more about configuration management, especially in the context of cloud deployments such as Amazon EC2 so please stay tuned.
We applications and websites get measurable traffic, recorded in metrics such as pageviews, unique visitors, and visits. All of this activity translates to hits to a webserver, and work for a database to retrieve information for those pages.
During one month your application might get 150,000 visits, then during one week where a large ad campaign hits, or some marketing feature goes viral, you suddenly get 500,000 visits in one week! This is a “good problem to have” on the business side, but poses great challenges to an infrastructure as it represents a 7x increase. What’s more if you do your capacity planning around that peak, you’ll have in 600% of your computing power and servers sitting around idle most of the year (assuming that’s just a blip).
Therein lies the challenge of seasonal traffic variations. Capacity planning attempts to watch for trends in traffic, and growth over time of your user base. But large spikes like the one described above can often be difficult to predict. The whim of the masses.
Processes are said to be coupled when they are tightly wound together, and dependent on one another.
A loose analogy might be replacing a traffic light by a traffic circle. You keep the traffic moving, reducing the overall wait time for any car entering the intersection.
Decoupling web applications might involve replacing a makeshift queue your application currently implements in a table, with a message queuing service such as RabbitMQ or Amazon’s SQS.
Ultimately decoupling promotes scalability, as you can scale the pieces of your infrastructure that your capacity planning identifies to be bottlenecks. What’s more you can make those pieces redundant, increasing high availability at the same time.
Migration in the context of enterprise and web-based applications means moving from one platform to another. Database Migrations are particularly complicated as you have all the challenges of changing your software platform, where some old features are missing, or behave differently and some new features are available and you’d like to take advantage of those.
In the world of databases, some developers try to build database independent applications, especially using ORMs (object relational mappers). On the surface this seems like a great option, build your application to use only standard components and features, and then you can easily move to a different platform when requirements dictate. Unfortunately things are not quite that simple.
Database independent applications necessarily shoot for the lowest common denominator of all of your database platforms, thus lowering the bar on what high-performance features you might take advantage of on the platform you are currently using.
Here are some scenarios:
Building an application which needs to support multiple database backends for customer sites
Building an application in dev and test for proof of concept. May port to an alternate database in the future.
Don’t want to be locked into one vendor, but have plans for only one platform currently.
These are all good reasons to think about features, and database platforms from the outset.
For situation #1, you need to be most serious about cross-platform compatibility from the start. Build modules for each database platform, with platform specific code isolated in that module. If the particular feature you want to use is available only on one of the two platforms, the alternate platform will have to include its implementation of that feature in the database specific module. Also by isolating all database specific interactions to one module, you have also put boundaries around that code. If you choose to support another database platform in the future, you merely need to rewrite that database interaction module.
For situation #2, you would use a similar tactic, but won’t necessarily have to implement all the routines in that module for the alternate platform. Just keep those features, and differences in mind during the development phase. Where possible document those differences, and comment code liberally. This will go a long way towards preparing you if you do decide to go for a different database backend.
In situation #3, this may be more of a philosophical concern at this stage. Don’t get overly dragged down by this, as it’s hypothetical at this stage. Sometimes developers labor under this concern from previous bad experiences migrating to a new database platform. But to some degree this is the nature of the beast. Database platforms include a myriad of different features, datatypes, storage methods, and coding languages. In many ways this is where their power lies.
Backups are obviously a crucial component in any enterprise application. Modern internet components are prone to failure, and backups keep your bases covered. Here’s what you should consider:
Is your database backed up, including object structures, data, stored procedures, grants, and logins?
Is your webserver doc-root backed up?
Is your application source code in version control and backed up?
Are your server configurations backed up? Relevant config files might include those for apache, mysql, memcache, php, email (postfix or qmail), tomcat, Java solr or any other software your application requires.
Are your cron or supporting scripts and jobs backed up?
Have you tested all of these components and your overall documentation with a fire drill? This is the proof that you’ve really covered all the angles.
If you do your backups right, you should be able to restore without a problem.
Highly available systems build redundancy into the application and the architecture layers to mitigate against disasters. Since computing systems are made from commodity hardware and components which are prone to failure, having redundancy at every layer is key.
Redundancy of switches, network interfaces, and load balanced webservers are fairly straightforward and run-of-the-mill. But clustering your database tier is another trick entirely. With MySQL, master-master active-passive can work quite well, running circular replication to send all changes to both nodes. Both nodes are able to handle production traffic, and you pick the one that is active simply by configuring your application to point to that. Use a technology like MMM or Pacemaker to front your database cluster with a virtual IP (vip), so no application or webserver changes are required to switch which node takes on the master role.
Redundant components are important in a single datacenter, but what if that datacenter goes out or gets hit by a natural disaster? Is your whole business out? That’s where geographic redundancy comes in. Geographic redundancy and geo load balanced DNS comes in. Having redundant copies of your whole site on both the east and west coast with geo-dns provides the next level of high availability.