Category Archives: All

What is disaster recovery and why is it important?

Disaster recovery involves the anticipation of major business outage, and the contingency planning to avoid business loss in revenue, customers or sales.

All of the technology components that make up your enterprise applications should be carefully considered against loss.  What happens if this database server disappears?  Do we have all the data backed up somewhere?  Have we tested that backup to restore it?  How long does it take to restore?  Can we reconnect the application to said database?  What if the network goes down?  How about if the whole datacenter goes out?

Planning for disaster recovery is important whether you’re hosted in-house or with a hosting provider.  Consider Amazon’s EC2 outage in April.  Various availability zones went out.  Were affected customers to have their database backed up properly – with offsite & tested copies, and further if they had other components such as webserver document roots, software configurations, etc they would be able to rebuild their entire infrastructure in an alternate availability zone or region.  Remember it was only a small component of Amazon Web Services which was out.

Sean Hull asks on Quora: Disaster Recovery – What is it and why is it important?

Scalability – What is it and why is it important?

Scaling comes in a few different flavors.  Vertical scaling involves growing the computing power of a single server, adding memory, faster or more CPUs and/or faster disk I/O.

Horizontal scaling involves adding additional computing resources or servers in parallel and then load balacing across them.

Scalability refers to applications which facilitate scaling well.  With web applications, the middle tier aka the webservers are fairly easy to scale horizontally and most enterprise class applications already do this with commercial load balancers – with either hardware or software.

Doing the same with the database tier, however can be trickier.  Enter MySQL replication to facilitate a fairly painless horizontal scalability.  Build your application architecture with read-only transactions, and write/update transactions segmented apart, and you can send the latter to one master database, and the former to a handful of replicated slaves.  With a typical web application that is less than 10% writes, and 90% reads, there is the potential to add as many as 5-10 servers horizontally to increase application throughput by as much as 500-1000%.

Sean Hull asks on Quora: What is scalability and why is it important?

Cloud Computing – What is it and why is it important?

Cloud Computing has a few varied meanings from API services such as twitter to web-based (read cloud-based) email services such as gmail and yahoo.

An even bigger tectonic shift is happening though, in the area of infrastructure and hosting, to cloud based solutions.  No longer is provisioning a slow ordering process, followed by a multi-year contract and commitment with an associated high price tag.  Now computing resources can be provisioned and “spin-up” in seconds, even allowing for auto-scaling, bringing new computing resources online dynamically as seasonal traffic patterns demand.

  • uniquely suited to applications with seasonal traffic requirements
  • supports disaster recovery effectively for free
  • allows temporary provisioning of test environments
  • facilitates auto-scaling of bare metal servers
  • no huge budgetary outlay, pay for only what you use
  • bring up resources in seconds – supports true agile development

What’s more since cloud resources are all provisioned in software through an API, it encourages the treatment of infrastructure as a whole as software.  Now the scripts to completely rebuild all of your systems, from spin-up, to package configuration to application configuration can all be done in software, and managed in version control.

Sean Hull asks the question on Quora: What is Cloud Computing?

Website Optimization – What is it and why is it important?

When you enter a website name in your browser or click on a google result, you start a cascade of events to unfold.  Your request various pieces and components that make up the webpage from a remote server which hosts that website.  Those pieces are sent back to you, and your browser assembles them.

There are many moving parts in that process.  Anywhere along the way you can hit a snag, slowing down the overall process of that page displaying.  Website Optimization attempts to identify all of those processes and components, and organize them by slowest to fastest.  This allows us to focus our attention on the slowest part of the process.  Like a physician looking at your vascular system, it allows the performance expert to identify and then fix those pipes that are slowing you down.

Since website performance has been shown to directly influence customer retention, conversion, and user experience, overall website performance and optimization are key to your business success.

Sean Hull asks on Quora: What is website optimization and why is it important?

SQL – What is it and why is it important?

The What:

SQL is a difficult acronym for a difficult language, but what it does is shuttle information into and out of your database in an organized manner.  Your web applications and developers have to speak it, and your database – whether Oracle, MySQL, Postgres or some other will return information back using this computing dialect.

The Why:

Since every movement on your website, from page to page (sessions) and purchase to purchase all involve interaction using these queries, writing them well can have a huge impact on your website performance.  How big?  We’ve fixed queries by adding indexes or rewriting them and seen improvements by as much as 100x.  That’s converting pages that take ten seconds to ones that take 1/10 of a second.  Be especially vigilant about those queries generated by Object Relational Mappers like Active Record, Ruby’s ORM layer.

What is SQL on Quora

Open Source – What is it and why is it important?

Open Source, a term understood well by the technology set, but not enough by everyone.

Open Source for the software industry is like generic drugs for the pharmaceutical industry.  It enables more players to come to the table, it is a huge driving force behind internet infrastructures, which are built on Linux, Apache and many other technologies.  It is the backbone of companies like google, and facilitates cloud services from the likes of Amazon EC2, Joyent, Rackspace and many others.

It is the rising tide that lifts all boats, if you will.

Sean Hull’s writing on Quora.

Web Cornerstone – iHeavy Insights Newsletter 80

I’ve recently been doing a lot of Search Engine Optimization for my corporate website.  This makes your site more visible and keyword rich for the search engines, helping your customers more easily find you. In that process it quickly becomes clear how important the website is to your business.  On the internet it is surely the cornerstone of your business, your virtual storefront.

Site Speed Is Key

Study after study from Google and others have shown that the speed and response of a website directly affects user experience and in turn use of your product or service.  It translates directly to the business bottom line.

Speed is accomplished by first measuring current speed and performance, then tuning and optimizing the various layers of technology which support it.  Then measuring again to determine the speedup.  You can then tie those changes to customer retention, higher click through rates, more time spent on the site, and higher conversions.  Apply a dollar value to your conversions and you can then estimate the direct value of that effort.

Interface to Customers

The image your company projects is formed first by your website.  The usability and simplicity affects how customers feel about interacting with your products and services.  What’s more the information, solutions, tools and downloads available also reflect directly on your business.

Analytics tools like Google’s or Yahoo’s allow you to track conversions.  These are important metrics to show when customers are taking action.  They are traditionally used for when a purchase is made on an ecommerce site, but can just as easily be put to useful work tracking bookmarks, form submissions, phone call or request a callback, downloads, newsletter signups, likes and a lot more.  If you do use conversions for these diverse functions, be sure to assign a dollar value by estimating based on the percentage who convert and those who later become paying customers in one way or another.

It’s All About Operations

Your website should be backed by a solid and flexible content management system so you can make SEO and ongoing process as well as tuning and optimization.  WordPress, Drupal, Joomla are just a few.  Use caching at every layer, and optimize images for faster download.  Of course you’ll also need to apply best practices for disaster recovery, making sure your database and content are backed up regularly, as well as your server configuration components.

All of this boils down to web operations, that hidden support providing your internet technology foundation.

Book Review: The Art of SEO by Enge, Spencer, Fishkin & Stricchiola

Search Engine Optimization is for sure one part art, but it is certainly a lot of science too, and this book really covers all of the angles.  Even if you don’t plan to do the SEO yourself, it’s good to have a strong grounding in the material so you can make intelligent decisions.

Plan to do research and brainstorm on the main keywords your customers use, what your competitors do differently and tune your site and CMS to best fit those searches.  You’ll learn to optimize title tags, content and anchor text to be keyword rich, and most importantly of all plan a link building campaign to grow inbound links to your site and thus your authority and reputation on the internet.

Art of SEO is comprehensive, easy to follow & thorough, but also easy enough to dip into here and there for pointers if that’s all you need.  I also like O’Reilly’s layouts and font, so it’s very easy on the eyes.

Professional Deployments Use Puppet For Configuration Management

Puppet is a configuration management tool that can be used to great advantage managing the configurations of a large fleet of servers in an enterprise.

My first thought upon finishing Turnbull & McCune’s book was that it could well have been titled Pro Deployments, for it covers a whole host of topics, integrating Puppet with a lot of other related tools.

Some of the advanced topics it covers in depth include:

  • integrating Puppet with version control such as git
  • setup of the standard dev, test and production environments
  • conditional application of generalized configs
  • managing nagios & load balancer configs to automatically add new nodes
  • capitalizing on puppet forge modules (like rpm packages)
  • testing your puppet configs with cucumber
  • reporting with the dashboard and the command line Continue reading

iHeavy Insights 79 – Plumbing the Interwebs

I meet new people all the time.  It’s a way of life in New York.  One of the first questions new people ask each other is “What do you do?”.  It begins to sound like a cliche after a while, but it can also provide endless fascinating discussions as there are so many people with different professions in New York.  Some choose a titled answer “i’m an investment banker”, “I’m an emcee”, “I’m an executive recruiter”.  I find for “Web Scalability Consultant” or “Web Operations Expert” this only leaves confused looks.

A Plumber By Another Name

The solution of course is to tell a good story.  Stories illustrate what titles and crusty vernacular cannot.  I’ve used analogies to surgeons or mechanics, of course they all operate on something people can related to in front of them.  People or vehicles we use everyday.  Of course with the internet, there is a huge hidden infrastructure that most people don’t see everyday.  They may vaguely know it’s there, but it’s still hidden out of site.

That’s why I think plumbing provides such an apt visual.  As it turns out the internet is built with countless data pipes both large and small, coming into your home or laying across the bottom of the transatlantic ocean.  These pipes plug into routers, high speed traffic lights and traffic cops.  Ultimately they feed into datacenters, huge rooms filled with racks of computers, holding your websites crown jewels.  Therein contains the images and status updates from your facebook profile, your banking transactions from your personal bank account or credit card, your netflix movie stream, or the email you sent via gmail.  Even your instant messaging stream, or the data from your favorite iphone app are all stored and retrieved from here.

Amazon Outage

The recent Amazon outage has been high profile enough that a lot of folks who don’t follow the latest trends in web operations, devops, and datacenter automation still heard about this event.  Turns out it’s had a silver lining for Amazon cause now everyone is scrutinizing how many sites actually rely on this goliath of a hosting provider.

As it turns out the root of the amazon outage was indeed a plumbing problem.  Amazon has shown rather high transparency publishing intimate details of the problem and it’s resolution.  Read more.

A misconfigured network cascaded through the system creating countless failures.  If you imagine water repairs being done in a large New York City building, they often ask tenants to turn off their water, so they won’t all come on at the same time when service is restored.  SImilarly intricate problems complicated the Amazon effort, slowing down attempts to restore everything after the incident.  I wrote at length about the outage if you’re interested, read more.

BOOK REVIEW:  Game-Based Marketing by Zicherman & Linder

There are so many new books coming out all the time, it’s tough to sift and find the good ones.  Anyone with a website as their storefront, whether they are a product company or a services company, can gain from reading this book.

From leaderboards to frequent flyer programs, badges and more this book is full of real-world examples where game-based principles are put into action.  On the internet where attention is a rarer and rarer commodity, these concepts will surely make a big difference to your business.

Amazon book link – Game Based Marketing

Amazon EC2 Outage – Failures, Lessons and Cloud Deployments

Now that we’ve had a chance to take a deep breath after last week’s AWS outage, I’ll offer some comments of my own.  Hopefully just enough time has passed to begin to have a broader view, and put events in perspective.
Despite what some reports may have announced, Amazon wasn’t down, but rather a small part of Amazon Web Services went down.  A failure, yes.  Beyond their service level agreement of 99.95% yes also.  Survivable, yes to this last question too.

Learning From Failure

The business management conversation du jour is all about learning from failure, rather than trying to avoid it.  Harvard Business Review’s April issue headlined with “The Failure Issue – How to Understand It, Learn From It, and Recover From It”.  The economist’s April 16th issue had some similarly interesting pieces one by Schumpeter “Fail often, fail well”,
and another in April 23rd issue “Lessons from Deepwater Horizon and Fukushima”.
With all this talk of failure there is surely one takeaway.  Complex systems will fail and it is in the anticipation of that failure that we gain the most.  Let’s stop howling and look at how to handle these situations intelligently.

How Do You Rebuild A Website?

In the cloud you will likely need two things.  (a) scripts to rebuild all the components in your architecture, spinup servers, fetch source code, fetch software and configuration files, configure load balancers and mount your database and more importantly (b) a database backup from which you can rebuild your current dataset.

Want to stick with EC2, build out your infrastructure in an alternate availability zone or region and you’re back up and running in hours.  Or better yet have an alternate cloud provider on hand to handle these rare outages.  The choice is yours.

Mitigate risk?  Yes indeed failure is more common in the cloud, but recovery is also easier.  Failure should pressure the adoption of best practices and force discipline in deployments, not make you more of a gunslinger!

Want to see an extreme example of how this can play in your favor?  Read Jeff Atwood’s discussion of so-called Chaos Monkey, a component whose sole job it is to randomly kill off servers in the Netflix environment at random.  Now that type of gunslinging will surely keep everyone on their toes!  Here’s a Wired article that discusses Chaos Monkey.

George Reese of enStratus discusses the recent failure at length.  The I would argue calling Amazon’s outage the Cloud’s Shing Moment, all of his points are wisened and this is the direction we should all be moving.

Going The Way of Commodity Hardware

Though it is still not obvious to everyone, I’ll spell it out loud and clear.  Like it or not, the cloud is coming.  Look at these numbers.

Furthermore the recent outage also highlights how much and how many internet sites rely on cloud computing, and Amazon EC2.
Way back in 2001 I authored a book on O’Reilly called “Oracle and Open Source”.  In it I discussed the technologies I was seeing in the real world.  Oracle on the backend and Linux, Apache, and PHP, Perl or some other language on the frontend.  These were the technologies that startups were using.  They were fast, cheap and with the right smarts reliable too.

Around that time Oracle started smelling the coffee and ported it’s enterprise database to Linux.  The equation for them was simple.  Customers that were previously paying tons of money to their good friend and confidant Sun for hardware, could now spend 1/10th as much on hardware and shift a lot of that left over cash to – you guessed it Oracle!  The hardware wasn’t as good, but who cares because you can get a lot more of it.

Despite a long entrenched and trusted brand like Sun being better and more reliable, guess what?  Folks still switched to commodity hardware.  Now this is so obvious, no one questions it.  But the same trend is happening with cloud computing.

Performance is variable, disk I/O can be iffy, and what’s more the recent outage illustrates front and center, the servers and network can crash at any moment.  Who in their right mind would want to move to this platform?

If that’s the question you’re stuck on, you’re still stuck on the old model.  You have not truely comprehended the power to build infrastructure with code, to provision through automation, and really embrace managing those components as software.  As the internet itself has the ability to route around political strife, and network outages, so too does cloud computing bring that power to mom & pop web shops.

Conclusions

  • Have existing investments in hardware?  Slow and cautious adoption makes most sense for you.
  • Have seasonal traffic variations?  An application like this is uniquely suited to the cloud.  In fact some of the gaming applications which can autoscale to 10x or 100x servers under load, are newly solveable with the advent of cloud computing.
  • Are you currently paying a lot for disaster recovery systems that primarily lay idle.  Script your infrastructure for rebuilding from bare metal, and save that part of the budget for more useful projects.