Category Archives: All

Open Source – What is it and why is it important?

Open Source, a term understood well by the technology set, but not enough by everyone.

Open Source for the software industry is like generic drugs for the pharmaceutical industry.  It enables more players to come to the table, it is a huge driving force behind internet infrastructures, which are built on Linux, Apache and many other technologies.  It is the backbone of companies like google, and facilitates cloud services from the likes of Amazon EC2, Joyent, Rackspace and many others.

It is the rising tide that lifts all boats, if you will.

Sean Hull’s writing on Quora.

Web Cornerstone – iHeavy Insights Newsletter 80

I’ve recently been doing a lot of Search Engine Optimization for my corporate website.  This makes your site more visible and keyword rich for the search engines, helping your customers more easily find you. In that process it quickly becomes clear how important the website is to your business.  On the internet it is surely the cornerstone of your business, your virtual storefront.

Site Speed Is Key

Study after study from Google and others have shown that the speed and response of a website directly affects user experience and in turn use of your product or service.  It translates directly to the business bottom line.

Speed is accomplished by first measuring current speed and performance, then tuning and optimizing the various layers of technology which support it.  Then measuring again to determine the speedup.  You can then tie those changes to customer retention, higher click through rates, more time spent on the site, and higher conversions.  Apply a dollar value to your conversions and you can then estimate the direct value of that effort.

Interface to Customers

The image your company projects is formed first by your website.  The usability and simplicity affects how customers feel about interacting with your products and services.  What’s more the information, solutions, tools and downloads available also reflect directly on your business.

Analytics tools like Google’s or Yahoo’s allow you to track conversions.  These are important metrics to show when customers are taking action.  They are traditionally used for when a purchase is made on an ecommerce site, but can just as easily be put to useful work tracking bookmarks, form submissions, phone call or request a callback, downloads, newsletter signups, likes and a lot more.  If you do use conversions for these diverse functions, be sure to assign a dollar value by estimating based on the percentage who convert and those who later become paying customers in one way or another.

It’s All About Operations

Your website should be backed by a solid and flexible content management system so you can make SEO and ongoing process as well as tuning and optimization.  WordPress, Drupal, Joomla are just a few.  Use caching at every layer, and optimize images for faster download.  Of course you’ll also need to apply best practices for disaster recovery, making sure your database and content are backed up regularly, as well as your server configuration components.

All of this boils down to web operations, that hidden support providing your internet technology foundation.

Book Review: The Art of SEO by Enge, Spencer, Fishkin & Stricchiola

Search Engine Optimization is for sure one part art, but it is certainly a lot of science too, and this book really covers all of the angles.  Even if you don’t plan to do the SEO yourself, it’s good to have a strong grounding in the material so you can make intelligent decisions.

Plan to do research and brainstorm on the main keywords your customers use, what your competitors do differently and tune your site and CMS to best fit those searches.  You’ll learn to optimize title tags, content and anchor text to be keyword rich, and most importantly of all plan a link building campaign to grow inbound links to your site and thus your authority and reputation on the internet.

Art of SEO is comprehensive, easy to follow & thorough, but also easy enough to dip into here and there for pointers if that’s all you need.  I also like O’Reilly’s layouts and font, so it’s very easy on the eyes.

Professional Deployments Use Puppet For Configuration Management

Puppet is a configuration management tool that can be used to great advantage managing the configurations of a large fleet of servers in an enterprise.

My first thought upon finishing Turnbull & McCune’s book was that it could well have been titled Pro Deployments, for it covers a whole host of topics, integrating Puppet with a lot of other related tools.

Some of the advanced topics it covers in depth include:

  • integrating Puppet with version control such as git
  • setup of the standard dev, test and production environments
  • conditional application of generalized configs
  • managing nagios & load balancer configs to automatically add new nodes
  • capitalizing on puppet forge modules (like rpm packages)
  • testing your puppet configs with cucumber
  • reporting with the dashboard and the command line Continue reading

iHeavy Insights 79 – Plumbing the Interwebs

I meet new people all the time.  It’s a way of life in New York.  One of the first questions new people ask each other is “What do you do?”.  It begins to sound like a cliche after a while, but it can also provide endless fascinating discussions as there are so many people with different professions in New York.  Some choose a titled answer “i’m an investment banker”, “I’m an emcee”, “I’m an executive recruiter”.  I find for “Web Scalability Consultant” or “Web Operations Expert” this only leaves confused looks.

A Plumber By Another Name

The solution of course is to tell a good story.  Stories illustrate what titles and crusty vernacular cannot.  I’ve used analogies to surgeons or mechanics, of course they all operate on something people can related to in front of them.  People or vehicles we use everyday.  Of course with the internet, there is a huge hidden infrastructure that most people don’t see everyday.  They may vaguely know it’s there, but it’s still hidden out of site.

That’s why I think plumbing provides such an apt visual.  As it turns out the internet is built with countless data pipes both large and small, coming into your home or laying across the bottom of the transatlantic ocean.  These pipes plug into routers, high speed traffic lights and traffic cops.  Ultimately they feed into datacenters, huge rooms filled with racks of computers, holding your websites crown jewels.  Therein contains the images and status updates from your facebook profile, your banking transactions from your personal bank account or credit card, your netflix movie stream, or the email you sent via gmail.  Even your instant messaging stream, or the data from your favorite iphone app are all stored and retrieved from here.

Amazon Outage

The recent Amazon outage has been high profile enough that a lot of folks who don’t follow the latest trends in web operations, devops, and datacenter automation still heard about this event.  Turns out it’s had a silver lining for Amazon cause now everyone is scrutinizing how many sites actually rely on this goliath of a hosting provider.

As it turns out the root of the amazon outage was indeed a plumbing problem.  Amazon has shown rather high transparency publishing intimate details of the problem and it’s resolution.  Read more.

A misconfigured network cascaded through the system creating countless failures.  If you imagine water repairs being done in a large New York City building, they often ask tenants to turn off their water, so they won’t all come on at the same time when service is restored.  SImilarly intricate problems complicated the Amazon effort, slowing down attempts to restore everything after the incident.  I wrote at length about the outage if you’re interested, read more.

BOOK REVIEW:  Game-Based Marketing by Zicherman & Linder

There are so many new books coming out all the time, it’s tough to sift and find the good ones.  Anyone with a website as their storefront, whether they are a product company or a services company, can gain from reading this book.

From leaderboards to frequent flyer programs, badges and more this book is full of real-world examples where game-based principles are put into action.  On the internet where attention is a rarer and rarer commodity, these concepts will surely make a big difference to your business.

Amazon book link – Game Based Marketing

Amazon EC2 Outage – Failures, Lessons and Cloud Deployments

Now that we’ve had a chance to take a deep breath after last week’s AWS outage, I’ll offer some comments of my own.  Hopefully just enough time has passed to begin to have a broader view, and put events in perspective.
Despite what some reports may have announced, Amazon wasn’t down, but rather a small part of Amazon Web Services went down.  A failure, yes.  Beyond their service level agreement of 99.95% yes also.  Survivable, yes to this last question too.

Learning From Failure

The business management conversation du jour is all about learning from failure, rather than trying to avoid it.  Harvard Business Review’s April issue headlined with “The Failure Issue – How to Understand It, Learn From It, and Recover From It”.  The economist’s April 16th issue had some similarly interesting pieces one by Schumpeter “Fail often, fail well”,
and another in April 23rd issue “Lessons from Deepwater Horizon and Fukushima”.
With all this talk of failure there is surely one takeaway.  Complex systems will fail and it is in the anticipation of that failure that we gain the most.  Let’s stop howling and look at how to handle these situations intelligently.

How Do You Rebuild A Website?

In the cloud you will likely need two things.  (a) scripts to rebuild all the components in your architecture, spinup servers, fetch source code, fetch software and configuration files, configure load balancers and mount your database and more importantly (b) a database backup from which you can rebuild your current dataset.

Want to stick with EC2, build out your infrastructure in an alternate availability zone or region and you’re back up and running in hours.  Or better yet have an alternate cloud provider on hand to handle these rare outages.  The choice is yours.

Mitigate risk?  Yes indeed failure is more common in the cloud, but recovery is also easier.  Failure should pressure the adoption of best practices and force discipline in deployments, not make you more of a gunslinger!

Want to see an extreme example of how this can play in your favor?  Read Jeff Atwood’s discussion of so-called Chaos Monkey, a component whose sole job it is to randomly kill off servers in the Netflix environment at random.  Now that type of gunslinging will surely keep everyone on their toes!  Here’s a Wired article that discusses Chaos Monkey.

George Reese of enStratus discusses the recent failure at length.  The I would argue calling Amazon’s outage the Cloud’s Shing Moment, all of his points are wisened and this is the direction we should all be moving.

Going The Way of Commodity Hardware

Though it is still not obvious to everyone, I’ll spell it out loud and clear.  Like it or not, the cloud is coming.  Look at these numbers.

Furthermore the recent outage also highlights how much and how many internet sites rely on cloud computing, and Amazon EC2.
Way back in 2001 I authored a book on O’Reilly called “Oracle and Open Source”.  In it I discussed the technologies I was seeing in the real world.  Oracle on the backend and Linux, Apache, and PHP, Perl or some other language on the frontend.  These were the technologies that startups were using.  They were fast, cheap and with the right smarts reliable too.

Around that time Oracle started smelling the coffee and ported it’s enterprise database to Linux.  The equation for them was simple.  Customers that were previously paying tons of money to their good friend and confidant Sun for hardware, could now spend 1/10th as much on hardware and shift a lot of that left over cash to – you guessed it Oracle!  The hardware wasn’t as good, but who cares because you can get a lot more of it.

Despite a long entrenched and trusted brand like Sun being better and more reliable, guess what?  Folks still switched to commodity hardware.  Now this is so obvious, no one questions it.  But the same trend is happening with cloud computing.

Performance is variable, disk I/O can be iffy, and what’s more the recent outage illustrates front and center, the servers and network can crash at any moment.  Who in their right mind would want to move to this platform?

If that’s the question you’re stuck on, you’re still stuck on the old model.  You have not truely comprehended the power to build infrastructure with code, to provision through automation, and really embrace managing those components as software.  As the internet itself has the ability to route around political strife, and network outages, so too does cloud computing bring that power to mom & pop web shops.

Conclusions

  • Have existing investments in hardware?  Slow and cautious adoption makes most sense for you.
  • Have seasonal traffic variations?  An application like this is uniquely suited to the cloud.  In fact some of the gaming applications which can autoscale to 10x or 100x servers under load, are newly solveable with the advent of cloud computing.
  • Are you currently paying a lot for disaster recovery systems that primarily lay idle.  Script your infrastructure for rebuilding from bare metal, and save that part of the budget for more useful projects.

Cloud Computing Use Cases

Cloud Computing may not make sense for all application types.  But as with the adoption of commodity hardware and Linux over a decade ago, economic considerations will continue to pressure adoption.

This article is part of a multi-part series Intro to EC2 Cloud Deployments

What types of applications do fit well in the cloud?

o Applications with Seasonal Traffic Patterns
o Proof-of-concept Applications
o Quick Temporary Dev & Test Environments
o CPU Intensive Applications
o On-Demand or Unknown Future Demand

Seasonal Traffic Patterns

Web applications often show the following traffic patterns.  Traffic is steady for weeks or months, then experiences a spike in traffic.  That spike may be due to a launch of a new product or service, a new marketing or advertising campaign or sudden user interest.  Inevitably you’ll need more servers and compute power to handle that spike.  That is your peak capacity requirement.

With traditional servers you would need to buy enough servers or big enough ones to support that load or else suffer outages.  What’s more you’d have to plan in advance in order to have those servers online and integrated into the web infrastructure.

With Cloud Computing, you already have spinup scripts for your server types, and can bring additional compute power online with only a few commands.  Even better with AWS Autoscaling, you can define rules to have new servers spinup for you automatically!

Proof-of-Concept Applications

If you’re in the process of testing a new business idea or internet startup, you may not have the budget to order all sorts of heavy iron to support it.  Cloud Computing complements this type of requirement very nicely.  You need dev servers, voila they’re up and running.  Quickly and cheaply.  You may not know what you’ll need in six months or if your idea will take off, and don’t have to risk a big purchase.  Buy only what you need.

Dev and Test Environments

Another application type that really complements cloud computing well is dev and test environments.  You may want to clone your production servers, or bring on a temporary test environment with all of the same components as production.  But you don’t need that setup all of the time.  Just bring the servers online when you need them and stop them when you’re done testing.  You won’t get instance charges while the servers are stopped, but the server images will remain resident on your EBS snapshots!

CPU Intensive Applications

Server farms are used for all sorts of applications such as SETI or the Human Genome Project.  These applications require legions of servers working together to churn through large amounts of data.  That are uniquely fitted to cloud computing, as they are cpu-intensive.  Once you are done, you can easily decomission all of those servers.

Online gaming is another CPU intensive application.  As users access Facebook applications such as Farmville, it’s hard to know in advance what those demands will be from day-to-day.  Enabling a feature like AWS Autoscaling means the compute power does a lot of the capacity planning for you, responding dynamically to need. We wrote a piece on autoscaling MySQL databases.

On-Demand or Unknown Future Requirements

Any other types of applications that have on-demand needs, and for which you don’t know what the future will look like, match cloud computing well.  You avoid the up-front costs of buying a whole rack of servers, and keep servers offline when they’re not busy.

Hey you… made it this far? Grab out newsletter – scalable startups.

iHeavy Insights 78 – Degrade Gracefully

Your recent social media campaign has gone viral.  It’s what you’ve been dreaming about, pinning your hopes on, and all of your hard work is now coming to fruition.  Tens of thousands of internet users, hoards of them in fact, are now descending on your website.  Only one problem, it went down!!

That’s a situation you want to avoid.  Luckily there are some best practices for avoiding scenarios like the one I described.  In engineering it’s termed “degrade gracefully”.  That is continue functioning but with the heaviest features disabled.

Browsing Only, But Still Functioning

One way to do this is for your site to have a browsing only mode.  On the database side you can still be functioning with a read-only database.  With a switch like that, your site will continue to function while pointed to any of your read-only replication slaves.  What’s more you can load balance across those easily, and keep your site up and running.

Decoupling

In software development, decoupling involves breaking apart components or pieces of an application that should not depend on one another.  One way to do this is to use a queuing system such as Amazon’s SQS to allow pieces of the application to queue up work to be done.  This makes those pieces asynchronous, ie they’ll return right away.  Another way is to expose services internal to your site through web services.  These individual components can then be scaled out as needed.  This makes them more highly available, and reduces the need to scale your memcache, webservers or database servers – the hardest ones to scale.

Identify Features You Can Disable

Typically your application will have features that are more superfluous, or that are not part of the core functionality.  Perhaps you have star ratings, or some other components that are heavy.  Work with the development and operations teams to identify those areas of the application that are heaviest, and that would warrant disabling if the site hits heavy storms.

Once you’ve done all that, document how to disable and reenable those features, so other team members will be able to flip the switches if necessary.

Continue reading

Cloud Computing – Disciplined Deployments

With traditional managed hosting solutions, we have best practices, we have business continuity plans, we have disaster recovery, we document our processes and all the moving parts in our infrastructure.  At least we pay lip service to these goals, though from time to time we admit to getting side tracked with bigger fish to fry, high priorities and the emergency of the day.  We add “firedrill” to our todo list, promising we’ll test restoring our backups.  But many times we find it is in the event of an emergency that we are forced to find out if we actually have all the pieces backed up and can reassemble them properly.

** Original article — Intro to EC2 Cloud Deployments **

Cloud Computing is different.  These goals are no longer be lofty ideals, but must be put into practice.  Here’s why.

  1. Virtual servers are not as reliable as physical servers
  2. Amazon EC2 has a lower SLA than many managed hosting providers
  3. Devops introduces new paradigm, infrastructure scripts can be version controlled
  4. EC2 environment really demands scripting and repeatability
  5. New flexibility and peace of mind

Unreliable Servers

EC2 virtual servers can and will die.  Your spinup scripts and infrastructure should consider this possibility not as some far off anomalous event, but a day-to-day concern.  With proper scripts and testing of various scenarios, this should become manageable.  Use snapshots to backup EBS root volumes, and build spinup scripts with AMIs that have all the components your application requires.  Then test, test and test again.

Amazon EC2′s SLA – Only 99.95%

The computing industry throws around the 99.999% or five-nines uptime SLA standard around a lot.  That amounts to less than six minutes of downtime.  Amazon’s 99.95% allows for 263 minutes of downtime.  Greater downtime merely gets you a credit on your account.  With that in mind, repeatable processes and scripts to bring your infrastructure back up in different availability zones or even different datacenters is a necessity.  Along with your infrastructure scripts, offsite backups also become a wise choice.  You should further take advantage of availability zones and regions to make your infrastructure more robust.  By using private IP addresses and network, you can host a MySQL database slave in a separate zone, for instance.  You can also do GDLB or Geographically Distributed Load Balancing to send customers on the west coast to that zone, and those on the east coast to one closer to them.  In the event that one region or availability zone goes out, your application is still responding, though perhaps with slightly degraded performance.

Devops – Infrastructure as Code

With traditional hosting, you either physically manage all of the components in your infrastructure, or have someone do it for you.  Either way a phone call is required to get things done.  With EC2, every piece of your infrastructure can be managed from code, so your infrastructure itself can be managed as software.  Whether you’re using waterfall method, or agile as your software development lifecycle, you have the new flexibility to place all of these scripts and configuration files in version control.  This raises manageability of your environment tremendously.  It also provides a type of ongoing documentation of all of the moving parts.  In a word, it forces you to deliver on all of those best practices you’ve been preaching over the years.

EC2 Environment Considerations

When servers get restarted they get new IP addresses – both private and public.  This may affect configuration files from webservers to mail servers, and database replication too, for example.  Your new server may mount an external EBS volume which contains your database.  If that’s the case your start scripts should check for that, and not start MySQL until it finds that volume.  To further complicate things, you may choose to use software raid over a handful of EBS volumes to get better performance.

The more special cases you have, the more you quickly realize how important it is to manage these things in software.  The more the process needs to be repeated, the more the scripts will save you time.

New Flexibility in the Cloud

Ultimately if you take into consideration less reliable virtual servers, and mitigate that with zones and regions, and automated scripts, you can then enjoy all the new benefits of the cloud.

  • autoscaling
  • easy test & dev environment setup
  • robust load & scalability testing
  • vertically scaling servers in place – in minutes!
  • pause a server – incurring only storage costs for days or months as you like
  • cheaper costs for applications with seasonal traffic patterns
  • no huge up-front costs

MySQL Cluster In The Cloud – Managers Guide

The term clustering is often used loosely in the context of enterprise databases.  In relation to MySQL in the cloud you can configure:

  1. Master-master active/passive
  2. Sharded MySQL Database
  3. NDB Cluster

Master-Master active/passive replication

Also sometimes known as circular replication.  This is used for high availability. You can perform operations on the inactive node (backups, alter tables or slow operations) then switch roles so inactive becomes active.  You would then perform the same operations on the former master.  Applications sees “zero downtime” because they are always pointing at the active master database.  In addition the inactive master can be used as a read-only slave to run SELECT queries and large reporting queries.  This is quite powerful as typical web applications tend to have 80% or more of their work performed with read-only queries such as browsing, viewing, and verifying data and information.

Sharded MySQL Database

This is similar to what in the Oracle world is called “application partitioning”.   In fact before Oracle 10 most Parallel server and RAC installations required you to do this.  For example a user table might be sharded by putting names A-F on node A, G-L on node B and so forth.

You can also achieve this somewhat transparently with user_ids.  MySQL has an autoincrement column type to handle serving up unique ids.  It also has a cluster-friendly feature called auto_increment_increment.  So in an example where you had *TWO* nodes, all EVEN numbered IDs would be generated on node A and all ODD numbered IDs would be generated on node B.  They would also be replicating changes to eachother, yet avoid collisions.

Obviously all this has to be done with care, as the database is not otherwise preventing you from doing things that would break replication and your data integrity.

One further caution with sharding your database is that although it increases write throughput by horizontally scaling the master, it ultimately reduces availability.   An outage of any server in the cluster means at least a partial outage of the cluster itself.

NDB Cluster

This is actually a storage engine, and can be used in conjunction with InnoDB and MyISAM tables.  Normally you would use it sparingly for a few special tables, providing availability and read/write access to multiple masters.  This is decidedly *NOT* like Oracle RAC though many mistake it for that technology.

MySQL Clustering In The Cloud

The most common MySQL cluster configuration we see in the Amazon EC2 environment is by far the Master-Master configuration described above.  By itself it provides higher availability of the master node, and a single read-only node for which you can horizontally scale your application queries.  What’s more you can add additional read-only slaves to this setup allowing you to scale out tremendously.