Category Archives: Technical Article

Professional Deployments Use Puppet For Configuration Management

Puppet is a configuration management tool that can be used to great advantage managing the configurations of a large fleet of servers in an enterprise.

My first thought upon finishing Turnbull & McCune’s book was that it could well have been titled Pro Deployments, for it covers a whole host of topics, integrating Puppet with a lot of other related tools.

Some of the advanced topics it covers in depth include:

  • integrating Puppet with version control such as git
  • setup of the standard dev, test and production environments
  • conditional application of generalized configs
  • managing nagios & load balancer configs to automatically add new nodes
  • capitalizing on puppet forge modules (like rpm packages)
  • testing your puppet configs with cucumber
  • reporting with the dashboard and the command line Continue reading Professional Deployments Use Puppet For Configuration Management

iHeavy Insights 79 – Plumbing the Interwebs

I meet new people all the time.  It’s a way of life in New York.  One of the first questions new people ask each other is “What do you do?”.  It begins to sound like a cliche after a while, but it can also provide endless fascinating discussions as there are so many people with different professions in New York.  Some choose a titled answer “i’m an investment banker”, “I’m an emcee”, “I’m an executive recruiter”.  I find for “Web Scalability Consultant” or “Web Operations Expert” this only leaves confused looks.

A Plumber By Another Name

The solution of course is to tell a good story.  Stories illustrate what titles and crusty vernacular cannot.  I’ve used analogies to surgeons or mechanics, of course they all operate on something people can related to in front of them.  People or vehicles we use everyday.  Of course with the internet, there is a huge hidden infrastructure that most people don’t see everyday.  They may vaguely know it’s there, but it’s still hidden out of site.

That’s why I think plumbing provides such an apt visual.  As it turns out the internet is built with countless data pipes both large and small, coming into your home or laying across the bottom of the transatlantic ocean.  These pipes plug into routers, high speed traffic lights and traffic cops.  Ultimately they feed into datacenters, huge rooms filled with racks of computers, holding your websites crown jewels.  Therein contains the images and status updates from your facebook profile, your banking transactions from your personal bank account or credit card, your netflix movie stream, or the email you sent via gmail.  Even your instant messaging stream, or the data from your favorite iphone app are all stored and retrieved from here.

Amazon Outage

The recent Amazon outage has been high profile enough that a lot of folks who don’t follow the latest trends in web operations, devops, and datacenter automation still heard about this event.  Turns out it’s had a silver lining for Amazon cause now everyone is scrutinizing how many sites actually rely on this goliath of a hosting provider.

As it turns out the root of the amazon outage was indeed a plumbing problem.  Amazon has shown rather high transparency publishing intimate details of the problem and it’s resolution.  Read more.

A misconfigured network cascaded through the system creating countless failures.  If you imagine water repairs being done in a large New York City building, they often ask tenants to turn off their water, so they won’t all come on at the same time when service is restored.  SImilarly intricate problems complicated the Amazon effort, slowing down attempts to restore everything after the incident.  I wrote at length about the outage if you’re interested, read more.

BOOK REVIEW:  Game-Based Marketing by Zicherman & Linder

There are so many new books coming out all the time, it’s tough to sift and find the good ones.  Anyone with a website as their storefront, whether they are a product company or a services company, can gain from reading this book.

From leaderboards to frequent flyer programs, badges and more this book is full of real-world examples where game-based principles are put into action.  On the internet where attention is a rarer and rarer commodity, these concepts will surely make a big difference to your business.

Amazon book link – Game Based Marketing

Amazon EC2 Outage – Failures, Lessons and Cloud Deployments

Now that we’ve had a chance to take a deep breath after last week’s AWS outage, I’ll offer some comments of my own.  Hopefully just enough time has passed to begin to have a broader view, and put events in perspective.
Despite what some reports may have announced, Amazon wasn’t down, but rather a small part of Amazon Web Services went down.  A failure, yes.  Beyond their service level agreement of 99.95% yes also.  Survivable, yes to this last question too.

Learning From Failure

The business management conversation du jour is all about learning from failure, rather than trying to avoid it.  Harvard Business Review’s April issue headlined with “The Failure Issue – How to Understand It, Learn From It, and Recover From It”.  The economist’s April 16th issue had some similarly interesting pieces one by Schumpeter “Fail often, fail well”,
and another in April 23rd issue “Lessons from Deepwater Horizon and Fukushima”.
With all this talk of failure there is surely one takeaway.  Complex systems will fail and it is in the anticipation of that failure that we gain the most.  Let’s stop howling and look at how to handle these situations intelligently.

How Do You Rebuild A Website?

In the cloud you will likely need two things.  (a) scripts to rebuild all the components in your architecture, spinup servers, fetch source code, fetch software and configuration files, configure load balancers and mount your database and more importantly (b) a database backup from which you can rebuild your current dataset.

Want to stick with EC2, build out your infrastructure in an alternate availability zone or region and you’re back up and running in hours.  Or better yet have an alternate cloud provider on hand to handle these rare outages.  The choice is yours.

Mitigate risk?  Yes indeed failure is more common in the cloud, but recovery is also easier.  Failure should pressure the adoption of best practices and force discipline in deployments, not make you more of a gunslinger!

Want to see an extreme example of how this can play in your favor?  Read Jeff Atwood’s discussion of so-called Chaos Monkey, a component whose sole job it is to randomly kill off servers in the Netflix environment at random.  Now that type of gunslinging will surely keep everyone on their toes!  Here’s a Wired article that discusses Chaos Monkey.

George Reese of enStratus discusses the recent failure at length.  The I would argue calling Amazon’s outage the Cloud’s Shing Moment, all of his points are wisened and this is the direction we should all be moving.

Going The Way of Commodity Hardware

Though it is still not obvious to everyone, I’ll spell it out loud and clear.  Like it or not, the cloud is coming.  Look at these numbers.

Furthermore the recent outage also highlights how much and how many internet sites rely on cloud computing, and Amazon EC2.
Way back in 2001 I authored a book on O’Reilly called “Oracle and Open Source”.  In it I discussed the technologies I was seeing in the real world.  Oracle on the backend and Linux, Apache, and PHP, Perl or some other language on the frontend.  These were the technologies that startups were using.  They were fast, cheap and with the right smarts reliable too.

Around that time Oracle started smelling the coffee and ported it’s enterprise database to Linux.  The equation for them was simple.  Customers that were previously paying tons of money to their good friend and confidant Sun for hardware, could now spend 1/10th as much on hardware and shift a lot of that left over cash to – you guessed it Oracle!  The hardware wasn’t as good, but who cares because you can get a lot more of it.

Despite a long entrenched and trusted brand like Sun being better and more reliable, guess what?  Folks still switched to commodity hardware.  Now this is so obvious, no one questions it.  But the same trend is happening with cloud computing.

Performance is variable, disk I/O can be iffy, and what’s more the recent outage illustrates front and center, the servers and network can crash at any moment.  Who in their right mind would want to move to this platform?

If that’s the question you’re stuck on, you’re still stuck on the old model.  You have not truely comprehended the power to build infrastructure with code, to provision through automation, and really embrace managing those components as software.  As the internet itself has the ability to route around political strife, and network outages, so too does cloud computing bring that power to mom & pop web shops.

Conclusions

  • Have existing investments in hardware?  Slow and cautious adoption makes most sense for you.
  • Have seasonal traffic variations?  An application like this is uniquely suited to the cloud.  In fact some of the gaming applications which can autoscale to 10x or 100x servers under load, are newly solveable with the advent of cloud computing.
  • Are you currently paying a lot for disaster recovery systems that primarily lay idle.  Script your infrastructure for rebuilding from bare metal, and save that part of the budget for more useful projects.

Cloud Computing Use Cases

Cloud Computing may not make sense for all application types.  But as with the adoption of commodity hardware and Linux over a decade ago, economic considerations will continue to pressure adoption.

This article is part of a multi-part series Intro to EC2 Cloud Deployments

What types of applications do fit well in the cloud?

o Applications with Seasonal Traffic Patterns
o Proof-of-concept Applications
o Quick Temporary Dev & Test Environments
o CPU Intensive Applications
o On-Demand or Unknown Future Demand

Seasonal Traffic Patterns

Web applications often show the following traffic patterns.  Traffic is steady for weeks or months, then experiences a spike in traffic.  That spike may be due to a launch of a new product or service, a new marketing or advertising campaign or sudden user interest.  Inevitably you’ll need more servers and compute power to handle that spike.  That is your peak capacity requirement.

With traditional servers you would need to buy enough servers or big enough ones to support that load or else suffer outages.  What’s more you’d have to plan in advance in order to have those servers online and integrated into the web infrastructure.

With Cloud Computing, you already have spinup scripts for your server types, and can bring additional compute power online with only a few commands.  Even better with AWS Autoscaling, you can define rules to have new servers spinup for you automatically!

Proof-of-Concept Applications

If you’re in the process of testing a new business idea or internet startup, you may not have the budget to order all sorts of heavy iron to support it.  Cloud Computing complements this type of requirement very nicely.  You need dev servers, voila they’re up and running.  Quickly and cheaply.  You may not know what you’ll need in six months or if your idea will take off, and don’t have to risk a big purchase.  Buy only what you need.

Dev and Test Environments

Another application type that really complements cloud computing well is dev and test environments.  You may want to clone your production servers, or bring on a temporary test environment with all of the same components as production.  But you don’t need that setup all of the time.  Just bring the servers online when you need them and stop them when you’re done testing.  You won’t get instance charges while the servers are stopped, but the server images will remain resident on your EBS snapshots!

CPU Intensive Applications

Server farms are used for all sorts of applications such as SETI or the Human Genome Project.  These applications require legions of servers working together to churn through large amounts of data.  That are uniquely fitted to cloud computing, as they are cpu-intensive.  Once you are done, you can easily decomission all of those servers.

Online gaming is another CPU intensive application.  As users access Facebook applications such as Farmville, it’s hard to know in advance what those demands will be from day-to-day.  Enabling a feature like AWS Autoscaling means the compute power does a lot of the capacity planning for you, responding dynamically to need. We wrote a piece on autoscaling MySQL databases.

On-Demand or Unknown Future Requirements

Any other types of applications that have on-demand needs, and for which you don’t know what the future will look like, match cloud computing well.  You avoid the up-front costs of buying a whole rack of servers, and keep servers offline when they’re not busy.

Hey you… made it this far? Grab out newsletter – scalable startups.

Cloud Computing – Disciplined Deployments

With traditional managed hosting solutions, we have best practices, we have business continuity plans, we have disaster recovery, we document our processes and all the moving parts in our infrastructure.  At least we pay lip service to these goals, though from time to time we admit to getting side tracked with bigger fish to fry, high priorities and the emergency of the day.  We add “firedrill” to our todo list, promising we’ll test restoring our backups.  But many times we find it is in the event of an emergency that we are forced to find out if we actually have all the pieces backed up and can reassemble them properly.

** Original article — Intro to EC2 Cloud Deployments **

Cloud Computing is different.  These goals are no longer be lofty ideals, but must be put into practice.  Here’s why.

  1. Virtual servers are not as reliable as physical servers
  2. Amazon EC2 has a lower SLA than many managed hosting providers
  3. Devops introduces new paradigm, infrastructure scripts can be version controlled
  4. EC2 environment really demands scripting and repeatability
  5. New flexibility and peace of mind

Unreliable Servers

EC2 virtual servers can and will die.  Your spinup scripts and infrastructure should consider this possibility not as some far off anomalous event, but a day-to-day concern.  With proper scripts and testing of various scenarios, this should become manageable.  Use snapshots to backup EBS root volumes, and build spinup scripts with AMIs that have all the components your application requires.  Then test, test and test again.

Amazon EC2’s SLA – Only 99.95%

The computing industry throws around the 99.999% or five-nines uptime SLA standard around a lot.  That amounts to less than six minutes of downtime.  Amazon’s 99.95% allows for 263 minutes of downtime.  Greater downtime merely gets you a credit on your account.  With that in mind, repeatable processes and scripts to bring your infrastructure back up in different availability zones or even different datacenters is a necessity.  Along with your infrastructure scripts, offsite backups also become a wise choice.  You should further take advantage of availability zones and regions to make your infrastructure more robust.  By using private IP addresses and network, you can host a MySQL database slave in a separate zone, for instance.  You can also do GDLB or Geographically Distributed Load Balancing to send customers on the west coast to that zone, and those on the east coast to one closer to them.  In the event that one region or availability zone goes out, your application is still responding, though perhaps with slightly degraded performance.

Devops – Infrastructure as Code

With traditional hosting, you either physically manage all of the components in your infrastructure, or have someone do it for you.  Either way a phone call is required to get things done.  With EC2, every piece of your infrastructure can be managed from code, so your infrastructure itself can be managed as software.  Whether you’re using waterfall method, or agile as your software development lifecycle, you have the new flexibility to place all of these scripts and configuration files in version control.  This raises manageability of your environment tremendously.  It also provides a type of ongoing documentation of all of the moving parts.  In a word, it forces you to deliver on all of those best practices you’ve been preaching over the years.

EC2 Environment Considerations

When servers get restarted they get new IP addresses – both private and public.  This may affect configuration files from webservers to mail servers, and database replication too, for example.  Your new server may mount an external EBS volume which contains your database.  If that’s the case your start scripts should check for that, and not start MySQL until it finds that volume.  To further complicate things, you may choose to use software raid over a handful of EBS volumes to get better performance.

The more special cases you have, the more you quickly realize how important it is to manage these things in software.  The more the process needs to be repeated, the more the scripts will save you time.

New Flexibility in the Cloud

Ultimately if you take into consideration less reliable virtual servers, and mitigate that with zones and regions, and automated scripts, you can then enjoy all the new benefits of the cloud.

  • autoscaling
  • easy test & dev environment setup
  • robust load & scalability testing
  • vertically scaling servers in place – in minutes!
  • pause a server – incurring only storage costs for days or months as you like
  • cheaper costs for applications with seasonal traffic patterns
  • no huge up-front costs

Migrating MySQL to Oracle Guide

Also find Sean Hull’s ramblings on twitter @hullsean.

Migrating from MySQL to Oracle can be as complex as picking up your life and moving from the country to the city.  Things in the MySQL world are often just done differently than they are in the Oracle world.  Our guide will give you a birds eye view of the differences to help you determine what is the right path for you.

** See also: Oracle to MySQL Migration Considerations **

MySQL comes from a more open-source or DIY background.  One of Unix and Linux administrators and even developers carrying the responsibility of a DBA.

  1. Installation & Administration Considerations
  2. Query and Optimizer Differences
  3. Security Strengths and Weaknesses
  4. Replication & High Availability
  5. Table Types & Storage Engines
  6. Applications, Connection Pooling, Stored Procedures and More
  7. Backups & Disaster Recovery
  8. Community – MySQL & Oracle Differences
  9. TCO, Licensing, and Cloud Considerations
  10. Advanced Oracle Features – Missing in MySQL

Check back soon as we update each of these sections.

Oracle to MySQL Migration Considerations

There are a lot of forms of transportation, from walking to bike riding, motorcycles and cars to busses, trains and airplanes.  Each mode of transport will get you from point a to point b, but one may be faster, or more comfortable and another more cost effective.  It’s important to keep in mind when comparing databases like Oracle and MySQL that there are indeed a lot of feature differences, a lot of cultural differences, and a lot of cost differences.  There are also a lot of impassioned people on both sides arguing at the tomfoolery of the other.  Hopefully we can dispel some of the myths and discuss the topic fairly.

** See also: Migrating MySQL to Oracle Guide **

As a long time Oracle DBA turned MySQL expert, I’ve spent time with clients running both database engines and many migrating from one to the other.  I can speak to many of the differences between the two environments.  I’ll cover the following:

  1. Query & Optimizer Limitations
  2. Security Differences
  3. Replication & HA Are Done Differently
  4. Installation & Administration Simplicity
  5. Watch Out – Triggers, Stored Procedures, Materialized Views & Snapshots
  6. Huge Community Support – Open-source Add-ons
  7. Enter The Cloud With MySQL
  8. Backup and Recovery
  9. Miscellaneous Considerations

Check back again as we edit and publish the various sections above.

How To Build Highly Scalable Web Applications For The Cloud

Scalability in the cloud depends a lot on application design.  Keep these important points in mind when you are designing your web application and you will scale much more naturally and easily in the cloud.

** Original article — Intro to EC2 Cloud Deployments **

1. Think twice before sharding

  • It increases your infrastructure and application complexity
  • it reduces availability – more servers mean more outages
  • have to worry about globally unique primary keys

2. Bake read/write database access into the application

  • allows you to check for stale data, fallback to write master
  • creates higher availability for read-only data
  • gracefully degrade to read-only website functionality if master goes down
  • horizontal scalability melds nicely with cloud infrastructure and IAAS

3. Save application state in the database

  • avoid in-memory locking structures that won’t scale with multiple web application servers
  • consider a database field for managing application locks
  • consider stored procedures for isolating and insulating developers from db particulars
  • a last updated timestamp field can be your friend

4. Consider Dynamic or Auto-scaling

  • great feature of cloud, spinup new servers to handle load on-demand
  • lean towards being proactive rather than reactive and measure growth and trends
  • watch the procurement process closely lest it come back to bite you

5. Setup Monitoring and Metrics

  • see trends over time
  • spot application trouble and bottlenecks
  • determine if your tuning efforts are paying off
  • review a traffic spike after the fact

The cloud is not a silver bullet that can automatically scale any web application.  Software design is still a crucial factor.  Baking in these features with the right flexibility and foresight, and you’ll manage your websites growth patterns with ease.

Have questions or need help with scalability?  Call us:  +1-212-533-6828

Introduction to EC2 Cloud Deployments

Cloud Computing holds a lot of promise, but there are also a lot of speed bumps in the road along the way.

In this six part series we’re going to cover a lot of ground.  We don’t intend this series to be an overly technical nuts and bolts howto.  Rather we will discuss high level issues and answer questions that come up for CTOs, business managers, and startup CEOs.

Some of the tantalizing issues we’ll address include:

  • How do I make sure my application is built for the cloud with scalability baked into the architecture?
  • I know disk performance is crucial for my database tier.  How do I get the best disk performance with Amazon Web Services & EC2?
  • How do I keep my AWS passwords, keys & certificates secure?
  • Should I be doing offsite backups as well, or are snapshots enough?
  • Cloud providers such as Amazon seem to have poor SLAs (service level agreements).  How do I mitigate this using availability zones & regions?
  • Cloud hosting environments like Amazons provide no perimeter security.  How do I use security groups to ensure my setup is robust and bulletproof?
  • Cloud deployments change the entire procurement process, handing a lot of control over to the web operations team.  How do I ensure that finance and ops are working together, and a ceiling budget is set and implemented?
  • Reliability of Amazon EC2 servers is much lower than traditional hosted servers.  Failure is inevitable.  How do we use this fact to our advantage, forcing discipline in the deployment and disaster recovery processes?  How do I make sure my processes are scripted & firedrill tested?
  • Snapshot backups and other data stored in S3 are somewhat less secure than I’d like.  Should I use encryption to protect this data?  When and where should I use encrypted filesystems to protect my more sensitive data?
  • How can I best use availability zones and regions to geographically disperse my data and increase availability?

As we publish each of the individual articles in this series we’ll link them to the titles below.  So check back soon!

  • Building Highly Scalable Web Applications for the Cloud
  • Managing Security in Amazon Web Services
  • MySQL Databases in the Cloud – Best Practices
  • Backup and Recovery in the Cloud – A Checklist
  • Cloud Deployments – Disciplined Infrastructure
  • Cloud Computing Use Cases