In search of a good book on Chef itself, I picked up this new title on O’Reilly. It’s one of their new format books, small in size, only 75 pages.
There was some very good material in this book. Mr. Nelson-Smith’s writing style is good, readable, and informative. The discussion of risks of infrastructure as code was instructive. With the advent of APIs to build out virtual data centers, the idea of automating every aspect of systems administration, and building infrastructure itself as code is a new one. So an honest discussion of the risks of such an approach is bold and much needed. I also liked the introduction to Chef itself, and the discussion of installation.
Chef isn’t really the main focus of this book, unfortunately. The book spends a lot of time introducing us to Agile Development, and specifically test driven development. While these are lofty goals, and the first time I’ve seen treatment of the topic in relation to provisioning cloud infrastructure, I did feel too much time was spent on that.
Amazon Web Services is a division of Amazon the bookseller, but this part of the business is devoted solely to infrastructure and internet servers. These are the building blocks of data centers, the workhorses of the internet. AWS’s offering of Cloud Computing solutions allows a business to setup or “spinup” in the jargon of cloud computing, new compute resources at will. Need a small single cpu 32bit ubuntu server with two 20G disks attached? One command and 30 seconds away, and you can have that!
As we discussed previously, Infrastructure Provisioning has evolved dramatically over the past fifteen years from something took time and cost a lot, to a fast automatic process that it is today with cloud computing. This has also brought with it a dramatic culture shift in the way that systems administration is being done, from a fairly manual process of physical machines, and software configuration, one that took weeks to setup new services, to a scriptable and automateable process that can then take seconds.
This new realm of cloud computing infrastructure and provisioning is called Infrastructure as a Service or IaaS, and Amazon Web Services is one of the largest providers of such compute resources. They’re not the only ones of course. Others include:
- Rackspace Cloud
Cloud Computing is still in it’s infancy, but is growing quickly. Amazon themselves had a major data center outage in April that we discussed in detail. It sent some hot internet startups into a tailspin!
In the old days…
You would have a closet in your startup company with a rack of computers. Provisioning involved:
- Deciding on your architectural direction, what, where & how
- Ordering the new hardware
- Waiting weeks for the packages to arrive
- Setup the hardware, wire things together, power up
- Discover some component is missing, or failed and order replacement
- Wait longer…
- Finally get all the pieces setup
- Configure software components and go
Along came some industrious folks who realized power and data to your physical location wasn’t reliable. So datacenters sprang up. With data centers, most of the above steps didn’t change except between steps 3 & 4 you would send your engineers out to the datacenter location. Trips back and forth ate up time and energy.
Then along came managed hosting. Managed hosting saved companies a lot of headache, wasted man hours, and other resources. They allowed your company to do more of what it does well, run the business, and less on managing hardware and infrastructure. Provisioning now became:
- Decide on architecture direction
- Call hosting provider and talk to sales person
- Wait a day or two
- Setup & configure software components and go
Obviously this new state of affairs improved infrastructure provisioning dramatically. It simplified the process and sped it up as well. What’s more a managed hosting provider could keep spare parts and standard components on hand in much greater volume than a small firm. That’s a big plus. This evolution continued because it was a win-win for everyone. The only downside was when engineers made mistakes, and finger pointing began. But despite all of that, a managed hosting provider which does only that, can do it better, and more reliably than you can yourself.
So where are we in present day? We are all either doing, or looking out cloud provisioning of infrastructure. What’s cloud provisioning? It is a complete paradigm shift, but along the same trajectory as what we’ve described above. Now you removed all the waiting. No waiting for sales team, or the ordering process. That’s automatic. No waiting for engineers to setup the servers, they’re already setup. They are allocated by your software and scripts. Even the setup and configuration of software components, Operating System and services to run on that server – all automatic.
This is such a dramatic shift, that we are still feeling the affects of it. Traditional operations teams have little experience with this arrangement, and perhaps little trust in virtual servers. Business units are also not used to handing the trigger to infrastructure spending over to ops teams or to scripts and software.
However the huge economic pressures continue to push firms to this new model, as well as new operational flexibility. Gartner predicts this trend will only continue. The advantages of cloud infrastructure provisioning include:
- Metered payment – no huge outlay of cash for new infrastructure
- Infrastructure as a service – scripted components automate & reduced manual processes
- Devops – Manage infrastructure like code with version control and reproduceability
- Take unused capacity offline easily & save on those costs
- Disaster Recovery is free – reuse scripts to build standard components
- Easily meet seasonal traffic requirements – spinup additional servers instantly
It may sound like a pessimistic view of computing systems, but the fact is all of the components that make up the modern Internet stack have a certain failure rate. So looking at that realistically, planning for a break-down so you can manage it better, is essential.
Failures in traditional datacenters
In your own datacenter, or that of your managed hosting provider sit racks and racks of servers. Typically an proactive system administrator will keep a lot of spare parts around, hard drives, switches, additional servers etc. Although you don’t need them now, you don’t want to be in a position to have to order new equipment when it fails. That would increase your recovery time dramatically.
Besides keeping extra components lying around, you also typically want to avoid the so-called single point of failure. Dual power systems, switches, database servers, webservers etc. We also see RAID as sort of standard now in all modern servers as a loss of commodity sata drive is so common. Yet this redundancy makes it a non-event. We are expecting it and so design for it.
And while we are prudent enough to perform backups regularly and document the layout of systems, rarely is the environment in a traditional datacenter completely scripted. Although attempts to test backups, and restore the database may be common, a full fire drill to rebuild everything is rarer.
Failure in the Cloud
In the last decade we saw Linux on commodity take over as the internet platform of choice because of the huge cost differential as compared to traditional hardware such as Sun or HP. The hardware was more likely to fail, but being 1/10th the price meant you could build redundancy in to cover yourself and still save money.
The latest wave of cloud providers are bringing the same types of costs savings. But cloud hosted servers, for instance in Amazon EC2 are much less reliable than typical rack mounted servers you might have in your datacenter.
Planning for disaster recovery we agree is a really good idea, but sometimes it gets pushed aside by other priorities. In the cloud it moves to front and center as an absolute necessity. This forces a new, more robust approach to rebuilding your environment with scripts documenting and formalizing your processes.
This is all a good thing as hardware failure then becomes an expected occurrence. Failures are a given, it’s how quickly you recover that makes the difference.
Cloud Application Architectures by George Reese
Originally picked up this book expecting a very hands on guide to cloud deployments, especially on EC2. That is not what this book is though. It’s actually a very good CTO targeted book, covering difficult questions like cost comparisons between cloud and traditional datacenter hosting, security implications, disaster recovery, performance and service levels. The book is very readable, and not overly technical.