Professional Deployments Use Puppet For Configuration Management
Puppet is a configuration management tool that can be used to great advantage managing the configurations of a large fleet of servers in an enterprise.
My first thought upon finishing Turnbull & McCune's book was that it could well have been titled Pro Deployments, for it covers a whole host of topics, integrating Puppet with a lot of other related tools.
Some of the advanced topics it covers in depth include:
- integrating Puppet with version control such as git
- setup of the standard dev, test and production environments
- conditional application of generalized configs
- managing nagios & load balancer configs to automatically add new nodes
- capitalizing on puppet forge modules (like rpm packages)
- testing your puppet configs with cucumber
- reporting with the dashboard and the command line
Amazon EC2 Outage – Failures, Lessons and Cloud Deployments
Now that we've had a chance to take a deep breath after last week's AWS outage, I'll offer some comments of my own. Hopefully just enough time has passed to begin to have a broader view, and put events in perspective.
Despite what some reports may have announced, Amazon wasn't down, but rather a small part of Amazon Web Services went down. A failure, yes. Beyond their service level agreement of 99.95% yes also. Survivable, yes to this last question too.
Learning From Failure
The business management conversation du jour is all about learning from failure, rather than trying to avoid it. Harvard Business Review's April issue headlined with "The Failure Issue - How to Understand It, Learn From It, and Recover From It". The economist's April 16th issue had some similarly interesting pieces one by Schumpeter "Fail often, fail well",
and another in April 23rd issue "Lessons from Deepwater Horizon and Fukushima".
With all this talk of failure there is surely one takeaway. Complex systems will fail and it is in the anticipation of that failure that we gain the most. Let's stop howling and look at how to handle these situations intelligently.
How Do You Rebuild A Website?
In the cloud you will likely need two things. (a) scripts to rebuild all the components in your architecture, spinup servers, fetch source code, fetch software and configuration files, configure load balancers and mount your database and more importantly (b) a database backup from which you can rebuild your current dataset.
Want to stick with EC2, build out your infrastructure in an alternate availability zone or region and you're back up and running in hours. Or better yet have an alternate cloud provider on hand to handle these rare outages. The choice is yours.
Mitigate risk? Yes indeed failure is more common in the cloud, but recovery is also easier. Failure should pressure the adoption of best practices and force discipline in deployments, not make you more of a gunslinger!
Want to see an extreme example of how this can play in your favor? Read Jeff Atwood's discussion of so-called Chaos Monkey, a component whose sole job it is to randomly kill off servers in the Netflix environment at random. Now that type of gunslinging will surely keep everyone on their toes! Here's a Wired article that discusses Chaos Monkey.
George Reese of enStratus discusses the recent failure at length. The I would argue calling Amazon's outage the Cloud's Shing Moment, all of his points are wisened and this is the direction we should all be moving.
Going The Way of Commodity Hardware
Though it is still not obvious to everyone, I'll spell it out loud and clear. Like it or not, the cloud is coming. Look at these numbers.
Furthermore the recent outage also highlights how much and how many internet sites rely on cloud computing, and Amazon EC2.
Way back in 2001 I authored a book on O'Reilly called "Oracle and Open Source". In it I discussed the technologies I was seeing in the real world. Oracle on the backend and Linux, Apache, and PHP, Perl or some other language on the frontend. These were the technologies that startups were using. They were fast, cheap and with the right smarts reliable too.
Around that time Oracle started smelling the coffee and ported it's enterprise database to Linux. The equation for them was simple. Customers that were previously paying tons of money to their good friend and confidant Sun for hardware, could now spend 1/10th as much on hardware and shift a lot of that left over cash to - you guessed it Oracle! The hardware wasn't as good, but who cares because you can get a lot more of it.
Despite a long entrenched and trusted brand like Sun being better and more reliable, guess what? Folks still switched to commodity hardware. Now this is so obvious, no one questions it. But the same trend is happening with cloud computing.
Performance is variable, disk I/O can be iffy, and what's more the recent outage illustrates front and center, the servers and network can crash at any moment. Who in their right mind would want to move to this platform?
If that's the question you're stuck on, you're still stuck on the old model. You have not truely comprehended the power to build infrastructure with code, to provision through automation, and really embrace managing those components as software. As the internet itself has the ability to route around political strife, and network outages, so too does cloud computing bring that power to mom & pop web shops.
Conclusions
- Have existing investments in hardware? Slow and cautious adoption makes most sense for you.
- Have seasonal traffic variations? An application like this is uniquely suited to the cloud. In fact some of the gaming applications which can autoscale to 10x or 100x servers under load, are newly solveable with the advent of cloud computing.
- Are you currently paying a lot for disaster recovery systems that primarily lay idle. Script your infrastructure for rebuilding from bare metal, and save that part of the budget for more useful projects.
Migrating Business To The Cloud – Advantages and Challenges
Cloud Computing Use Cases
Cloud Computing may not make sense for all application types. But as with the adoption of commodity hardware and Linux over a decade ago, economic considerations will continue to pressure adoption.
** Original article -- Intro to EC2 Cloud Deployments **
What types of applications do fit well in the cloud?
- Applications with Seasonal Traffic Patterns
- Proof-of-concept Applications
- Quick Temporary Dev & Test Environments
- CPU Intensive Applications
- On-Demand or Unknown Future Demand
Seasonal Traffic Patterns
Cloud Computing – Disciplined Deployments
With traditional managed hosting solutions, we have best practices, we have business continuity plans, we have disaster recovery, we document our processes and all the moving parts in our infrastructure. At least we pay lip service to these goals, though from time to time we admit to getting side tracked with bigger fish to fry, high priorities and the emergency of the day. We add "firedrill" to our todo list, promising we'll test restoring our backups. But many times we find it is in the event of an emergency that we are forced to find out if we actually have all the pieces backed up and can reassemble them properly.
** Original article -- Intro to EC2 Cloud Deployments **
Cloud Computing is different. These goals are no longer be lofty ideals, but must be put into practice. Here's why.
- Virtual servers are not as reliable as physical servers
- Amazon EC2 has a lower SLA than many managed hosting providers
- Devops introduces new paradigm, infrastructure scripts can be version controlled
- EC2 environment really demands scripting and repeatability
- New flexibility and peace of mind
Unreliable Servers
EC2 virtual servers can and will die. Your spinup scripts and infrastructure should consider this possibility not as some far off anomalous event, but a day-to-day concern. With proper scripts and testing of various scenarios, this should become manageable. Use snapshots to backup EBS root volumes, and build spinup scripts with AMIs that have all the components your application requires. Then test, test and test again.
Amazon EC2's SLA - Only 99.95%
The computing industry throws around the 99.999% or five-nines uptime SLA standard around a lot. That amounts to less than six minutes of downtime. Amazon's 99.95% allows for 263 minutes of downtime. Greater downtime merely gets you a credit on your account. With that in mind, repeatable processes and scripts to bring your infrastructure back up in different availability zones or even different datacenters is a necessity. Along with your infrastructure scripts, offsite backups also become a wise choice. You should further take advantage of availability zones and regions to make your infrastructure more robust. By using private IP addresses and network, you can host a MySQL database slave in a separate zone, for instance. You can also do GDLB or Geographically Distributed Load Balancing to send customers on the west coast to that zone, and those on the east coast to one closer to them. In the event that one region or availability zone goes out, your application is still responding, though perhaps with slightly degraded performance.
Devops - Infrastructure as Code
With traditional hosting, you either physically manage all of the components in your infrastructure, or have someone do it for you. Either way a phone call is required to get things done. With EC2, every piece of your infrastructure can be managed from code, so your infrastructure itself can be managed as software. Whether you're using waterfall method, or agile as your software development lifecycle, you have the new flexibility to place all of these scripts and configuration files in version control. This raises manageability of your environment tremendously. It also provides a type of ongoing documentation of all of the moving parts. In a word, it forces you to deliver on all of those best practices you've been preaching over the years.
EC2 Environment Considerations
When servers get restarted they get new IP addresses - both private and public. This may affect configuration files from webservers to mail servers, and database replication too, for example. Your new server may mount an external EBS volume which contains your database. If that's the case your start scripts should check for that, and not start MySQL until it finds that volume. To further complicate things, you may choose to use software raid over a handful of EBS volumes to get better performance.
The more special cases you have, the more you quickly realize how important it is to manage these things in software. The more the process needs to be repeated, the more the scripts will save you time.
New Flexibility in the Cloud
Ultimately if you take into consideration less reliable virtual servers, and mitigate that with zones and regions, and automated scripts, you can then enjoy all the new benefits of the cloud.
- autoscaling
- easy test & dev environment setup
- robust load & scalability testing
- vertically scaling servers in place - in minutes!
- pause a server - incurring only storage costs for days or months as you like
- cheaper costs for applications with seasonal traffic patterns
- no huge up-front costs
Migrating MySQL to Oracle Guide
Migrating from MySQL to Oracle can be as complex as picking up your life and moving from the country to the city. Things in the MySQL world are often just done differently than they are in the Oracle world. Our guide will give you a birds eye view of the differences to help you determine what is the right path for you.
** See also: Oracle to MySQL Migration Considerations **
MySQL comes from a more open-source or DIY background. One of Unix and Linux administrators and even developers carrying the responsibility of a DBA.
- Installation & Administration Considerations
- Query and Optimizer Differences
- Security Strengths and Weaknesses
- Replication & High Availability
- Table Types & Storage Engines
- Applications, Connection Pooling, Stored Procedures and More
- Backups & Disaster Recovery
- Community - MySQL & Oracle Differences
- TCO, Licensing, and Cloud Considerations
- Advanced Oracle Features - Missing in MySQL
Check back soon as we update each of these sections.
Oracle to MySQL Migration Considerations
There are a lot of forms of transportation, from walking to bike riding, motorcycles and cars to busses, trains and airplanes. Each mode of transport will get you from point a to point b, but one may be faster, or more comfortable and another more cost effective. It's important to keep in mind when comparing databases like Oracle and MySQL that there are indeed a lot of feature differences, a lot of cultural differences, and a lot of cost differences. There are also a lot of impassioned people on both sides arguing at the tomfoolery of the other. Hopefully we can dispel some of the myths and discuss the topic fairly.
** See also: Migrating MySQL to Oracle Guide **
As a long time Oracle DBA turned MySQL expert, I've spent time with clients running both database engines and many migrating from one to the other. I can speak to many of the differences between the two environments. I'll cover the following:
- Query & Optimizer Limitations
- Security Differences
- Replication & HA Are Done Differently
- Installation & Administration Simplicity
- Watch Out - Triggers, Stored Procedures, Materialized Views & Snapshots
- Huge Community Support - Open-source Add-ons
- Enter The Cloud With MySQL
- Backup and Recovery
- Miscellaneous Considerations
Check back again as we edit and publish the various sections above.
How To Build Highly Scalable Web Applications For The Cloud
Scalability in the cloud depends a lot on application design. Keep these important points in mind when you are designing your web application and you will scale much more naturally and easily in the cloud.
** Original article -- Intro to EC2 Cloud Deployments **
1. Think twice before sharding
- It increases your infrastructure and application complexity
- it reduces availability - more servers mean more outages
- have to worry about globally unique primary keys
2. Bake read/write database access into the application
- allows you to check for stale data, fallback to write master
- creates higher availability for read-only data
- gracefully degrade to read-only website functionality if master goes down
- horizontal scalability melds nicely with cloud infrastructure and IAAS
3. Save application state in the database
- avoid in-memory locking structures that won't scale with multiple web application servers
- consider a database field for managing application locks
- consider stored procedures for isolating and insulating developers from db particulars
- a last updated timestamp field can be your friend
4. Consider Dynamic or Auto-scaling
- great feature of cloud, spinup new servers to handle load on-demand
- lean towards being proactive rather than reactive and measure growth and trends
- watch the procurement process closely lest it come back to bite you
5. Setup Monitoring and Metrics
- see trends over time
- spot application trouble and bottlenecks
- determine if your tuning efforts are paying off
- review a traffic spike after the fact
The cloud is not a silver bullet that can automatically scale any web application. Software design is still a crucial factor. Baking in these features with the right flexibility and foresight, and you'll manage your websites growth patterns with ease.
Have questions or need help with scalability? Call us: +1-212-533-6828
Introduction to EC2 Cloud Deployments
Cloud Computing holds a lot of promise, but there are also a lot of speed bumps in the road along the way.
In this six part series we're going to cover a lot of ground. We don't intend this series to be an overly technical nuts and bolts howto. Rather we will discuss high level issues and answer questions that come up for CTOs, business managers, and startup CEOs.
Some of the tantalizing issues we'll address include:
- How do I make sure my application is built for the cloud with scalability baked into the architecture?
- I know disk performance is crucial for my database tier. How do I get the best disk performance with Amazon Web Services & EC2?
- How do I keep my AWS passwords, keys & certificates secure?
- Should I be doing offsite backups as well, or are snapshots enough?
- Cloud providers such as Amazon seem to have poor SLAs (service level agreements). How do I mitigate this using availability zones & regions?
- Cloud hosting environments like Amazons provide no perimeter security. How do I use security groups to ensure my setup is robust and bulletproof?
- Cloud deployments change the entire procurement process, handing a lot of control over to the web operations team. How do I ensure that finance and ops are working together, and a ceiling budget is set and implemented?
- Reliability of Amazon EC2 servers is much lower than traditional hosted servers. Failure is inevitable. How do we use this fact to our advantage, forcing discipline in the deployment and disaster recovery processes? How do I make sure my processes are scripted & firedrill tested?
- Snapshot backups and other data stored in S3 are somewhat less secure than I'd like. Should I use encryption to protect this data? When and where should I use encrypted filesystems to protect my more sensitive data?
- How can I best use availability zones and regions to geographically disperse my data and increase availability?
As we publish each of the individual articles in this series we'll link them to the titles below. So check back soon!
