Tag Archives: cloud deployments

3 things CEOs should know about the Cloud

You’ve heard all the buzz and spiel about the cloud, and there’re good reasons to want to get there. On-demand compute power makes new levels of scalability possible. Low up front costs means moving capital expenditure to operating expenditure and saving a bundle in the process. We won’t give you anymore of the rah rah marketing hoopla. You’ve heard enough of that. We’ll gently play devil’s advocate for a moment, and give you a few things to think about when deploying applications with a cloud provider. Our focus is mainly on Amazon EC2.

You might also be interested in a wide reaching introduction to deploying on Amazon EC2.

  1. Funky Performance
  2. One of the biggest hurdles we see clients struggle with on Amazon EC2 is performance. This is rooted in the nature of shared resources. Computer servers, just like desktops rely on CPUs, Memory, Network and Disk. In the virtual datacenter, you can be given more than your fair share without you even knowing it. More bandwidth, more CPU, more disk? Who would complain? Well if your application behaves erratically, while you suddenly compete for disk resources you’ll quickly feel the flip side of that coin. Stocks go up, and they can just as easily come right back down.

    Variability around disk I/O seems to be the one that hits applications the hardest, especially the database tier of many web applications. If your application requires extremely high database transaction throughput, you would do well to consider physical servers and a real RAID array to host your database server. Read more about IOPs

  3. Uncertain Reliability – A Loaded Gun
  4. Everybody has heard the saying, don’t hand someone a loaded gun. In the case of Amazon servers, you really do load your applications onto fickle and neurotic servers.

    Imagine you open a car rental business. You could have two brand new fully reliable cars to rent out to customers. Your customers would be very happy, but you’d have a very small business. Alternatively you could have twenty used Pintos. You’d have some breaking down a lot, but as long as you keep ten of them rented at a time, your business is booming.

    In the Amazon world you have all the tools to keep your Ford Pintos running, but it’s important to think long and hard about reliability, redundancy, and automation. Read more about Failures, Lessons & the Chaos Monkey

  5. Iffy Support
  6. Managed hosting providers vary drastically in terms of the support you can expect. Companies like Rackspace, Servint or Datapipe have Support built into their DNA. They’ve grown up around having a support tech that your team can reach when they’re having trouble.

    Amazon takes the opposite approach. They give you all the tools to do everything yourself. But in a crunch it can be great to have that service available to help troubleshoot and diagnose a problem. Although they’re now offering support contracts, it’s not how they started out.

    If you have a crack operations team at your disposal, or you hire a third party provider like Heavyweight Internet Group Amazon Web Services gives you the flexibility and power to build phenomenal and scalable architectures. But if you’re a very small team without tons of technical know-how, you may well do better with a service-oriented provider like Rackspace et al.

A few more considerations…

  • Will your cloud provider go out of business?
  • Could a Subpoena against your provider draw you into the net?
  • Since you don’t know where your sensitive data is, should you consider encryption?
  • Should I keep additional backups outside of the cloud?
  • Should I use multiple cloud providers?
  • Should I be concerned about the lack of perimeter security?

Scale Quickly Like Birchbox – Startup Scalability 101

One of the great things about the Internet is how it has made it easier to put great ideas into practice. Whether the ideas are about improving people’s lives or a new way to sell and old-fashioned product, there’s nothing like a good little startup tale of creative disruption to deliver us from something old and tired.

We work with a lot of startup firms and we love being part of the atmosphere of optimism and ingenuity, peppered with a bit of youthful zeal – something very indie-rock-and-roll about it. But whether they are just starting out or already picking up pace every startup faces the same challenges to scale a business. Recently, we were reminded of this when we watched Inc’s video interview with Birchbox founders, Hayley Barna and Katia Beauchamp. Continue reading Scale Quickly Like Birchbox – Startup Scalability 101

The New Commodity Hardware Craze aka Cloud Computing

Does anyone remember 15 years ago when the dot-com boom was just starting?  A lot of companies were running on Sun.  Sun was the best hardware you could buy for the price.  It was reliable and a lot of engineers had experience with the operating system, SunOS a flavor of Unix.

Yet suddenly companies were switching to cheap crappy hardware.  The stuff failed more often, had lower quality control, and cheaper and slower buses.  Despite all of that, cutting edge firms and startups were moving to commodity hardware in droves.  Why was it so? Continue reading The New Commodity Hardware Craze aka Cloud Computing

Migrating to the Cloud – Why and why not?

A lot of technical forums and discussions have highlighted the limitations of EC2 and how it loses  on performance when compared to physical servers of equal cost.  They argue that you can get much more hardware and bigger iron for the same money.  So it then seems foolhardy to turn to the cloud.  Why this mad rush to the cloud then?  Of course if all you’re looking at is performance, it might seem odd indeed.  But another way of looking at it is, if performance is not as good, it’s clearly not the driving factor to cloud adoption.

CIOs and CTOs are often asking questions more along the lines of, “Can we deploy in the cloud and settle with the performance limitations, and if so how do we get there?”

Another question, “Is it a good idea to deploy your database in the cloud?”  It depends!  Let’s take a look at some of the strengths and weaknesses, then you decide.

8 big strengths of the cloud

  1. Flexibility in disaster recovery – it becomes a script, no need to buy additional hardware
  2. Easier roll out of patches and upgrades
  3. Reduced operational headache – scripting and automation becomes central
  4. Uniquely suited to seasonal traffic patterns – keep online only the capacity you’re using
  5. Low initial investment
  6. Auto-scaling – set thresholds and deploy new capacity automatically
  7. Easy compromise response – take server offline and spinup a new one
  8. Easy setup of dev, qa & test environments

Some challenges with deploying in the cloud

  1. Big cultural shift in how operations is done
  2. Lower SLAs and less reliable virtual servers – mitigate with automation
  3. No perimeter security – new model for managing & locking down servers
  4. Where is my data?  — concerns over compliance and privacy
  5. Variable disk performance – can be problematic for MySQL databases
  6. New procurement process can be a hurdle

Many of these challenges can be mitigated against.  The promise of the infrastructure deployed in the cloud is huge, so digging our heels in with gradual adoption is perhaps the best option for many firms.  Mitigate the weaknesses of the cloud by:

  • Use encrypted filesystems and backups where necessary
  • Also keep offsite backups inhouse or at an alternate cloud provider
  • Mitigate against EBS performance – cache at every layer of your application stack
  • Employ configuration management & automation tools such as Puppet & Chef

Quora discussion – Why or why not to migrate to the cloud?

Root Cause Analysis – What is it and why is it important?

Root Cause Analysis is the means to identify the ultimate source and cause of an outage.  When an outage occurs that causes serious downtime of a website, typically organizations are in crisis mode.  Urgency of resolution sometimes pushes aside due process, change management and general caution.  Root Cause Analysis attempts to as much as possible isolate logfiles, configurations, and the current state of systems for later analysis.

With traditional physical servers, physical hardware failure, operator error, or a security breach can cause outages.  Since you’re dealing with one physical machine, resolving that issue necessarily means moving around the things that broke.  So caution and later analysis must be balanced with the immediate problem resolution.

Another silver lining in cloud hosted solutions is around root cause analysis.  If a server was breached for example, that server can immediately be shutdown, while maintaining it’s current state as a disk or EBS snapshot.  A new server can then be fired up from a AMI image, then your server rebuilt from scripts or template and you’re back up and running.  Save the snapshot then for later analysis.

This could be used for analysis of operator error related outages as well.  Hardware failures are more expected and common in cloud hosted environments, so this should and really must push adoption of best practices around infrastructure, that is having scripts at hand that rebuild everything from bare metal.

More discussion of root cause analysis by Sean Hull on Quora.

Configuration Management – What is it and why is it important?

Every software service or component on a server requires configurations. In your desktop applications you set preferences for what your default page will be, how you’d like your margins set, or whether to save and restore cookies each time you restart.

Enterprise applications also require complex configuration settings.  Want to monitor a webserver and a database with Nagios, that’s set in the config file.  What to start MySQL with 8G of memory for InnoDB, that’s also set in a config file.  What’s more config files contain server specific settings, based on IP address, or the servers role, webserver or database for example.   The webserver may also have memcache and outbound email services running.

With more traditional deployments, the systems administrator will setup each physical box, and configure those services based on the business needs.  As you bring online 10’s or 100’s of servers, however, you can quickly see how labor intensive this process would be, and also how much redundancy there is.

Enter configuration management into the picture.  Previously I blogged about tools like Puppet that can bring great new best practices to the table. There is also cfengine, and the newer Chef which incorporates cloud deployments as well into the mix.  Configuration management allows you to remotely administer servers, install packages, manage dependencies, install configurations based on a central copy, and even define roles and templates for new servers.  This brings a whole new level of professionalism to deployments, and also newfound power and flexibility.

We’ll be writing more about configuration management, especially in the context of cloud deployments such as Amazon EC2 so please stay tuned.

Sean Hull asks on Quora – What is configuration management and why is it important?

Auto-scaling – What is it and why is it important?

With cloud-based hosting solutions, new servers can be provisioned and “spun up” with a few options on the command line.  This opens a whole new dimension for infrastructure, allowing software scripts to bring new computing power into your web infrastructure.

Internet based applications often exhibit seasonal traffic patterns where traffic stays steady or grows slowly over a period, but then experiences a sharp spike in demand requiring much higher computing resources to meet customer demand.

Enter auto-scaling, an even more powerful feature of cloud-based offerings.  Define roles for your webservers and database servers, set capacity rules that control how much traffic will trigger new servers to be rolled out, and watch your infrastructure scale automatically to meet the needs of your internet application.

Managing Security in Amazon Web Services

Security is on everyone’s mind when talking about the cloud.  What are some important considerations?

For the web operations team:

  1. AWS has no perimeter security, should this be an overriding concern?
  2. How do I manage authentication keys?
  3. How do I harden my machine images?

** Original article — Intro to EC2 Cloud Deployments **

Amazon’s security groups can provide strong security if used properly.  Create security groups with specific minimum privileges, and do not expose your sensitive data – ie database to the internet directly, but only to other security groups.  On the positive side, AWS security groups mean there is no single point to mount an attack against as with a traditional enterprises network security.  What’s more there is no opportunity to accidentally erase network rules since they are defined in groups in AWS.

Authentication keys can be managed in a couple of different ways.  One way is to build them into the AMI.  From there any server spinup based on that AMI will be accessible by the owner of those credentials.  Alternatively a more flexible approach would be to pass in the credentials when you spinup the server, allowing you to dynamically control who has access to that server.

Hardening your AMIs in EC2 is much like hardening any Unix or Linux server.  Disable user accounts, ssh password authentication, and unnecessary services.  Consider a tool like AppArmor to fence applications in and keep them out of areas they don’t belong.  This can be an ongoing process that is repeated if the unfortunate happens and you are compromised.

You should also consider:

  • AWS password recovery mechanism is not as secure as a traditional managed hosting provider.  Use a very strong password to lock down your AWS account and monitor it’s usage.
  • Consider encrypted filesystems for your database mount point.  Pass in decryption key at server spinup time.
  • Consider storing particularly sensitive data outside of the cloud and expose through SSL API call.
  • Consider encrypting your backups.  S3 security is not proven.

For CTOs and Operations Managers:

  1. Where is my data physically located?
  2. Should I rely entirely on one provider?
  3. What if my cloud provider does not sufficiently protect the network?

Although you do not know where your data is physically located in S3 and EC2, you have the choice of whether or not to encrypt your data and/or the entire filesystem.  You also control access to the server.  So from a technical standpoint it may not matter whether you control where the server is physically.  Of course laws, standards and compliance rules may dictate otherwise.

You also don’t want to put all your eggs in one basket.  There are all sorts of things that can happen to a provider, from going out of business, to lawsuits that directly or indirectly affect you to even political pressure as in the wikileaks case.  A cloud provider may well choose the easier road and pull the plug rather than deal with any complicated legal entanglements.  For all these reasons you should be keeping regular backups of your data either on in-house servers, or alternatively at a second provider.

As a further insurance option, consider host intrusion detection software.  This will give you additional peace of mind against the potential of your cloud provider not sufficiently protecting their own network.

Additionally consider that:

  • A simple password recovery mechanism in AWS is all that sits between you and a hacker to your infrastructure.  Choose a very secure password, and monitor it’s usage.
  • EC2 servers are not nearly as reliable as traditional physical servers.  Test your deployment scripts, and your disaster recovery scenarios again and again.
  • Responding to a compromise will be much easier in the cloud.  Spinup the replacement server, and keep the EBS volume around for later analysis.

As with any new paradigm there is an element of the unknown and unproven which we are understandably concerned about.  Cloud hosted servers and computing can be just as secure if not more secure than traditional managed servers, or servers you can physically touch in-house.

Introduction to EC2 Cloud Deployments

Cloud Computing holds a lot of promise, but there are also a lot of speed bumps in the road along the way.

In this six part series we’re going to cover a lot of ground.  We don’t intend this series to be an overly technical nuts and bolts howto.  Rather we will discuss high level issues and answer questions that come up for CTOs, business managers, and startup CEOs.

Some of the tantalizing issues we’ll address include:

  • How do I make sure my application is built for the cloud with scalability baked into the architecture?
  • I know disk performance is crucial for my database tier.  How do I get the best disk performance with Amazon Web Services & EC2?
  • How do I keep my AWS passwords, keys & certificates secure?
  • Should I be doing offsite backups as well, or are snapshots enough?
  • Cloud providers such as Amazon seem to have poor SLAs (service level agreements).  How do I mitigate this using availability zones & regions?
  • Cloud hosting environments like Amazons provide no perimeter security.  How do I use security groups to ensure my setup is robust and bulletproof?
  • Cloud deployments change the entire procurement process, handing a lot of control over to the web operations team.  How do I ensure that finance and ops are working together, and a ceiling budget is set and implemented?
  • Reliability of Amazon EC2 servers is much lower than traditional hosted servers.  Failure is inevitable.  How do we use this fact to our advantage, forcing discipline in the deployment and disaster recovery processes?  How do I make sure my processes are scripted & firedrill tested?
  • Snapshot backups and other data stored in S3 are somewhat less secure than I’d like.  Should I use encryption to protect this data?  When and where should I use encrypted filesystems to protect my more sensitive data?
  • How can I best use availability zones and regions to geographically disperse my data and increase availability?

As we publish each of the individual articles in this series we’ll link them to the titles below.  So check back soon!

  • Building Highly Scalable Web Applications for the Cloud
  • Managing Security in Amazon Web Services
  • MySQL Databases in the Cloud – Best Practices
  • Backup and Recovery in the Cloud – A Checklist
  • Cloud Deployments – Disciplined Infrastructure
  • Cloud Computing Use Cases
  • Newsletter 74 – Design For Failure

    It may sound like a pessimistic view of computing systems, but the fact is all of the components that make up the modern Internet stack have a certain failure rate. So looking at that realistically, planning for a break-down so you can manage it better, is essential.

    Failures in traditional datacenters

    In your own datacenter, or that of your managed hosting provider sit racks and racks of servers. Typically an proactive system administrator will keep a lot of spare parts around, hard drives, switches, additional servers etc. Although you don’t need them now, you don’t want to be in a position to have to order new equipment when it fails.  That would increase your recovery time dramatically.

    Besides keeping extra components lying around, you also typically want to avoid the so-called single point of failure. Dual power systems, switches, database servers, webservers etc. We also see RAID as sort of standard now in all modern servers as a loss of commodity sata drive is so common. Yet this redundancy makes it a non-event. We are expecting it and so design for it.

    And while we are prudent enough to perform backups regularly and document the layout of systems, rarely is the environment in a traditional datacenter completely scripted. Although attempts to test backups, and restore the database may be common, a full fire drill to rebuild everything is rarer.

    Failure in the Cloud

    In the last decade we saw Linux on commodity take over as the internet platform of choice because of the huge cost differential as compared to traditional hardware such as Sun or HP.   The hardware was more likely to fail, but being 1/10th the price meant you could build redundancy in to cover yourself and still save money.

    The latest wave of cloud providers are bringing the same types of costs savings. But cloud hosted servers, for instance in Amazon EC2 are much less reliable than typical rack mounted servers you might have in your datacenter.

    Planning for disaster recovery we agree is a really good idea, but sometimes it gets pushed aside by other priorities. In the cloud it moves to front and center as an absolute necessity. This forces a new, more robust approach to rebuilding your environment with scripts documenting and formalizing your processes.

    This is all a good thing as hardware failure then becomes an expected occurrence. Failures are a given, it’s how quickly you recover that makes the difference.

    Book Review:

    Cloud Application Architectures by George Reese
    Originally picked up this book expecting a very hands on guide to cloud deployments, especially on EC2. That is not what this book is though. It’s actually a very good CTO targeted book, covering difficult questions like cost comparisons between cloud and traditional datacenter hosting, security implications, disaster recovery, performance and service levels. The book is very readable, and not overly technical.