Category Archives: Business

Backup and Recovery in EC2 – 5 Point Checklist

backup and recovery checklistBest practices for backups and disaster recovery aren’t tremendously different in the cloud than from a managed hosting environment.  But they are more crucial since cloud servers are less reliable than physical servers.  Also the security aspect may play a heightened role in the cloud.  Here are some points to keep in mind.

Read the original article -
Intro to EC2 Cloud Deployments.

1. Perform multiple types of backups
2. Keep non-proprietary backups offsite
3. Test your backups – perform firedrills
4. Encrypt backups in S3
5. Perform Replication Integrity Checks Continue reading

Deploying MySQL on Amazon EC2 – 8 Best Practices

Also find Sean Hull’s ramblings on twitter @hullsean.

There are a lot of considerations for deploying MySQL in the Cloud.  Some concepts and details won’t be obvious to DBAs used to deploying on traditional servers.  Here are eight best practices which will certainly set you off on the right foot.

This article is part of a multi-part series Intro to EC2 Cloud Deployments.

1. Replication

Master-Slave replication is easy to setup, and provides a hot online copy of your data.  One or more slaves can also be used for scaling your database tier horizontally.

Master-Master active/passive replication can also be used to bring higher uptime, and allow some operations such as ALTER statements and database upgrades to be done online with no downtime.  The secondary master can be used for offloading read queries, and additional slaves can also be added as in the master-slave configuration.

Caution: MySQL’s replication can drift silently out of sync with the master. If you’re using statement based replication with MySQL, be sure to perform integrity checking to make your setup run smoothly. Here’s our guide to bulletproofing MySQL replication.

2. Security

You’ll want to create an AWS security group for databases which opens port 3306, but don’t allow access to the internet at large.  Only to your AWS defined webserver security group.  You may also decide to use a single box and security group which allows port 22 (ssh) from the internet at large.  All ssh connections will then come in through that box, and internal security groups (database & webserver groups) should only allow port 22 connections from that security group.

When you setup replication, you’ll be creating users and granting privileges.  You’ll need to grant to the wildcard ‘%’ hostname designation as your internal and external IPs will change each time a server dies. This is safe since you expose your database server port 3306 only to other AWS security groups, and no internet hosts.

You may also decide to use an encrypted filesystem for your database mount point, your database backups, and/or your entire filesystem.  Be particularly careful of your most sensitive data.  If compliance requirements dictate, choose to store very sensitive data outside of the cloud and secure network connections to incorporate it into application pages.

Be particularly careful of your AWS logins.  The password recovery mechanism in Amazon Web Services is all that prevents an attacker from controlling your entire infrastructure, after all.

3. Backups

There are a few ways to backup a MySQL database.  By far the easiest way in EC2 is using the AWS snapshot mechanism for EBS volumes.  Keep in mind you’ll want to encrypt these snapshots as S3 may not be as secure as you might like.   Although you’ll need to lock your MySQL tables during the snapshot, it will typically only take a few seconds before you can release the database locks.

Now snapshots are great, but they can only be used within the AWS environment, so it also behooves you to be performing additional backups, and moving them offsite either to another cloud provider or to your own internal servers.  For this your choices are logical backups or hotbackups.

mysqldump can perform logical backups for you.  These backups perform SELECT * on every table in your database, so they can take quite some time, and really destroy the warm blocks in your InnoDB buffer cache.   What’s more rebuilding a database from a dump can take quite some time.  All these factors should be considered before deciding a dump is the best option for you.

xtrabackup is a great open source tool available from Percona.  It can perform hotbackups of all MySQL tables including MyISAM, InnoDB and XtraDB if you use them.  This means the database will be online, not locking tables, with smarter less destructive hits to your buffer cache and database server as a whole.  The hotbackup will build a complete copy of your datadir, so bringing up the server from a backup involves setting the datadir in your my.cnf file and starting.

We wrote a handy guide to using hotbackups to setup replication.

4. Disk I/O

Obviously Disk I/O is of paramount performance for any database server including MySQL.  In AWS you do not want to use instance store storage at all.  Be sure your AMI is built on EBS, and further, use a separate EBS mount point for the database datadir.

An even better configuration than the above, but slightly more complex to configure is a software raid stripe of a number of EBS volumes.  Linux’s software raid will create an md0 device file which you will then create a filesystem on top of – use xfs.  Keep in mind that this arrangement will require some care during snapshotting, but can still work well.  The performance gains are well worth it!

5. Network & IPs

When configuring Master & Slave replication, be sure to use the internal or private IPs and internal domain names so as not to incur additional network charges.  The same goes for your webservers which will point to your master database, and one or more slaves for read queries.

6. Availability Zones

Amazon Web Services provides a tremendous leap in options for high availability.  Take advantage of availability zones by putting one or more of your slaves in a separate zone where possible.  Interestingly if you ensure the use of internal or private IP addresses and names, you will not incur additional network charges to servers in other availability zones.

7. Disaster Recovery

EC2 servers are out of the gates *NOT* as reliable as traditional servers.  This should send shivers down your spine if you’re trying to treat AWS like a traditional hosted environment.  You shouldn’t.  It should force you to get serious about disaster recovery.  Build bulletproof scripts to spinup your servers from custom built AMIs and test them.  Finally you’re taking disaster recovery as seriously as you always wanted to.   Take advantage of Availability Zones as well, and various different scenarios.

8. Vertical and Horizontal Scaling

Interestingly vertical scaling can be done quite easily in EC2.  If you start with a 64bit AMI, you can stop such a server, without losing the root EBS mount.  From there you can then start a new larger instance in EC2 and use that existing EBS root volume and voila you’ve VERTICALLY scaled your server in place.  This is quite a powerful feature at the system administrators disposal.  Devops has never been smarter!  You can do the same to scale *DOWN* if you are no longer using all the power you thought you’d need.  Combine this phenomenal AWS feature with MySQL master-master active/passive configuration, and you can scale vertically with ZERO downtime.  Powerful indeed.

We wrote an EC2 Autoscaling Guide for MySQL that you should review.

Along with vertical scaling, you’ll also want the ability to scale out, that is add more servers to the mix as required, and scale back when your needs reduce.  Build in smarts in your application so you can point SELECT queries to read-only slaves.  Many web applications exhibit the bulk of there work in SELECTs so being able to scale those horizontally is very powerful and compelling.  By baking this logic into the application you also allow the application to check for slave lag.  If your slave is lagging slightly behind the master you can see stale data, or missing data.  In those cases your application can choose to go to the master to get the freshest data.

What about RDS?

Wondering whether RDS is right for you? It may be. We wrote a comprehensive guide to evaluating RDS over MySQL.

If you read this far, you should grab our newsletter!

Managing Security in Amazon Web Services

Security is on everyone’s mind when talking about the cloud.  What are some important considerations?

For the web operations team:

  1. AWS has no perimeter security, should this be an overriding concern?
  2. How do I manage authentication keys?
  3. How do I harden my machine images?

** Original article — Intro to EC2 Cloud Deployments **

Amazon’s security groups can provide strong security if used properly.  Create security groups with specific minimum privileges, and do not expose your sensitive data – ie database to the internet directly, but only to other security groups.  On the positive side, AWS security groups mean there is no single point to mount an attack against as with a traditional enterprises network security.  What’s more there is no opportunity to accidentally erase network rules since they are defined in groups in AWS.

Authentication keys can be managed in a couple of different ways.  One way is to build them into the AMI.  From there any server spinup based on that AMI will be accessible by the owner of those credentials.  Alternatively a more flexible approach would be to pass in the credentials when you spinup the server, allowing you to dynamically control who has access to that server.

Hardening your AMIs in EC2 is much like hardening any Unix or Linux server.  Disable user accounts, ssh password authentication, and unnecessary services.  Consider a tool like AppArmor to fence applications in and keep them out of areas they don’t belong.  This can be an ongoing process that is repeated if the unfortunate happens and you are compromised.

You should also consider:

  • AWS password recovery mechanism is not as secure as a traditional managed hosting provider.  Use a very strong password to lock down your AWS account and monitor it’s usage.
  • Consider encrypted filesystems for your database mount point.  Pass in decryption key at server spinup time.
  • Consider storing particularly sensitive data outside of the cloud and expose through SSL API call.
  • Consider encrypting your backups.  S3 security is not proven.

For CTOs and Operations Managers:

  1. Where is my data physically located?
  2. Should I rely entirely on one provider?
  3. What if my cloud provider does not sufficiently protect the network?

Although you do not know where your data is physically located in S3 and EC2, you have the choice of whether or not to encrypt your data and/or the entire filesystem.  You also control access to the server.  So from a technical standpoint it may not matter whether you control where the server is physically.  Of course laws, standards and compliance rules may dictate otherwise.

You also don’t want to put all your eggs in one basket.  There are all sorts of things that can happen to a provider, from going out of business, to lawsuits that directly or indirectly affect you to even political pressure as in the wikileaks case.  A cloud provider may well choose the easier road and pull the plug rather than deal with any complicated legal entanglements.  For all these reasons you should be keeping regular backups of your data either on in-house servers, or alternatively at a second provider.

As a further insurance option, consider host intrusion detection software.  This will give you additional peace of mind against the potential of your cloud provider not sufficiently protecting their own network.

Additionally consider that:

  • A simple password recovery mechanism in AWS is all that sits between you and a hacker to your infrastructure.  Choose a very secure password, and monitor it’s usage.
  • EC2 servers are not nearly as reliable as traditional physical servers.  Test your deployment scripts, and your disaster recovery scenarios again and again.
  • Responding to a compromise will be much easier in the cloud.  Spinup the replacement server, and keep the EBS volume around for later analysis.

As with any new paradigm there is an element of the unknown and unproven which we are understandably concerned about.  Cloud hosted servers and computing can be just as secure if not more secure than traditional managed servers, or servers you can physically touch in-house.

How To Build Highly Scalable Web Applications For The Cloud

Scalability in the cloud depends a lot on application design.  Keep these important points in mind when you are designing your web application and you will scale much more naturally and easily in the cloud.

** Original article — Intro to EC2 Cloud Deployments **

1. Think twice before sharding

  • It increases your infrastructure and application complexity
  • it reduces availability – more servers mean more outages
  • have to worry about globally unique primary keys

2. Bake read/write database access into the application

  • allows you to check for stale data, fallback to write master
  • creates higher availability for read-only data
  • gracefully degrade to read-only website functionality if master goes down
  • horizontal scalability melds nicely with cloud infrastructure and IAAS

3. Save application state in the database

  • avoid in-memory locking structures that won’t scale with multiple web application servers
  • consider a database field for managing application locks
  • consider stored procedures for isolating and insulating developers from db particulars
  • a last updated timestamp field can be your friend

4. Consider Dynamic or Auto-scaling

  • great feature of cloud, spinup new servers to handle load on-demand
  • lean towards being proactive rather than reactive and measure growth and trends
  • watch the procurement process closely lest it come back to bite you

5. Setup Monitoring and Metrics

  • see trends over time
  • spot application trouble and bottlenecks
  • determine if your tuning efforts are paying off
  • review a traffic spike after the fact

The cloud is not a silver bullet that can automatically scale any web application.  Software design is still a crucial factor.  Baking in these features with the right flexibility and foresight, and you’ll manage your websites growth patterns with ease.

Have questions or need help with scalability?  Call us:  +1-212-533-6828

Metrics Bridge Gap Between IT & Business Units

On the business side we’ve all seen requests for hardware purchases that seem astronomical, or somehow out of proportion to the project at hand.  And on the IT side we’ve been faced with the challenge of selling capital expenditures on technology, as demands grow.

Collecting statistics on real usage of server systems, and then connecting the dots to business metrics is an excellent way to bridge the gap.  This allows IT to draw concrete connection between technology investment, and reaching business goals.

Metrics and drawing the dotted line in this way also educates folks on both sides of the tracks.  It educates technologists on exactly how technology purchases can be justified, by their direct return to the business.  And it educates finance and business executives on how those hardware purchases directly contribute to business growth.

iHeavy Insights 75 – Recognizing Quality

Finding good vendors who provide professional services may have a lot in common with finding good restaurants.  There may be an abundance of them, while the best ones remain difficult to find.

A long line does not mean quality food

Some restaurants have a long line because they have slow service.  If that’s because you’re getting quality personalized service, great.  But if it’s because of incompetence and general disorganization or because they can’t keep quality help, that’s another story.

Hype and marketing can bring a lot of customers to a new restaurant.  Sometimes it’s a celebrity chef or architect.  If that’s what you’re after then you may be at the right place.  If you’re looking for the best home cooked meal, you may have to keep looking.

Convenience and location can also bring long lines.  Finding a restaurant on the main street or square is usually not the one with the best food.

A better way to find quality

Take a look at how long the restaurant has been around.  A service provider who has been in business for a long time has obviously been successful at acquiring customers, solving their problems, and charging a fee that matches both their needs and those of their customers.

Check the testimonials of your provider.  If their website doesn’t list some, ask for one or two customers that they’ve worked with recently.

Pay attention to service.  If you are a small fish for your vendor, it’s likely that service will be affected.  If you on the other hand are one of your vendors bigger clients, they’ll likely give much more attention to you.  Notice how regular customers at a restaurant or lounge tend to get the best service.

Book Review:  The Power of Pull by Hagel, Brown & Davison

A lot of really influential people like this book.  Joichi Ito, Richard Florida and Eric Schmidt to name a few.  Enterprises are faced with a bewildering array of challenges from finding good people, to retaining them, and putting them to work in the most creative ways.  This book brings another new and welcome perspective on the future of building and growing successful organizations.

Introduction to EC2 Cloud Deployments

Cloud Computing holds a lot of promise, but there are also a lot of speed bumps in the road along the way.

In this six part series we’re going to cover a lot of ground.  We don’t intend this series to be an overly technical nuts and bolts howto.  Rather we will discuss high level issues and answer questions that come up for CTOs, business managers, and startup CEOs.

Some of the tantalizing issues we’ll address include:

  • How do I make sure my application is built for the cloud with scalability baked into the architecture?
  • I know disk performance is crucial for my database tier.  How do I get the best disk performance with Amazon Web Services & EC2?
  • How do I keep my AWS passwords, keys & certificates secure?
  • Should I be doing offsite backups as well, or are snapshots enough?
  • Cloud providers such as Amazon seem to have poor SLAs (service level agreements).  How do I mitigate this using availability zones & regions?
  • Cloud hosting environments like Amazons provide no perimeter security.  How do I use security groups to ensure my setup is robust and bulletproof?
  • Cloud deployments change the entire procurement process, handing a lot of control over to the web operations team.  How do I ensure that finance and ops are working together, and a ceiling budget is set and implemented?
  • Reliability of Amazon EC2 servers is much lower than traditional hosted servers.  Failure is inevitable.  How do we use this fact to our advantage, forcing discipline in the deployment and disaster recovery processes?  How do I make sure my processes are scripted & firedrill tested?
  • Snapshot backups and other data stored in S3 are somewhat less secure than I’d like.  Should I use encryption to protect this data?  When and where should I use encrypted filesystems to protect my more sensitive data?
  • How can I best use availability zones and regions to geographically disperse my data and increase availability?

As we publish each of the individual articles in this series we’ll link them to the titles below.  So check back soon!

  • Building Highly Scalable Web Applications for the Cloud
  • Managing Security in Amazon Web Services
  • MySQL Databases in the Cloud – Best Practices
  • Backup and Recovery in the Cloud – A Checklist
  • Cloud Deployments – Disciplined Infrastructure
  • Cloud Computing Use Cases
  • iHeavy Insights 73 – It’s Easy

    In the business of technology consulting, there are times when I’ve heard this statement.  It’s Easy!  Perhaps the single biggest thing I’ve learned through a decade and a half of consulting is, people use this phrase when they are feeling overly confident.

    What do I mean by that?  Well it turns out in psychology there are all sorts things we communicate with our spoken language & body language.  Some of those things we aren’t even conscious of.  In the case of the statement “It’s easy” your first thought may be about all the intricate details that have yet to be ironed out, all the hiccups that may happen along the way.  Or you may just simply be thinking of Murphy’s Law that always seems to rear it’s ugly head at the worst time.

    Truth is when you hear this statement, you may also be inclined to think of it as a statement of fact.  The person saying that they have reviewed all the facts and ascertained that task x is in fact a trivial one.

    Of course one doesn’t want to be the naysayer either, but you can raise concerns while still acknowledging both sides.  My tack is first to repeat what the person said in more detailed language.  By reiterating all of the details, it can sometimes illustrate right there some hidden complexity and weaken the sense of triviality to the task at hand.

    A Software Developer

    A few years back I had subcontractor developer working on a project.  We went over some details about what changes needed to happen.  A web-based analytical tool needed some additional search functionality.  We went over how that search would index documents in the site.  The developer explained to me “That’s easy.  No problem”.  I was suspicious.

    As development unfolded we hit a bump in the road.  Besides the database indexing, additional xml documents needed to be indexed in order for the search function to work properly.  That added quite a bit of additional complexity because the search solution developer had envisioned couldn’t deal with that xml data.

    A Business Prospect

    I was recently reviewing a contract with a prospect, and going over items and deliverables.   They explained that for the database portion we’ll just use Amazon RDS, instead of installing MySQL and configuring the server manually.  “This piece will be easy”.  Unfortunately using Amazon’s solution is still not  push-button  in any event.   These types of oversimplifications are fine if you’re working on a time and materials basis because the complexity of the project will unfold organically, and the process will educate everyone involved.  But if you are trying to do a fixed fee project, these can be a harbinger of trouble later on.

    Conclusion

    When you hear people say “That’s Easy” understand that they are only expressing their confidence, despite their assurances.  If you are not equally confident,  you’ll both need to discuss details until you reach a middle point.  If it scares you to hear someone say something is EASY, think of it as a warning flag that you are not both on the same page.  Then remedy the situation with ample communication.

    BOOK REVIEW:  Satyajit Das – Traders, Guns & Money

    Financial Times has very high standards, and with an endorsement by Gillian Tett on the back, you know you are on the right track to some excellent material.  Das’ expose explores the inside world of derivatives, the so-called WMD of the financial world.   Along the way you’ll enjoy wacky stories of rogue traders, $70,000 meals, LIBOR numbers, delusional thinking, and even more about financial risk.  It illustrates exactly why Warren Buffet said “You only find out who is swimming naked when the tide goes out.”

    Newsletter 72 – Don't Over Engineer

    It’s not a caution you here very often, but a worthy one now as ever.  Don’t Over Engineer solutions to your problems, features in your product, moving parts in your infrastructure, or solutions for your customers.

    Five Levels of Settings

    Years ago during the dot-com boom we were involved with a project for a financial services firm.  We were building out a subscription-based web service for them.  As part of the requirements gathering, we discussed various components and features that the site should have.  Their vision included various levels of settings and customizations that the user could control to filter and tune presentation.  It all appeared very rube goldbergian to us.

    Our suggestion was to drastically reduce the initial settings, simplify the process, shorten development and then find out what real customers wanted.  In this case the client is always right, so we went ahead and built out the complex settings scheme.  Once the product rolled out, however customers indeed had much different usage patterns than either our development team or the client had even expected.  Their demands in turn drove a different direction, but one matching real-world requirements that only the customers ultimately understood.

    Features But No Customers

    A colleague of mine is in the process of building out software for a web service.  Currently they’ve built version one of the product as they envisioned.  They have no customers.  They’re in the process of reviewing the product and deciding on a second round of new features to add.  Let me repeat the earlier part – they have no customers.

    I asked my colleague, why not launch with it as it is, and then see what your customers want.  Their response – we don’t want to launch until we’re ready.

    Well chances are you’ll never be ready because you don’t know where you’re going.  You can’t really know where you’re going until your customers tell you.

    Zero Percent Downtime

    I’ve spoken with many managers and CEOs about infrastructure and architecture over the years.  I remember one instance where I was asking about expectations, and downtime.  The managers response – we want zero downtime.  Well that’s not necessarily practical in the real-world.  Well let’s put everything in place we can to get as close as we can to that.

    The real world is messier.  Data centers have power outages.  In fact the east coast had over a day of power loss.  Averaged out that is one hour per year over the past thirty years.  Adding more components, more software to detect anomalies, more redundancy behind redundancy has it’s own commensurate costs.  At a certain point you have to err on the side of simplicity, as ultimately the complexity of the system itself contributes to outages and downtime.

    Evolution of a Company

    iPhone applications are everywhere these days.  A couple of years ago we were gathering feedback and opinion from experts and investors about a concept we had to build a venue and event management platform.  A colleague put us in touch with the CEO of a company building a iphone platform.  After discussing our concept he explained that they had started out with a very similar concept.  But over the past two years their company had evolved quite a bit from that starting point.  They simply responded to their customers requests for features, and grew organically from their.

    Conclusion

    It’s not easy to engineer a perfect widget from the start.  You usually don’t know what customers want, or how they will use your product or service.  Or further you don’t always know what the real world will deliver up.  So it is very easy to over engineer it, and miss the mark.  Better to build less, build small and release early.  Then let your customers or real-world dictates decide your next move.

    Book Review – The Four Agreements – A Practical Guide To Personal Wisdom by Don Miguel Ruiz

    This small little book is full of some very big ideas.

    1. Be Impeccable With Your Word.
    2. Don’t Take Anything Personally.
    3. Don’t Make Assumptions.
    4. Always Do Your Best.

    Whether you find personal insights, or help in your business relationships, this book will surely give you some fresh perspectives.

    iHeavy Insights 71 – Business Continuity Planning

    BCP or business continuity planning sometimes also called Business Continuity and Resiliency Planning is the process of protecting your business against disaster.

    There are a lot of risks associated with running a business from competitors, and hackers to legal entanglements to natural disasters or power outages.  For my subject matter expertise, I’ll focus on the datacenter.

    Your computing resources are hosted at a colocation facility or perhaps in the cloud.  So one part of BCP would be looking at redundancy and high availability.  Do you have two webservers, two database servers, two networks, routers, power systems.  Can you failover easily?  Have you tested the failover and documented the process?

    Once you have considered the issues within your datacenter, you can look at bigger outages such as power outages, or general datacenter downtime.  Even with solid SLAs, mistakes do happen.  If your business can’t handle this risk, you can look at using multiple datacenters – say on the east and west coast, or perhaps on different continents.  In that case global server load balancing (GSLB) may work for you, allowing load balancing to bring you to the physically closest servers or datacenter to service your request.  In the event one of those datacenters is unavailable, all traffic will be routed to the available one.

    As more and more deployments happen in the cloud, the process of failover and testing becomes more crucial.  And that’s a good thing.  With many clients I suggest doing a firedrill to run through all the disaster recovery steps.  This makes sure all your backups are complete, but more importantly that the process is well documented.  When an outage happens it’s not the time you want to put all the pieces of the puzzle back together, and figure out that one piece is missing.

    Cloud deployments though push you to automate processes, create images of server configurations and generally script the process of spinning up new servers.  That’s because virtualization requires it.   That pressure will only serve to improve recoverability and thus support business process continuity further.

    Book Review: Richard Florida – The Rise of the Creative Class

    Florida’s idea of creative is wide ranging.  The folks he includes are everyone whose job involves creating new ideas, new technologies, or new content.  Well written and exhaustive, he provides an insightful look at the horizon of our changing job market and economy.

    View The Rise of the Creative Class on Amazon