5 core pieces of the Amazon Cloud puzzle to get your project off the ground

amazon cloud automation

One of the most common engagements I do is working with firms in and around the NYC startup sector. I evaluate AWS infrastructures & applications built in the Amazon cloud.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

I’ve seen some patterns in customers usage of Amazon. Below is a laundry list of the most important ones.

On our products & pricing page you can see more detail including how we perform a performance review and a sample executive summary.

1. Use automation

When you first start using Amazon Web Services to host your application, you like many before you may think of it like you’re old school hosting. Setup a machine, configure it, get your code running. The traditional model of systems administration. It’s fine for a single server, but if you’re managing a more complex deploy with continuous integration, or want to be resilient to regular server failures you need to do more.

Enter the various automation tools on offer. The simplest of the three is Elastic Beanstalk. If you’re using a very standard stack & don’t need a lot of customizations, this may well work for you.

With more complex deployments you’ll likely want to look at Opsworks Sounds familiar? That’s because it *is* Opscode Chef. Everything you can do with Chef & all the templates out there will work with Amazon’s offering. Let AWS manage your templates & make sure your servers are in the right state, just like hosted chef.

If you want to get down to the assembly language layer of infrastructure in Amazon, you’ll eventually be dealing with CloudFormation. This is JSON code which defines everything, from a server with an attached EBS volume, to a VPC with security rules, IAM users & everything inbetween. It is ultimately what these other services utilize under the hood.

Also: Is Amazon too big to fail?

2. Use Advisor & Alerts

Amazon has a few cool tools to help you manage your infrastructure better. One is called Trusted Advisor . This helps you by looking at your aws usage for best practices. Cost, performance, security & high availability are the big focal points.

In order to make best use of alerts, you’ll want to do a few things. First define an auto scaling group. Even if you don’t want to use autoscaling, putting your instance into one allows amazon to do the monitoring you’ll want.

Next you’ll want to analyze your CloudWatch metrics for usage patterns. Notice a spike, could be a job that is running, or it could be a seasonal traffic spike that you need to manage. Once you have some ideas here, you can set alerts around normal & problematic usage patterns.

Related: Are we fast approaching cloud-mageddon?

3. Use Multi-factor at Login

If you haven’t already done so, you’ll want to enable multi-factor authentication on your AWS account. This provides much more security than a password (even a sufficiently long one) can ever do. You can use Google authenticator to generate the mfa codes and associated it with your smartphone.

While you’re at it, you’ll want to create at least one alternate IAM account so you’re not logging in through the root AWS account. This adds a layer of security to your infrastructure. Consider creating an account for your command line tools to spinup components in the cloud.

You can also use MFA for your command line SSH logins. This is also recommended & not terribly hard to setup.

Read: When hosting data on Amazon turns bloodsport

4. Use virtual networking

Amazon offers Virtual Private Cloud which allows you to create virtual networks within the Amazon cloud. Set your own ip address range, create route tables, gateways, subnets & control security settings.

There is another interesting offering called VPC peering. Previously, if you wanted to route between two VPCs or across the internet to your office network, you’d have to run a box within your VPC to do the networking. This became a single point of failure, and also had to be administered.

With VPC peering, Amazon can do this at the virtualization layer, without extra cost, without single point of failure & without overhead. You can even use VPC peering to network between two AWS accounts. Cool stuff!

Also: Are SQL databases dead?

5. Size instances & I/O

I worked with one startup that had been founded in 2010. They had initially built their infrastructure on AWS so they chose instances based on what was available at the time. Those were m1.large & m1.xlarge. A smart choice at the time, but oh how things evolve in the amazon world.

Now those instance types are “previous generation”. Newer instances offer SSD, more CPU & better I/O for roughly the same price. If you’re in this position, be sure to evaluate upgrading your instances.

If you’re on Amazon RDS, you may not be able to get to the newer instance sizes until you upgrade your database. Does upgrading MySQL involve much more downtime on Amazon RDS? In my experience it surely does.

Along with instance sizes, you’ll also want to evaluate disk I/O options. By default instances in amazon being multi-tenant, use disk as a shared resource. So they’ll see it go up & down dramatically. This can kill database performance & can be painful. There are expensive solutions. Consider looking at provisioned IOPS and additional SSD storage.

Also: Is the difference between dev & ops a four-letter word?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Some thoughts on 12 factor apps

12 factor app

I was talking with a colleague recently about an upcoming project.

In the summary of technologies, he listed 12 factor, microservices, containers, orchestration, CI and nodejs. All familiar to everyone out there, right?

Join 28,000 others and follow Sean Hull on twitter @hullsean.

Actually it was the first I had heard of 12 factor, so I did a bit of reading.

1. How to treat your data resources

12 factor recommends that backing services be treated like attached resources. Databases are loosely coupled to the applications, making it easier to replace a badly behaving database, or connect multiple ones.

Also: Is the difference between dev & ops a four-letter word?

2. Stay loosely coupled

In 12 Fractured Apps Kelsey Hightower adds that this loose coupling can be taken a step further. Applications shouldn’t even assume the database is available. Why not fall back to some useful state, even when the database isn’t online. Great idea!

Related: Is Amazon too big to fail?

3. Degrade gracefully

A read-only or browse-only mode is another example of this. Allow your application to have multiple decoupled database resources, some that are read-only. The application behaves intelligently based on what’s available. I’ve advocated those before in Why Dropbox didn’t have to fail.

Read: When hosting data on Amazon turns bloodsport

Conclusion

The twelve-factor app appears to be an excellent guideline on building cleaning applications that are easier to deploy and manage. I’ll be delving into it more in future posts, so check back!

Read: Are we fast approaching cloud-mageddon?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Does FedRAMP formalize what good devops already do?

fedramp-logo

amazon-govcloud

Amazon’s GovCloud provides a specialized region within Amazon’s global footprint of datacenters. These are hosted within the United States, and provide a subset of the full Amazon cloud functionality.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

However, hosting within GovCloud is not the whole story. Beyond this, you’ll want to implement FedRAMP compliant procedures & policies.

Are these policies new? As a seasoned systems administrator of Unix & Linux networks, you’ll likely find these very familiar best practices. What they do however, is formalize those into a set of procedures for testing compliance. And that’s a good thing.

1. Use a bastion box

A bastion box is a single point of entry for all your SSH activity. Instead of allowing SSH access to any of your servers from *anywhere* on the internet, you limit it to one box. This box is hardened with multi-factor authentication for security, only opens port 22, monitors & logs access, and funnels movement to all your other boxes. Thus you gain a virtual perimeter that you’re already familiar with in more traditional firewall setups.

Also: Ward Cunningham explains the high cost of technical debt (video)

2. Monitor & scan for vulnerabilities

Monitoring, scanning & logging are all key facilities for security management. Regular patch management of each of your servers, is essential to protect from newly discovered vulnerabilities. FedRAMP also requires scanning by tools such as Nessus or Retina.

Also centralizing your authorization, access & error logs allows easy monitoring & alerting of threats & improper access attempts.

Related: Do managers underestimate the cost of operations?

3. Policy of least privilege

The policy of least privilege is an old friend in computing & managing unix systems. It means first to eliminate all privileges (default to none) and then grant only those a user requires to do his or her work.

In Amazon it means not using the root account for provisioning infrastructure, it means a clear separation of dev, test & production environments. It limits who can access production & especially make changes there. It limits who can see sensitive data.

As well, you’ll use Access Control Lists (ACL’s) and security groups to control which servers can reach which other servers, whom on the internet can touch specific servers & ports, and so forth. These are the Amazon Cloud equivalent of perimeter security you may be familiar with in more traditional firewalls.

Read: When hosting data on Amazon turns bloodsport

4. Encrypt your data

If you want to be truly secure, you’ll want to encrypt your data at rest. You can do this by using encrypted filesystems in Linux. That way data is in a digital envelope, even on disk. Only when data is read into memory is it unencrypted. This provides additional insurance, because your EBS snapshots, backups & so forth are all hidden from prying eyes.

Also: Why dropbox didn’t have to fail

5. Conclusion

Amazon’s GovCloud provides access to a subset of their cloud offerings including EC2 their elastic compute cloud virtual servers, EBS the elastic block storage their own storage area network, S3 for file storage, VPC, IAM, RDS, Elasticache & Redshift.

FedRAMP formalizes what good systems administrators do already. Secure systems, deliver reliability & high availability & protect from unauthorized entry.

Also: Is Amazon too big to fail?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Why Dropbox didn’t have to fail

dropbox outage dec 2015

Dropbox is currently experiencing a *major* outage. See the dropbox status page to get an update.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

I’ve written about outages a lot before. Are these types of major failures avoidable? Can we build better, with redundant services so everything doesn’t fall over at once?

Here’s my take.

1. Browse only mode

The first thing Dropbox can do to be more resilient is to build a browsing only mode into the application. Often we hear about this option for performaing maintenance without downtime. But it’s even more important during a real outage like Dropbox is currently experiencing.

Not if but *when* it happens, you don’t have control over how long it lasts. So browsing only can provide you with real insurance.

For a site like Dropbox it would mean that the entire website is still up and operating. Customers can browse their documents, view listings of files & download those files. However they would not be able to *add* or change files during the outage. Thus only a very small segment of customers is interrupted, and it becomes a much smaller PR problem to manage.

Facebook has experienced outages of service. People hardly notice because they’ll often only see a message when they try to comment on someone’s wall post, send a message or upload a photo. The site is still operating, but not allowing changes. That’s what a browsing only mode affords you.

A browsing only mode can make a big difference, keeping most of the site up even when transactions or publish are blocked.

Drupal is an open source platform that powers big publishing sites like Adweek, hollywoodreporter.com & economist.com. It supports a browsing only mode out of the box. An outage like this one would only stop editors from publishing new stories temporarily. It would be a huge win to sites that get 50 to 100 million with-an-m visitors per month.

Also: Is Amazon too big to fail

2. Redundancy

There are lots of components to a web infrastructure. Two big ones are webservers & databases. Turns out Dropbox could make both tiers redundant. How do we do it?

On the database side, you can take advantage of Amazon’s RDS & either read-replicas or Multi-AZ. Each have different service characteristics, so you’ll need to evaluate your app to figure out what works best.

You can also host MySQL, Percona or Mariadb direclty on Amazon instances yourself & then use replication.


Using redundant components like placing webservers and databases in multiple regions, Dropbox could avoid a major outage like they’re experiencing this weekend.

Wondering about MySQL versus RDS? Here are some uses cases.

Now that you’re using multiple zones & regions for your database the hard work is completed. Webservers can be hosted in different regions easily, and don’t require complicated replication to do it.

Related: Are SQL databases dead?

3. Feature flags

On/off switches are something we’re all familiar with. We have them in the fuse box in our house or apartment. And you’ll also find a bigger larger shutoff in the basement.

Individual on/off switches are valuable because they allow us to disable inessential features. We can build them into heavier parts of a website, allowing us to shutdown features in an emergency. Host components in multiple availability zones for extra piece of mind.

Read: 5 Things toxic to scalability

4. Simian armies

Netflix has taken a more progressive & proactive approach to outages. They introduce their own! Yes that’s right they bake redundancy & automation right into all of their infrastructure, then have a loose canon piece of software called Chaos Monkey that periodically kills servers. Did I hear that right? Yep it actually nocks components offline, to actively test the system for resiliency.

Take a look at the Netflix blog for details on intentional load & stress testing.

Also: When hosting data on Amazon turns bloodsport

5. Multiple clouds

If all these suggestions aren’t enough for you, taking it further you could do what George Reese of enstratus recommends and use multiple cloud providers. Not being dependant on one company could help in many situations, not just the ones described here.

Basic Amazon EC2 best practices require building redundancy into your infrastructure. Virtual servers & on-demand components are even less reliable than commodity hardware we’re familiar with. Because of that, we must use Amazon’s automation to insure us against expected failure.

Also: Why I like Etsy’s site performance report

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Five things I learned at NY CTO Summit 2015

cto summit 2015

Enjoyed attending the New York CTO Summit yesterday with a notable list of presenters. Looking forward to the slides. Links to follow.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

1. Product is a reflection of teams

Conway’s law was repeated by three different presenters!

Also: Is the difference between dev & ops a four-letter word?

2. Agile government

Government efficiency can be tackled with startup efficiencies!

Related: Is AWS enabling Angellist to boil the VC business?

3. Learning culture

There are lots of benefits to building a learning culture, not least is making the business succeed.

Read: Do managers underestimate operational cost?

4. Don’t report to finance

Let’s remember how important which teams report to whom is. It can make or break your technology initiatives.

Also: Is Amazon too big to fail?

5. Course correction & size

The cost of changing course gets bigger as your org does.

Also: Airbnb didn’t have to fail

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters