Category Archives: Cloud Computing

Is Zero downtime even possible on RDS?

amazon rds mysql

Join 29,000 others and follow Sean Hull on twitter @hullsean.

Oh RDS, you offer such promise, but damn it if the devil isn’t always buried in the details.

Diving into a recent project, I’ve been looking at upgrading RDS MySQL. Major MySQL upgrades can be a bit messy. Since the entire engine is rebuilt, queries performance can change, syntax can break, and surely triggers & stored procedures can have problems.

That’s not even getting into it with storage engines. Still have some tables on MyISAM? Beware.

The conclusion I would make is if you want zero downtime, or even nearly zero, you’re going to want to roll your own MySQL on EC2 instances.

Read: Why high availability is so very hard to deliver

1. How long did that upgrade take?

First thing I set out to do was upgrade a test instance. One of the first questions my client asked, how long did that take? “Ummm… you know I can’t tell you clearly.” For an engineer this is the worst feeling. We live & die by finding answers. When your hands are tied, you really can’t say what’s going on behind the curtain.

While I’m sitting at the web dashboard, I feel like I’m trying to pickup a needle with thick leather gloves. Nothing to grasp here. At one point the dashboard was still spinning, and I was curious what was happening. I logged out and back in again, and found the entire upgrade step had already completed. I think that added five minutes to perceived downtime.

Sure I can look at the RDS instance log, and tell you when RDS logged various events. But when did the machine go offline, and when did it return for users? That’s a harder question to answer.

Without command line, I can’t monitor the process carefully, and minimize downtime. I can only give you a broad brush idea of what’s happening.

Also: RDS or MySQL 10 use cases

2. Did we need to restart the instance?

RDS insists on rebooting the instance itself, everytime it performs a “Modify” operations. Often restarting the MySQL process would have been enough! This is like hunting squirrels with a bazooka. Definitely overkill.

As a DBA, it’s frustrating to watch the minutes spin by while your hands are tied. At some point I’m starting to wonder… Why am I even here?

Related: Howto automate MySQL slow query analysis with Amazon RDS

3. EBS Snapshots are blunt instruments

RDS provides some protection against a failed upgrade. The process will automatically snapshot your volume before it begins. That’s great. If I spend

See also: Is Amazon RDS hard to manage

4. Even promoting a read-replica sucks

I also evaluated using a read-replica. Here you spinup a slave first. You then upgrade *THAT* box to 5.6 ahead of your master. While your master is still sending data to the slave, your downtime would in theory be very minimal. Put master in read-only mode, wait few seconds for slave to catchup and switch application to point to slave, then promote it!

All that would work well with command line, as your instances don’t restart. But with RDS, it takes over seven long minutes!

Read this: 5 Reasons to move data to Amazon Redshift

5. RDS can upgrade to MySQL 5.6!

MySQL 5.6 introduced a new timestamp datatype which allows for fractional seconds. Great feature, but it means the on-disk datastructures are different. Uh oh!

If you’re doing replication with MySQL 5.5 to 5.6 it will break because the rows will flow out in one size, and break the 5.6 formatted datafiles! Not good.

The solution requires running ALTER commands run on the master beforehand. That in turn locks up tables. So it turns out promoting a read-replica is a non-starter for 5.5 to 5.6. Doesn’t really save much.

All of this devil in the details stuff is terrible when you don’t have command line access.

Read: Are SQL databases dead?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

5 reasons to move data to Amazon Redshift

redshift amazon

Join 28,000 others and follow Sean Hull on twitter @hullsean.

Amazon is rolling out new database offerings at a rapid clip. I wondered Did MySQL and Mongodb just have a beautiful baby called Aurora? That was last month.

Another that’s been out for a while is the data warehouse offering called RedShift.

1. old-fashioned SQL interface

Ok, yes Redshift can support petabyte databases and this in itself is staggering to consider. But just after you digest that little fact, you’ll probably discover that it’s SQL compatible.

This is a godsend. It means the platform can leverage all of the analytical tools already in the marketplace, ones that your organization is already familiar. Many are already certified on RedShift such as Looker and Chart IO.

Also: Are SQL Databases Dead?

2. Lots of ways to load data

After you build your first cluster, the first question on your mind will be, “How do I get my data into RedShift?” Fortunately there are lots of ways.

Stage in S3 & use COPY

Everyone using AWS is already familiar with S3, and RedShift uses this as a staging ground. Create a bucket for your csv or other datafiles, then parallel load them with the special COPY command.

For those coming from the Oracle world, this is like SQL*Loader, which doesn’t go through the SQL engine, but directly loads data as blocks into datafiles. Very fast, very parallel.

AWS Data Pipeline

Some folks are leveraging the AWS Data Pipeline to copy MySQL tables straight into RedShift.

FlyData for Amazon MySQL RDS

I’m in the process of evaluating FlyData sync. This is a service based solution which connects to your Amazon RDS for MySQL instance, capturing binload data much like Oracle’s GoldenGate does, and ships it across to RedShift for you.

If you have constantly changing data, this may be ideal as you don’t have a one-shot dataload option, implied by the basic COPY command solution.

Read: What is ETL and why is it important?

3. Very fast or very big nodes

There are essentially two types of compute nodes for RedShift, DW2 are dense compute running on SSD. As we all know, these are very fast solid state memory drives, and bring huge disk I/O benefits. Perfect for a data warehouse. They cost about $1.50/Tb per hour.

The second type is DW1 or so-called dense storage nodes. These can scale up to a petabyte of storage. They are running on traditional storage disks so aren’t SSD fast. They’re also around $0.50/Tb per year. So a lot cheaper.

Amazon recommends if you’re less than 1Tb of data, go with Dense Compute or DW2. That makes sense as you get SSD speed right out of the gates.

Related: What is a data warehouse?

4. distkeys, sortkeys & compression

The nice thing about NoSQL databases is you don’t have to jump through all the hoops trying to shard your data with a traditional database like MySQL. That’s because distribution is supported right out of the box.

When you create tables you’ll choose a distkey. You can only have one on a table, so be sure it’s the column you join on most often. A timestamp field, or user_id, perhaps would make sense. You’ll choose diststyle as well. ALL means keep an entire copy of the table on each node, key means organize based on this distkey, and EVEN the default means let Amazon try to figure it out.

RedShift also has sortkeys. You can have more than one of these on your table, and they are something like b-tree indexes. They order values, and speed up sorting.

Check: 8 Questions to ask an AWS expert

5. Compression, defragmentation & constraints

Being a columnar database, Redshift also supports collumn encodings or compression. There is LZO often used for varchar columns, bytedict and runlength are also common. One way to determine these is to load a sample of data, say 100,000 rows. From there you can ANALYZE COMPRESSION on the table, and RedShift will make recommendations.

A much easier way however, is to use the COPY command with COMPUPDATE ON. During the initial load, this will tell RedShift to analyze data as it is loaded and set the column compression types. This is by far the most streamlined approach.

RedShift also supports Table constraints, however they don’t restrict data. Sounds useless right? Execept they do inform the optimizer. What’s that mean? If you know you have a primary key id column, tell RedShift about it. No it won’t enforce that but since your source database is, you’re able to pass along that information to RedShift for optimizing queries.

You’ll also find some of the defragmentation options from Oracle & MySQL present in Redshift. There is vacuum which reorganizes the table & resets the high water mark, while it is still online for updates. And then there is Deep Copy which is more thorough, but takes the table offline to do it. It’s faster, but locks the table.
o deep copy

Related: Is Oracle killing MySQL?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Did MySQL & Mongo have a beautiful baby called Aurora?

amazon aurora slide

Amazon recently announced RDS Aurora a new addition to their database as a service offerings.

Here’s Mark Callaghan’s take on what’s happening under the hood and thoughts from Fusheng Han.

Amazon is uniquely positioned with RDS to take on offerings like Clustrix. So it’s definitely worth reading Dave Anselmi’s take on Aurora.

Join 28,000 others and follow Sean Hull on twitter @hullsean.

1. Big availability gains

One of the big improvements that Aurora seems to offer is around availability. You can replicate with aurora, or alternatively with MySQL binlog type replication as well. They’re also duplicating data two times in three different availability zones for six copies of data.

All this is done over their SSD storage network which means it’ll be very fast indeed.

Read: What’s best RDS or MySQL? 10 Use Cases

2. SSD means 5x faster

The Amazon RDS Aurora FAQ claims it’ll be 5x faster than equivalent hardware, but making use of it’s proprietary SSD storage network. This will be a welcome feature to anyone already running on MySQL or MySQL for RDS.

Also: Is MySQL talent in short supply?

3. Failover automation

Unplanned failover takes just a few minutes. Here customers will really be benefiting from the automation that Amazon has built around this process. Existing customers can do all of this of course, but typically require operations teams to anticipate & script the necessary steps.

Related: Will Oracle Kill MySQL?

4. Incremental backups & recovery

The new Aurora supports incremental backups & point-in-time recovery. This is traditionally a fairly manual process. In my experience MySQL customers are either unaware of the feature, or not interested in using it due to complexity. Restore last nights backup and we avoid the hassle.

I predict automation around this will be a big win for customers.

Check out: Are SQL Databases dead?

5. Warm restarts

RDS Aurora separates the buffer cache from the MySQL process. Amazon has probably accomplished this by some recoding of the stock MySQL kernel. What that means is this cache can survive a restart. Your database will then start with a warm cache, avoiding any service brownout.

I would expect this is a feature that looks great on paper, but one customers will rarely benefit from.

See also: The Myth of Five Nines – Is high availability overrated?

Unanswered questions

The FAQ says point-in-time recovery up to the last five minutes. What happens to data in those five minutes?

Presumably aurora duplication & read-replicas provide this additional protection.

If Amazon implemented Aurora as a new storage engine, doesn’t that mean new code?

As with anything your mileage may vary, but Innodb has been in the wild for many years. It is widely deployed, and thus tested in a variety of environments. Aurora may be a very new experiment.

Will real-world customers actually see 500% speedup?

Again your mileage may vary. Lets wait & see!

Related: 5 Things toxic to scalability

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How do you get prepared for Infrastructure Engineering jobs?

datacenter-rack

I just started contributing to a great site called Career Dean. It offers a forum where students and new college graduates can learn from those with established careers in industry.

Join 28,000 others and follow Sean Hull on twitter @hullsean.

A recent question…

Application infrastructure is not something we learned in my college, and it’s definitely not something I will learn anytime soon in my current job (I work as a mobile developer for a mid-sized startup). I also think it’s not something you can just goof around with in your own computer. 

Do companies prepare their software engineers when hiring infrastructure engineers, or do they all expect you to know your skills and tools? 

Also: Is automation killing old-school operations

For example, My guess is that Facebook has a huge infrastructure team making the site usable and fast for as many people as possible. Where can you learn that skills, or get prepared for that time of job? Do you think it is possible to self-learn those skills?

Here’s my take on some of this. Since the invention of Linux, experimenting with infrastructure has been within reach. In the present day there are some even better reasons to experiment & teach yourself about this important aspect of devops & backend server management.

Early Linux circa 1992

Before Linux (in the 80′s we’re talking about) it was a lot harder. Into the 90′s Linux came on the scene and you could cobble together parts, video, motherboard, memory, ide or scsi bus & disks & build a 486 tower. You could then start building linux. I mean because of course everything had to be hand rolled (compiled by hand & debugged usually)!

Also: Is five nines availability a myth in todays datacenters?

Present day virtualization

Fast forward 20 years, and it’s an incredible time to be messing with infrastructure. Why? Because virtualization means you can do it all right on your laptop.

Also: Are SQL databases dead?

What to learn

Start learning Vagrant. It automates the provisioning of virtual machines on your own desktop. You can boot those linux boxes to your hearts content, network between them, hack them, run services on them, build your skills.

I’d also recommend digging into docker. It is the lightening fast younger brother to Virtualization.

Also: Is Oracle trying to kill MySQL?

Fundamentals

You really need those fundamentals. Build some 1.x Linux kernels and see if you can get ‘em running. That’ll teach you some hacking & troubleshooting skills. Find forums to get answers.

Also take a look at CoreOS. It has some really cool stuff around infrastructure management & automation.

Also: Is the art of resistance important to devops success?

After all of that, you might want to play around with puppet or chef. Learn how to setup continuous integration, jenkins etc.

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Today’s startups: assemble at your own risk

devops divide

I was talking with Todd Hoff recently over at High Scalability about a trend I’ve seen of late.

ME: I really liked this post by Zoli Kahan from Clay.io.  AWS, cloudflare, docker, haproxy, mysql, mongo, memcache, ansible.  They use just about every technology being talked about these days.  

Todd: Yah, that’s why I asked to republish it. I thought it was a good updated sampler stack.

ME: That said I defy you to find a team that actually *KNOWS* all those technologies.  

Todd: Agreed. Systems are a lot of assembly these days, which doesn’t mean we know how to build the parts being assembled.

Join 28,000 others and follow Sean Hull on twitter @hullsean.

The article I was referring to was: How Clay.io Built their 10x Arch Using AWS, Docker, HAProx & Lots More

1. Dizzying array of technologies in use

I’ve been working with startups since the mid-nineties. In those days most application stacks consisted of a PHP application running on Apache, with Oracle on the backend. Both webserver & db ran on Sun Solaris. Hardware was reliable. Most attention was focused on fitting everything in memory, and monitoring the servers for swapping, and disk failure. Boy have those days changed.

I see dozens of startups each year, so I see a lot of very cutting edge environments. Here’s a peak at what I’m seeing these days:

Database: MySQL, Postgres & Oracle, to Mongodb, Cassandra & Couchbase

Caching: Memcache or Redis

Search: Solr

Webservers: Apache, Nginx, Lighttpd

Load balancers: haproxy, Zen

Languages: PHP, Python & Ruby

Publishing: Drupal, WordPress, Joomla

Continuous Integration: Jenkins

Metrics: Cacti, collectd, NewRelic

Monitoring: Nagios, Ganglia, Munin, OpenNMS

Automation: Ancible, Chef, Puppet, Docker & Vagrant

Logs: Logstash

DDOS & CDN: Cloudflare, Ultradns

Whew… That’s a long list!! And we’re not even considering the API’s that many applications are now building on.

Also: Are generalists better at scaling the web?

2. Shortcuts abound

Startups early on, don’t have enough working capital to hire a huge engineering team. So that means everyone is stretched. With a list of technologies that is ever growing, something’s gotta give.

These may cut corners by handing the web & technical operations work to a developer who has some skills. But I continue to ask… Does a four-letter word divide dev & ops?

Read: Which tech do startups use most?

3. More things to break & master

Ownership of a software stack, such as a database means mastery of…

o features in current versions
o bugs of current versions
o vulnerabilities of various versions
o troubleshooting
o best practices
o backup & reliability

For example a lot of shops where I dig into the database, I find low hanging fruit, such as misconfigured startup settings, table layout or index usage.

I see similar things when a networking expert pours over the haproxy configuration, or runs ping tests across the network. Most of these components are setup with fairly vanilla configurations, leaving loose ends and frayed threads.

Check out: Why I can’t raise the bar at every firm

4. Many startups carrying technical debt

I’ve seen a growing reliance on ORM’s which is worrying. Build your foundation on a crutch, and it gets very hard to eliminate down the line. Here are Ward Cunningham’s warnings on technical debt.

Related: Are SQL Databases Dead?

5. Long term support & viability

At one five year old firm, I was brought in to address scalability problems. I met with the team and was asked to provide a comprehensive review. The first thing I found was all the original engineers had long since left, so the code was new for everyone. As I dug my heels in, I found multiple versions of Apache along with Nginx on some other servers. Their stack was built on a patchwork of Python, Ruby & PHP. Then digging in further, we found a complicated web of dependencies for digital assets, mounted across servers & unmonitored.

Lack of standards is common in environments like these. Without an operational or architectural lead, developers are left to make decisions with what is directly in front of them. Though a decision of what language to use may appear simple at the outset, it carries long term consequences.

Will that language or technology be supported in five years? Will the community survive? Will your firm be able to hire people with that skill set? Will engineers still be excited about it?

See also: Is high availability overrated? Is five nines a myth?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Is automation killing old-school operations?

puppet logo

Join 27,000 others and follow Sean Hull on twitter @hullsean.

I was shocked to find this article on ReadWrite: The Truth About DevOps: IT Isn’t Dead; It’s not even Dying. Wait a second, do people really think this?

Truth is I have heard whispers of this before. I was at a meetup recently where the speaker claimed “With more automation you can eliminate ops. You can then spend more on devs”. To an audience of mostly developers & startup founders, I can imagine the appeal.

1. Does less ops mean more devs?

If you’re listening to a platform service sales person or a developer who needs more resources to get his or her job done, no one would be surprised to hear this. If we can automate away managing the stack, we’ll be able to clear the way for the real work that needs to be done!

This is a very seductive perspective. But it may be akin to taking on technical debt, ignoring the complexity of operations and the perspective that can inform a longer view.

chef logo

Puppet Labs’ Luke Kanies says “Become uniquely valuable. Become great at something the market finds useful.”. I couldn’t agree more.

Read: Are SQL Databases Dead?

2. What happens when developers leave?

I would argue that ops have a longer view of product lifecycle. I for one have been brought in to many projects after the first round of developers have left, and teams are trying to support that software five years after the first version was built.

That sort of long term view, of how to refresh performance, and revitalize code is a unique one. It isn’t the “building the future” mindset, the sexy products, and disruptive first mover “we’re changing the world” mentality.

It’s a more stodgy & conservative one. The mindset is of reliability, simplicity, and long term support.

Also: How to hire a developer that doesn’t suck

3. What’s your mandate?

From what I’ve seen, devs & ops are divided by a four letter word.

That word I believe is “risk”. Devs have a mandate from the business to build features & directly answer to customer requests today. Ops have a mandate to reliability, working against change and thinking in terms of making all that change manageable.

Different mandates mean different perspectives.

Related: What is Devops & why is it important?

4. Can infrastructure live as code?

Puppet along with infrastructure automation & configuration management tools like Chef offer the promise of fully automated infrastructure. But the truth is much much more complex. As typical technology stacks expand from load balancer, webserver & database, to multiple databases, caching server, search server, puppet masters, package repositories, monitoring & metrics collection & jump boxes we’re all reaching a saturation point.

Yes automation helps with that saturation, but ultimately you need people with those wide ranging skills, to manage the complex web of dependencies when things fail.

And fail they will.

Check out: Why are MySQL DBA’s and ops so hard to find?

5. ORM’s and architecture

If you aren’t familiar, ORM’s are a rather dry sounding name for a component that is regularly overlooked. It’s a middleware sitting between application & database, and they drastically simplify developers lives. It helps them write better code and get on with the work of delivering to the business. It’s no wonder they are popular.

But as Ward Cunningham elloquently explains, they are surely technical debt that eventually must get paid. Indeed.

There is broad agreement among professional DBA’s. Each query should be written, each one tuned, and each one deployed. Just like any other bit of code. Handing that process to a library is doomed to failure. Yet ORM’s are still evolving, and the dream still lives on.

And all that because devs & ops have a completely different perspective. We need both of them to run modern internet applications. Lets not forget folks. :)

Read this: Do managers and CTO’s underestimate operational costs?

Want more? Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How to deploy on Amazon EC2 with Vagrant

vagrant logo

Join 16,000 others and follow Sean Hull on twitter @hullsean.

Why do I want Vagrant?

Vagrant is a really powerful tool for managing virtual machines. If you’re a developer it can make it push-button simple to setup a dev box on your laptop. It manages the images, and uses configuration files to describe specifics of your machines.

In the amazon environment, you can deploy machines just as easily as on your desktop. That’s pretty exciting for those of us already familiar with Vagrant. With that I’ve provided a simple 7 step howto for doing just that!

Also: Are SQL Databases Dead?

1. Use the Mac OS X installer

Fetch your download file here:

Vagrant Installer Downloads

Run the installer. It should do the right thing!

Also: Why Oracle Won’t Kill MySQL

2. Install the vagrant-aws plugin


$ vagrant plugin install vagrant-aws

Also: Bulletproofing MySQL Replication with Checksums

3. Fetch a vagrant box image

Box images vary depending on your “provider” which is vagrant-speak for the environment you’re running in. For aws, they’re some simple json files that tell Vagrant how to work in that environment.

The creator of the plugin has provided a dummy box. Let’s fetch it:


$ vagrant box add dummy https://github.com/mitchellh/vagrant-aws/raw/master/dummy.box

This command is straight out of the readme. What does it do? Take a look:


$ cd /var/root/.vagrant.d/boxes/dummy/aws

$ cat metadata.json
{
"provider": "aws"
}

There’s also the info.json file which looks like this:


$ cat info.json
{"url":"https://github.com/mitchellh/vagrant-aws/raw/master/dummy.box","downloaded_at":"2014-01-14 17:42:33 UTC"}

There’s not a whole lot going on here. If you’re deploying VirtualBox VMs with Vagrant, you’d see a VMware4 disk image. But with Amazon, it stores it’s own AMIs on S3, so Vagrant simply fetches them and runs them for you.

Related: Intro to EC2 Cloud Deployments

4. Configure Vagrantfile

Create a directory to hold your vagrant metadata. This would be the name of your machine:


$ cd /var/root
$ mkdir testaws
$ cd testaws
$ vagrant init

Edit the file as follows:


Vagrant.configure("2") do |config|
# config.vm.box = "sean"

config.vm.provider :aws do |aws, override|
aws.access_key_id = "AAAAIIIIYYYY4444AAAA”
aws.secret_access_key = "c344441LooLLU322223526IabcdeQL12E34At3mm”
aws.keypair_name = "iheavy"

aws.ami = "ami-7747d01e"

override.ssh.username = "ubuntu"
override.ssh.private_key_path = "/var/root/iheavy_aws/pk-XHHHHHMMMAABPEDEFGHOAOJH1QBH5324.pem"
end
end

If you’re familiar with the Amazon command line tools, you’ve probably setup environment variables. Otherwise these may not be familiar to you, so lets go through them:

Your access_key_id and secret_access_key are two pieces of information Amazon uses to identify your instances and bill you. Those are unique to your environment so keep them close to the vest. Here’s how you create them or find them on your aws dashboard.

The keypair_name is your personal SSH key. You may have one on your laptop which you use to access other servers. If so you can upload to the amazon environment. If not you can also use the dashboard to create your own. Whenever you spinup a server, you can instruct amazon to drop that key on the box in the right place. Then you’ll have secure command line access to the box, without password. Great for automation!

Next is your AMI. This is an important choice, as it determines the OS of the machine you’ll spinup, and many other characteristics. You can go with a Amazon Linux AMI but I quite like the Alestic ones from Eric Hammond. Trusted & reliable.

Looking for an ubuntu AMI? Try this ami locator tool.

Check this: 8 Best Practices for Deplying MySQL on AWS

5. Startup the box

Starting an instance once you’ve configured your Vagrantfile is pretty straightforward.


$ vagrant up —-provider=aws

Related: How to autoscale MySQL on Amazon EC2

6. Verify in the Amazon dashboard

Jump over to your amazon dashboard with this link. If you’re logged in already, that will take you to your EC2 instances. You should see a new one, based on the parameters in your Vagrantfile.

Read: Why devops talent is in short supply

7. Login to your Amazon instance

Last but not least, you’ll want to login. Note I’m explicitly specifying my SSH key here. Your path may vary…


$ ssh -i ./iheavy.pem ubuntu@ec2-50-220-50-40.compute-1.amazonaws.com

Also: 5 more things deadly to scalability

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Why cloud computing is the spotify-cation of hosting

dvd collection

Join 16,000 others and follow Sean Hull on twitter @hullsean.

1. Music collections of old

Way way back in the 70’s I remember riding around in a VW beetle. Maybe I’d be driving
with my dad or my uncle. Everybody seemed to own a VW! What everybody also had was a huge collection of 8-track taps in a big box. You’d dig through the box and find what you wanted to play, then pop in the tape. It was exciting because before 8-tracks you only had records, and you couldn’t play those in the car!

But even record collections were new in the 60’s. Before that, most music was consumed live or on the radio.

Also: Why a killer title can make or break your content efforts

2. When books left the library

A similar trend followed for books and reading. Although newspapers have been sold by subscription for a lot longer, books were mostly consumed in libraries. But the consumer itch to build collections eventually built Barnes & Noble into a powerhouse brick and mortar store.

Internet disruption of that business model came too. Enter Amazon’s Kindle. Although you theoretically *buy* digital books, if you read the fine print you’ll see you actually rent them in perpetuity. In fact there have been cases where Amazon has reached into devices and removed previously purchased media.

Related: Why AirBNB didn’t have to fail

3. Managing collections (even stolen ones) is hard work

When you download music or movies, either from iTunes or god forbid grabbing it off of Bittorrent networks, you need to put it somewhere. You’ll store it on your laptop harddrive or if your collection is large enough, on some shared storage system at home. And you’ll also probably never back it up.

The thing is harddrives themselves have a life of about two to four years. As an operations guy I manage data everyday. Backups are a big part of that process, so when the media fails, you won’t lose the collection of movies & music you built lovingly over so many years.

Sadly most people learn the hard way. And when you learn this lesson you probably think, where did all that time go? What did I even *have* in my collection?

Also: Are SQL Databases Dead?

4. Why music & movie theft was just a blip on the historical radar

I’m also a bit of a Doctor Who fan. Since it’s a rather obscure British TV show (or was) I spent some time buying many of the old episodes on DVD. Or I *did* rather, until Netflix starting offering the whole classic collection on subscription. They did this with Star Trek too. Now I have no reason to fish through my shelves for a DVD. Why would I?

As users become more accustomed to the subscription model, they’re less likely to want to build a whole collection of media. This goes well for books, music & videos. Who would bother downloading off of Bittorrents, managing your home collection, and all that trouble when you can just subscribe. Easy. No mess!

Read: Why Oracle Won’t Kill MySQL

5. Subscriptions, subscriptions everywhere!

Whether you managed a datacenter of physical servers in-house, or bought servers managed by a hosting company before the subscription model you had to worry about moving parts. You had to worry about failing harddrives, memory & all the rest.

Then along comes Amazon Web Services and it’s EC2 servers bringing the subscription model to hosting too. This raises the bar on the biggest failing component harddrives, but putting all data on EBS, their virtual storage network. All of this raises the bar for a lot of organizations and reduces the drudgery.

What spotify is doing with music, Netflix is doing for movies & tv shows, and kindle is doing for books. That same trend has brought great disruption to the internet & server hosing. Startups and consumers win big in this game.

Can you think of any businesses where a subscription model might work? They may be ripe for disruption by a new startup.

Check out: Why your startup is failing at Devops

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

The Most Important AWS Feature for Performance and Scalability

Join 6100 others and follow Sean Hull on twitter @hullsean.

The Foundation of Speed

All servers use disk to store files. Operating system libraries, webserver & application code, and most importantly databases all use disk constantly.

So disk speed is crucial to server speed.

[mytweetlinks]

[quote]
Disk speed is crucial for MySQL databases. It has been a real challenge in multi-tenant environments like Amazon’s EBS. The provisioned IOPS feature addresses this head on, allowing customers to lock in great MySQL database performance!
[/quote]

Also check out: Five more things Deadly to Scalability.

Disk Performance on Multi-tenant EBS

Amazon’s EBS or elastic block storage, is a virtualized network storage solution. You can think of it as RAIDed disks but accessed & provisioned over a high speed network.

Related: Why Generalists are Better at Scaling the Web

Since Amazon is a multi-tenant environment, other customers are using that same network, and hitting those same disks. So if your neighbors are seeing a lot of traffic to disk, your web application can slow down. Not good!

What is Provisioned IOPS

We’ll agree that it’s one of the worst branded features ever, but you should know about it and use it, especially for your MySQL databases.

Provisioned means that you’ll lock in performance in advance, and IOPS stands for I/O operations. Think of it as google juice for your cloud database servers!

Also: How I increased my blog pagerank to 5

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Here’s a sample

Business Agility at AWS re:Invent

Also find Sean Hull’s ramblings on twitter @hullsean.

Although I couldn’t be in Vegas to attend re:Invent, there is so much online it’s almost better than being at the conference. From an ongoing live stream of keynotes and sessions, to an archived collection on Youtube.

The big wins

You may have heard of all the great things that Amazon or cloud computing can do, but I thought Andy Jassy summarized these nicely in these six points.

1. Replace capex with opex
2. lower total costs of ownership
3. no guessing about capacity
4. encourage agility & innovation
5. differentiation
6. global from the start

Redshift

By far the biggest announcement at the show is Amazon’s new Redshift product. It is a fully managed datawarehouse solution that scales to petabytes in it’s cloud. Currently there are two business intelligence tools that are supported namely Jaspersoft and Microstrategy.

[quote]
In 2003 Amazon was a 5 billion dollar company. Today AWS adds the same infrastructure capacity everyday to it’s availability zones!
[/quote]

Reduced prices by 25% for S3

As a lot of folks know, Amazon has always been about cheaper prices. That model has been disruptive in the book selling industry, and in a huge way in the infrastructure and datacenter industry. As more customers signup, economies of scale mean they can offer the same hardware & services for lower prices.

With that they’re announcing lower prices for S3 by a whopping 25%. To me this speaks to their continuing push to dominate the market by driving prices downward.

Amazon’s Channel on Youtube

If you weren’t able to attend the conference, or want to recap some highlights you might have missed, they have put up a great AWS Channel on Youtube.

Some of the speakers include Sharon Chiarella VP Mechanical Turk, Glenn Hazard, CEO, Xceedium, Todd Barr CMO of Alfresco talks, Bright Fulton, Operations for Swipely, Colin Percival, FreeBSD Developer, Ted Dunning, Chief Application Architect of MapR Technologies, James Broberg, CTO & Founder of MetaCDN, Mitchell Garnaat, Sr. Engineer, David Etue, Vice President, SafeNet, and Mike Culver, Sr. Consultant to name just a few.

Read this far? Grab our Scalable Startups for more tips and special content.