Category Archives: Cloud Computing

Is AWS the patient that needs constant medication?

storm coming

I was just reading High Scalability about Why Swiftype moved off Amazon EC2 to Softlayer and saw great wins!

We’ve all heard by now how awesome the cloud is. Spinup infrastructure instantly. Just add water! No up front costs! Autoscale to meet seasonal application demands!

But less well known or even understood by most engineering teams are the seasonal weather patterns of the cloud environment itself!

Join 28,000 others and follow Sean Hull on twitter @hullsean.

Sure there are firms like Netflix, who have turned the fickle cloud into one of virtues & reliability. But most of the firms I work with everyday, have moved to Amazon as though it’s regular bare-metal. And encountered some real problems in the process.

1. Everyday hardware outages

Many of the firms I’ve seen hosted on AWS don’t realize the servers fail so often. Amazon actually choosing cheap commodity components as a cost-savings measure. The assumption is, resilience should be built into your infrastructure using devops practices & automation tools like Chef & Puppet.

The sad reality is most firms provision the usual way, through the dashboard, with no safety net.

Also: Is your cloud speeding for a scalability cliff

2. Ongoing network problems

Network latency is a big problem on Amazon. And it will affect you more. One reason is you’re most likely sitting on EBS as your storage. EBS? That’s elastic block storage, it’s Amazon’s NAS solution. Your little cheapo instance has to cross the network to get to storage. That *WILL* affect your performance.

If you’re not already doing so, please start using their most important & easily missed performance feature – provisioned IOPS.

Related: The chaos theory of cloud scalability

3. Hard to be as resilient as netflix

We’ve by now heard of firms such as Netflix building their Chaos Monkey to actively knock out servers, in effort to test their ability to self-healing infrastructure.

From what I’m seeing at startups, most have a bit of devops in place, a bit of automation, such as autoscaling around the webservers. But little in terms of cross-region deployments. What’s more their database tier is protected only by multi-az or just a read-replica or two. These are fine for what they are, but will require real intervention when (not if) the server fails.

I recommend building a browse-only mode for your application, to eliminate downtime in these cases.

Read: 8 questions to ask an aws expert

4. Provisioning isn’t your only problem

But the cloud gives me instant infrastructure. I can spinup servers & configure components through an API! Yes this is a major benefit of the cloud, compared to 1-2 hours in traditional environments like Softlayer or Rackspace. But you can also compare that with an outage every couple of years! Amazon’s hardware may fail a couple times a hear, more if you’re unlucky.

Meanwhile you’re going to deal with season weather problems *INSIDE* your datacenter. Think of these as swarms of customers invading your servers, like a DDOS attack, but self-inflicted.

Amazon is like a weak immune system attacking itself all the time, requiring constant medication to keep the host alive!

Also: 5 Things toxic to scalability

5. RDS is going to bite you

Besides all these other problems, I’m seeing more customers build their applications on the managed database solution MySQL RDS. I’ve found RDS terribly hard to manage. It introduces downtime at every turn, where standard MySQL would incur none.

In my experience Upgrading RDS is like a shit-storm that will not end!

Also: Does open source enable the cloud?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

The chaos theory of cloud scalability

The.Rohit - Flickr

The.Rohit – Flickr

Reading Benedict Evans weekly newsletter, you’re bound to bump into something new & useful. His newsletter covers Mobile, but that also means it touches on a lot of other areas of tech, innovation & startups.

This week he pointed me to A Weissman’s The Chaos Theory of Startups. He argues a VC’s job is to help a startup identify the right framework. It’s about finding the signal in the noise.

Join 29,000 others and follow Sean Hull on twitter @hullsean.

I think you can carry this idea over to technical operations today. There are a few key maxims I follow to keep you on the scalable track.

1. Degrade gracefully

You’ve heard it before, but have you done anything about it?

Build a read-only or browse only mode into your application. Do it now. You will thank me. When your database goes down unexpectedly (with RDS this might happen sooner than you think), you want to be able to use your lovely read-only slave database. Browse only mode, forces developers to add read-only support in most application functions, keeping the site up and running, without a full visible and ugly outage.

Which brings me to point two, be sure to have copies of your production database. Real live, only read-only copies. In Amazon speak, this is a read-replica, in MySQL this is a slave database. Most startups I see these days have this, but if you’re one of the ones dragging your feet, do this now.

Also: Is the difference between dev & ops a four-letter word?

2. Monitor & measure

Amazon’s cloudwatch is fine for what it is, and so is New Relic. But employing a dedicated tool just for monitoring, such a Nagios & cacti can give you much more granular intelligence about what’s happening with your infrastructure. Nagios gives you the monitoring & alerting, Cacti gives you the history. It’s like a BI reporting tool for infrastructure.

Related: Is automation killing old-school operations?

3. Keep components simple

Keep it simple stupid? Don’t adopt new technologies, languages, or versions of software, without first vetting them. Ask questions:

o Is there an existing piece of software or service that can overlap this new one, killing two birds with one stone?
o Does everybody know this new technology?
o Does this choice of technology solve any other broad problems we have?
o Is there a large community around the project?
o Are there a lot of engineers with experience in this chosen technology?

Tellingly, many startups don’t have an operations person to start with. In those, the danger is developers choose new solutions, with no push back.

I asked… Does a four letter word divide dev and ops?

Read: Do managers underestimate operational cost?

4. Don’t force database abstraction

Object Relational Modelers, aka database middleware, are great in theory. We want a library that takes database & SQL drudgery away from developers. Why reinvent the wheel?

The trouble is database independent code doesn’t work, and never has. ORMs are painfully inefficient, selecting all columns, or repeatedly reading rows from tables. This causes serious traffic jams inside your database.

They come in various guises, Cake PHP, Active Record for Ruby, Hibernate for Java, SQL Alchemy for Python.

Also: Is the difference between dev & ops a four-letter word?

5. Be asynchronous

This means don’t make your application code wait. Make asynchronous calls to APIs & check back later, use software queues so traffic backups don’t clog your components & communication.

Avoid any type of two-phase or multi-phase commit. These are common in clustered databases, forcing a serialization point so nodes can agree on what data looks like. However they’re toxic to scalability. Look for technologies that use eventually consistent algorithms.

Also: Is the difference between dev & ops a four-letter word?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Is upgrading RDS like a shit-storm that will not end?

aws logo

Join 29,000 others and follow Sean Hull on twitter @hullsean.

Can RDS worsen an outage ?? That’s another way to think about this question. In my experience, it very clearly increases outages, by tying one or both hands behind your back. Believe me when I say, that is terribly frustrating when you’re putting out fires!

1. Changing Parameters

An everyday occurance, is the need to change database parameters. Want to enable a login, great no problem. Except in RDS it becomes a problem! Ok, you’re thinking, why is that?

In regular MySQL, you login with the shell & issue SET GLOBAL parameter = value; Nice, easy, straightforward. No servers restarting, no nonsense. If the parameter requires a reboot, MySQL will tell you.

In RDS, the process is waay more complex. First you edit a parameter group. You can copy an existing one, or change the one you’re using. If that parameter group applies to many servers, be careful!

Ok, what next? Now you APPLY that new parameter group. You can do so immediately, or during the next maintenance window. Here’s the tricky part. Is Amazon going to restart my instance? That’s something your boss or manager will surely ask you. Well you might think it would only do so if the parameter in question required it. But I tried to enable the general log recently and Amazon tells me the status of “pending-reboot”. This change shouldn’t require that! I’m sitting there scared Amazon might suddenly decide to reboot a production server for no reason!

This is where you feel you’ve lost control. You can dig through docs all you want, but you can’t ever say for sure if a managed service will behave predictably. There’s already more layers of software between you and your relational database. Not what you want.

Also: Did MySQL & Mongo have a beautiful baby called Aurora?

2. How much longer?

Another question you’ll ask yourself is, how long will this maintenance take? With MySQL at the command line, you can run through test after test & time the process. When you go to perform tasks offhours, you already have a clear picture.

With RDS, things can’t be predicted. Servers are restarted when they needn’t be. Rebuilds take forever, and you have no progress bar. EBS performance has a hiccup and your snapshot time doubles. The troubles go on and on.

Related: Is automation killing old-school operations?

3. Why did Amazon just force an OS upgrade?

Here’s another surprise I ran into. Again we have a managed solution, so Amazon must take opportunities when they can. But you pay for it in unpredictability.

Going to perform a MySQL 5.1 to 5.5 upgrade, and I’d run through test after test in advance. Timed the process to about 45 minutes. Then went to do it in production. Amazon decided to throw in the OS upgrade too, adding 40 minutes of surprise time. What’s worse? No progress bar on that either.

Upgrades are nerve wracking enough, without this kind of stuff scaring the daylights out of you.

Read: Do managers underestimate operational cost?

4. What’s happening on my server?

All of the questions about progress are opaque on RDS because you lack command line. You can’t watch processes, disk I/O or any of the granular stuff. In my surgery analogy below, it’s as though you can’t touch the patient, find their pulse or guage if their skin is cold, clammy or pale.

Also: Is the difference between dev & ops a four-letter word?

5. Surgery with blunt instruments

At the end of the day, RDS feels like surgery with blunt instruments. If command line were your scalpel, windows & GUI tools may be your remote video surgery. And worse still, RDS would be like doing surgery on the Opportunity Mars rover, after it’s landed & stuck in a valley. Everything is delayed, it’s hard to tell what’s going on, and the worst environment to work in when you have an emergency with your database.

If you have any operations experience, deploy your own MySQL on an EC2 instance. You’ll thank yourself later.

Also: Is zero downtime even possible on RDS?

Upside to RDS

Is there any upside? Why do people use it? Push-button replication. Check. Push-button multi-az, check. Those are great if you have no DBA. Automated backups so you don’t shoot yourself in the foot, check.

I guess there is *something* to love.

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Is Zero downtime even possible on RDS?

amazon rds mysql

Join 29,000 others and follow Sean Hull on twitter @hullsean.

Oh RDS, you offer such promise, but damn it if the devil isn’t always buried in the details.

Diving into a recent project, I’ve been looking at upgrading RDS MySQL. Major MySQL upgrades can be a bit messy. Since the entire engine is rebuilt, queries performance can change, syntax can break, and surely triggers & stored procedures can have problems.

That’s not even getting into it with storage engines. Still have some tables on MyISAM? Beware.

The conclusion I would make is if you want zero downtime, or even nearly zero, you’re going to want to roll your own MySQL on EC2 instances.

Read: Why high availability is so very hard to deliver

1. How long did that upgrade take?

First thing I set out to do was upgrade a test instance. One of the first questions my client asked, how long did that take? “Ummm… you know I can’t tell you clearly.” For an engineer this is the worst feeling. We live & die by finding answers. When your hands are tied, you really can’t say what’s going on behind the curtain.

While I’m sitting at the web dashboard, I feel like I’m trying to pickup a needle with thick leather gloves. Nothing to grasp here. At one point the dashboard was still spinning, and I was curious what was happening. I logged out and back in again, and found the entire upgrade step had already completed. I think that added five minutes to perceived downtime.

Sure I can look at the RDS instance log, and tell you when RDS logged various events. But when did the machine go offline, and when did it return for users? That’s a harder question to answer.

Without command line, I can’t monitor the process carefully, and minimize downtime. I can only give you a broad brush idea of what’s happening.

Also: RDS or MySQL 10 use cases

2. Did we need to restart the instance?

RDS insists on rebooting the instance itself, everytime it performs a “Modify” operations. Often restarting the MySQL process would have been enough! This is like hunting squirrels with a bazooka. Definitely overkill.

As a DBA, it’s frustrating to watch the minutes spin by while your hands are tied. At some point I’m starting to wonder… Why am I even here?

Related: Howto automate MySQL slow query analysis with Amazon RDS

3. EBS Snapshots are blunt instruments

RDS provides some protection against a failed upgrade. The process will automatically snapshot your volume before it begins. That’s great. If I spend

See also: Is Amazon RDS hard to manage

4. Even promoting a read-replica sucks

I also evaluated using a read-replica. Here you spinup a slave first. You then upgrade *THAT* box to 5.6 ahead of your master. While your master is still sending data to the slave, your downtime would in theory be very minimal. Put master in read-only mode, wait few seconds for slave to catchup and switch application to point to slave, then promote it!

All that would work well with command line, as your instances don’t restart. But with RDS, it takes over seven long minutes!

Read this: 5 Reasons to move data to Amazon Redshift

5. RDS can upgrade to MySQL 5.6!

MySQL 5.6 introduced a new timestamp datatype which allows for fractional seconds. Great feature, but it means the on-disk datastructures are different. Uh oh!

If you’re doing replication with MySQL 5.5 to 5.6 it will break because the rows will flow out in one size, and break the 5.6 formatted datafiles! Not good.

The solution requires running ALTER commands run on the master beforehand. That in turn locks up tables. So it turns out promoting a read-replica is a non-starter for 5.5 to 5.6. Doesn’t really save much.

All of this devil in the details stuff is terrible when you don’t have command line access.

Read: Are SQL databases dead?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

5 reasons to move data to Amazon Redshift

redshift amazon

Join 28,000 others and follow Sean Hull on twitter @hullsean.

Amazon is rolling out new database offerings at a rapid clip. I wondered Did MySQL and Mongodb just have a beautiful baby called Aurora? That was last month.

Another that’s been out for a while is the data warehouse offering called RedShift.

1. old-fashioned SQL interface

Ok, yes Redshift can support petabyte databases and this in itself is staggering to consider. But just after you digest that little fact, you’ll probably discover that it’s SQL compatible.

This is a godsend. It means the platform can leverage all of the analytical tools already in the marketplace, ones that your organization is already familiar. Many are already certified on RedShift such as Looker and Chart IO.

Also: Are SQL Databases Dead?

2. Lots of ways to load data

After you build your first cluster, the first question on your mind will be, “How do I get my data into RedShift?” Fortunately there are lots of ways.

Stage in S3 & use COPY

Everyone using AWS is already familiar with S3, and RedShift uses this as a staging ground. Create a bucket for your csv or other datafiles, then parallel load them with the special COPY command.

For those coming from the Oracle world, this is like SQL*Loader, which doesn’t go through the SQL engine, but directly loads data as blocks into datafiles. Very fast, very parallel.

AWS Data Pipeline

Some folks are leveraging the AWS Data Pipeline to copy MySQL tables straight into RedShift.

FlyData for Amazon MySQL RDS

I’m in the process of evaluating FlyData sync. This is a service based solution which connects to your Amazon RDS for MySQL instance, capturing binload data much like Oracle’s GoldenGate does, and ships it across to RedShift for you.

If you have constantly changing data, this may be ideal as you don’t have a one-shot dataload option, implied by the basic COPY command solution.

Read: What is ETL and why is it important?

3. Very fast or very big nodes

There are essentially two types of compute nodes for RedShift, DW2 are dense compute running on SSD. As we all know, these are very fast solid state memory drives, and bring huge disk I/O benefits. Perfect for a data warehouse. They cost about $1.50/Tb per hour.

The second type is DW1 or so-called dense storage nodes. These can scale up to a petabyte of storage. They are running on traditional storage disks so aren’t SSD fast. They’re also around $0.50/Tb per year. So a lot cheaper.

Amazon recommends if you’re less than 1Tb of data, go with Dense Compute or DW2. That makes sense as you get SSD speed right out of the gates.

Related: What is a data warehouse?

4. distkeys, sortkeys & compression

The nice thing about NoSQL databases is you don’t have to jump through all the hoops trying to shard your data with a traditional database like MySQL. That’s because distribution is supported right out of the box.

When you create tables you’ll choose a distkey. You can only have one on a table, so be sure it’s the column you join on most often. A timestamp field, or user_id, perhaps would make sense. You’ll choose diststyle as well. ALL means keep an entire copy of the table on each node, key means organize based on this distkey, and EVEN the default means let Amazon try to figure it out.

RedShift also has sortkeys. You can have more than one of these on your table, and they are something like b-tree indexes. They order values, and speed up sorting.

Check: 8 Questions to ask an AWS expert

5. Compression, defragmentation & constraints

Being a columnar database, Redshift also supports collumn encodings or compression. There is LZO often used for varchar columns, bytedict and runlength are also common. One way to determine these is to load a sample of data, say 100,000 rows. From there you can ANALYZE COMPRESSION on the table, and RedShift will make recommendations.

A much easier way however, is to use the COPY command with COMPUPDATE ON. During the initial load, this will tell RedShift to analyze data as it is loaded and set the column compression types. This is by far the most streamlined approach.

RedShift also supports Table constraints, however they don’t restrict data. Sounds useless right? Execept they do inform the optimizer. What’s that mean? If you know you have a primary key id column, tell RedShift about it. No it won’t enforce that but since your source database is, you’re able to pass along that information to RedShift for optimizing queries.

You’ll also find some of the defragmentation options from Oracle & MySQL present in Redshift. There is vacuum which reorganizes the table & resets the high water mark, while it is still online for updates. And then there is Deep Copy which is more thorough, but takes the table offline to do it. It’s faster, but locks the table.
o deep copy

Related: Is Oracle killing MySQL?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Did MySQL & Mongo have a beautiful baby called Aurora?

amazon aurora slide

Amazon recently announced RDS Aurora a new addition to their database as a service offerings.

Here’s Mark Callaghan’s take on what’s happening under the hood and thoughts from Fusheng Han.

Amazon is uniquely positioned with RDS to take on offerings like Clustrix. So it’s definitely worth reading Dave Anselmi’s take on Aurora.

Join 28,000 others and follow Sean Hull on twitter @hullsean.

1. Big availability gains

One of the big improvements that Aurora seems to offer is around availability. You can replicate with aurora, or alternatively with MySQL binlog type replication as well. They’re also duplicating data two times in three different availability zones for six copies of data.

All this is done over their SSD storage network which means it’ll be very fast indeed.

Read: What’s best RDS or MySQL? 10 Use Cases

2. SSD means 5x faster

The Amazon RDS Aurora FAQ claims it’ll be 5x faster than equivalent hardware, but making use of it’s proprietary SSD storage network. This will be a welcome feature to anyone already running on MySQL or MySQL for RDS.

Also: Is MySQL talent in short supply?

3. Failover automation

Unplanned failover takes just a few minutes. Here customers will really be benefiting from the automation that Amazon has built around this process. Existing customers can do all of this of course, but typically require operations teams to anticipate & script the necessary steps.

Related: Will Oracle Kill MySQL?

4. Incremental backups & recovery

The new Aurora supports incremental backups & point-in-time recovery. This is traditionally a fairly manual process. In my experience MySQL customers are either unaware of the feature, or not interested in using it due to complexity. Restore last nights backup and we avoid the hassle.

I predict automation around this will be a big win for customers.

Check out: Are SQL Databases dead?

5. Warm restarts

RDS Aurora separates the buffer cache from the MySQL process. Amazon has probably accomplished this by some recoding of the stock MySQL kernel. What that means is this cache can survive a restart. Your database will then start with a warm cache, avoiding any service brownout.

I would expect this is a feature that looks great on paper, but one customers will rarely benefit from.

See also: The Myth of Five Nines – Is high availability overrated?

Unanswered questions

The FAQ says point-in-time recovery up to the last five minutes. What happens to data in those five minutes?

Presumably aurora duplication & read-replicas provide this additional protection.

If Amazon implemented Aurora as a new storage engine, doesn’t that mean new code?

As with anything your mileage may vary, but Innodb has been in the wild for many years. It is widely deployed, and thus tested in a variety of environments. Aurora may be a very new experiment.

Will real-world customers actually see 500% speedup?

Again your mileage may vary. Lets wait & see!

Related: 5 Things toxic to scalability

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How do you get prepared for Infrastructure Engineering jobs?

datacenter-rack

I just started contributing to a great site called Career Dean. It offers a forum where students and new college graduates can learn from those with established careers in industry.

Join 28,000 others and follow Sean Hull on twitter @hullsean.

A recent question…

Application infrastructure is not something we learned in my college, and it’s definitely not something I will learn anytime soon in my current job (I work as a mobile developer for a mid-sized startup). I also think it’s not something you can just goof around with in your own computer. 

Do companies prepare their software engineers when hiring infrastructure engineers, or do they all expect you to know your skills and tools? 

Also: Is automation killing old-school operations

For example, My guess is that Facebook has a huge infrastructure team making the site usable and fast for as many people as possible. Where can you learn that skills, or get prepared for that time of job? Do you think it is possible to self-learn those skills?

Here’s my take on some of this. Since the invention of Linux, experimenting with infrastructure has been within reach. In the present day there are some even better reasons to experiment & teach yourself about this important aspect of devops & backend server management.

Early Linux circa 1992

Before Linux (in the 80′s we’re talking about) it was a lot harder. Into the 90′s Linux came on the scene and you could cobble together parts, video, motherboard, memory, ide or scsi bus & disks & build a 486 tower. You could then start building linux. I mean because of course everything had to be hand rolled (compiled by hand & debugged usually)!

Also: Is five nines availability a myth in todays datacenters?

Present day virtualization

Fast forward 20 years, and it’s an incredible time to be messing with infrastructure. Why? Because virtualization means you can do it all right on your laptop.

Also: Are SQL databases dead?

What to learn

Start learning Vagrant. It automates the provisioning of virtual machines on your own desktop. You can boot those linux boxes to your hearts content, network between them, hack them, run services on them, build your skills.

I’d also recommend digging into docker. It is the lightening fast younger brother to Virtualization.

Also: Is Oracle trying to kill MySQL?

Fundamentals

You really need those fundamentals. Build some 1.x Linux kernels and see if you can get ‘em running. That’ll teach you some hacking & troubleshooting skills. Find forums to get answers.

Also take a look at CoreOS. It has some really cool stuff around infrastructure management & automation.

Also: Is the art of resistance important to devops success?

After all of that, you might want to play around with puppet or chef. Learn how to setup continuous integration, jenkins etc.

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Today’s startups: assemble at your own risk

devops divide

I was talking with Todd Hoff recently over at High Scalability about a trend I’ve seen of late.

ME: I really liked this post by Zoli Kahan from Clay.io.  AWS, cloudflare, docker, haproxy, mysql, mongo, memcache, ansible.  They use just about every technology being talked about these days.  

Todd: Yah, that’s why I asked to republish it. I thought it was a good updated sampler stack.

ME: That said I defy you to find a team that actually *KNOWS* all those technologies.  

Todd: Agreed. Systems are a lot of assembly these days, which doesn’t mean we know how to build the parts being assembled.

Join 28,000 others and follow Sean Hull on twitter @hullsean.

The article I was referring to was: How Clay.io Built their 10x Arch Using AWS, Docker, HAProx & Lots More

1. Dizzying array of technologies in use

I’ve been working with startups since the mid-nineties. In those days most application stacks consisted of a PHP application running on Apache, with Oracle on the backend. Both webserver & db ran on Sun Solaris. Hardware was reliable. Most attention was focused on fitting everything in memory, and monitoring the servers for swapping, and disk failure. Boy have those days changed.

I see dozens of startups each year, so I see a lot of very cutting edge environments. Here’s a peak at what I’m seeing these days:

Database: MySQL, Postgres & Oracle, to Mongodb, Cassandra & Couchbase

Caching: Memcache or Redis

Search: Solr

Webservers: Apache, Nginx, Lighttpd

Load balancers: haproxy, Zen

Languages: PHP, Python & Ruby

Publishing: Drupal, WordPress, Joomla

Continuous Integration: Jenkins

Metrics: Cacti, collectd, NewRelic

Monitoring: Nagios, Ganglia, Munin, OpenNMS

Automation: Ancible, Chef, Puppet, Docker & Vagrant

Logs: Logstash

DDOS & CDN: Cloudflare, Ultradns

Whew… That’s a long list!! And we’re not even considering the API’s that many applications are now building on.

Also: Are generalists better at scaling the web?

2. Shortcuts abound

Startups early on, don’t have enough working capital to hire a huge engineering team. So that means everyone is stretched. With a list of technologies that is ever growing, something’s gotta give.

These may cut corners by handing the web & technical operations work to a developer who has some skills. But I continue to ask… Does a four-letter word divide dev & ops?

Read: Which tech do startups use most?

3. More things to break & master

Ownership of a software stack, such as a database means mastery of…

o features in current versions
o bugs of current versions
o vulnerabilities of various versions
o troubleshooting
o best practices
o backup & reliability

For example a lot of shops where I dig into the database, I find low hanging fruit, such as misconfigured startup settings, table layout or index usage.

I see similar things when a networking expert pours over the haproxy configuration, or runs ping tests across the network. Most of these components are setup with fairly vanilla configurations, leaving loose ends and frayed threads.

Check out: Why I can’t raise the bar at every firm

4. Many startups carrying technical debt

I’ve seen a growing reliance on ORM’s which is worrying. Build your foundation on a crutch, and it gets very hard to eliminate down the line. Here are Ward Cunningham’s warnings on technical debt.

Related: Are SQL Databases Dead?

5. Long term support & viability

At one five year old firm, I was brought in to address scalability problems. I met with the team and was asked to provide a comprehensive review. The first thing I found was all the original engineers had long since left, so the code was new for everyone. As I dug my heels in, I found multiple versions of Apache along with Nginx on some other servers. Their stack was built on a patchwork of Python, Ruby & PHP. Then digging in further, we found a complicated web of dependencies for digital assets, mounted across servers & unmonitored.

Lack of standards is common in environments like these. Without an operational or architectural lead, developers are left to make decisions with what is directly in front of them. Though a decision of what language to use may appear simple at the outset, it carries long term consequences.

Will that language or technology be supported in five years? Will the community survive? Will your firm be able to hire people with that skill set? Will engineers still be excited about it?

See also: Is high availability overrated? Is five nines a myth?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Is automation killing old-school operations?

puppet logo

Join 27,000 others and follow Sean Hull on twitter @hullsean.

I was shocked to find this article on ReadWrite: The Truth About DevOps: IT Isn’t Dead; It’s not even Dying. Wait a second, do people really think this?

Truth is I have heard whispers of this before. I was at a meetup recently where the speaker claimed “With more automation you can eliminate ops. You can then spend more on devs”. To an audience of mostly developers & startup founders, I can imagine the appeal.

1. Does less ops mean more devs?

If you’re listening to a platform service sales person or a developer who needs more resources to get his or her job done, no one would be surprised to hear this. If we can automate away managing the stack, we’ll be able to clear the way for the real work that needs to be done!

This is a very seductive perspective. But it may be akin to taking on technical debt, ignoring the complexity of operations and the perspective that can inform a longer view.

chef logo

Puppet Labs’ Luke Kanies says “Become uniquely valuable. Become great at something the market finds useful.”. I couldn’t agree more.

Read: Are SQL Databases Dead?

2. What happens when developers leave?

I would argue that ops have a longer view of product lifecycle. I for one have been brought in to many projects after the first round of developers have left, and teams are trying to support that software five years after the first version was built.

That sort of long term view, of how to refresh performance, and revitalize code is a unique one. It isn’t the “building the future” mindset, the sexy products, and disruptive first mover “we’re changing the world” mentality.

It’s a more stodgy & conservative one. The mindset is of reliability, simplicity, and long term support.

Also: How to hire a developer that doesn’t suck

3. What’s your mandate?

From what I’ve seen, devs & ops are divided by a four letter word.

That word I believe is “risk”. Devs have a mandate from the business to build features & directly answer to customer requests today. Ops have a mandate to reliability, working against change and thinking in terms of making all that change manageable.

Different mandates mean different perspectives.

Related: What is Devops & why is it important?

4. Can infrastructure live as code?

Puppet along with infrastructure automation & configuration management tools like Chef offer the promise of fully automated infrastructure. But the truth is much much more complex. As typical technology stacks expand from load balancer, webserver & database, to multiple databases, caching server, search server, puppet masters, package repositories, monitoring & metrics collection & jump boxes we’re all reaching a saturation point.

Yes automation helps with that saturation, but ultimately you need people with those wide ranging skills, to manage the complex web of dependencies when things fail.

And fail they will.

Check out: Why are MySQL DBA’s and ops so hard to find?

5. ORM’s and architecture

If you aren’t familiar, ORM’s are a rather dry sounding name for a component that is regularly overlooked. It’s a middleware sitting between application & database, and they drastically simplify developers lives. It helps them write better code and get on with the work of delivering to the business. It’s no wonder they are popular.

But as Ward Cunningham elloquently explains, they are surely technical debt that eventually must get paid. Indeed.

There is broad agreement among professional DBA’s. Each query should be written, each one tuned, and each one deployed. Just like any other bit of code. Handing that process to a library is doomed to failure. Yet ORM’s are still evolving, and the dream still lives on.

And all that because devs & ops have a completely different perspective. We need both of them to run modern internet applications. Lets not forget folks. :)

Read this: Do managers and CTO’s underestimate operational costs?

Want more? Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How to deploy on Amazon EC2 with Vagrant

vagrant logo

Join 16,000 others and follow Sean Hull on twitter @hullsean.

Why do I want Vagrant?

Vagrant is a really powerful tool for managing virtual machines. If you’re a developer it can make it push-button simple to setup a dev box on your laptop. It manages the images, and uses configuration files to describe specifics of your machines.

In the amazon environment, you can deploy machines just as easily as on your desktop. That’s pretty exciting for those of us already familiar with Vagrant. With that I’ve provided a simple 7 step howto for doing just that!

Also: Are SQL Databases Dead?

1. Use the Mac OS X installer

Fetch your download file here:

Vagrant Installer Downloads

Run the installer. It should do the right thing!

Also: Why Oracle Won’t Kill MySQL

2. Install the vagrant-aws plugin


$ vagrant plugin install vagrant-aws

Also: Bulletproofing MySQL Replication with Checksums

3. Fetch a vagrant box image

Box images vary depending on your “provider” which is vagrant-speak for the environment you’re running in. For aws, they’re some simple json files that tell Vagrant how to work in that environment.

The creator of the plugin has provided a dummy box. Let’s fetch it:


$ vagrant box add dummy https://github.com/mitchellh/vagrant-aws/raw/master/dummy.box

This command is straight out of the readme. What does it do? Take a look:


$ cd /var/root/.vagrant.d/boxes/dummy/aws

$ cat metadata.json
{
"provider": "aws"
}

There’s also the info.json file which looks like this:


$ cat info.json
{"url":"https://github.com/mitchellh/vagrant-aws/raw/master/dummy.box","downloaded_at":"2014-01-14 17:42:33 UTC"}

There’s not a whole lot going on here. If you’re deploying VirtualBox VMs with Vagrant, you’d see a VMware4 disk image. But with Amazon, it stores it’s own AMIs on S3, so Vagrant simply fetches them and runs them for you.

Related: Intro to EC2 Cloud Deployments

4. Configure Vagrantfile

Create a directory to hold your vagrant metadata. This would be the name of your machine:


$ cd /var/root
$ mkdir testaws
$ cd testaws
$ vagrant init

Edit the file as follows:


Vagrant.configure("2") do |config|
# config.vm.box = "sean"

config.vm.provider :aws do |aws, override|
aws.access_key_id = "AAAAIIIIYYYY4444AAAA”
aws.secret_access_key = "c344441LooLLU322223526IabcdeQL12E34At3mm”
aws.keypair_name = "iheavy"

aws.ami = "ami-7747d01e"

override.ssh.username = "ubuntu"
override.ssh.private_key_path = "/var/root/iheavy_aws/pk-XHHHHHMMMAABPEDEFGHOAOJH1QBH5324.pem"
end
end

If you’re familiar with the Amazon command line tools, you’ve probably setup environment variables. Otherwise these may not be familiar to you, so lets go through them:

Your access_key_id and secret_access_key are two pieces of information Amazon uses to identify your instances and bill you. Those are unique to your environment so keep them close to the vest. Here’s how you create them or find them on your aws dashboard.

The keypair_name is your personal SSH key. You may have one on your laptop which you use to access other servers. If so you can upload to the amazon environment. If not you can also use the dashboard to create your own. Whenever you spinup a server, you can instruct amazon to drop that key on the box in the right place. Then you’ll have secure command line access to the box, without password. Great for automation!

Next is your AMI. This is an important choice, as it determines the OS of the machine you’ll spinup, and many other characteristics. You can go with a Amazon Linux AMI but I quite like the Alestic ones from Eric Hammond. Trusted & reliable.

Looking for an ubuntu AMI? Try this ami locator tool.

Check this: 8 Best Practices for Deplying MySQL on AWS

5. Startup the box

Starting an instance once you’ve configured your Vagrantfile is pretty straightforward.


$ vagrant up —-provider=aws

Related: How to autoscale MySQL on Amazon EC2

6. Verify in the Amazon dashboard

Jump over to your amazon dashboard with this link. If you’re logged in already, that will take you to your EC2 instances. You should see a new one, based on the parameters in your Vagrantfile.

Read: Why devops talent is in short supply

7. Login to your Amazon instance

Last but not least, you’ll want to login. Note I’m explicitly specifying my SSH key here. Your path may vary…


$ ssh -i ./iheavy.pem ubuntu@ec2-50-220-50-40.compute-1.amazonaws.com

Also: 5 more things deadly to scalability

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters