Secrets of a happy Amazon hacker – IAM, MFA & locking down your account

aws logo

If you’re still using a password to login to your AWS account it’s time you batten down the hatches. With a little work you can dramatically improve security.

1. install command line tools

First get ahold of the aws comand line tools. They’re python based so you’ll need the package manager “pip” first.

$ curl -O
$ pip install awscli

Next configure your access key & secret key. You can edit the file below or use “$ aws configure”

$ cat .aws/credentials
aws_access_key_id = AAAAAAAAAAAAAAAABCD
aws_secret_access_key = ABcdefghijklmnop!mnors323


Also: Is Amazon too big to fail?

2. Create a new user

You don’t want to be using your aws root user for everything. So we’ll create a new user called “seancli”.

$ aws create-user --user-name "seancli"
$ aws iam create-login-profile --user-name "seancli" --password "seanpass"

Related: Did Airbnb have to fail?

3. give admin privileges

We want our new user to be able to administer things. So let’s give him administrator privileges to AWS resources. AdministratorAccess is a collection of permissions & a policy managed by AWS.


$ aws iam create-group –group-name “admin”
$ aws iam attach-group-policy –group-name “admin” –policy-arn “arn:aws:iam::aws:policy/AdministratorAccess”
$ aws iam add-user-to-group –group-name “admin” –user-name “seancli”

Read: When hosting data on Amazon turns bloodsport

4. Enable MFA

Now for the fun bit. Enable multi-factor authentication. This is important for really making your aws account secure. Remember anyone who gets into your account can delete *ALL* your infrastructure, and/or spinup servers which cost a lot of money. So just a password alone is not sufficient.

MFA uses your phone (or a key fob if you like) as the second factor.

A. Install Google Authenticator
B. Login to your aws dashboard
C. Click your name menu then select “Security Credentials”


amazon security credentials







D. Open the Multi-factor section


activate amazon mfa







E. Click “activate MFA” & a QR code with display


virtual mfa device amazon





F. Open your Google Authenticator app & click (+)
G. Select scan barcode
H. Point your smartphone camera at the QR code from step E.

You’ll be asked to enter *two* consecutive six-digit sequences. Once completed, try logging in again.

Also: Are SQL Databases Dead?

5. Test with command line

After you’ve created your new user, you should test it to make sure you can login properly.

Also: 5 Reasons to move data to Amazon Redshift

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Is AWS enabling AngelList to boil the VC business?

aws logo

Just finished listening to Ben Thompson & James Allworth discuss how Amazon Web Services is impacting the venture capital business.

My mind is blown!

Join 32,000 others and follow Sean Hull on twitter @hullsean.

I heard all about AngelList getting 400m from China’s csc. I didn’t really understand the significance until I saw Fred Wilson’s post Outsider vs Disruptor.

Are VCs nervous, I wondered?

The argument goes, as it takes less capital to get started, way more people can step in to help you get going. Startups don’t need a VC from their first day.

1. Is Amazon boiling the VC frog alive?

As the story goes, if you turn up the temperature slowly, the frog won’t notice that he’s being boiled.

Ben Thompson at 9:30 in the podcast:

“I think the real enabler of this is Amazon. Back in the 90’s you had to go buy Sun servers, Oracle databases, and you had to spend hundreds of thousands if not millions of dollars. And they were all up-front costs. And that’s what Venture Capital is good for.”

Indeed, with the advent of AWS, startups can build their application in the cloud with *ZERO* upfront costs, and only dollars per hour. This is truly a seachange.

Also: A history lesson for cloud detractors – January 2012

2. Dell’s 67 billion dollar buy of EMC

o largest acquisition in tech history
o enterprise tech & enterprise storage

At 50:36 in the podcast, James Allworth says:

“I have this mental image of what used to be this massive land mass, and all these companies fighting it out and eventually the ocean is rising, aws is rising and it’s leaving an increasingly small amount of land mass and there are fewer & fewer of them and it’s going to be very interesting to see whether any land mass is remains when aws is finished with it, and i guess this DELL EMC thing, the argument is well there’s gonna be a little bit left & we’re going to take whatever it is because we’re the biggest but it remains to be seen whether there’s gonna be anything left for anyone at all”

Dell buying EMC is apparently the largest acquisition in tech history at 67 billion according to Bloomberg. That sure does say a lot about Amazon’s downward pressure & commoditization.

Though I didn’t know EMC would be bought by Dell for such a ridiculous sum, I was arguing this back in 2011 – the New commodity hardware craze .

Related: Is Amazon too big to fail?

3. Wework & the disappearing server room

Ben Thompson makes a really fascinating point at 46:30 of the podcast:

“There’s been a big shift from the valley to san Francisco all the big companies of yesteryear are in the valley and almost all of the unicorns are in san francisco, and this is also because of AWS…

You can’t afford to pay square footage for servers in San Francisco, but if your startup is only some people, a desk & some computers… suddently it’s much more viable you have companies running businesses out of wework offices… the only reason wework can exist is because you don’t need to have servers because all the servers are housed by amazon the fundamental fabric of the silicon valley is changed because of aws”

Yet again, Amazon has impacted the valley in a huge way.

Also: Are we fast approaching cloud-mageddon?

4. Google & iphone scale

“You could make the argument that AWS is right up there with Google & right up there with the iPhone in it’s fundamental transformation of industry after industry.”

And while Amazon is fully enabled by Linux, and didn’t invent utility computing, they have surely

Read: When hosting data on Amazon turns bloodsport

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Does Linux tell the Gilgamesh story of hacker culture?

stephenson command line

Is the command line still essential?
Was Stephenson right about his Linux

It’s been a while since I read Stephenson’s essay on Linux. It’s one of those pieces that’s so well written, we need to go back to it now & then.

Join 28,000 others and follow Sean Hull on twitter @hullsean.

This quote caught my eye right away.

“…as living in a commune, where much lip service was paid to ideals of peace, love and harmony, had deprived them of normal, socially approved outlets for their control freakdom, it tended to come out in other invariably more sinister ways. Applying this to the case of Apple Computer will be left as an exercise for the reader, and not a very difficult exercise.”

Anyone who has read about Steve Jobs will chuckle at this one.

1. The Hole Hawg of the internet

When Stephenson wrote this it was 1999. Linux adoption was growing at internet startups, where cost was everything, and risks could be taken. Remember this was before the two biggest data center companies even existed, namely Google & Amazon. Without Linux, neither would be here today!

hole hawg power

Linux was and is today more like a Hole Hawg for the internet, powerful, but dangerous in the wrong hands. ๐Ÿ™‚

“The Hole Hawg is like the genie of the ancient fairy tales, who carries out his masters instructions literally and precisely and with unlimited power, often with disasterous unforseen consequences.”

Also: Why I like Etsy’s site performance report

2. Unix as oral history, our Gilgamesh

gilgamesh unix

“Unix, by contrast is not so much a product as it is a painstakingly compiled oral history of the hacker subculture. It is our Gilgamesh. What made old epics like Gilgamesh so powerful and so long-lived was that they were living bodies of narrative that many people knew by heart, and told over and over again — making their own personal embellishments whenever it struck their fancy.”

Also: Are SQL Databases dead?

3. The bizarre Trinity Torvalds, Stallman & Gates

“In trying to understand the Linux phenomenon, then, we have to look not to a single innovator but to a sort of bizarre Trinity, Linus Torvalds, Richard Stallman and Bill Gates. Take away any of these three & Linux would not exist.”

And indeed we must thank all three of these characters for where the internet stands today. The cloud is possible because of Linux & cheap intel hardware. And the GNU free software to go along with it.

Related: Did MySQL & Mongo have a beautiful baby called Aurora?

4. On the meaning of “Open Source”

“Source files are useless to your computer, and of little interest to most users, but they are of gigantic cultural & political significance, because Microsoft & Apple keep them secret, while Linux makes them public. They are the family Jewels. They are the sort of thing that in Hollywood thrillers is used as a McGuffin: the plutonium bomb core, the top-secret blueprints, the suitcase of bearer bonds, the reel of microfilm.

Read: When hosting data on Amazon turns bloodsport

5. What about Apple today?

“The ideal OS for me would be one that had a well-designed GUI that was easy to set up and use, but that included terminal windows where I could revert to the command line interface and run GNU software when it made sense.”

Stephenson wrote this before Apple has rebuilt their OS to sit on top of Unix. And that’s where we are today with Mac OS X!

Also: Are we fast approaching cloud-mageddon??

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

When hosting data on Amazon turns bloodsport

reddit aws outage

There’s a strong trend to automation across the cloud. That’s a great thing for startups because it reduces operational headaches & lets them focus on building products.

Join 31,000 others and follow Sean Hull on twitter @hullsean.

But as that trend begins to touch the database tier, all sorts of complications emerge. Let’s take a look at some of the tradeoffs.

1. Database as a service trend

I was recently reading Baron Schwartz’s article on the trend to database as a service.

I work with a lot of venture backed startups & pay close attention to what’s happening in New York & SF. From where I’m standing I see a similar trend. As automation simplifies management across the application stack, from load balancers to web & search servers, the same advantages are moving to database management.

Also: How to automate MySQL analysis on Amazon RDS

2. How Amazon RDS helps

Amazon’s RDS offers firms a data solution for Oracle & SQL Server as well as MySQL. For those just starting, it offers a long list of advantages.

o quick push-button deployment in minutes
o standardized parameters settings that just work
o ability to scale up or down from the dashboard
o automated backups
o multi-az so you can sleep at night

This brings a huge advantage to startups. Many have a team of developers but aren’t large enough to need an operations team and can’t afford a dedicated database administrator.

Amazon is obviously helping these firms raise the bar. And that’s a good thing.

Related: RDS or MySQL 10 use cases

3. How Amazon RDS hurts

As you get bigger, your needs will grow too. You’ll have tens of millions of customers, and with more customers comes an even higher bar. Zero downtime becomes critical. It’s then that Amazon’s solution starts to become frustrating.

Unpredictable upgrades

MySQL upgrades on RDS are a messy activity. Amazon will restart the instance, backup the instance, perform the upgrade then restart again. Each of these restarts takes a few minutes. The whole operation may have you down for ten minutes. This becomes more frustrating when your hands are completely tied. You don’t know when or what will happen!

When you roll-your-own instance, an upgrade can be performed in a matter of seconds. No instance restarts are necessary and you can monitor the process to know exactly where you are. This is the kind of control you’re going to want if you have millions of customers relying on your site & uptime.

Unnecessary slow restarts

When you apply parameter changes on RDS, some require a MySQL restart. Amazon forces the whole server to restart, increasing this downtime from a few seconds (when you roll your own) to many minutes. And while some parameters can be changed online, Amazon can provoke some strange behavior that is not always predictable.

With the frequency of these types of changes, you’ll quickly grow tired and frustrated with RDS.

EBS Snapshots are not portable

As mentioned above Amazon uses it’s standard filesystem snapshot technology to perform backups. While this works well, it can be slow & unpredictable in a multi-tenant environment.

When you roll your own, you can take advantage of xtrabackup, and perform hot backups against your database with zero downtime. This is a real godsend. What’s more they are portable, and can be moved to any other server even ones not hosted in Amazon’s cloud!

Promoting a read-replica is slow too!

One feature that Amazon touts is creating copies or “read replicas” of your data. These are great and can facilitate easy copying of data. However promoting these again brings unnecessary restarts which are slow.

When you roll your own, you can promote a read-replica or read-only slave in seconds. A few seconds can seem invisible to end users, while minutes will be perceived as a real outage or downtime.

Read: Is zero downtime even possible with RDS?

4. Is migration an option?

So what to do? As I mentioned above, there are real advantages to startups deploying their first database. It really does help. I would argue for many it can be a good place to start.

If you’re starting to outgrow RDS and frustrated with the limitations, performance tuning headaches & unneeded downtime, luckily you have options.

Migrating off of RDS onto a physical server can be done in a number of ways.

o slave off of the master

Here you build a MySQL slave on a standard EC2 instance, with your RDS instance as the master. When you’re caught up, bring your site down temporarily. Reset the slave & set to read-write mode. Then point your webservers at your new EC2 instance and bring the site back up. If done carefully 10 to 20 seconds of downtime should be plenty.

Don’t forget to run through the process with a firedrill first!

o dump & import

Another way to move your data may be MySQLdump. This option would be slower & bring a lot more downtime, but possibly necessary in some cases.

Also: 5 Reasons to move data to Amazon Redshift

5. Speed: It’s the database

Fred Wilson says speed is the number one feature of a web application. If customers are frustrated & waiting, they may leave & not come back. On the web it can be everything.

Many firms are rushing to database as a service to simplify administration. While that’s wonderful at the beginning, as you grow performance will become more of a day-to-day concern. And when it does, the database is going to be big on your list of headaches.

Web application performance inevitably involves the database and while it does, your decision to choose database as a service may come into question. Don’t be afraid to bite the bullet and manage things yourself when that time comes.

Also: Is upgrading RDS like a shit-storm that will not end?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Is upgrading RDS like a shit-storm that will not end?

aws logo

Join 29,000 others and follow Sean Hull on twitter @hullsean.

Can RDS worsen an outage ?? That’s another way to think about this question. In my experience, it very clearly increases outages, by tying one or both hands behind your back. Believe me when I say, that is terribly frustrating when you’re putting out fires!

1. Changing Parameters

An everyday occurance, is the need to change database parameters. Want to enable a login, great no problem. Except in RDS it becomes a problem! Ok, you’re thinking, why is that?

In regular MySQL, you login with the shell & issue SET GLOBAL parameter = value; Nice, easy, straightforward. No servers restarting, no nonsense. If the parameter requires a reboot, MySQL will tell you.

In RDS, the process is waay more complex. First you edit a parameter group. You can copy an existing one, or change the one you’re using. If that parameter group applies to many servers, be careful!

Ok, what next? Now you APPLY that new parameter group. You can do so immediately, or during the next maintenance window. Here’s the tricky part. Is Amazon going to restart my instance? That’s something your boss or manager will surely ask you. Well you might think it would only do so if the parameter in question required it. But I tried to enable the general log recently and Amazon tells me the status of “pending-reboot”. This change shouldn’t require that! I’m sitting there scared Amazon might suddenly decide to reboot a production server for no reason!

This is where you feel you’ve lost control. You can dig through docs all you want, but you can’t ever say for sure if a managed service will behave predictably. There’s already more layers of software between you and your relational database. Not what you want.

Also: Did MySQL & Mongo have a beautiful baby called Aurora?

2. How much longer?

Another question you’ll ask yourself is, how long will this maintenance take? With MySQL at the command line, you can run through test after test & time the process. When you go to perform tasks offhours, you already have a clear picture.

With RDS, things can’t be predicted. Servers are restarted when they needn’t be. Rebuilds take forever, and you have no progress bar. EBS performance has a hiccup and your snapshot time doubles. The troubles go on and on.

Related: Is automation killing old-school operations?

3. Why did Amazon just force an OS upgrade?

Here’s another surprise I ran into. Again we have a managed solution, so Amazon must take opportunities when they can. But you pay for it in unpredictability.

Going to perform a MySQL 5.1 to 5.5 upgrade, and I’d run through test after test in advance. Timed the process to about 45 minutes. Then went to do it in production. Amazon decided to throw in the OS upgrade too, adding 40 minutes of surprise time. What’s worse? No progress bar on that either.

Upgrades are nerve wracking enough, without this kind of stuff scaring the daylights out of you.

Read: Do managers underestimate operational cost?

4. What’s happening on my server?

All of the questions about progress are opaque on RDS because you lack command line. You can’t watch processes, disk I/O or any of the granular stuff. In my surgery analogy below, it’s as though you can’t touch the patient, find their pulse or guage if their skin is cold, clammy or pale.

Also: Is the difference between dev & ops a four-letter word?

5. Surgery with blunt instruments

At the end of the day, RDS feels like surgery with blunt instruments. If command line were your scalpel, windows & GUI tools may be your remote video surgery. And worse still, RDS would be like doing surgery on the Opportunity Mars rover, after it’s landed & stuck in a valley. Everything is delayed, it’s hard to tell what’s going on, and the worst environment to work in when you have an emergency with your database.

If you have any operations experience, deploy your own MySQL on an EC2 instance. You’ll thank yourself later.

Also: Is zero downtime even possible on RDS?

Upside to RDS

Is there any upside? Why do people use it? Push-button replication. Check. Push-button multi-az, check. Those are great if you have no DBA. Automated backups so you don’t shoot yourself in the foot, check.

I guess there is *something* to love.

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Is automation killing old-school operations?

puppet logo

Join 27,000 others and follow Sean Hull on twitter @hullsean.

I was shocked to find this article on ReadWrite: The Truth About DevOps: IT Isn’t Dead; It’s not even Dying. Wait a second, do people really think this?

Truth is I have heard whispers of this before. I was at a meetup recently where the speaker claimed “With more automation you can eliminate ops. You can then spend more on devs”. To an audience of mostly developers & startup founders, I can imagine the appeal.

1. Does less ops mean more devs?

If you’re listening to a platform service sales person or a developer who needs more resources to get his or her job done, no one would be surprised to hear this. If we can automate away managing the stack, we’ll be able to clear the way for the real work that needs to be done!

This is a very seductive perspective. But it may be akin to taking on technical debt, ignoring the complexity of operations and the perspective that can inform a longer view.

chef logo

Puppet Labs’ Luke Kanies says “Become uniquely valuable. Become great at something the market finds useful.”. I couldn’t agree more.

Read: Are SQL Databases Dead?

2. What happens when developers leave?

I would argue that ops have a longer view of product lifecycle. I for one have been brought in to many projects after the first round of developers have left, and teams are trying to support that software five years after the first version was built.

That sort of long term view, of how to refresh performance, and revitalize code is a unique one. It isn’t the “building the future” mindset, the sexy products, and disruptive first mover “we’re changing the world” mentality.

It’s a more stodgy & conservative one. The mindset is of reliability, simplicity, and long term support.

Also: How to hire a developer that doesn’t suck

3. What’s your mandate?

From what I’ve seen, devs & ops are divided by a four letter word.

That word I believe is “risk”. Devs have a mandate from the business to build features & directly answer to customer requests today. Ops have a mandate to reliability, working against change and thinking in terms of making all that change manageable.

Different mandates mean different perspectives.

Related: What is Devops & why is it important?

4. Can infrastructure live as code?

Puppet along with infrastructure automation & configuration management tools like Chef offer the promise of fully automated infrastructure. But the truth is much much more complex. As typical technology stacks expand from load balancer, webserver & database, to multiple databases, caching server, search server, puppet masters, package repositories, monitoring & metrics collection & jump boxes we’re all reaching a saturation point.

Yes automation helps with that saturation, but ultimately you need people with those wide ranging skills, to manage the complex web of dependencies when things fail.

And fail they will.

Check out: Why are MySQL DBA’s and ops so hard to find?

5. ORM’s and architecture

If you aren’t familiar, ORM’s are a rather dry sounding name for a component that is regularly overlooked. It’s a middleware sitting between application & database, and they drastically simplify developers lives. It helps them write better code and get on with the work of delivering to the business. It’s no wonder they are popular.

But as Ward Cunningham elloquently explains, they are surely technical debt that eventually must get paid. Indeed.

There is broad agreement among professional DBA’s. Each query should be written, each one tuned, and each one deployed. Just like any other bit of code. Handing that process to a library is doomed to failure. Yet ORM’s are still evolving, and the dream still lives on.

And all that because devs & ops have a completely different perspective. We need both of them to run modern internet applications. Lets not forget folks. ๐Ÿ™‚

Read this: Do managers and CTO’s underestimate operational costs?

Want more? Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Howto automate MySQL slow query analysis with amazon RDS


If you’ve used relational databases for more than ten minutes, I hope you’ve heard of slow queries. Those are those pesky little gremlins that are slowing down your startup, and preventing scalability you so desperately need.

Luckily there’s a solution. What I’ve found is if I send a report to developers every week, it keeps these issues front and center, for folks that are very busy indeed.

The script below is for RDS, but you can surely modify it if you have a physical server or roll-your-own MySQL box on Amazon. Take a look & enjoy!

Join 26,000 others and follow Sean Hull on twitter @hullsean.

1. install percona tools

Percona as many probably already know, are a wildly successful services firm that support MySQL and related technologies. They also have a very popular & scalable MySQL distribution by the same name.

Even if you’re not using Percona MySQL, you definitely want to get ahold of the percona toolkit. It provides all sorts of useful tools, including the one this article is based on, query-digest.

This tool takes your stock MySQL slow query logfile as input, and summarizes it into a very useful and readable report. Formerly mk-query-digest, it’s not called pt-query-digest. See below.

You can install the percona tools easily by grabbing the repository file and installing that with rpm. From there you can just use yum or apt-get depending on your distribution.

Related: Why a killer title can make or break your content efforts

2. install aws command line tool

Amazon has consolidated all it’s command line tools into a single one called just “aws”. The options can be a little arcane, and the error messages misleading besides. What’s good though is it is slightly easier to install & configure.

Do you already use Python? Install it this way:

$ pip install awscli

If not, you’ll need to dig into the aws cli installation instructions further.

Also: Do managers underestimate operational costs?

3. edit .aws/config

After you get the tool installed, you need to setup your environment. I edited a file named /home/shull/.aws/config as follows:

region = us-east-1
aws_access_key_id = BLIBJZMKLWIL5UTNRBMQ
aws_secret_access_key = MF5J/2z7HmN92lQUrV12ZO/FBXNjDVjL52TNRWsG

Those access_key_id and secret_access_key you can find on your amazon dashboard. Click upper right hand corner under your name, select the menu item “Security Credentials”.

Check out: Are SQL Databases Dead?

4. edit

I wrote the script below so you can fairly easily edit it.


# get the rds db instanceID from command line (or crontab) entry

# here's where we'll store the latest slowquery.log
#SLOWLOG=`/bin/ls -tr /home/shull/*.log | /usr/bin/tail -1`

# fetch slow query log from rds box
# here I always grab the latest one.
/usr/local/bin/aws rds download-db-log-file-portion --db-instance-identifier $AWS_INSTANCE --output text --log-file-name slowquery/mysql-slowquery.log > $SLOWLOG

# query report output

# pt-query-digest location

# run the tool to get analysis report

# today's date in a variable
TODAY=`/bin/date +\%m/\%d/\%Y-\%H:\%S`
#YESTERDAY=`/bin/date -d "1 day ago" +\%m/\%d/\%Y-\%H:\%S`

# report subject
SUBJECT="Sean Query Report -- $TODAY "

# recipient
EMAIL="[email protected]"

# send an email using /bin/mail
/usr/bin/mailx -s "$SUBJECT" "$EMAIL" < $SLOWREPORT

Note, if you don't have mailx installed, it should be available in your repository. Use apt-get or yum as necessary to get it installed.

Also: Is high availability overrated & near impossible to deliver?

5. Add to crontab

After you've tested the above script from command line, you will want to add it to a weekly cron job. Voila, automation! Don't forget to chmod +x to make it executable. ๐Ÿ™‚

00 09 * * 5 /home/shull/ seandb

Read: Are MySQL DBA's impossible to find?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don't work with recruiters

If you use MySQL in the Amazon cloud, you need to ask yourself this question

Join 25,000 others and follow Sean Hull on twitter @hullsean.

Are you serious about backups?

If you’re just using Amazon EBS snapshots, that may not be sufficient. There’s a good chance it won’t protect you against your next data loss.

That’s why I like to have a few different types of backups

Also: 5 more things deadly to scalability

Protect against operator error

mysqldump is a tool every DBA is familiar with. Same as a hotbackup or snapshot you say? Just more labor? Not true.

A dump allows you to restore one table, or one schema. That’s why they’re also known as logical backups. What’s more you can edit the file, remove indexes, change object names, or datatypes. All these can be essential in the screwy and unpredictable event of a real world outage.

Expect the unexpected!

Read: Why devops talent is in short supply

Test those backups regularly

If you haven’t actually tried to restore, you really don’t know if you have everything. Did you backup stored procedures & database code? How about grants? Database events? How about cronjobs? What about the my.cnf file? And your replication configuration?

Yes there are a lot of little pieces, and testing your backups by rebuilding everything is an attempt to poke holes in your plan, and hit issues before d-day!

Related: MySQL interview guide for managers and candidates alike

Replication isn’t a backup

Replication is getting better and better in MySQL. It used to fail regularly. MyiSAM was very unpredictable. But even in the comfortable realm of Innodb, there can still be data drift. If you’re on MySQL 5.0 or 5.1, you should consider performing regular checksums. These test the integrity of data and compare what’s actually in master & slave. Bulletproofing MySQL replication with checksums.

Read: Why high availability is so very hard to deliver

Have you considered security around your backup files?

While you’re thinking about backups, make sure the files themselves are secure. Remember they contain your crown jewels. Hopefully individual data that’s sensitive is encrypted, but still you should secure their final resting place as well.

If you’re using S3, consider encrypting the file before shipping it up to the bucket.

Read this: Why a four letter word divides dev and ops

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Cloud Operations Interview

What does a cloud computing expert need to know? How do you hire a cloud computing expert? Competition for operations & DBAs is fierce, so you’ll want to know how to find the best.

If you’re a systems administrator or ops guy, you may want to prepare for an interview for such a position. Meanwhile, if you’re a director of it or operations, a recruiter or manager in HR, you’ll want to have some idea how to find the right candidate.

Here’s my guide to do just that. You may also jump to part two Cloud Deployment Interview or the last part three Cloud DBA, Architecture and Management Interview.

1. Solid unix systems administrator

At the top of the list, a cloud operations expert needs to understand Unix and more importantly Linux. Here are some sample questions to get the conversation moving:

o What is web operations and what have you done day-to-day?

Prepare some stories.

o What’s your favorite feature of the linux kernel?

This is an open ended question, but a systems administrator should have some knowledge here. The kernel is the most basic piece of software that runs when a computer boots up, whether it is a desktop or a server. This piece of software coordinates everything, manages resources, and directs traffic.

o Name some distributions of linux. What is a distro?

Linux is built by a collaborative team of thousands on the internet. That’s what makes it open source. The distributions, include the operating system, along with a collection of software to go along with it. All the supporting utilities, libraries and servers must be compiled and held in a repository. That’s what makes up a distribution. Debian, Redhat and Ubuntu are a few popular ones.

A cloud operations expert needs to have a wide ranging skillset, from unix administration, architecture, scalability, database & webserver administration, troubleshooting & performance, load & stress testing. You’ll also want someone who has learned hard lessons from some failures, has some war stories to tell and has a hard nose for stability.

o What’s the difference between apache and nginx?

These two pieces of software are both webservers, that is they respond to the HTTP protocol, and can serve HTML pages. They also have a myriad of plugins to support different languages and features. The difference? Nginx (pronounced engine-X) is a newer incarnation. It’s been rearchitected from the ground up, building on all the things learned from Apache over the years. Its tighter, more efficient code, and easier to configure.

You might also enjoy our Intro to EC2 Cloud Deployments Guide.

o What is a key value store? examples?

There are lots of examples of these types of databases. They are a very simple memory cache that can interface with most applications. Memcache is a popular example of a key value store. Redis, CouchDB and Voldemort can also do this.

o What is a page cache? Reverse proxy cache? examples?

These are all the same thing. They are basically a very minimal webserver without all the plugins or bells and whistles. You put one of these in front of your webserver to handle all the easy stuff, and speed up overall throughput. Varnish is a popular example.

o What filesystem do you prefer?

This is a bit arcane, but one should have some opinions here. xfs is a popular filesystem, though ext3 and ext4 are also common. Emphasize the journaling aspect here. Journaling means that if you pull the cord or your server crashes, the filesystem can recover upon reboot. It does this by journaling changes, much how a database keeps a redolog cache of recent changes to database tables.

o Command line tools

There are lots of commands in the day-to-day toolbox of a web ops expert. Here are some examples:
rsync (pronounced our-sync) – sync files between servers & do checksums to allow easy restarts
scp (pronounced s-c-p) – secure copy, similar to rsync but no checksums, so less reliable
curl (pronounced kurl) – diagnose & test urls and HTTP from the command line
cron (pronounced cron) – run commands at scheduled times
ssh (pronounced s-s-h) – secure shell, the most basic tool to reach a cloud server
ifconfig (pronounced if-config) – check the network interfaces on the server
vi/emacs (pronounced v-i and e-macks) – terminal editors, to modify config files
uptime (pronounced up-time) – display the current load average of the server
top (pronounced top) – interactive display of system metrics like memory, load, swap & processes
ps (pronounced p-s) – shows running processes on the server
/var/log/messages – essential system logfile

o What are application servers? How are they different from webservers?

Tomcat & Glassfish are two examples of application servers. These handle heavier weight languages & applications like Java. Application server on some level is just a more heavyduty webserver and these days Apache can be thought of as an application server also.โ€จ

2. Cloud concepts

o What is virtualization? What is a hypervisor?

Virtualization allows you to run one or more computers within a computer. You can do virtualization on a desktop, sharing network, memory, cpu and disk resources among a number of virtual servers. But more importantly in cloud computing or IaaS offerings you can do virtualization at the datacenter level. The hypervisor layer is a datacenter virtualization technology that provisions server resources, and balances shared network and disk resources.

o What is an image?

In Amazon the world, the AMI or amazon machine image is a snapshot of a server state at one moment in time. This image is take at the block level, and includes the master block record, the first block on disk that a server boots from. All that is the state of a server, when it is shutdown, is what is stored on disk or in this image. All config files, logfiles, and anything else writing to disk.

o What is multi-tenant?

This means that there are multiple servers sharing resources. The tenants are the customers who each want to get the server, cpu, memory, network and disk that they paid for.

o What is the downside to shared resources?

Contention for resources is always the challenge. If your fellow tenants are not very thirsty, this can work to your advantage. But if they’re also heavy users, the hypervisor layer has manage the balancing act. You may get a spike of disk I/O at one point, but later get a dearth. This can cause a relational database like MySQL or Oracle to suddenly look stalled.

o What is instance-store? What is ebs?

Instance store servers were Amazon’s original offering, where servers had their own local (and slow) storage. This storage was ephemeral, so all machine state was lost on reboot. These servers also boot slowly. EBS also known as elastic block storage is a virtualized storage option, similar to NAS or NFS. You can create arbitrary chunks of storage, and attach them to servers, all from command line APIs. Cool!

o What is virtual private cloud?

With the VPC offering, Amazon drops a router into your existing datacenter. You can then provision virtual servers to your hearts content, and they all appear to be servers in your existing datacenter. Elastically scale, within the network and security model you’re already using.

o What is a hybrid approach to cloud adoption?

Keeping your investments in hardware and datacenter is obviously an appealing option for firms that have large existing environment. A hybrid approach with a VPC allows you to get your feet wet, but still keep essential applications on physical servers.

o What is Amazon EC2?

Elastic Compute Cloud refers to the virtual servers you spinup in Amazon Web Services.

o What is Amazon RDS, Oracle RDS, Mysql RDS?

Amazon has various relational and non-relational database offerings. RDS stands for relational database service.

RDS or roll your own – which is better? Here are some use cases to help you decide.

o What is multi-az?

Amazon’s infrastructure offering isn’t just a single datacenter with servers. The beauty of what they’ve built is that they offer a number of datacenters (called availability zones) in each of many regions such as Northern Virginia, Oregon and Singapore.

Incidentally multi-az is a key feature to how businesses can protect themselves from failure. Amazon recently had an outage, but AirBNB, Reddit & Foursquare didn’t have to fail.

o What does a CDN do? How does it work? examples?

A CDN is a content delivery network. Remember all those files that make up a webpage? Images, video, css files? Turns out serving these components from servers *closer* to your customer, make their webpages load much faster. CDNs are networks of servers that hold the content of your pages, and serve them faster.

It works by replacing content paths with a special one from your provider. A simple change in your code will allow content to dynamically load from across the web. Cool!

CloudFront is Amazon’s offering coupled with S3 for file storage. Akamai is another big provider.

We’re not done yet. In part two on deployments and”>part three of this series, we’ll hit on other important skills a cloud ops expert should have including scripting, database administration (Our MySQL Interview Guide), scalability, performance, configuration management, metrics, monitoring, and some all important war stories!

Here are some questions to pique your interest:

o Why does the API battle between Amazon & Eucalyptus (FOSS) matter?
o Do you use command line tools? why?
o What can go wrong with backups? how do we test them?
o Should we encrypt filesystems in the cloud? what are the risks?
o Should we use offsite backups?
o What is DRBD?
o Why is auditing important? access control?
o What is load balancing? why is it difficult with databases?โ€จo How do you perform a benchmark? perform load testing?โ€จo Why use a package manager? can we install from source?

Our Deploying MySQL on Amazon EC2 Guide is also related to this interview process.

You may also jump to part two Cloud Deployment Interview or the last part three Cloud DBA, Architecture and Management Interview.

Read this far? Grab our newsletter – startup scalability.

$1000 per hour Servers, Anyone?

Amazon’s spot market for computing power is set up as an open market for surplus servers. The price is dynamic and depends on demand. So when demand is low, you can get computing instances for rock bottom prices. When you do that you normally set a range of prices you’re willing to pay. If it goes over your top end, your instances get killed and re-provisioned for someone else. Obviously this wouldn’t work for all applications, like a website that has to be up all the time, but for computing power, say to run some huge hedge fund analytics, it might fit perfectly. Continue reading “$1000 per hour Servers, Anyone?”