Category Archives: Cloud Computing

Is Redshift outpacing Hadoop as the big data warehouse for startups?

redshift hadoop killer

More and more startups are looking at Redshift as a cheaper & faster solution for big data & analytics.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

Saggi Neumann posted a pretty good side-by-side comparison of Redshift & Hadoop and concluded they were tied based on your individual use case.

Meanwhile Bitly engineering concluded Redshift was much easier.

1. More agile

One thing pointed out by the bitly blog post, which I’ve seen countless times, is the slow iteration cycle. Write your map-reduce job, run, test, debug, then run on your cluster. Wait for it to return and you might feel like you’re submitting a stack of punched cards. LOL Resolve the errors that come back and then rerun on your cluster. Over & over & over again.

With Redshift you’re writing SQL, so your iterating through syntax errors quickly. What’s more since Redshift is a column-compressed database, you can do full table scans on columns without indexes.

What that means for you and me is that queries just run. And they run blazingly fast!

Also: When hosting data on Amazon turns bloodsport

2. Cheap

Redshift is pretty darn cheap.

Saggi’s article above quotes Redshift at $1000/TB/yr for reserved, and $3700/TB/yr for on-demand. This compared with a hadoop cluster at $5000/TB/yr.

But neither will come with spitting distance of the old-world of Oracle, where customers host big iron servers in their own datacenter, paying north of a million dollars between hardware & license costs. Amazon cloud FTW!

Related: Did dropbox have to fail?

3. Even faster

Airbnb’s nerds blog has a post showing it costing 25% of a Hadoop cluster, and getting 5x performance boost. That’s pretty darn impressive!

Flydata has done benchmarks showing 10x speedup.

Read: Are SQL Databases dead?

4. SQL Toolchains

***

Also: 5 core pieces of the Amazon cloud puzzle to get your project off the ground

5. Limitations

o data loading

You load data into Redshift using the COPY command. This command reads flat files from S3 and dumps them into tables. It can be extremely fast if you do things in parallel. However getting your data into those flat files is up to you.

There are a few solutions to this.

– amazon data pipeline

This is Amazon’s own toolchain, which allows you to move data from RDS & other Amazon hosted data sources. Data pipeline does not move data realtime, but in batch. Also it doesn’t take care of schema changes so you have to do that manually.

I mentioned it in my 5 reasons to move data to Amazon Redshift

– Flydata service

Flydata is a service with a monthly subscription which will connect to your RDS database, and move the data into Redshift. This seems like a no brainer, and given the heft pricetag of thousands per month, you’d expect it to cover your bases.

In my experience there are a lot of problems & it still required a lot of administration. When schema changes happen, those have to be carefully applied on Redshift. What’s more there’s no silver bullet around the datatype differences.

Also: Some thoughts on 12 factor apps

Flydata also makes use of the binary logs to replicate your data. Anything that doesn’t show up in the binary logs is going to cause you trouble. That includes when you do sql_log_bin=0 in the session, an SQL statement includes a no logging hint. Also watch out for replicate-ignore-db options in your my.cnf. But it also will fail if you use ON DELETE CASCADE. That’s because these downstream changes happen via Constraint in MySQL. But… drumroll please, Redshift doesn’t support ON DELETE CASCADE. In our case the child tables ended up with extra rows, and some queries broke.

– scripts such as Donors choose loader

Donors Choose has open sourced their nightly Redshift loader script. It appears to reload all data each night. This will nicely sidestep the ON DELETE CASCADE problem. As you grow though you may quickly hit a point where you can’t load the entire data set each night.

Their script sources from Postgres, though I’m interested to see if it can be modified for MySQL RDS.

– Tried & failed with Tungsten replicator

Theoretically Tungsten replicator can do the above. What’s more it seems like a tool custom made for such a use case. I tried for over a month to troubleshoot. I worked closely with the team to iron out bugs. I wrote wrestling with bears or how I tamed Tungsten replicator for MySQL and then I wrote a second article Tungsten replicator the good the bad & the ugly. Ultimately I did get some data moving between MySQL RDS & Redshift, however certain data clogged the system & it wouldn’t work for any length of time.

Also: Secrets of a happy Amazon hacker or how to lock down your account with IAM and multi-factor authentication

o data types & character sets

There are a few things here to keep in mind. Redshift counts bytes, so if in mysql or some other database you had a varchar(5) it may be varchar(20) in Redshift. Even then I had cases where it still didn’t fit & I had to make the field bigger by 4.

I also ran into problems around string character encodings. According to the docs Redshift handles 4-byte UTF-8.

Redshift doesn’t support ARRAYs, BIT, BYTEA, DATE/TIME, ENUM, JSON and a bunch of others. So don’t go into it expecting full Postgres support.

What you will get are multibyte characters, numeric, character, datetime, boolean and some type conversion.

Also: Is the difference between dev & ops a four-letter word?

o rebalancing

If and when you want to add nodes, expect some downtime. Yes theoretically the database is online while it’s shipping data to the new nodes & redistributing things, the latency can start to feel like an outage. What’s more it can easily push into the hours to do.

Also: Is AWS enabling startups which enable AngelList Syndicates to boil the VC business?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

5 core pieces of the Amazon Cloud puzzle to get your project off the ground

amazon cloud automation

One of the most common engagements I do is working with firms in and around the NYC startup sector. I evaluate AWS infrastructures & applications built in the Amazon cloud.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

I’ve seen some patterns in customers usage of Amazon. Below is a laundry list of the most important ones.

On our products & pricing page you can see more detail including how we perform a performance review and a sample executive summary.

1. Use automation

When you first start using Amazon Web Services to host your application, you like many before you may think of it like you’re old school hosting. Setup a machine, configure it, get your code running. The traditional model of systems administration. It’s fine for a single server, but if you’re managing a more complex deploy with continuous integration, or want to be resilient to regular server failures you need to do more.

Enter the various automation tools on offer. The simplest of the three is Elastic Beanstalk. If you’re using a very standard stack & don’t need a lot of customizations, this may well work for you.

With more complex deployments you’ll likely want to look at Opsworks Sounds familiar? That’s because it *is* Opscode Chef. Everything you can do with Chef & all the templates out there will work with Amazon’s offering. Let AWS manage your templates & make sure your servers are in the right state, just like hosted chef.

If you want to get down to the assembly language layer of infrastructure in Amazon, you’ll eventually be dealing with CloudFormation. This is JSON code which defines everything, from a server with an attached EBS volume, to a VPC with security rules, IAM users & everything inbetween. It is ultimately what these other services utilize under the hood.

Also: Is Amazon too big to fail?

2. Use Advisor & Alerts

Amazon has a few cool tools to help you manage your infrastructure better. One is called Trusted Advisor . This helps you by looking at your aws usage for best practices. Cost, performance, security & high availability are the big focal points.

In order to make best use of alerts, you’ll want to do a few things. First define an auto scaling group. Even if you don’t want to use autoscaling, putting your instance into one allows amazon to do the monitoring you’ll want.

Next you’ll want to analyze your CloudWatch metrics for usage patterns. Notice a spike, could be a job that is running, or it could be a seasonal traffic spike that you need to manage. Once you have some ideas here, you can set alerts around normal & problematic usage patterns.

Related: Are we fast approaching cloud-mageddon?

3. Use Multi-factor at Login

If you haven’t already done so, you’ll want to enable multi-factor authentication on your AWS account. This provides much more security than a password (even a sufficiently long one) can ever do. You can use Google authenticator to generate the mfa codes and associated it with your smartphone.

While you’re at it, you’ll want to create at least one alternate IAM account so you’re not logging in through the root AWS account. This adds a layer of security to your infrastructure. Consider creating an account for your command line tools to spinup components in the cloud.

You can also use MFA for your command line SSH logins. This is also recommended & not terribly hard to setup.

Read: When hosting data on Amazon turns bloodsport

4. Use virtual networking

Amazon offers Virtual Private Cloud which allows you to create virtual networks within the Amazon cloud. Set your own ip address range, create route tables, gateways, subnets & control security settings.

There is another interesting offering called VPC peering. Previously, if you wanted to route between two VPCs or across the internet to your office network, you’d have to run a box within your VPC to do the networking. This became a single point of failure, and also had to be administered.

With VPC peering, Amazon can do this at the virtualization layer, without extra cost, without single point of failure & without overhead. You can even use VPC peering to network between two AWS accounts. Cool stuff!

Also: Are SQL databases dead?

5. Size instances & I/O

I worked with one startup that had been founded in 2010. They had initially built their infrastructure on AWS so they chose instances based on what was available at the time. Those were m1.large & m1.xlarge. A smart choice at the time, but oh how things evolve in the amazon world.

Now those instance types are “previous generation”. Newer instances offer SSD, more CPU & better I/O for roughly the same price. If you’re in this position, be sure to evaluate upgrading your instances.

If you’re on Amazon RDS, you may not be able to get to the newer instance sizes until you upgrade your database. Does upgrading MySQL involve much more downtime on Amazon RDS? In my experience it surely does.

Along with instance sizes, you’ll also want to evaluate disk I/O options. By default instances in amazon being multi-tenant, use disk as a shared resource. So they’ll see it go up & down dramatically. This can kill database performance & can be painful. There are expensive solutions. Consider looking at provisioned IOPS and additional SSD storage.

Also: Is the difference between dev & ops a four-letter word?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Does FedRAMP formalize what good devops already do?

fedramp-logo

amazon-govcloud

Amazon’s GovCloud provides a specialized region within Amazon’s global footprint of datacenters. These are hosted within the United States, and provide a subset of the full Amazon cloud functionality.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

However, hosting within GovCloud is not the whole story. Beyond this, you’ll want to implement FedRAMP compliant procedures & policies.

Are these policies new? As a seasoned systems administrator of Unix & Linux networks, you’ll likely find these very familiar best practices. What they do however, is formalize those into a set of procedures for testing compliance. And that’s a good thing.

1. Use a bastion box

A bastion box is a single point of entry for all your SSH activity. Instead of allowing SSH access to any of your servers from *anywhere* on the internet, you limit it to one box. This box is hardened with multi-factor authentication for security, only opens port 22, monitors & logs access, and funnels movement to all your other boxes. Thus you gain a virtual perimeter that you’re already familiar with in more traditional firewall setups.

Also: Ward Cunningham explains the high cost of technical debt (video)

2. Monitor & scan for vulnerabilities

Monitoring, scanning & logging are all key facilities for security management. Regular patch management of each of your servers, is essential to protect from newly discovered vulnerabilities. FedRAMP also requires scanning by tools such as Nessus or Retina.

Also centralizing your authorization, access & error logs allows easy monitoring & alerting of threats & improper access attempts.

Related: Do managers underestimate the cost of operations?

3. Policy of least privilege

The policy of least privilege is an old friend in computing & managing unix systems. It means first to eliminate all privileges (default to none) and then grant only those a user requires to do his or her work.

In Amazon it means not using the root account for provisioning infrastructure, it means a clear separation of dev, test & production environments. It limits who can access production & especially make changes there. It limits who can see sensitive data.

As well, you’ll use Access Control Lists (ACL’s) and security groups to control which servers can reach which other servers, whom on the internet can touch specific servers & ports, and so forth. These are the Amazon Cloud equivalent of perimeter security you may be familiar with in more traditional firewalls.

Read: When hosting data on Amazon turns bloodsport

4. Encrypt your data

If you want to be truly secure, you’ll want to encrypt your data at rest. You can do this by using encrypted filesystems in Linux. That way data is in a digital envelope, even on disk. Only when data is read into memory is it unencrypted. This provides additional insurance, because your EBS snapshots, backups & so forth are all hidden from prying eyes.

Also: Why dropbox didn’t have to fail

5. Conclusion

Amazon’s GovCloud provides access to a subset of their cloud offerings including EC2 their elastic compute cloud virtual servers, EBS the elastic block storage their own storage area network, S3 for file storage, VPC, IAM, RDS, Elasticache & Redshift.

FedRAMP formalizes what good systems administrators do already. Secure systems, deliver reliability & high availability & protect from unauthorized entry.

Also: Is Amazon too big to fail?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Why Dropbox didn’t have to fail

dropbox outage dec 2015

Dropbox is currently experiencing a *major* outage. See the dropbox status page to get an update.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

I’ve written about outages a lot before. Are these types of major failures avoidable? Can we build better, with redundant services so everything doesn’t fall over at once?

Here’s my take.

1. Browse only mode

The first thing Dropbox can do to be more resilient is to build a browsing only mode into the application. Often we hear about this option for performaing maintenance without downtime. But it’s even more important during a real outage like Dropbox is currently experiencing.

Not if but *when* it happens, you don’t have control over how long it lasts. So browsing only can provide you with real insurance.

For a site like Dropbox it would mean that the entire website is still up and operating. Customers can browse their documents, view listings of files & download those files. However they would not be able to *add* or change files during the outage. Thus only a very small segment of customers is interrupted, and it becomes a much smaller PR problem to manage.

Facebook has experienced outages of service. People hardly notice because they’ll often only see a message when they try to comment on someone’s wall post, send a message or upload a photo. The site is still operating, but not allowing changes. That’s what a browsing only mode affords you.

A browsing only mode can make a big difference, keeping most of the site up even when transactions or publish are blocked.

Drupal is an open source platform that powers big publishing sites like Adweek, hollywoodreporter.com & economist.com. It supports a browsing only mode out of the box. An outage like this one would only stop editors from publishing new stories temporarily. It would be a huge win to sites that get 50 to 100 million with-an-m visitors per month.

Also: Is Amazon too big to fail

2. Redundancy

There are lots of components to a web infrastructure. Two big ones are webservers & databases. Turns out Dropbox could make both tiers redundant. How do we do it?

On the database side, you can take advantage of Amazon’s RDS & either read-replicas or Multi-AZ. Each have different service characteristics, so you’ll need to evaluate your app to figure out what works best.

You can also host MySQL, Percona or Mariadb direclty on Amazon instances yourself & then use replication.


Using redundant components like placing webservers and databases in multiple regions, Dropbox could avoid a major outage like they’re experiencing this weekend.

Wondering about MySQL versus RDS? Here are some uses cases.

Now that you’re using multiple zones & regions for your database the hard work is completed. Webservers can be hosted in different regions easily, and don’t require complicated replication to do it.

Related: Are SQL databases dead?

3. Feature flags

On/off switches are something we’re all familiar with. We have them in the fuse box in our house or apartment. And you’ll also find a bigger larger shutoff in the basement.

Individual on/off switches are valuable because they allow us to disable inessential features. We can build them into heavier parts of a website, allowing us to shutdown features in an emergency. Host components in multiple availability zones for extra piece of mind.

Read: 5 Things toxic to scalability

4. Simian armies

Netflix has taken a more progressive & proactive approach to outages. They introduce their own! Yes that’s right they bake redundancy & automation right into all of their infrastructure, then have a loose canon piece of software called Chaos Monkey that periodically kills servers. Did I hear that right? Yep it actually nocks components offline, to actively test the system for resiliency.

Take a look at the Netflix blog for details on intentional load & stress testing.

Also: When hosting data on Amazon turns bloodsport

5. Multiple clouds

If all these suggestions aren’t enough for you, taking it further you could do what George Reese of enstratus recommends and use multiple cloud providers. Not being dependant on one company could help in many situations, not just the ones described here.

Basic Amazon EC2 best practices require building redundancy into your infrastructure. Virtual servers & on-demand components are even less reliable than commodity hardware we’re familiar with. Because of that, we must use Amazon’s automation to insure us against expected failure.

Also: Why I like Etsy’s site performance report

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Secrets of a happy Amazon hacker – IAM, MFA & locking down your account

aws logo

If you’re still using a password to login to your AWS account it’s time you batten down the hatches. With a little work you can dramatically improve security.

1. install command line tools

First get ahold of the aws comand line tools. They’re python based so you’ll need the package manager “pip” first.


$ curl -O https://bootstrap.pypa.io/get-pip.py
$ pip install awscli

Next configure your access key & secret key. You can edit the file below or use “$ aws configure”


$ cat .aws/credentials
[default]
aws_access_key_id = AAAAAAAAAAAAAAAABCD
aws_secret_access_key = ABcdefghijklmnop!mnors323

 

Also: Is Amazon too big to fail?

2. Create a new user

You don’t want to be using your aws root user for everything. So we’ll create a new user called “seancli”.


$ aws create-user --user-name "seancli"
$ aws iam create-login-profile --user-name "seancli" --password "seanpass"

Related: Did Airbnb have to fail?

3. give admin privileges

We want our new user to be able to administer things. So let’s give him administrator privileges to AWS resources. AdministratorAccess is a collection of permissions & a policy managed by AWS.

 

$ aws iam create-group –group-name “admin”
$ aws iam attach-group-policy –group-name “admin” –policy-arn “arn:aws:iam::aws:policy/AdministratorAccess”
$ aws iam add-user-to-group –group-name “admin” –user-name “seancli”

Read: When hosting data on Amazon turns bloodsport

4. Enable MFA

Now for the fun bit. Enable multi-factor authentication. This is important for really making your aws account secure. Remember anyone who gets into your account can delete *ALL* your infrastructure, and/or spinup servers which cost a lot of money. So just a password alone is not sufficient.

MFA uses your phone (or a key fob if you like) as the second factor.

A. Install Google Authenticator
B. Login to your aws dashboard
C. Click your name menu then select “Security Credentials”

 

amazon security credentials

 

 

 

 

 

 

D. Open the Multi-factor section

 

activate amazon mfa

 

 

 

 

 

 

E. Click “activate MFA” & a QR code with display

 

virtual mfa device amazon

 

 

 

 

F. Open your Google Authenticator app & click (+)
G. Select scan barcode
H. Point your smartphone camera at the QR code from step E.

You’ll be asked to enter *two* consecutive six-digit sequences. Once completed, try logging in again.

Also: Are SQL Databases Dead?

5. Test with command line

After you’ve created your new user, you should test it to make sure you can login properly.

Also: 5 Reasons to move data to Amazon Redshift

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Is AWS enabling AngelList to boil the VC business?

aws logo

Just finished listening to Ben Thompson & James Allworth discuss how Amazon Web Services is impacting the venture capital business.

My mind is blown!

Join 32,000 others and follow Sean Hull on twitter @hullsean.

I heard all about AngelList getting 400m from China’s csc. I didn’t really understand the significance until I saw Fred Wilson’s post Outsider vs Disruptor.

Are VCs nervous, I wondered?

The argument goes, as it takes less capital to get started, way more people can step in to help you get going. Startups don’t need a VC from their first day.

1. Is Amazon boiling the VC frog alive?

As the story goes, if you turn up the temperature slowly, the frog won’t notice that he’s being boiled.

Ben Thompson at 9:30 in the podcast:

“I think the real enabler of this is Amazon. Back in the 90’s you had to go buy Sun servers, Oracle databases, and you had to spend hundreds of thousands if not millions of dollars. And they were all up-front costs. And that’s what Venture Capital is good for.”

Indeed, with the advent of AWS, startups can build their application in the cloud with *ZERO* upfront costs, and only dollars per hour. This is truly a seachange.

Also: A history lesson for cloud detractors – January 2012

2. Dell’s 67 billion dollar buy of EMC

o largest acquisition in tech history
o enterprise tech & enterprise storage

At 50:36 in the podcast, James Allworth says:

“I have this mental image of what used to be this massive land mass, and all these companies fighting it out and eventually the ocean is rising, aws is rising and it’s leaving an increasingly small amount of land mass and there are fewer & fewer of them and it’s going to be very interesting to see whether any land mass is remains when aws is finished with it, and i guess this DELL EMC thing, the argument is well there’s gonna be a little bit left & we’re going to take whatever it is because we’re the biggest but it remains to be seen whether there’s gonna be anything left for anyone at all”

Dell buying EMC is apparently the largest acquisition in tech history at 67 billion according to Bloomberg. That sure does say a lot about Amazon’s downward pressure & commoditization.

Though I didn’t know EMC would be bought by Dell for such a ridiculous sum, I was arguing this back in 2011 – the New commodity hardware craze .

Related: Is Amazon too big to fail?

3. Wework & the disappearing server room

Ben Thompson makes a really fascinating point at 46:30 of the podcast:

“There’s been a big shift from the valley to san Francisco all the big companies of yesteryear are in the valley and almost all of the unicorns are in san francisco, and this is also because of AWS…

You can’t afford to pay square footage for servers in San Francisco, but if your startup is only some people, a desk & some computers… suddently it’s much more viable you have companies running businesses out of wework offices… the only reason wework can exist is because you don’t need to have servers because all the servers are housed by amazon the fundamental fabric of the silicon valley is changed because of aws”

Yet again, Amazon has impacted the valley in a huge way.

Also: Are we fast approaching cloud-mageddon?

4. Google & iphone scale


“You could make the argument that AWS is right up there with Google & right up there with the iPhone in it’s fundamental transformation of industry after industry.”

And while Amazon is fully enabled by Linux, and didn’t invent utility computing, they have surely

Read: When hosting data on Amazon turns bloodsport

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Does Linux tell the Gilgamesh story of hacker culture?

stephenson command line

Is the command line still essential?
Was Stephenson right about his Linux

It’s been a while since I read Stephenson’s essay on Linux. It’s one of those pieces that’s so well written, we need to go back to it now & then.

Join 28,000 others and follow Sean Hull on twitter @hullsean.

This quote caught my eye right away.

“…as living in a commune, where much lip service was paid to ideals of peace, love and harmony, had deprived them of normal, socially approved outlets for their control freakdom, it tended to come out in other invariably more sinister ways. Applying this to the case of Apple Computer will be left as an exercise for the reader, and not a very difficult exercise.”

Anyone who has read about Steve Jobs will chuckle at this one.

1. The Hole Hawg of the internet

When Stephenson wrote this it was 1999. Linux adoption was growing at internet startups, where cost was everything, and risks could be taken. Remember this was before the two biggest data center companies even existed, namely Google & Amazon. Without Linux, neither would be here today!

hole hawg power

Linux was and is today more like a Hole Hawg for the internet, powerful, but dangerous in the wrong hands. :)


“The Hole Hawg is like the genie of the ancient fairy tales, who carries out his masters instructions literally and precisely and with unlimited power, often with disasterous unforseen consequences.”

Also: Why I like Etsy’s site performance report

2. Unix as oral history, our Gilgamesh

gilgamesh unix


“Unix, by contrast is not so much a product as it is a painstakingly compiled oral history of the hacker subculture. It is our Gilgamesh. What made old epics like Gilgamesh so powerful and so long-lived was that they were living bodies of narrative that many people knew by heart, and told over and over again — making their own personal embellishments whenever it struck their fancy.”

Also: Are SQL Databases dead?

3. The bizarre Trinity Torvalds, Stallman & Gates


“In trying to understand the Linux phenomenon, then, we have to look not to a single innovator but to a sort of bizarre Trinity, Linus Torvalds, Richard Stallman and Bill Gates. Take away any of these three & Linux would not exist.”

And indeed we must thank all three of these characters for where the internet stands today. The cloud is possible because of Linux & cheap intel hardware. And the GNU free software to go along with it.

Related: Did MySQL & Mongo have a beautiful baby called Aurora?

4. On the meaning of “Open Source”


“Source files are useless to your computer, and of little interest to most users, but they are of gigantic cultural & political significance, because Microsoft & Apple keep them secret, while Linux makes them public. They are the family Jewels. They are the sort of thing that in Hollywood thrillers is used as a McGuffin: the plutonium bomb core, the top-secret blueprints, the suitcase of bearer bonds, the reel of microfilm.

Read: When hosting data on Amazon turns bloodsport

5. What about Apple today?


“The ideal OS for me would be one that had a well-designed GUI that was easy to set up and use, but that included terminal windows where I could revert to the command line interface and run GNU software when it made sense.”

Stephenson wrote this before Apple has rebuilt their OS to sit on top of Unix. And that’s where we are today with Mac OS X!

Also: Are we fast approaching cloud-mageddon??

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Why you should build a portable cloud

486 linux server

I was recently browsing Ycombinator News. It is always an endless trove of the curious & interesting.

I stumbled onto Karanbir Singh’s post The Portable Cloud. I was curious, what is that?

Join 28,000 others and follow Sean Hull on twitter @hullsean.

This story gave me warm fuzzies… I was excited in a similar way when Linux was first released. This was many years back through the mists of time in 1992. I had recently graduated computer science, and one of my favorite classes was Operating Systems. We worked to build an OS following Andrew Tennenbaum’s book Minix.

After graduation, I heard about the Linux project & got excited. I was hearing whispering online that Linux could really completely replace windows. So I bought all the parts to build a 486 tower, graphics card, motherboard, memory cards & IDE drives. This ran into the thousands of dollars. Hardware wasn’t cheap then! Keyboard & monitor. I even ordered an optical mouse because it felt like you were sitting at a sun workstation, at home!

I remember putting all this together, and loading the first floppy disk into the thing. Did I image the disks properly? Will it really load something?

Up comes the bios and sure enough it’s booting off of the floppy drive. I thought “Wow, mother of god! This is amazing!”

From there I had init running, and soon the very seat of the soul, the Unix OS itself. That felt so darn cool.

After that I’d spend weeks configuring x-windows, but to have a GUI seemed like the mission impossible. And then you’d go about tweaking and rebuilding your kernel for this or that.

Thank you to Karanbir for rekindling this memory. It’s a great one!

For those starting out now as a developer, operations, or cloud site reliability engineer, I would totally recommend following Karanbir’s instructions. Here’s why!

1. Learning by building

My favorite thing about building a server myself, is that there’s something physical going on. You’re plugging in a cable for the disk bus. Bus is no longer just a concept, but a thing you can hold. You’re plugging in memory, you can look at it & say oh this is a chip, it’s different than a disk drive. You can hold the drive and say, oh there’s a miniature little record player in there, with magnetic arm. Cool!

Also: Is Amazon too big to fail?

2. Linux early beginnings

Another thing I remember about those days, was feeling like I was part of something big. I knew operating systems were crucial. And I knew that Windows wasn’t working. I knew it wouldn’t scale to the datacenter anyway.

I realized I wasn’t the only one to think this way. There were many others as excited as me, who were contributing code, and debug reports.

Related: Did Airbnb have to fail?

3. Debugging & problem solving

Building your own server involves a ton of debugging. In those days you had to compile all those support programs, using the GNU C compiler. You’d run make and get a whole slew of errors, and fire up your editor and get to work.

Configuring your windowing system meant figuring out where the right driver was, and also buying a graphics card that was *supported*. You would then tweak the refresh frequencies, resolution, and so forth. There was no auto detection. You could actually fry your monitor, if you set those numbers wrong!

Read: When hosting data on Amazon turns bloodsport

4. Ownership of the stack

These days you hear a lot about “fullstack engineers”. There is no doubt in my mind, that this is the way to become one. Basic systems administration requires you to compile other peoples software & troubleshoot it. All those developer skills that will come in very handy.

They force you to see all the hardware, and how it fits together into a greater performant whole. It also gives you an appreciation for speed. Use one bus such as IDE or another such as SCSI and experience a different performance profile. Because all that software that Unix is paging in and out of memory, it’s doing by reading & writing to disk!

Also: Are SQL Databases Dead?

5. Learn to be a generalist

I’ve argued before that being a generalist allows you to better scale the web. I still think this, and I believe it is this formative experience building servers from individual parts, that taught me whole the larger whole fit together.

This allows me to look at problems today, and jump to causes of performance problems quickly.

Also: Does Volkswagen highlight the bloodsport in benchmarks?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

When hosting data on Amazon turns bloodsport

reddit aws outage

There’s a strong trend to automation across the cloud. That’s a great thing for startups because it reduces operational headaches & lets them focus on building products.

Join 31,000 others and follow Sean Hull on twitter @hullsean.

But as that trend begins to touch the database tier, all sorts of complications emerge. Let’s take a look at some of the tradeoffs.

1. Database as a service trend

I was recently reading Baron Schwartz’s article on the trend to database as a service.

I work with a lot of venture backed startups & pay close attention to what’s happening in New York & SF. From where I’m standing I see a similar trend. As automation simplifies management across the application stack, from load balancers to web & search servers, the same advantages are moving to database management.

Also: How to automate MySQL analysis on Amazon RDS

2. How Amazon RDS helps

Amazon’s RDS offers firms a data solution for Oracle & SQL Server as well as MySQL. For those just starting, it offers a long list of advantages.

o quick push-button deployment in minutes
o standardized parameters settings that just work
o ability to scale up or down from the dashboard
o automated backups
o multi-az so you can sleep at night

This brings a huge advantage to startups. Many have a team of developers but aren’t large enough to need an operations team and can’t afford a dedicated database administrator.

Amazon is obviously helping these firms raise the bar. And that’s a good thing.

Related: RDS or MySQL 10 use cases

3. How Amazon RDS hurts

As you get bigger, your needs will grow too. You’ll have tens of millions of customers, and with more customers comes an even higher bar. Zero downtime becomes critical. It’s then that Amazon’s solution starts to become frustrating.

Unpredictable upgrades

MySQL upgrades on RDS are a messy activity. Amazon will restart the instance, backup the instance, perform the upgrade then restart again. Each of these restarts takes a few minutes. The whole operation may have you down for ten minutes. This becomes more frustrating when your hands are completely tied. You don’t know when or what will happen!

When you roll-your-own instance, an upgrade can be performed in a matter of seconds. No instance restarts are necessary and you can monitor the process to know exactly where you are. This is the kind of control you’re going to want if you have millions of customers relying on your site & uptime.

Unnecessary slow restarts

When you apply parameter changes on RDS, some require a MySQL restart. Amazon forces the whole server to restart, increasing this downtime from a few seconds (when you roll your own) to many minutes. And while some parameters can be changed online, Amazon can provoke some strange behavior that is not always predictable.

With the frequency of these types of changes, you’ll quickly grow tired and frustrated with RDS.

EBS Snapshots are not portable

As mentioned above Amazon uses it’s standard filesystem snapshot technology to perform backups. While this works well, it can be slow & unpredictable in a multi-tenant environment.

When you roll your own, you can take advantage of xtrabackup, and perform hot backups against your database with zero downtime. This is a real godsend. What’s more they are portable, and can be moved to any other server even ones not hosted in Amazon’s cloud!

Promoting a read-replica is slow too!

One feature that Amazon touts is creating copies or “read replicas” of your data. These are great and can facilitate easy copying of data. However promoting these again brings unnecessary restarts which are slow.

When you roll your own, you can promote a read-replica or read-only slave in seconds. A few seconds can seem invisible to end users, while minutes will be perceived as a real outage or downtime.

Read: Is zero downtime even possible with RDS?

4. Is migration an option?

So what to do? As I mentioned above, there are real advantages to startups deploying their first database. It really does help. I would argue for many it can be a good place to start.

If you’re starting to outgrow RDS and frustrated with the limitations, performance tuning headaches & unneeded downtime, luckily you have options.

Migrating off of RDS onto a physical server can be done in a number of ways.

o slave off of the master

Here you build a MySQL slave on a standard EC2 instance, with your RDS instance as the master. When you’re caught up, bring your site down temporarily. Reset the slave & set to read-write mode. Then point your webservers at your new EC2 instance and bring the site back up. If done carefully 10 to 20 seconds of downtime should be plenty.

Don’t forget to run through the process with a firedrill first!

o dump & import

Another way to move your data may be MySQLdump. This option would be slower & bring a lot more downtime, but possibly necessary in some cases.

Also: 5 Reasons to move data to Amazon Redshift

5. Speed: It’s the database

Fred Wilson says speed is the number one feature of a web application. If customers are frustrated & waiting, they may leave & not come back. On the web it can be everything.

Many firms are rushing to database as a service to simplify administration. While that’s wonderful at the beginning, as you grow performance will become more of a day-to-day concern. And when it does, the database is going to be big on your list of headaches.

Web application performance inevitably involves the database and while it does, your decision to choose database as a service may come into question. Don’t be afraid to bite the bullet and manage things yourself when that time comes.

Also: Is upgrading RDS like a shit-storm that will not end?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How 1and1 failed me

1and1 fail

I manage this blog myself. Not just the content, but also the technology it runs on. The systems & servers are from a hosting company called 1and1.com. And recently I had some serious problems.

Join 31,000 others and follow Sean Hull on twitter @hullsean.

The publishing platform wordpress, as a few versions out of date. Because of that some vulnerabilities surfaced.

1. Malware from Odessa

While my eyes were on content, some russian hackers managed to scan my server & due to the older version of wordpress, found a way to install some malware onto the box. This would be invisible to most users, but was nevertheless dangerous. As a domain name with a fifteen year life, it has some credibility among the algorithms & search engines. There’s some trust there.

Google identified the malware, and emailed me about it. That was the first I was alerted in mid-August. That was a few days before I left for vacation, but given the severity of it, I jumped on the problem right away.

Also: Why I say Always be publishing

2. Heading off a lockout

I ordered up a new server from 1and1.com to rebuild. I then set to work moving over content, and completely reinstalled the latest version of wordpress.

Since it was within the old theme that the malware files had been hidden, I eliminated that whole directory & all files, and configured the blog with the newest wordpress theme.

Around that time I got some communication from 1and1. As it turns out they had been notified by google as well. Makes sense.

Given the shortage of time, and my imminent vacation, I quickly called 1and1. As always their support team was there & easy to reach. This felt reassuring. I explained the issue, how it occurred and all the details of how the server & publishing system had been rebuillt from the ground up.

This was August 24th timeframe. As I had received emails about a potential lockout, I was reassured by the support specialist that the problem had been resolved to their satisfaction.

Read: Do managers underestimate operational cost?

3. Vacation implosion

I happily left for vacation knowing that all my hard work had been well spent.

Meantime around August 25th, 1and1.com sent me further emails asking me for “additional details”. Apparently the “I’m going on vacation” note had not made it to their security division. Another day goes by and since they received no email from me the server was locked!

Being locked, means it is completely unreachable. Totally offline. No bueno! That’s certainly frustrating, but websites do go down. What happened next was worse.

Since I use Mailchimp to host my newsletter, I write that well in advance each month. Just like clockwork the emails go out to my 1100 subscribers on September 1st. Many of those are opened & hundreds click on the link. And there they are faced with a blank screen & browser. Nothing. Zilch! Offline!

Also: Why I use Airbnb chat even when texting is easier

4. The aftermath

As I return to connectivity, I begin sifting through my emails. I receive quite a few from friends in colleagues explaining that they couldn’t view my newsletter. I immediately remember my conversation with 1and1, their assurances that the server won’t be locked out, and that all is well. I’m thinking “I bet that server got locked out anyway”. Damn it, I’m angry.

Taking a deep breath, I call up 1and1 and get on the line with a support tech. Being careful not to show my frustration, I explain the situation again. I also explain how my server was down for two weeks and how it was offline during a key moment when my newsletter goes out.

The tech is able to reach out to the security department & explain things again. Without any additional changes to my server or technical configuration they are then able to unlock the server. Sad proof of a beurocratic mixup if there ever was one.

Also: Is Amazon too big to fail?

5. Reflections on complexity

For me this example illustrates the complexity in modern systems. As the internet gets more & more complex, some argue that we are building a sort of house of cards. So many moving parts, so many vendors, so many layers of software & so many pieces to patch & update.

As things get more complex, their are more cracks for the hackers to exploit. And patching those up becomes ever more daunting.

Related: Are we fast approaching cloud-mageddon?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters