Category Archives: Cloud Computing

Why is everyone suddenly talking about Amazon Redshift?

par accel redshift

It seems like all I hear these days is Redshift, Redshift, Redshift!

I met up with a recruiter today. We talked about this & that. The usual. Then when he came to the topic of technology he said,

“yeah it seems as though suddenly everybody is looking for Redshift & Snowflake”

As I blogged about before, I don’t work with recruiters, I learn a lot from them.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

Luckily I got to cut my teeth on Redshift about a year ago. I was senior database engineer managing Amazon & MySQL RDS, and they wanted to build a data warehouse. Bingo!

Here’s the big takeaway from my discussion today. Recruiters have their fingers on the pulse!

1. We need an Amazon expert

Here’s what else I’m hearing everywhere. “We’re migrating to AWS, can you help?” Complexity & confusion around the new virtual networking, moving into the cloud, and tuning applications & components to get the same performance as before. All of these are real & present needs for firms.

Related: Is data your dirty little secret?

2. We need a Redshift expert

Amazon bought Par Accel, a bleedingly fast warehouse. It uses SQL. It looks like Postgres, and handles petabytes. You read that petabytes! It’s so good in fact that it seems a lot of folks are now dumping Hadoop.

Incredible as that sounds, Redshift is delivering *that* kind of speed on that kind of big data. Wow! What’s more you skip the whole Hadoop cycle of write, test, debug, schedule job, fix bugs, and stir. With SQL you bring back the iterative agile process!

Read: 5 cloud challenges I’m thinking about today

3. We need a Hadoop expert

Ok, for those enterprises who aren’t sold on Redshift yet, there is still a ton of Hadoop out there. And for good reason.

Apache Spark is also getting really big now too. It’s an easier to manage successor to Hadoop, based around much of the same concepts.

Also: 5 core pieces of the Amazon cloud puzzle to get your project off the ground

4. We need strong Python skills

Python is everywhere. Amazon’s command line interface is python based. You see it everywhere. If it’s not in your wheelhouse get it there!

Also: Why Dropbox didn’t have to fail

5. We need communicators

Another interesting thing the recruiter said

“I was surprised & a little shocked that you suggested we meet for coffee. Most developers are hard to get out to have a conversation with.”

Good communicators are as in-demand as ever! Being able to and happy to talk with people who aren’t deeply technical, and distill complex technical jargon into plain english. And do that with a smile too & enjoy it?

That’s special!

Also: Should we be muddying the waters? Use cases for MySQL & Mongodb

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Is data your dirty little secret?

data comparison cloud

While I was fumbling for the dictionary to figure out what polyglot persistence was, the CTO had decided to build a warehouse on Redshift.

“Everybody’s moving data there.” He declared. I looked on quizzically.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

“That’s a very new database engine”, I chimed in. “Lets do some testing”.

And there began an adventurous ride into the bleeding edge of Amazon’s new data service offerings!

1. The data scientist comes crying

Are our transactional database we were using Amazon RDS for MySQL. It’s a great managed service that eliminates some of the headaches of doing it yourself. I wrote about thisRDS or MySQL Use Cases.

We needed some way to get data over to Redshift. We evaluated AWS Data Pipeline, but it wasn’t realtime enough. In a pinch we decided on a service called Flydata. After weeks of effort to get it setup & administered we had it running smoothly.

I since discovered some pipelining solutions dedicated to Redshift such as Alooma – modern data plumbing, RJMetrics pipeline and Domo. I *did not* manage to get Tungsten working. It supports redshift on paper, but has a lot of growing up to do.

Until one data when the data scientist shows up at my desk. “We have problems in our data on redshift.”. I look back confused. “Are you sure? Can you tell me where you’re seeing that?” I respond.

Also: When hosting data on Amazon turns bloodsport

2. Deleted data reappears in Redshift!

He sends me over some queries, that I rerun myself. I see extra data in Redshift too, data that had been deleted in MySQL. Strange. We dig deeper together trying to figure out what’s happening.

We find that the tables with extra data are child tables of a parent where data was deleted. Imagine Citibank deletes a customer, they also want to delete the records for monthly bills. Otherwise those will just be hanging around, and won’t match up anymore with a parent record for a customer. In real life Citibank probably doesn’t delete like this but it’s a helpful example.

The first thing I do is open a ticket with Flydata. After all we hadn’t gotten any errors logged. Things *must* be running correctly.

After highlighting the severity of the issue, we setup a conference call with Flydata. Digging further they discover the problem. Child table data can’t get deleted on Redshift, because it doesn’t support ON DELETE CASCADE. Wait what?

Turns out Flydata makes use of the MySQL transaction log to move data. In mysql to mysql replication this works fine because downstream you also have MySQL. It also implements on delete cascade so those child records will get cleaned up correctly. Since Redshift doesn’t have this, there’s no way for Flydata to instruct Redshift what to do. Again I said, wait what?

My surprise wasn’t that a new unproven technology like Redshift had a lot of holes & missing features. My surprise was that Flydata was just silently ignoring the problem. No logged messages to tell the customer about inconsistencies. At least?

Related: Is Amazon too big to fail?

3. The problem – comparing data

As you might imagine, this is a terrible way to find out about data problems. As the person tasked with moving data between these systems, eyes were on me. My thought was, we chose a service-based solution, so they’re manage data movement. If there’s a problem, they’ll surely alert us.

From there the conversation became, ok, how do we figure out where all these data differences are? Is it widespread or isolated to a few tables? Can we mitigate it with changed queries? Cleanup on a daily basis? These are some questions that’ll immediately come to mind.

To answer them we needed a way to compare table data across different databases. This is hard to do within a homogenous environment where server versions & datatypes are likely to be the same. It is much more complicated when you’re working across heterogenous systems.

Read: 5 Reasons to move data to amazon redshift

4. Build some way to spot check data

Although this still doesn’t seem a solved problem, there are some tools. One way is to perform checksums on tables & rows. These can then be compared to find differences.

This drew me to find Jason Friedman’s
table hash script on Github. It can work across MySQL, Postgres & redshift. Pretty cool stuff if you ask me.

One problem remains. Databases are always in flux. As such you may find discrepancies based on data that hasn’t been moved yet. Data that’s just changed in the last few minutes.

If you refresh data nightly, you may for example be able to stop a slave to compare data at an instant in time.

Also: Is Redshift outpacing hadoop as the warehouse for startups?

5. The mentality: treat data as a product & monitor

Solving tough problems like these is a work in progress. What it taught me is that:

You should own your data pipeline

This allows you to be vigilant about monitoring, throw errors if data is different, and ultimately treat data as a product. Owning the pipeline will mean you can build monitoring around stages, and automate spot checks on your data.

You won’t get it perfect, but you want to know when it isn’t.

Also: 5 core pieces of the Amazon puzzle to get your project off the ground

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

5 tech challenges I’m thinking about today

fast fish

Technical operations & startup tech are experiencing an incredible upheaval which is bringing a lot of great things.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

Here are some of the questions it raises for me.

1. Are we adopting Docker without enough consideration?

Container deployments are accelerating at a blistering pace. I was reading Julian Dunn recently, and he had an interesting critical post Are container deployments like an oncoming train?

He argues that we should be wary of a few trends. One of taking legacy applications and blindly containerizing them. Now we can keep them alive forever. :) He also argues that there is a tendency for folks who aren’t particularly technical or qualified who start evangelizing it everywhere. A balm for every ailment!

Also: Is Amazon too big to fail?

2. Is Redshift supplanting hadoop & spark for startup analytics?

In a recent blog post I asked Is Redshift outpacing hadoop as the big data warehouse for startups.

On the one hand this is exciting. Speed & agile is always good right? But what of more Amazon & vendor lock-in?

Related: Did Dropbox have to fail?

3. Does devops automation make all of operations a software development exercise?

I asked this question a while back on my blog. Is automation killing old-school operations?

Automation suites like Chef & Puppet are very valuable, in enabling the administration of fleets of servers in the cloud. They’re essential. But there’s some risk in moving further away from the bare metal, that we might weaken our everyday tuning & troubleshooting skills that are essential to technical operations.

Read: When hosting data on Amazon turns bloodsport?

4. Is the cloud encouraging the old pattern of throwing hardware at the problem?

Want to scale your application? Forget tighter code. Don’t worry about tuning SQL queries that could be made 1000x faster. We’re in the cloud. Just scale out!

That’s right with virtualization, we can elastically scale anything. Infinitely. :)

I’ve argued that throwing hardware at the problem is like kicking the can down the road. Eventually you have to pay your technical debt & tune your application.

Also: Are SQL databases dead?

5. Is Amazon disrupting venture capital itself?

I’m not expert on the VC business. But Ben Thompson & James Allworth surely are. And they suggested that because of AWS, startups can setup their software for pennies.

This resonates loud & clear for me. Why? Because in the 90’s I remember startups needing major venture money to buy Sun hardware & Oracle licenses to get going. A half million easy.

They asked Is Amazon Web Services enabling AngelList syndicates to disrupt the Venture capital business? That’s a pretty interesting perspective. It would be ironic if all of this disruption that VC’s bring to entrenched businesses, began unravel their own business!

Also: Are we fast approaching cloud-mageddon?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Is Redshift outpacing Hadoop as the big data warehouse for startups?

redshift hadoop killer

More and more startups are looking at Redshift as a cheaper & faster solution for big data & analytics.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

Saggi Neumann posted a pretty good side-by-side comparison of Redshift & Hadoop and concluded they were tied based on your individual use case.

Meanwhile Bitly engineering concluded Redshift was much easier.

1. More agile

One thing pointed out by the bitly blog post, which I’ve seen countless times, is the slow iteration cycle. Write your map-reduce job, run, test, debug, then run on your cluster. Wait for it to return and you might feel like you’re submitting a stack of punched cards. LOL Resolve the errors that come back and then rerun on your cluster. Over & over & over again.

With Redshift you’re writing SQL, so your iterating through syntax errors quickly. What’s more since Redshift is a column-compressed database, you can do full table scans on columns without indexes.

What that means for you and me is that queries just run. And they run blazingly fast!

Also: When hosting data on Amazon turns bloodsport

2. Cheap

Redshift is pretty darn cheap.

Saggi’s article above quotes Redshift at $1000/TB/yr for reserved, and $3700/TB/yr for on-demand. This compared with a hadoop cluster at $5000/TB/yr.

But neither will come with spitting distance of the old-world of Oracle, where customers host big iron servers in their own datacenter, paying north of a million dollars between hardware & license costs. Amazon cloud FTW!

Related: Did dropbox have to fail?

3. Even faster

Airbnb’s nerds blog has a post showing it costing 25% of a Hadoop cluster, and getting 5x performance boost. That’s pretty darn impressive!

Flydata has done benchmarks showing 10x speedup.

Read: Are SQL Databases dead?

4. SQL Toolchains

***

Also: 5 core pieces of the Amazon cloud puzzle to get your project off the ground

5. Limitations

o data loading

You load data into Redshift using the COPY command. This command reads flat files from S3 and dumps them into tables. It can be extremely fast if you do things in parallel. However getting your data into those flat files is up to you.

There are a few solutions to this.

– amazon data pipeline

This is Amazon’s own toolchain, which allows you to move data from RDS & other Amazon hosted data sources. Data pipeline does not move data realtime, but in batch. Also it doesn’t take care of schema changes so you have to do that manually.

I mentioned it in my 5 reasons to move data to Amazon Redshift

– Flydata service

Flydata is a service with a monthly subscription which will connect to your RDS database, and move the data into Redshift. This seems like a no brainer, and given the heft pricetag of thousands per month, you’d expect it to cover your bases.

In my experience there are a lot of problems & it still required a lot of administration. When schema changes happen, those have to be carefully applied on Redshift. What’s more there’s no silver bullet around the datatype differences.

Also: Some thoughts on 12 factor apps

Flydata also makes use of the binary logs to replicate your data. Anything that doesn’t show up in the binary logs is going to cause you trouble. That includes when you do sql_log_bin=0 in the session, an SQL statement includes a no logging hint. Also watch out for replicate-ignore-db options in your my.cnf. But it also will fail if you use ON DELETE CASCADE. That’s because these downstream changes happen via Constraint in MySQL. But… drumroll please, Redshift doesn’t support ON DELETE CASCADE. In our case the child tables ended up with extra rows, and some queries broke.

– scripts such as Donors choose loader

Donors Choose has open sourced their nightly Redshift loader script. It appears to reload all data each night. This will nicely sidestep the ON DELETE CASCADE problem. As you grow though you may quickly hit a point where you can’t load the entire data set each night.

Their script sources from Postgres, though I’m interested to see if it can be modified for MySQL RDS.

– Tried & failed with Tungsten replicator

Theoretically Tungsten replicator can do the above. What’s more it seems like a tool custom made for such a use case. I tried for over a month to troubleshoot. I worked closely with the team to iron out bugs. I wrote wrestling with bears or how I tamed Tungsten replicator for MySQL and then I wrote a second article Tungsten replicator the good the bad & the ugly. Ultimately I did get some data moving between MySQL RDS & Redshift, however certain data clogged the system & it wouldn’t work for any length of time.

Also: Secrets of a happy Amazon hacker or how to lock down your account with IAM and multi-factor authentication

o data types & character sets

There are a few things here to keep in mind. Redshift counts bytes, so if in mysql or some other database you had a varchar(5) it may be varchar(20) in Redshift. Even then I had cases where it still didn’t fit & I had to make the field bigger by 4.

I also ran into problems around string character encodings. According to the docs Redshift handles 4-byte UTF-8.

Redshift doesn’t support ARRAYs, BIT, BYTEA, DATE/TIME, ENUM, JSON and a bunch of others. So don’t go into it expecting full Postgres support.

What you will get are multibyte characters, numeric, character, datetime, boolean and some type conversion.

Also: Is the difference between dev & ops a four-letter word?

o rebalancing

If and when you want to add nodes, expect some downtime. Yes theoretically the database is online while it’s shipping data to the new nodes & redistributing things, the latency can start to feel like an outage. What’s more it can easily push into the hours to do.

Also: Is AWS enabling startups which enable AngelList Syndicates to boil the VC business?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

5 core pieces of the Amazon Cloud puzzle to get your project off the ground

amazon cloud automation

One of the most common engagements I do is working with firms in and around the NYC startup sector. I evaluate AWS infrastructures & applications built in the Amazon cloud.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

I’ve seen some patterns in customers usage of Amazon. Below is a laundry list of the most important ones.

On our products & pricing page you can see more detail including how we perform a performance review and a sample executive summary.

1. Use automation

When you first start using Amazon Web Services to host your application, you like many before you may think of it like you’re old school hosting. Setup a machine, configure it, get your code running. The traditional model of systems administration. It’s fine for a single server, but if you’re managing a more complex deploy with continuous integration, or want to be resilient to regular server failures you need to do more.

Enter the various automation tools on offer. The simplest of the three is Elastic Beanstalk. If you’re using a very standard stack & don’t need a lot of customizations, this may well work for you.

With more complex deployments you’ll likely want to look at Opsworks Sounds familiar? That’s because it *is* Opscode Chef. Everything you can do with Chef & all the templates out there will work with Amazon’s offering. Let AWS manage your templates & make sure your servers are in the right state, just like hosted chef.

If you want to get down to the assembly language layer of infrastructure in Amazon, you’ll eventually be dealing with CloudFormation. This is JSON code which defines everything, from a server with an attached EBS volume, to a VPC with security rules, IAM users & everything inbetween. It is ultimately what these other services utilize under the hood.

Also: Is Amazon too big to fail?

2. Use Advisor & Alerts

Amazon has a few cool tools to help you manage your infrastructure better. One is called Trusted Advisor . This helps you by looking at your aws usage for best practices. Cost, performance, security & high availability are the big focal points.

In order to make best use of alerts, you’ll want to do a few things. First define an auto scaling group. Even if you don’t want to use autoscaling, putting your instance into one allows amazon to do the monitoring you’ll want.

Next you’ll want to analyze your CloudWatch metrics for usage patterns. Notice a spike, could be a job that is running, or it could be a seasonal traffic spike that you need to manage. Once you have some ideas here, you can set alerts around normal & problematic usage patterns.

Related: Are we fast approaching cloud-mageddon?

3. Use Multi-factor at Login

If you haven’t already done so, you’ll want to enable multi-factor authentication on your AWS account. This provides much more security than a password (even a sufficiently long one) can ever do. You can use Google authenticator to generate the mfa codes and associated it with your smartphone.

While you’re at it, you’ll want to create at least one alternate IAM account so you’re not logging in through the root AWS account. This adds a layer of security to your infrastructure. Consider creating an account for your command line tools to spinup components in the cloud.

You can also use MFA for your command line SSH logins. This is also recommended & not terribly hard to setup.

Read: When hosting data on Amazon turns bloodsport

4. Use virtual networking

Amazon offers Virtual Private Cloud which allows you to create virtual networks within the Amazon cloud. Set your own ip address range, create route tables, gateways, subnets & control security settings.

There is another interesting offering called VPC peering. Previously, if you wanted to route between two VPCs or across the internet to your office network, you’d have to run a box within your VPC to do the networking. This became a single point of failure, and also had to be administered.

With VPC peering, Amazon can do this at the virtualization layer, without extra cost, without single point of failure & without overhead. You can even use VPC peering to network between two AWS accounts. Cool stuff!

Also: Are SQL databases dead?

5. Size instances & I/O

I worked with one startup that had been founded in 2010. They had initially built their infrastructure on AWS so they chose instances based on what was available at the time. Those were m1.large & m1.xlarge. A smart choice at the time, but oh how things evolve in the amazon world.

Now those instance types are “previous generation”. Newer instances offer SSD, more CPU & better I/O for roughly the same price. If you’re in this position, be sure to evaluate upgrading your instances.

If you’re on Amazon RDS, you may not be able to get to the newer instance sizes until you upgrade your database. Does upgrading MySQL involve much more downtime on Amazon RDS? In my experience it surely does.

Along with instance sizes, you’ll also want to evaluate disk I/O options. By default instances in amazon being multi-tenant, use disk as a shared resource. So they’ll see it go up & down dramatically. This can kill database performance & can be painful. There are expensive solutions. Consider looking at provisioned IOPS and additional SSD storage.

Also: Is the difference between dev & ops a four-letter word?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Does FedRAMP formalize what good devops already do?

fedramp-logo

amazon-govcloud

Amazon’s GovCloud provides a specialized region within Amazon’s global footprint of datacenters. These are hosted within the United States, and provide a subset of the full Amazon cloud functionality.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

However, hosting within GovCloud is not the whole story. Beyond this, you’ll want to implement FedRAMP compliant procedures & policies.

Are these policies new? As a seasoned systems administrator of Unix & Linux networks, you’ll likely find these very familiar best practices. What they do however, is formalize those into a set of procedures for testing compliance. And that’s a good thing.

1. Use a bastion box

A bastion box is a single point of entry for all your SSH activity. Instead of allowing SSH access to any of your servers from *anywhere* on the internet, you limit it to one box. This box is hardened with multi-factor authentication for security, only opens port 22, monitors & logs access, and funnels movement to all your other boxes. Thus you gain a virtual perimeter that you’re already familiar with in more traditional firewall setups.

Also: Ward Cunningham explains the high cost of technical debt (video)

2. Monitor & scan for vulnerabilities

Monitoring, scanning & logging are all key facilities for security management. Regular patch management of each of your servers, is essential to protect from newly discovered vulnerabilities. FedRAMP also requires scanning by tools such as Nessus or Retina.

Also centralizing your authorization, access & error logs allows easy monitoring & alerting of threats & improper access attempts.

Related: Do managers underestimate the cost of operations?

3. Policy of least privilege

The policy of least privilege is an old friend in computing & managing unix systems. It means first to eliminate all privileges (default to none) and then grant only those a user requires to do his or her work.

In Amazon it means not using the root account for provisioning infrastructure, it means a clear separation of dev, test & production environments. It limits who can access production & especially make changes there. It limits who can see sensitive data.

As well, you’ll use Access Control Lists (ACL’s) and security groups to control which servers can reach which other servers, whom on the internet can touch specific servers & ports, and so forth. These are the Amazon Cloud equivalent of perimeter security you may be familiar with in more traditional firewalls.

Read: When hosting data on Amazon turns bloodsport

4. Encrypt your data

If you want to be truly secure, you’ll want to encrypt your data at rest. You can do this by using encrypted filesystems in Linux. That way data is in a digital envelope, even on disk. Only when data is read into memory is it unencrypted. This provides additional insurance, because your EBS snapshots, backups & so forth are all hidden from prying eyes.

Also: Why dropbox didn’t have to fail

5. Conclusion

Amazon’s GovCloud provides access to a subset of their cloud offerings including EC2 their elastic compute cloud virtual servers, EBS the elastic block storage their own storage area network, S3 for file storage, VPC, IAM, RDS, Elasticache & Redshift.

FedRAMP formalizes what good systems administrators do already. Secure systems, deliver reliability & high availability & protect from unauthorized entry.

Also: Is Amazon too big to fail?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Why Dropbox didn’t have to fail

dropbox outage dec 2015

Dropbox is currently experiencing a *major* outage. See the dropbox status page to get an update.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

I’ve written about outages a lot before. Are these types of major failures avoidable? Can we build better, with redundant services so everything doesn’t fall over at once?

Here’s my take.

1. Browse only mode

The first thing Dropbox can do to be more resilient is to build a browsing only mode into the application. Often we hear about this option for performaing maintenance without downtime. But it’s even more important during a real outage like Dropbox is currently experiencing.

Not if but *when* it happens, you don’t have control over how long it lasts. So browsing only can provide you with real insurance.

For a site like Dropbox it would mean that the entire website is still up and operating. Customers can browse their documents, view listings of files & download those files. However they would not be able to *add* or change files during the outage. Thus only a very small segment of customers is interrupted, and it becomes a much smaller PR problem to manage.

Facebook has experienced outages of service. People hardly notice because they’ll often only see a message when they try to comment on someone’s wall post, send a message or upload a photo. The site is still operating, but not allowing changes. That’s what a browsing only mode affords you.

A browsing only mode can make a big difference, keeping most of the site up even when transactions or publish are blocked.

Drupal is an open source platform that powers big publishing sites like Adweek, hollywoodreporter.com & economist.com. It supports a browsing only mode out of the box. An outage like this one would only stop editors from publishing new stories temporarily. It would be a huge win to sites that get 50 to 100 million with-an-m visitors per month.

Also: Is Amazon too big to fail

2. Redundancy

There are lots of components to a web infrastructure. Two big ones are webservers & databases. Turns out Dropbox could make both tiers redundant. How do we do it?

On the database side, you can take advantage of Amazon’s RDS & either read-replicas or Multi-AZ. Each have different service characteristics, so you’ll need to evaluate your app to figure out what works best.

You can also host MySQL, Percona or Mariadb direclty on Amazon instances yourself & then use replication.


Using redundant components like placing webservers and databases in multiple regions, Dropbox could avoid a major outage like they’re experiencing this weekend.

Wondering about MySQL versus RDS? Here are some uses cases.

Now that you’re using multiple zones & regions for your database the hard work is completed. Webservers can be hosted in different regions easily, and don’t require complicated replication to do it.

Related: Are SQL databases dead?

3. Feature flags

On/off switches are something we’re all familiar with. We have them in the fuse box in our house or apartment. And you’ll also find a bigger larger shutoff in the basement.

Individual on/off switches are valuable because they allow us to disable inessential features. We can build them into heavier parts of a website, allowing us to shutdown features in an emergency. Host components in multiple availability zones for extra piece of mind.

Read: 5 Things toxic to scalability

4. Simian armies

Netflix has taken a more progressive & proactive approach to outages. They introduce their own! Yes that’s right they bake redundancy & automation right into all of their infrastructure, then have a loose canon piece of software called Chaos Monkey that periodically kills servers. Did I hear that right? Yep it actually nocks components offline, to actively test the system for resiliency.

Take a look at the Netflix blog for details on intentional load & stress testing.

Also: When hosting data on Amazon turns bloodsport

5. Multiple clouds

If all these suggestions aren’t enough for you, taking it further you could do what George Reese of enstratus recommends and use multiple cloud providers. Not being dependant on one company could help in many situations, not just the ones described here.

Basic Amazon EC2 best practices require building redundancy into your infrastructure. Virtual servers & on-demand components are even less reliable than commodity hardware we’re familiar with. Because of that, we must use Amazon’s automation to insure us against expected failure.

Also: Why I like Etsy’s site performance report

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Secrets of a happy Amazon hacker – IAM, MFA & locking down your account

aws logo

If you’re still using a password to login to your AWS account it’s time you batten down the hatches. With a little work you can dramatically improve security.

1. install command line tools

First get ahold of the aws comand line tools. They’re python based so you’ll need the package manager “pip” first.


$ curl -O https://bootstrap.pypa.io/get-pip.py
$ pip install awscli

Next configure your access key & secret key. You can edit the file below or use “$ aws configure”


$ cat .aws/credentials
[default]
aws_access_key_id = AAAAAAAAAAAAAAAABCD
aws_secret_access_key = ABcdefghijklmnop!mnors323

 

Also: Is Amazon too big to fail?

2. Create a new user

You don’t want to be using your aws root user for everything. So we’ll create a new user called “seancli”.


$ aws create-user --user-name "seancli"
$ aws iam create-login-profile --user-name "seancli" --password "seanpass"

Related: Did Airbnb have to fail?

3. give admin privileges

We want our new user to be able to administer things. So let’s give him administrator privileges to AWS resources. AdministratorAccess is a collection of permissions & a policy managed by AWS.

 

$ aws iam create-group –group-name “admin”
$ aws iam attach-group-policy –group-name “admin” –policy-arn “arn:aws:iam::aws:policy/AdministratorAccess”
$ aws iam add-user-to-group –group-name “admin” –user-name “seancli”

Read: When hosting data on Amazon turns bloodsport

4. Enable MFA

Now for the fun bit. Enable multi-factor authentication. This is important for really making your aws account secure. Remember anyone who gets into your account can delete *ALL* your infrastructure, and/or spinup servers which cost a lot of money. So just a password alone is not sufficient.

MFA uses your phone (or a key fob if you like) as the second factor.

A. Install Google Authenticator
B. Login to your aws dashboard
C. Click your name menu then select “Security Credentials”

 

amazon security credentials

 

 

 

 

 

 

D. Open the Multi-factor section

 

activate amazon mfa

 

 

 

 

 

 

E. Click “activate MFA” & a QR code with display

 

virtual mfa device amazon

 

 

 

 

F. Open your Google Authenticator app & click (+)
G. Select scan barcode
H. Point your smartphone camera at the QR code from step E.

You’ll be asked to enter *two* consecutive six-digit sequences. Once completed, try logging in again.

Also: Are SQL Databases Dead?

5. Test with command line

After you’ve created your new user, you should test it to make sure you can login properly.

Also: 5 Reasons to move data to Amazon Redshift

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Is AWS enabling AngelList to boil the VC business?

aws logo

Just finished listening to Ben Thompson & James Allworth discuss how Amazon Web Services is impacting the venture capital business.

My mind is blown!

Join 32,000 others and follow Sean Hull on twitter @hullsean.

I heard all about AngelList getting 400m from China’s csc. I didn’t really understand the significance until I saw Fred Wilson’s post Outsider vs Disruptor.

Are VCs nervous, I wondered?

The argument goes, as it takes less capital to get started, way more people can step in to help you get going. Startups don’t need a VC from their first day.

1. Is Amazon boiling the VC frog alive?

As the story goes, if you turn up the temperature slowly, the frog won’t notice that he’s being boiled.

Ben Thompson at 9:30 in the podcast:

“I think the real enabler of this is Amazon. Back in the 90’s you had to go buy Sun servers, Oracle databases, and you had to spend hundreds of thousands if not millions of dollars. And they were all up-front costs. And that’s what Venture Capital is good for.”

Indeed, with the advent of AWS, startups can build their application in the cloud with *ZERO* upfront costs, and only dollars per hour. This is truly a seachange.

Also: A history lesson for cloud detractors – January 2012

2. Dell’s 67 billion dollar buy of EMC

o largest acquisition in tech history
o enterprise tech & enterprise storage

At 50:36 in the podcast, James Allworth says:

“I have this mental image of what used to be this massive land mass, and all these companies fighting it out and eventually the ocean is rising, aws is rising and it’s leaving an increasingly small amount of land mass and there are fewer & fewer of them and it’s going to be very interesting to see whether any land mass is remains when aws is finished with it, and i guess this DELL EMC thing, the argument is well there’s gonna be a little bit left & we’re going to take whatever it is because we’re the biggest but it remains to be seen whether there’s gonna be anything left for anyone at all”

Dell buying EMC is apparently the largest acquisition in tech history at 67 billion according to Bloomberg. That sure does say a lot about Amazon’s downward pressure & commoditization.

Though I didn’t know EMC would be bought by Dell for such a ridiculous sum, I was arguing this back in 2011 – the New commodity hardware craze .

Related: Is Amazon too big to fail?

3. Wework & the disappearing server room

Ben Thompson makes a really fascinating point at 46:30 of the podcast:

“There’s been a big shift from the valley to san Francisco all the big companies of yesteryear are in the valley and almost all of the unicorns are in san francisco, and this is also because of AWS…

You can’t afford to pay square footage for servers in San Francisco, but if your startup is only some people, a desk & some computers… suddently it’s much more viable you have companies running businesses out of wework offices… the only reason wework can exist is because you don’t need to have servers because all the servers are housed by amazon the fundamental fabric of the silicon valley is changed because of aws”

Yet again, Amazon has impacted the valley in a huge way.

Also: Are we fast approaching cloud-mageddon?

4. Google & iphone scale


“You could make the argument that AWS is right up there with Google & right up there with the iPhone in it’s fundamental transformation of industry after industry.”

And while Amazon is fully enabled by Linux, and didn’t invent utility computing, they have surely

Read: When hosting data on Amazon turns bloodsport

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Does Linux tell the Gilgamesh story of hacker culture?

stephenson command line

Is the command line still essential?
Was Stephenson right about his Linux

It’s been a while since I read Stephenson’s essay on Linux. It’s one of those pieces that’s so well written, we need to go back to it now & then.

Join 28,000 others and follow Sean Hull on twitter @hullsean.

This quote caught my eye right away.

“…as living in a commune, where much lip service was paid to ideals of peace, love and harmony, had deprived them of normal, socially approved outlets for their control freakdom, it tended to come out in other invariably more sinister ways. Applying this to the case of Apple Computer will be left as an exercise for the reader, and not a very difficult exercise.”

Anyone who has read about Steve Jobs will chuckle at this one.

1. The Hole Hawg of the internet

When Stephenson wrote this it was 1999. Linux adoption was growing at internet startups, where cost was everything, and risks could be taken. Remember this was before the two biggest data center companies even existed, namely Google & Amazon. Without Linux, neither would be here today!

hole hawg power

Linux was and is today more like a Hole Hawg for the internet, powerful, but dangerous in the wrong hands. :)


“The Hole Hawg is like the genie of the ancient fairy tales, who carries out his masters instructions literally and precisely and with unlimited power, often with disasterous unforseen consequences.”

Also: Why I like Etsy’s site performance report

2. Unix as oral history, our Gilgamesh

gilgamesh unix


“Unix, by contrast is not so much a product as it is a painstakingly compiled oral history of the hacker subculture. It is our Gilgamesh. What made old epics like Gilgamesh so powerful and so long-lived was that they were living bodies of narrative that many people knew by heart, and told over and over again — making their own personal embellishments whenever it struck their fancy.”

Also: Are SQL Databases dead?

3. The bizarre Trinity Torvalds, Stallman & Gates


“In trying to understand the Linux phenomenon, then, we have to look not to a single innovator but to a sort of bizarre Trinity, Linus Torvalds, Richard Stallman and Bill Gates. Take away any of these three & Linux would not exist.”

And indeed we must thank all three of these characters for where the internet stands today. The cloud is possible because of Linux & cheap intel hardware. And the GNU free software to go along with it.

Related: Did MySQL & Mongo have a beautiful baby called Aurora?

4. On the meaning of “Open Source”


“Source files are useless to your computer, and of little interest to most users, but they are of gigantic cultural & political significance, because Microsoft & Apple keep them secret, while Linux makes them public. They are the family Jewels. They are the sort of thing that in Hollywood thrillers is used as a McGuffin: the plutonium bomb core, the top-secret blueprints, the suitcase of bearer bonds, the reel of microfilm.

Read: When hosting data on Amazon turns bloodsport

5. What about Apple today?


“The ideal OS for me would be one that had a well-designed GUI that was easy to set up and use, but that included terminal windows where I could revert to the command line interface and run GNU software when it made sense.”

Stephenson wrote this before Apple has rebuilt their OS to sit on top of Unix. And that’s where we are today with Mac OS X!

Also: Are we fast approaching cloud-mageddon??

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters