Category Archives: All

How to deploy on Amazon EC2 with Vagrant

vagrant logo

Join 16,000 others and follow Sean Hull on twitter @hullsean.

Why do I want Vagrant?

Vagrant is a really powerful tool for managing virtual machines. If you’re a developer it can make it push-button simple to setup a dev box on your laptop. It manages the images, and uses configuration files to describe specifics of your machines.

In the amazon environment, you can deploy machines just as easily as on your desktop. That’s pretty exciting for those of us already familiar with Vagrant. With that I’ve provided a simple 7 step howto for doing just that!

Also: Are SQL Databases Dead?

1. Use the Mac OS X installer

Fetch your download file here:

Vagrant Installer Downloads

Run the installer. It should do the right thing!

Also: Why Oracle Won’t Kill MySQL

2. Install the vagrant-aws plugin


$ vagrant plugin install vagrant-aws

Also: Bulletproofing MySQL Replication with Checksums

3. Fetch a vagrant box image

Box images vary depending on your “provider” which is vagrant-speak for the environment you’re running in. For aws, they’re some simple json files that tell Vagrant how to work in that environment.

The creator of the plugin has provided a dummy box. Let’s fetch it:


$ vagrant box add dummy https://github.com/mitchellh/vagrant-aws/raw/master/dummy.box

This command is straight out of the readme. What does it do? Take a look:


$ cd /var/root/.vagrant.d/boxes/dummy/aws

$ cat metadata.json
{
"provider": "aws"
}

There’s also the info.json file which looks like this:


$ cat info.json
{"url":"https://github.com/mitchellh/vagrant-aws/raw/master/dummy.box","downloaded_at":"2014-01-14 17:42:33 UTC"}

There’s not a whole lot going on here. If you’re deploying VirtualBox VMs with Vagrant, you’d see a VMware4 disk image. But with Amazon, it stores it’s own AMIs on S3, so Vagrant simply fetches them and runs them for you.

Related: Intro to EC2 Cloud Deployments

4. Configure Vagrantfile

Create a directory to hold your vagrant metadata. This would be the name of your machine:


$ cd /var/root
$ mkdir testaws
$ cd testaws
$ vagrant init

Edit the file as follows:


Vagrant.configure("2") do |config|
# config.vm.box = "sean"

config.vm.provider :aws do |aws, override|
aws.access_key_id = "AAAAIIIIYYYY4444AAAA”
aws.secret_access_key = "c344441LooLLU322223526IabcdeQL12E34At3mm”
aws.keypair_name = "iheavy"

aws.ami = "ami-7747d01e"

override.ssh.username = "ubuntu"
override.ssh.private_key_path = "/var/root/iheavy_aws/pk-XHHHHHMMMAABPEDEFGHOAOJH1QBH5324.pem"
end
end

If you’re familiar with the Amazon command line tools, you’ve probably setup environment variables. Otherwise these may not be familiar to you, so lets go through them:

Your access_key_id and secret_access_key are two pieces of information Amazon uses to identify your instances and bill you. Those are unique to your environment so keep them close to the vest. Here’s how you create them or find them on your aws dashboard.

The keypair_name is your personal SSH key. You may have one on your laptop which you use to access other servers. If so you can upload to the amazon environment. If not you can also use the dashboard to create your own. Whenever you spinup a server, you can instruct amazon to drop that key on the box in the right place. Then you’ll have secure command line access to the box, without password. Great for automation!

Next is your AMI. This is an important choice, as it determines the OS of the machine you’ll spinup, and many other characteristics. You can go with a Amazon Linux AMI but I quite like the Alestic ones from Eric Hammond. Trusted & reliable.

Looking for an ubuntu AMI? Try this ami locator tool.

Check this: 8 Best Practices for Deplying MySQL on AWS

5. Startup the box

Starting an instance once you’ve configured your Vagrantfile is pretty straightforward.


$ vagrant up —-provider=aws

Related: How to autoscale MySQL on Amazon EC2

6. Verify in the Amazon dashboard

Jump over to your amazon dashboard with this link. If you’re logged in already, that will take you to your EC2 instances. You should see a new one, based on the parameters in your Vagrantfile.

Read: Why devops talent is in short supply

7. Login to your Amazon instance

Last but not least, you’ll want to login. Note I’m explicitly specifying my SSH key here. Your path may vary…


$ ssh -i ./iheavy.pem ubuntu@ec2-50-220-50-40.compute-1.amazonaws.com

Also: 5 more things deadly to scalability

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Why cloud computing is the spotify-cation of hosting

dvd collection

Join 16,000 others and follow Sean Hull on twitter @hullsean.

1. Music collections of old

Way way back in the 70’s I remember riding around in a VW beetle. Maybe I’d be driving
with my dad or my uncle. Everybody seemed to own a VW! What everybody also had was a huge collection of 8-track taps in a big box. You’d dig through the box and find what you wanted to play, then pop in the tape. It was exciting because before 8-tracks you only had records, and you couldn’t play those in the car!

But even record collections were new in the 60’s. Before that, most music was consumed live or on the radio.

Also: Why a killer title can make or break your content efforts

2. When books left the library

A similar trend followed for books and reading. Although newspapers have been sold by subscription for a lot longer, books were mostly consumed in libraries. But the consumer itch to build collections eventually built Barnes & Noble into a powerhouse brick and mortar store.

Internet disruption of that business model came too. Enter Amazon’s Kindle. Although you theoretically *buy* digital books, if you read the fine print you’ll see you actually rent them in perpetuity. In fact there have been cases where Amazon has reached into devices and removed previously purchased media.

Related: Why AirBNB didn’t have to fail

3. Managing collections (even stolen ones) is hard work

When you download music or movies, either from iTunes or god forbid grabbing it off of Bittorrent networks, you need to put it somewhere. You’ll store it on your laptop harddrive or if your collection is large enough, on some shared storage system at home. And you’ll also probably never back it up.

The thing is harddrives themselves have a life of about two to four years. As an operations guy I manage data everyday. Backups are a big part of that process, so when the media fails, you won’t lose the collection of movies & music you built lovingly over so many years.

Sadly most people learn the hard way. And when you learn this lesson you probably think, where did all that time go? What did I even *have* in my collection?

Also: Are SQL Databases Dead?

4. Why music & movie theft was just a blip on the historical radar

I’m also a bit of a Doctor Who fan. Since it’s a rather obscure British TV show (or was) I spent some time buying many of the old episodes on DVD. Or I *did* rather, until Netflix starting offering the whole classic collection on subscription. They did this with Star Trek too. Now I have no reason to fish through my shelves for a DVD. Why would I?

As users become more accustomed to the subscription model, they’re less likely to want to build a whole collection of media. This goes well for books, music & videos. Who would bother downloading off of Bittorrents, managing your home collection, and all that trouble when you can just subscribe. Easy. No mess!

Read: Why Oracle Won’t Kill MySQL

5. Subscriptions, subscriptions everywhere!

Whether you managed a datacenter of physical servers in-house, or bought servers managed by a hosting company before the subscription model you had to worry about moving parts. You had to worry about failing harddrives, memory & all the rest.

Then along comes Amazon Web Services and it’s EC2 servers bringing the subscription model to hosting too. This raises the bar on the biggest failing component harddrives, but putting all data on EBS, their virtual storage network. All of this raises the bar for a lot of organizations and reduces the drudgery.

What spotify is doing with music, Netflix is doing for movies & tv shows, and kindle is doing for books. That same trend has brought great disruption to the internet & server hosing. Startups and consumers win big in this game.

Can you think of any businesses where a subscription model might work? They may be ripe for disruption by a new startup.

Check out: Why your startup is failing at Devops

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Are SQL Databases Dead?

mesa verde city

I like the image of this city of Mesa Verde. It’s fascinating to see how ancient cities were built, especially as an inhabitant of one of the worlds largest cities today, New York.

I’m a long time relational database guy. I worked at scores of dot-coms in the 90′s as an old-guard Oracle DBA, and pivoted to MySQL into the new century. Would a guy like me who’s seen 20 years of relational database dominance really believe they could be dying?

There’s a lot to be excited about in this new realm of db, and some interesting bigger trends that are pushing things in a new way.

Join 15,100 others and follow Sean Hull on twitter @hullsean.

1. Growing use of ORMs

ORM probably sounds like some strange fossil archeologists just dug up in the ancient city of Mesa Verde. But they’re important. You may know them by their real-life names, Hibernate, Active Record, SQL Alchemy and Cake. There are many others. Object Relational Modelers provide a middleware between developers and the SQL of your chosen relational database. They abstract away the nitty gritty, and encapsulate it into a library.

In a way they’re like code generators. Mark Winand talks about them in SQL Performance Explained warning of the “eager fetching” problem. This is DBA speak for specifying all columns (SELECT *) or fetching all rows, when you don’t need them all. It’s inefficient in terms of asking the database to read & cache all that data, but also to send it across the network and then discard it on the webserver side. Like a lazy housekeeper the clutter & dust will grow to overwhelm you.

Martin Fowler is the author of the great book NoSQL Distilled. He tries to walk the fence in his post ORM Hate, trying to balance developers love of ORMs, and the obvious need for scalability. Ted Neward calls ORMs the Vietnam of Computer Science.

Mattias Geniar points out that BAD ORMs are infinitely worse than bad SQL and another on High Scalability by Drewsky The Case Against ORM Frameworks.

If you agree the ORM conversation is still a huge mess, you’ll be excited to know that NoSQL sidesteps it completely. They’re built out of the box to interface more like data structures, than reading rows and columns. So you eliminate the scalability problems they introduce when you go NoSQL. That makes developers happy, and pleases DBAs and techops too. Win!

Read: Why Oracle won’t kill MySQL

2. Widening field of options

NoSQL databases are not simply key value stores, though some like Memcache and Riak do fit that mold.

Mongodb offers configurable consistency & durability & the advantages of document storage, no need for an ORM here. You also have a mix of indexing options, that go a little deeper than other NoSQL solutions. A sort of middle ground solution that offers the best of both worlds.

Cassandra, a powerful db that is clustered out of the box. All nodes are writeable, and there are various ways to handle conflict resolution to suit your needs. Cassandra can grow big, and naturally takes advantage of cloud nodes. It also has a nice feature to naturally age out data, based on settings you control. No more monumental archiving jobs.

Hbase is the database part of Hadoop, based on Google’s seminal Bigtable paper.

Redis is another option with growing popularity. It’s a key-value store, but allowing more complex data in it’s buckets, such as hashes, lists, sets and sorted sets. Developers should be salivating at this one.

Also: 5 Great Things about Markus Winand’s Book SQL Performance Explained

3. Lowering bar

The old world of relational databases treat data as sacrosanct. DBAs are tasked with protecting it’s integrity & consistency. They manage backups to protect against disaster. In this world, every bit of data written is as sacred as any other, whether it’s your bank account balance, or a comment added to a facebook discussion.

But modern non-relational databases introduce the idea of eventually consistent. DBAs and architects would say we are relaxing our durability requirements. What they mean is data can get slightly out of sync and we’re ok with that. We’ll build our web applications to plan for that, or even in the case of Riak expose the levers of durability directly to the developers, allowing them to make some changes instant, while others more lax and lazy.

Check this: Why high availability is so very hard to deliver

4. Cloud demands

Virtualized environments like Amazon EC2, give easy access to legions of servers. Availability zones & regions only widen the deployment options. So deploying a single writeable master, the way traditional relational databases work best, is not natural.

Databases like Cassandra, Mongo & Redis are clustered right out of the box. They grew up in this virtual datacenter environment and feel comfortable there.

Related: Why I wrote the book on Oracle & Open Source

5. Only DBAs understand them

Devs may whine at this statement, and to be fair it’s a generalization. The popularity of ORMs speaks volumes here. Anything to eliminate the dreaded SQL writing. Meanwhile DBAs bemoan the use of ORMs for they represent everything they’re trying to fix.

SQL is hard enough, but the ugly truth is each database vendor has their own implementation, their own optimizations, their own optimal tweaks. Even between database versions, SQL code may not perform consistently.

Identifying slow SQL and tweaking it remains one of the primary tasks of performance tuning, for this reason. It hasn’t changed much in my two decades on the job.

Also: Why bemoaning AWS performance sounds like Linux detractors circa 1999

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How to do technical assessment for a merger acquisition

tsmitty11 big eats small fish

Join 15,000 others and follow Sean Hull on twitter @hullsean.

My newsletter goes out the first few days of every month. I invite everyone I’ve ever worked with, so many of my present and former colleagues & clients receive it. I often receive a few emails in the days following, with requests for help, advice & expertise.

Recently I was referred on a merger acquisition project. The project involved evaluating the technology of the target company, to understand how it would fit in with existing infrastructure, and what challenges they might face.

1. What’s your tech stack?

The scope of the project included evaluating existing application code. We looking closely at:

o What programming languages & versions were in use?
o What volume of code was there, when was it built, and how was it maintained?
o Was the code well commented & organized?
o Are those languages or technologies popular in the marketplace today?
o Are the foundation technologies still supported, either commercially or by open source communities?

Related: When you’re hired to solve a people problem

2. What’s your team stack?

One of the points that came up during the discovery phase was subject matter expertise. We found that the acquiring company’s DNA was around Windows, SQL Server & IIS. Their applications had been developed mostly on C#, so their team had skills around that stack.

The target company, on the other hand was Oracle on Unix. Here the team & expertise had a different heritage.

These two stacks have very different people & cultures behind them. Those would likely introduce unique challenges were you to merge those two firms, as the former would not have skills and expertise to manage & maintain the latter. Trimming teams down, or consolidating hardware & components would likely prove challenging.

Also: A CTO must never do this

3. Where are your pain points

We also evaluated current problem areas in the target application.

o Where were team members struggling most?
o What type of performance problems existed?
o Was there data mismatch or redundancy?
o Were business units struggling in some area to report on the data they needed?

Read this: When you have to take the fall

4. Style of software development

In software development, the traditional building model is called waterfall. It involves specing out requirements, spending a long period writing code, and then releasing it all at once to be tested & deployed. Typically during development period, there isn’t a working version of the system, as it’s undergoing change.

The risk with waterfall is that what comes out the other end won’t be what feature specification teams envisioned. Worse still is when the resulting product is full of bugs, or has major performance problems. The ACA used this method to develop healthcare.gov website, resulting in a whole host of problems on launch. It’s why I’ve advocated for real techops at Healthcare.gov.

These days, the modern and arguably superior model is called agile software development. This involves writing code in much much smaller chunks, and for each releasing a small test to verify that it performs it’s function. These unit tests allow for software to be continuously deployed, perhaps multiple times per day.

Check this: How we do a performance review

5. Legacy or Open Source

A last point of evaluation is the use of legacy software components such as Oracle, versus the more nimble and open source components many internet firms use, such as Linux, Apache, MySQL & PHP or Python. These later components make the stack of many modern web facing applications, are supported by the internet community, and provide flexibility & configurability in the cloud.

Fred Wilson has advocated for Open Source in various postings on his A VC blog. Although these technologies offer great opportunity & strength, your team may not have experience bushwacking through the DIY world of open source and that would be a major consideration for a merger.

DNA and culture of the acquiring company, and the target company have a huge impact on whether those technologies will all play well together as the firm grows up.

Also: A sample executive summary we did for Acme Startup, Inc.

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

5 great things about Markus Winand’s book SQL Performance Explained

markus winand sql performance explained book

Join 12,100 others and follow Sean Hull on twitter @hullsean.

1. Covers databases broadly

You may not have noticed, but there’s a whole spectrum of relational databases on offer. Of course in the database world, most get infatuated with one, and that becomes their bread & butter before long. Their life, their passion, their devotion.

That’s fine as far as it goes, but Winand really stands out, offering a spectrum of ideas and optimization techniques for different platforms. If you’re an Oracle-only or MySQL-only dba you’ll gain a lot from this book but even more importantly if you work in professional services, and need to communicate with DBAs brought up on one of these platforms, it becomes like a rosetta stone for SQL query tuning.

Read: Why devops talent is in short supply

2. Shows you how to fish

I find database books and methods fall into two sort of broad categories. There’s the call Oracle support method, where you’ll be handed one very specific set of steps, commands, and a path to solve each specific problem. It’s more about memorization, it’s like they actually hand you the fish.

Then there is the investigative method, where you learn how to use a magnifying glass to look at fingerprints, and check for DNA samples, and interrogate suspects. You know learn the tools of the trade.

That’s what Markus brings you, in all it’s delicious glory.

Read: Why high availability is so very hard to deliver

3. It’s concise

Another gripe I have of technical books is that the publishing model is, this is a textbook and since the price tag is large, let’s make the physical book large! Of course no one wants to carry those books around. I even recently bought a kindle to solve this problem.

SQL Performance Explained is more a paperback book form factor, and that means you can tote it around with you easily, and keep it with you at work. Read it on the train, commuting to work.

200 pages packed cover to cover with all sorts of good chapters, including a primer on indexes & types, scalability & performance, joins, clustering, Top-N queries, DML, and more.

Read this: Why a four letter word divides dev and ops

4. It’s technical but accessible

If you’re a real rock bottom beginner, you might want to dig a bit more on your SQL syntax, and some of the basics. You could also keep a 101 book side-by-side, while you’re reading this book.

For the intermediate & advanced DBAs out there, this book will sit comfortably in your paws as you flip the pages and learn something new. For instance just today I learned that Postgres can index NULLs while MySQL, Oracle and SQL*Server cannot. Learn something new everyday.

Related: MySQL interview guide for managers and candidates alike

5. Gives you answers you can use today

After twenty years of consulting, I’ve seen a few patterns emerge. Besides the spectrum of team & communication challenges, firms hitting the performance wall often have issues with their relational databases.

Yes those databases are sometimes on the wrong hardware, or their are other obscure problems with setup or configuration. But the bulk of issues center on badly written SQL.

SQL is a much reviled language and often misunderstood. And it doesn’t seem like developers have gotten that much better at it over the years. It would explain the rise of NoSQL databases, as they often speak REST or xml, no need for pesky sequel.

One parting note. For all the devs and architects out there, who want to sing the virtues of ORMs, this book hits that squarely in the nose. By showing how differently each relational database implements SQL, performs work, and optimizes, Winand also illustrates the naivete behind trying to write database independent application code.

If you’re a developer and don’t know how to profile a query or run explain plan, don’t walk, run to your closest Amazon.com store and get this book!

Also: 5 more things deadly to scalability

Criticisms

If I were to offer two slight criticisms, it would be these. First, the index is a bit wonky. When I look under “P” for example, there’s no Postgres, while one quarter of the book is obviously devoted to that platform. Further, looking up NULL which are covered in depth, in various places in the book, only has one entry in the index, p54 on Oracle. So the index could be a bit more robust to be useful.

The other criticism is more perhaps my bias. On page 96, when he discusses ORMs I thought he was rather… shall we say gentle. Although he clearly states that “eager fetching” is problematic, I don’t think he goes far enough to condemn it. In my experience ORMs are always trouble.

Then again why am I complaining, their use keeps me forever employed.

Want a copy? Markus Winand’s book site has all the goods!.

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Why Healthcare.gov desperately needs techops

healthcare.gov logo

Join 15,000 others and follow Sean Hull on twitter @hullsean.

1. Tech-what? A quick education

Techops is operational excellence. It’s the handoff when code is complete. These are the folks who are up at 2am when a website is down. They manage servers, keep the pipes clean, and the hackers out. They also help plan for capacity needs, and may help with load testing too.

If you’re new to technology, imagine a movie set. Story writers (programmers) have already done their part. The producers (venture capital folks) have financed the project. The director (architect) is there trying to put the vision together. But the folks who manage everything on set, from sound guys, camera guys, lighting people, and all the coordination, this is operations. In web application deployments it is devops, sysops or techops.

Also: Why the Twitter IPO is afraid of scalability

2. In contrast with Obama election campaign

Notice how phenomenally well Obama for America project was run. Like a finely tuned machine. Harper Reed and team pulled off one of the most data backed election campaigns in history.

That project used AWS cloud technologies to the fullest, from devops tools like Puppet and Asgard, collaboration tools like Campfire & Github, and superb monitoring & instrumentation tools NewRelic and Chartbeat.

Clearly Obama knows how to run an election. Something is drastically different with the healthcare.gov project. Too many cooks in the kitchen, perhaps?

Read: Why your startup needs professional techops

3. A failure in capacity planning

Many popular news outlets covered the outage, but most pointed to “bugs”, which caused the outage. But when a site dies under load, while it’s working in test & Q/A, that’s a failure of load testing, and capacity planning.

I would wager a good bet, database tuning would definitely help as it’s the most common and prevalent cause of

Read this: What four letter word divides dev and ops?

4. More testing & more Agility needed

Modern software projects take advantage of continuous integration & agile methods. That is they make small incremental changes. Developers build unit tests, and the code is always in a working state. There is no multi-month dev cycle, where your current software is in doubt.

Reports indicate that the healthcare.gov software was being designed & developed using this old and most agree inferior method of software development, the waterfall method. New Yorker criticises it in Don’t go chasing waterfalls.

Read: Why devops talent is in short supply

5. Caching is desperately needed

All high performance, high scale websites need to take advantage of various types of caching as I’ve discussed in detail before. From browser caching, to page & object caching on the server side.

Hayden James investigated in depth, and found healthcare.gov severely lacking. Again this is a huge failure in techops, sysops or devops. It’s not a bug, and not something the developers are responsible to deliver.

Read: Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Why Oracle won’t kill MySQL

oracle mysql database

Join 15,000 others and follow Sean Hull on twitter @hullsean.

1. MySQL does not compete with Oracle

It’s a myth that MySQL somehow poses a threat to Oracle. Oracle’s customers tend to be large enterprises running apps like e-business suite. These are certified to run on Oracle, and further they sit close to finance.

MySQL tends to be a choice of scrappy but nimble startups for their web-facing applications. They want to deploy in the cloud, and don’t want to deal with licenses. Plus they have the techops chops to handle the bushwacking of open source.

Related: Why I wrote the book on Oracle & Open Source

2. Oracle bought Sun for the hardware business

Remember when Oracle acquired Sun? A lot of folks assumed Larry was after MySQL. Grab it & slowly smother it. But actually it was more frosting on the cake. Larry had for years expressed interest in cubes and clusters, and building an Oracle appliance. Whether this ever came to profitable fruition in the form of Exadata remains to be seen. But buying Sun for a song helped him do this.

Also: Why bemoaning AWS performance sounds like Linux detractors circa 1999

3. Larry blows with the wind on open source

He’s money minded, so you’ll see in his decisions that comes first.

In the late 90’s when a customer might spend $100k on Sun and $100k on Oracle licenses, Larry realized porting to Linux and pushing commodity hardware would be a win. So he pushed Linux, and customers could now spend $20k on commodity hardware and $180k on Oracle licenses for them. Imagine the 10million dollar budget if you’re having trouble with the math here.

He also eventually moved the middle tier to Apache for similar reasons. I would argue Oracle corp overall pays lip service to contributing to open source, but they do that to some degree.

Read: Why MySQL dbas are so hard to find

4. MySQL support business is real

What’s more, just as adopting Linux, and then offering their “unbreakable Linux” distro, and pricey support along with it, they’re doing similar things with MySQL. For enterprise customers, and those already comfortable with making the call to Redwood Shores, sales folks will happily direct them support contracts and enterprise add-ons. Naturally.

Read: Why your startup needs real techops

5. There are real viable alternatives to keep balance

And let’s not forget folks, there are already a bunch of forks. There’s the popular and every growing Mariadb which Google has put their muscle behind.

Of course let’s not forget the very popular, very capable, and very bulletproof Percona distribution, along with the Percona toolkit and xtrabackup for real hotbackups.

And for those looking to experiment, there’s Drizzle a work in progress, complete rewrite, and one that’s unfortunately not a drop-in replacement.

Read this: What’s the four letter word dividing dev and ops?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Lulzsec, Anonymous and the sorry state of internet security

zalgo text

If you’ve been hiding under a rock for the past few years, you might not have heard of Anonymous, the headline grabbing hacker group that’s famous for attacking citibank, ebay, Sony, the FBI, CIA and the websites of various world governments.

Parmy Olson takes us on a ride, through tales that are riveting, and quite a bit scary for what they reveal about today’s internet, and the false sense of security we all have.

Join 14,000 others and follow Sean Hull on twitter @hullsean.

Kids these days!

By now you’ve probably heard their names T-flow, Topiary, Sabu, & Kayla. And then there was AVunit, pwnsauce, Sup_g, and Havij. Cool characters, sitting at keyboards all over the world hatching menacing attacks, and seeming more organized than they actually were…

Topiary jumped into the role as spokesman for the group. Listening to this live hack only seems amusing in retrospect, now that the group has been brought down…

Read: Why devops talent is in short supply

For all the subcultures you’ve never heard of…

Today’s internet is rife with fascinating subcultures, many I’d never heard of. Parmy’s book on Anonymous takes us to the door of all these places, and gives us a candid peak at what goes on there. Kids these days are up to no good!

The bizarro Encyclopedia Dramatica is a wikipedia of weirdness. And then there’s Googledorks, a hackers delight of exploits (ways to break into systems online), and hacks.

And let’s not forget 4Chan the online community and forum that hatched Anonymous.

You thought Ascii Art was cool, but have you heard of zalgo text? That’s the text garbling software that created this posts image.

If you’re looking to dig a little deeper, browse over to know your meme, a sort of urban dictionary for internet subcultures.

Don’t forget the 47 rules of the internet. I’m still looking for rules two through thirty three. Does this have something to do with this 33?

Read: How to evaluate an independent consultant expert

With only a very thin blanket to secure us…

If you’re not already a touch paranoid with the risks of online banking, social networks and identity theft you will be after reading this tale.

Anonymous troublemakers were able to send SWAT teams to unsuspecting people’s homes, crowd source personal information, social engineer their way to facts about someone and then dox them publishing all that personal information online.

On the more technical side, many sites are vulnerable to SQL Injection a rather technical sounding method to trick websites into dumping the contents of their databases back to a hacker. There’s even an automated tool called sqlmap to help you with the dirty work.

And then there are the very illegal denial of service attack tools like the ominous sounding low orbit ion cannon. Please don’t try this at home!

Definitely the worst of all offenders are the botnets, swarms of infected computers that can be controlled from a central location, to wreak havoc on users and internet firms alike. Thanks Bill!

As a parting word, take a quick look at this instructional video on using backtrack5, a hacking & security testing tool…

Also: Why a killer title can make or break your content efforts

The older roots of hacking circa 80’s and 90’s

I remember back in the 80’s when War Games came out. It was a scary premise. With the cold war between the US and the former Soviet Union in full bloom, it felt very real.

The 90’s brought Clifford Stoll hunting a hacker through his computer systems in The Cuckoo’s Egg.

And then along comes Kevin Mitnick, turning his finger up at US agents, and wreaking his own havoc in his wake.

The anonymous story turns more political when they meet the likes of Julian Assange, but even that isn’t new. Remember the Pentagon Papers?

What’s really knew is how the internet has grown, but how computers have not gotten more secure through that period. It has all grown more brittle, with many websites, and personal computers steered by unsuspecting users.

Read: Why high availability is so very hard to deliver

Surprisingly soft landing

One thing that really surprised me in this tale, was the sentences many Anons received. The way the headlines read, this was real all-out warfare on governments and corporations a like. But reading the judgements, it appears judges had a different perspective.

Although there were certainly compromises of personal information, the group really wasn’t responsible for a huge amount of theft & fraud. Sure they took down some websites, but whom does that really harm. It makes great headlines, but the bigger systems behind the scenes are actually more secure than that.

”IRC is just the crap out of everyone’s minds…” – Topiary on words thought-typed in IRC chats

After flipping through to the end, it seems we’ve taken a ride through the internet underground, but not through the criminal underworld. That is out there surely, but it’s not run by this scattered team of recluse misfits.

Related: Why Airbnb didn’t have to fail

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

When fat fingers take down your business

apple sad mac fail

Join 14,000 others and follow Sean Hull on twitter @hullsean.

Github goes nuclear

I was flipping through reddit last night, and hit this crazy story. strange pushes on GitHub. For those who don’t know, github houses source code. It’s version control for the software world. Lots of projects use it, to keep track of change management.

Jenkins is a continuous integration platform. Someone working on the project accidentally did a force push up to the server. They overwrote not only their own work, but the work of hundreds of other plugins unrelated to his own project.

This is like doing a demolition to put up a new building, and taking down all the buildings on your block and the next. Not very neighborly, to say the list. They’re still at the time of this writing, doing cleanup, and digging through the rubble.

Read: Why DynamoDB can increase availability

How to kill a database

I worked a startup a few years back that had an interesting business model. Users would sit and watch videos, and get paid for their time. Watch the video, note the code, enter the code, earn cash. Somehow the advertisers had found a way to make this work.

The whole infrastructure ran on Amazon EC2 servers, and was managed by Rightscale. Well it was actually managed by an west coast outsourcing shop, whose specialty was managing deployments on Righscale.

The site kept it’s information in a MySQL database. They had various scripts to spinup slaves, remaster, switch roles and so forth. Of course MySQL can be finicky and is prone to throwing surprises your way from time to time.

One time this automation failed in a big way, switching over production customers to a database that took way way way too long to rebuild. As their automation didn’t perform checksums to bulletproof the setup it couldn’t know that all the data wasn’t finished moving!

Customers sure did notice though when the site fell over. Yes this was a failure of automation. But not of the Rightscale platform, but of the outsourcing firm managing the process, checking the pieces and components and ensuring the computer systems did their thing to completion. Huge fail!

Read: Why devops talent is in short supply

Your website will fail

Sites big and small fail. Hopefully these stories illustrate that fact. I’ve said over and over why perfect availability is a pipe dream.

At the end of the day, the difference between the successful sites and the sloppy ones isn’t failure and perfection. It’s *how* they fail, and how they get back up on their feet. What type of planning did they do for disaster recovery like many firms in NYC did before and after Sandy.

Also: Why startups need both devs and ops for scalability

Reducing failure

So instead of thinking about eliminating failure, let’s think about *reducing* it from happening, and when it does, reducing the fallout. One thing you can do is signup for scalable startups where we share tips once a month on the topic. Meanwhile try to put these best practices into play.

1. Test your DR plan by running real life fire drills
2. Use more than one hosting provider, data center or cloud provider
3. Give each op or end user the least privileges they need to do their job
4. Embrace a culture of caution in operations
5. Check, double check and triple check those fat fingers!

Read this: Why a four letter word divides dev and ops

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

What is eventually consistent and will it work for you?

aws dynamodb

In the tech world we’re fond of inventing all sorts of technical terms, that are admittedly kind of confusing.

Join 13,500 others and follow Sean Hull on twitter @hullsean.

I was attending an excellent talk recently called Data at Scale, part of the Database Month series that Eric Benari hosts. In it Mark Uhrmacher presented some phenomenal solutions which worked for flash site ideeli. It allowed them to support their incredible business model, where 15% of traffic would happen in 15 minutes everyday. As he called it a “self-imposed denial of service attack”. Interesting analogy.

What occurred to me though, is that a lot of companies and startups struggle to understand which database solutions will work for them, and what the strengths and weaknesses of each are, and further what tradeoffs they’ll grapple with.

One concept that we hear a lot is “eventually consistent”. Many of the new NoSQL databases achieve their speed & availability this way. But what’s it all about?

Let’s change a smartphone contact

I’m sure you have a smartphone in your pocket, and for demonstration sake I’ll use the iphone configured with iCloud.

Let’s go ahead and dial up your *OWN* contact card. Click “Edit” and go ahead and change something. Let’s change your title to “rock star”. Now click “Done”. We’ll wait a minute. Now go to your desktop and open up Contacts. Scroll through to your contact and verify that the Title field now shows “rock star”.

How does all this happen? When you click the “Done” button, the iphone sends changes up to iCloud. iCloud then lets your laptop know a change has happened and those then sync up.

Now let’s run through the same exercise, but change it in two places. We’ll change the smartphone contact to “Founder” and the desktop Contacts record title to “Consultant”. Wait a little bit and you’ll notice they will both eventually show “Consultant”.

Also: Why a killer title can make or break your content efforts

How long were laptop & phone out of sync?

As you probably noticed, the iCloud seems to lean in favor of the desktop client. It’s not clear to me what rules it uses here, nor does it seem to be configurable. Nevertheless eventually both the desktop and smartphone with have the same contact card for you. Quite a feat of magic!

Read: Why high availability is so very hard to deliver

Handling collisions

There is only one *YOU* and presumably your digital rolodex reflects that too. You have one and only one contact card. Or do you? As far as these digital tools are concerned there are actually THREE! One on your desktop, one in iCloud and one on your phone. Each time you change in any of those places, it syncs *UP* to iCloud and then down to the other devices.

Collisions happen if you make changes in two places. Imagine if you’re a road warrior and your laptop was offline for some days, or your smartphone for that matter. In those cases that syncing would happen much later, and collisions more likely.

Related: Why the twitter IPO made a shocking admission on scalability

In the high frequency world of online databases

With online databases, all of this becomes vastly more complex. Web based applications may have 100,000 simultaneous users. Some may be coming from IMEA while others the Americas. It gets pretty darn complex when you have databases in each of those regions.

We deploy applications this way, so one datacenter, say the East Coast region one version, can fail, but all the others still operate. They can still change data, read and write, without being impacted by the New York outage.

Once that datacenter is restored, the databases will then sync up and reconcile missing data.

Also: Why a killer title can make or break your content efforts

MariaDB and Amazon RDS read replicas

MySQL and it’s variants of MariaDB, Percona and Amazon RDS can do something like this with read-replicas. The read-only copies of the database are asynchronous and take time to catch up to changes. You can have the read-only copies in different regions.

This improves availability for browsing your application, but not for making changes. In other words MySQL can use this method to scale reads but not writes. That’s why I recommend your applications also support a browse only mode which means availability won’t be impacted if your authoritative master dies.

Although you can try to do the same for writes by sharding your MySQL instances, this starts to get very messy very fast. Imagine backing up 10 shards, 10x the complexity, and even more when you want to go and do a restore.

Read: Why devops talent is in short supply

Amazon’s Dynamo DB

Amazon’s DynamoDB is a technology based around the original Dynamo whitepaper which attempts to solve a whole class of problems by easing eventually consistent constraints.

What you get is more availability, it’s hard for the whole cluster to go down. That’s great for applications because they can continue to operate if one or more nodes fails. It also scales writes, which is a sort of holy grail in the database world as it’s typically hard to do.

But remember all this comes at a cost. Traditionally scaling writes is hard to do because all changes are kept in one place. You maintain a single authoritative master. If you want to imagine why this matters, think back to our smartphone example. We changed our contact card on our phone and our desktop at the same time. One of those two changes won the battle. But that’s a case where we’re not overly concerned.

If you imagine a bank doing the same thing, and you wire $1000 via phone and desktop, you can quickly see that there is a whole class of applications that won’t be happy with eventually consistent. Your web application may be one of those. Or it may not. Consider carefully before you go with Amazon RDS or DynamoDB as your datastore.

Read: Why startups need more than great developers to achieve scalability

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters