Category Archives: Scalability

If you use MySQL in the Amazon cloud, you need to ask yourself this question

Join 25,000 others and follow Sean Hull on twitter @hullsean.

Are you serious about backups?

If you’re just using Amazon EBS snapshots, that may not be sufficient. There’s a good chance it won’t protect you against your next data loss.

That’s why I like to have a few different types of backups

Also: 5 more things deadly to scalability

Protect against operator error

mysqldump is a tool every DBA is familiar with. Same as a hotbackup or snapshot you say? Just more labor? Not true.

A dump allows you to restore one table, or one schema. That’s why they’re also known as logical backups. What’s more you can edit the file, remove indexes, change object names, or datatypes. All these can be essential in the screwy and unpredictable event of a real world outage.

Expect the unexpected!

Read: Why devops talent is in short supply

Test those backups regularly

If you haven’t actually tried to restore, you really don’t know if you have everything. Did you backup stored procedures & database code? How about grants? Database events? How about cronjobs? What about the my.cnf file? And your replication configuration?

Yes there are a lot of little pieces, and testing your backups by rebuilding everything is an attempt to poke holes in your plan, and hit issues before d-day!

Related: MySQL interview guide for managers and candidates alike

Replication isn’t a backup

Replication is getting better and better in MySQL. It used to fail regularly. MyiSAM was very unpredictable. But even in the comfortable realm of Innodb, there can still be data drift. If you’re on MySQL 5.0 or 5.1, you should consider performing regular checksums. These test the integrity of data and compare what’s actually in master & slave. Bulletproofing MySQL replication with checksums.

Read: Why high availability is so very hard to deliver

Have you considered security around your backup files?

While you’re thinking about backups, make sure the files themselves are secure. Remember they contain your crown jewels. Hopefully individual data that’s sensitive is encrypted, but still you should secure their final resting place as well.

If you’re using S3, consider encrypting the file before shipping it up to the bucket.

Read this: Why a four letter word divides dev and ops

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

5 cloud ideas that aren’t actually true

storm coming

Join 20,000 others and follow Sean Hull’s scalability, startup & innovation content on twitter @hullsean.

Cloud computing is heralding us into a wonderful era where computing can be bought in small increments, like a utility. This changes the whole way we plan, manage budgets, and accelerates startups making them more agile.

But it’s not all wine & roses up there. I’ve heard a few refrains from clients over the years, and thought I’d share some of the most common.

1. Scaling is automatic

Rather recently I was working with a client on building some sophisticated reports. They needed to slice & dice customer data, over various time series, and summarize with invoices & tracking data. Unfortunately their dataset was large, in the half terabyte range.


Client: Can we just load all this data into the cloud?
Me: Yes we can do that. Build a system in Amazon public cloud, can support large datasets.
Client: I want it to scale easily. So we won’t have these slow reports. And as we add data, it’ll just manage it easily for us.
Me: Well it’s a little bit more complicated than that, unfortunately.

Unfortunately this is a rather familiar conversation that I have quite often. A lot of the press around cloud scalability, centers around auto-scaling, Amazon’s renowned & superb virtualization feature. Yes it’s true you can roll out webservers to scale out this way, but that’s not the end of the story. Typically web applications have a lot of components, from caching servers, to search servers, and of course their backend datastore.

But can we scrap our relational database, such as MySQL and go with one that scales out of the box like Riak, Cassandra or Dynamodb?

Those NoSQL solutions are built to be distributed from the start, it’s true. And they lend themselves to that type of architecture. However, if you’ve built up a dataset in MySQL or Oracle, and more so an application around that, you’ll have to migrate data into the NoSQL solution. That process will take some time.

Like teaching a fish to fly, it make take some time. They do well in water, but evolution takes a bit longer.

Related: RDS or MySQL 10 use cases

2. Disaster recovery is free

In the traditional datacenter, when you want DR, you setup a parallel environment. Hopefully not in the same room, same city or same coast even. Preferrably you do so in a different region. What you can’t get around is dishing out cash for that second datacenter. You need the servers, just in case.

In the cloud, things are different. That’s why we’re here, right? In amazon you have regions already setup & available for plugin-n-play use. Setup your various components, servers, software & configure. Once you’ve verified you can failover to the parallel environment you can just turn off all those instances. Great, no big charges for all that iron that you’d pay for to keep the rooms warm in an old-school datacenter. Or do you?

As it turns out, since you don’t have this environment running all the time, you’ll want to test it more often, run fire drills to bring the servers back online. That’ll incur some costs in terms of manpower. You’ll also want to include in there some scripts to start those servers up, and/or some detailed documentation on how to do that. And don’t lose that documentation, either will you?

You may also want to build some infrastructure as code unit tests. Things change, code checkouts evolve, especially in the agile & continuous integration world. Devops beware!

Read this: Why a killer title can make or break your content efforts

3. Machines are fast

Fast, fast, fast. That’s what we expect, things keep getting faster, right? Hard to believe then that the world of computing took a big step backward when it jumped into the cloud. Something similar happened when we jumped to commodity Linux a decade ago.

In amazon, it’s a multi-tenant world. And just like apartment buildings, popular restaurants, or busy highways you must share. When things are quiet you may have the road to yourself, but it’ll never be as quiet as a dirt road in the country!

Amazon is making big strides though. They now offer memory optimized & storage optimized instances. And an even bigger development is the addition of the most important feature for performance & scalability. That said the network & EBS can still be a real bottleneck.

Also: What is a relational database & why is it important?

4. Backups aren’t necessary

I’ve experienced a few horror stories over the years. I wrote about one noteworthy one When fat fingers take down your business.

True EBS snapshots make backing up your whole server, well a snap! That said a few extra steps have to happen (flush the filesystem & lock tables) to make this work for a relational database like MySQL or Oracle. And suddenly you have a verification step that you also need to perform. You see no backups are valid until they’ve been restored, remember?

But even with these wonderful disk snapshots, you’ll still want to do database dumps, and perhaps table dumps. Operator error, deleting the wrong data, or dropping the wrong tables, will always be a risk. Ignore backups at your own peril!

Check this: Why CTOs underestimate operational costs

5. Outages won’t happen

In an ideal world, everything is redundant, and outages will be a thing of the past. We’ll finally reach five nines uptime and devops everywhere will be out of work. :)

It’s true that Amazon provides all the components to build redundancy into your architecture, and very cutting edge firms that have taken netflix’s approach with chaos monkey are seeing big improvements here. But AirBNB did fail and at root it was an Amazon outage that shouldn’t ever happen.

Read: Why Oracle won’t kill MySQL

Get more. Monthly insights about scalability, startups & innovation.. Our latest Are SQL Databases Dead?

Why I can’t raise the bar at every firm

Screen-shot-2012-08-02-at-1.28.35-PM

Join 17,000 others and follow Sean Hull on twitter @hullsean. Also check out: Who is Sean Hull?

It may seem counterintuitive. If I am not the best solution provider, why on earth would I highlight it?

I believe by pointing those cases out, I also underline the clients and problems that I’m particularly well suited too, and for which I can really provide value. Read on!

1. People Problems

Sometimes, you’re hired to solve a particular problem which is framed as a technical one. Some process needs to be reworked, recoded or retooled. It’s framed as a technical problem, yet as things unfold the client already has the expertise in-house to solve & write the code. What then?

It may be that the right people aren’t communicating, project managers aren’t seeing the issues, or part of the human systems are gummed up. We can’t raise the technical bar, but we can help getting those folks talking.

I wrote about this before in When You’re Hired to Solve a People Problem.

Also: Why are oil spills & financial instability related to datacenter outages?

2. ORM Usage & Technical Debt

If you’ve read my blog you know I am not very fond of Object Relational Modelers.

I would also argue as Ward Cunningham does so elloquently that technical debt can be a real and pressing problem.

Here we would help identify and frame the problem, though the work of raising the bar technically involves the longer process of retooling & refactoring your code base.

Related: Why database choices are tricky

3. Where Commodity & Offshore Works

Some firms are already making use of odesk or offshoring resources, where you might pay as little as $150/day. If you have a very technical manager or CTO, such a solution may work well for you.

At the other end of the spectrum are the high priced senior consultants from firms like Oracle, Percona or Pythian. Yes they may set you back as much as $3500/day.

In those cases a scalability & performance review may make sense. Here’s how.. Although specialists are necessary, remember to ask yourself Why generalists are better at scaling the web.

We sit in the sweet spot between the two options. With low overhead, our prices are more affordable. At the same time you’re getting a whole lot more than a commodity solution. We’ll communicate in plain language with folks at every technical level. And for many firms that in itself is a value add.

Check this: Does Oracle Aim to Kill MySQL?

4. Existing team did their homework

Believe it or not, I’ve gone into consulting engagements where the existing team has really really done their homework.

In those cases it becomes much harder to raise the bar technically. In those cases I can help when existing team missed something. But more importantly, I can validate a correct setup, or identify technical debt.

Having an outside perspective then, can provide reassurance. As I see ten to fifteen new environments per year, I’ve seen hundreds in the past decade & a half. That’s helpful perspective in itself.

Read: Does Oracle Aim to Kill MySQL?

5. Availability & Uptime Are Already High

I wrote in depth about high availability in the Myth of Five Nines

At the end of the day, availability can only approach perfection, not actually reach it. That’s a property of complex systems. If your uptime is already extremely high, again we can validate your environment, review and provide & summarize findings. But we may not be able to raise the bar.

If that’s you, it’s a good problem to have!

Also: Why AirBNB Didn’t Have to Fail

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Why Scalability Is Big Business

Russian_Dolls

Join 16,500 others and follow Sean Hull on twitter @hullsean.

1. Complexity Is Growing

Despite automation & the mass migration to the cloud, or perhaps because of it, complexity continues to grow. Back in the dot com era a typical infrastructure included a load balancer, a couple web servers, one oracle database, and that was pretty much it.

Now that has multiplied. Pile on top of that three to five more webservers, a search server, a page cache, an object cache, one or more slave databases and more. You may have a utility server with jenkins for continuous automation, monitoring applications like nagios and cacti, your source code repository and perhaps configuration management like Puppet or Chef.

That’s not only more moving parts, it’s a wider swath of skills and technologies to understand. That’s one reason Generalists Are Better At Scaling The Web.

Also: Are SQL Databases Dead?

2. Developer Mandate: Features

The pressure to build features that can directly be monetized is obvious. Startups especially have the pressure to grow fast and grow now. So security, technical debt, and scalability often take a back seat. What’s more in small scrappy and lean startups, ops sometimes falls on the shoulders of one competent but overworked developer.

Related: Why Oracle Won’t Kill MySQL

3. Startups Growing Pains

With hyper growth, startups can go from 100 customers to millions overnight. That kind of popularity is a good problem to have. But if your app hits a wall and suddenly falls over, everyone is scrambling. The pressure builds, as fear of losing that traction mounts, and heads are put on the chopping block.

Read: AirBNB Didn’t Have To Fail

4. Missing Browse-only Mode & Feature Flags

Ever been browsing for airline tickets, then go to order and get an error? Try again later? If so you’re familiar with a browse-only mode. This is a very powerful addition to any web application but is very often left out. Some mistakenly believe it won’t work for their application, as users will always be changing data.

Ever visited a website that has star ratings, only to find them missing? Or temporarily unable to edit your rating for a piece of content? This amounts to what’s called a feature flag. These powerful switches give operations teams the ability to disable heavy features, while the side is under tremendous load. They can take a huge burden off the shoulders of your servers when you hit that scalability cliff.

Check this: Why I Don’t Work With Recruiters

5. Operations as an afterthought

I outlined some of the top reasons Why Startups Desperately Need Techops. It is a repeating refrain. Priorities of a growing startup often involve taking on technical debt. But if that isn’t managed carefully you’ll run into some of the problems that Ward Cunningham Warns Us About.

Also: 5 Things Are Toxic To Scalability

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Are SQL Databases Dead?

mesa verde city

I like the image of this city of Mesa Verde. It’s fascinating to see how ancient cities were built, especially as an inhabitant of one of the worlds largest cities today, New York.

I’m a long time relational database guy. I worked at scores of dot-coms in the 90′s as an old-guard Oracle DBA, and pivoted to MySQL into the new century. Would a guy like me who’s seen 20 years of relational database dominance really believe they could be dying?

There’s a lot to be excited about in this new realm of db, and some interesting bigger trends that are pushing things in a new way.

Join 15,100 others and follow Sean Hull on twitter @hullsean.

1. Growing use of ORMs

ORM probably sounds like some strange fossil archeologists just dug up in the ancient city of Mesa Verde. But they’re important. You may know them by their real-life names, Hibernate, Active Record, SQL Alchemy and Cake. There are many others. Object Relational Modelers provide a middleware between developers and the SQL of your chosen relational database. They abstract away the nitty gritty, and encapsulate it into a library.

In a way they’re like code generators. Mark Winand talks about them in SQL Performance Explained warning of the “eager fetching” problem. This is DBA speak for specifying all columns (SELECT *) or fetching all rows, when you don’t need them all. It’s inefficient in terms of asking the database to read & cache all that data, but also to send it across the network and then discard it on the webserver side. Like a lazy housekeeper the clutter & dust will grow to overwhelm you.

Martin Fowler is the author of the great book NoSQL Distilled. He tries to walk the fence in his post ORM Hate, trying to balance developers love of ORMs, and the obvious need for scalability. Ted Neward calls ORMs the Vietnam of Computer Science. Mattias Geniar points out that BAD ORMs are infinitely worse than bad SQL and another on High Scalability by Drewsky ”>The Case Against ORM Frameworks.

If you agree the ORM conversation is still a huge mess, you’ll be excited to know that NoSQL sidesteps it completely. They’re built out of the box to interface more like data structures, than reading rows and columns. So you eliminate the scalability problems they introduce when you go NoSQL. That makes developers happy, and pleases DBAs and techops too. Win!

Read: Why Oracle won’t kill MySQL

2. Widening field of options

NoSQL databases are not simply key value stores, though some like Memcache and Riak do fit that mold.

Mongodb offers configurable consistency & durability & the advantages of document storage, no need for an ORM here. You also have a mix of indexing options, that go a little deeper than other NoSQL solutions. A sort of middle ground solution that offers the best of both worlds.

Cassandra, a powerful db that is clustered out of the box. All nodes are writeable, and there are various ways to handle conflict resolution to suit your needs. Cassandra can grow big, and naturally takes advantage of cloud nodes. It also has a nice feature to naturally age out data, based on settings you control. No more monumental archiving jobs.

Hbase is the database part of Hadoop, based on Google’s seminal Bigtable paper.

Redis is another option with growing popularity. It’s a key-value store, but allowing more complex data in it’s buckets, such as hashes, lists, sets and sorted sets. Developers should be salivating at this one.

Also: 5 Great Things about Markus Winand’s Book SQL Performance Explained

3. Lowering bar

The old world of relational databases treat data as sacrosanct. DBAs are tasked with protecting it’s integrity & consistency. They manage backups to protect against disaster. In this world, every bit of data written is as sacred as any other, whether it’s your bank account balance, or a comment added to a facebook discussion.

But modern non-relational databases introduce the idea of eventually consistent. DBAs and architects would say we are relaxing our durability requirements. What they mean is data can get slightly out of sync and we’re ok with that. We’ll build our web applications to plan for that, or even in the case of Riak expose the levers of durability directly to the developers, allowing them to make some changes instant, while others more lax and lazy.

Check this: Why high availability is so very hard to deliver

4. Cloud demands

Virtualized environments like Amazon EC2, give easy access to legions of servers. Availability zones & regions only widen the deployment options. So deploying a single writeable master, the way traditional relational databases work best, is not natural.

Databases like Cassandra, Mongo & Redis are clustered right out of the box. They grew up in this virtual datacenter environment and feel comfortable there.

Related: Why I wrote the book on Oracle & Open Source

5. Only DBAs understand them

Devs may whine at this statement, and to be fair it’s a generalization. The popularity of ORMs speaks volumes here. Anything to eliminate the dreaded SQL writing. Meanwhile DBAs bemoan the use of ORMs for they represent everything they’re trying to fix.

SQL is hard enough, but the ugly truth is each database vendor has their own implementation, their own optimizations, their own optimal tweaks. Even between database versions, SQL code may not perform consistently.

Identifying slow SQL and tweaking it remains one of the primary tasks of performance tuning, for this reason. It hasn’t changed much in my two decades on the job.

Also: Why bemoaning AWS performance sounds like Linux detractors circa 1999

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

5 great things about Markus Winand’s book SQL Performance Explained

markus winand sql performance explained book

Join 12,100 others and follow Sean Hull on twitter @hullsean.

1. Covers databases broadly

You may not have noticed, but there’s a whole spectrum of relational databases on offer. Of course in the database world, most get infatuated with one, and that becomes their bread & butter before long. Their life, their passion, their devotion.

That’s fine as far as it goes, but Winand really stands out, offering a spectrum of ideas and optimization techniques for different platforms. If you’re an Oracle-only or MySQL-only dba you’ll gain a lot from this book but even more importantly if you work in professional services, and need to communicate with DBAs brought up on one of these platforms, it becomes like a rosetta stone for SQL query tuning.

Read: Why devops talent is in short supply

2. Shows you how to fish

I find database books and methods fall into two sort of broad categories. There’s the call Oracle support method, where you’ll be handed one very specific set of steps, commands, and a path to solve each specific problem. It’s more about memorization, it’s like they actually hand you the fish.

Then there is the investigative method, where you learn how to use a magnifying glass to look at fingerprints, and check for DNA samples, and interrogate suspects. You know learn the tools of the trade.

That’s what Markus brings you, in all it’s delicious glory.

Read: Why high availability is so very hard to deliver

3. It’s concise

Another gripe I have of technical books is that the publishing model is, this is a textbook and since the price tag is large, let’s make the physical book large! Of course no one wants to carry those books around. I even recently bought a kindle to solve this problem.

SQL Performance Explained is more a paperback book form factor, and that means you can tote it around with you easily, and keep it with you at work. Read it on the train, commuting to work.

200 pages packed cover to cover with all sorts of good chapters, including a primer on indexes & types, scalability & performance, joins, clustering, Top-N queries, DML, and more.

Read this: Why a four letter word divides dev and ops

4. It’s technical but accessible

If you’re a real rock bottom beginner, you might want to dig a bit more on your SQL syntax, and some of the basics. You could also keep a 101 book side-by-side, while you’re reading this book.

For the intermediate & advanced DBAs out there, this book will sit comfortably in your paws as you flip the pages and learn something new. For instance just today I learned that Postgres can index NULLs while MySQL, Oracle and SQL*Server cannot. Learn something new everyday.

Related: MySQL interview guide for managers and candidates alike

5. Gives you answers you can use today

After twenty years of consulting, I’ve seen a few patterns emerge. Besides the spectrum of team & communication challenges, firms hitting the performance wall often have issues with their relational databases.

Yes those databases are sometimes on the wrong hardware, or their are other obscure problems with setup or configuration. But the bulk of issues center on badly written SQL.

SQL is a much reviled language and often misunderstood. And it doesn’t seem like developers have gotten that much better at it over the years. It would explain the rise of NoSQL databases, as they often speak REST or xml, no need for pesky sequel.

One parting note. For all the devs and architects out there, who want to sing the virtues of ORMs, this book hits that squarely in the nose. By showing how differently each relational database implements SQL, performs work, and optimizes, Winand also illustrates the naivete behind trying to write database independent application code.

If you’re a developer and don’t know how to profile a query or run explain plan, don’t walk, run to your closest Amazon.com store and get this book!

Also: 5 more things deadly to scalability

Criticisms

If I were to offer two slight criticisms, it would be these. First, the index is a bit wonky. When I look under “P” for example, there’s no Postgres, while one quarter of the book is obviously devoted to that platform. Further, looking up NULL which are covered in depth, in various places in the book, only has one entry in the index, p54 on Oracle. So the index could be a bit more robust to be useful.

The other criticism is more perhaps my bias. On page 96, when he discusses ORMs I thought he was rather… shall we say gentle. Although he clearly states that “eager fetching” is problematic, I don’t think he goes far enough to condemn it. In my experience ORMs are always trouble.

Then again why am I complaining, their use keeps me forever employed.

Want a copy? Markus Winand’s book site has all the goods!.

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Why Healthcare.gov desperately needs techops

healthcare.gov logo

Join 15,000 others and follow Sean Hull on twitter @hullsean.

1. Tech-what? A quick education

Techops is operational excellence. It’s the handoff when code is complete. These are the folks who are up at 2am when a website is down. They manage servers, keep the pipes clean, and the hackers out. They also help plan for capacity needs, and may help with load testing too.

If you’re new to technology, imagine a movie set. Story writers (programmers) have already done their part. The producers (venture capital folks) have financed the project. The director (architect) is there trying to put the vision together. But the folks who manage everything on set, from sound guys, camera guys, lighting people, and all the coordination, this is operations. In web application deployments it is devops, sysops or techops.

Also: Why the Twitter IPO is afraid of scalability

2. In contrast with Obama election campaign

Notice how phenomenally well Obama for America project was run. Like a finely tuned machine. Harper Reed and team pulled off one of the most data backed election campaigns in history.

That project used AWS cloud technologies to the fullest, from devops tools like Puppet and Asgard, collaboration tools like Campfire & Github, and superb monitoring & instrumentation tools NewRelic and Chartbeat.

Clearly Obama knows how to run an election. Something is drastically different with the healthcare.gov project. Too many cooks in the kitchen, perhaps?

Read: Why your startup needs professional techops

3. A failure in capacity planning

Many popular news outlets covered the outage, but most pointed to “bugs”, which caused the outage. But when a site dies under load, while it’s working in test & Q/A, that’s a failure of load testing, and capacity planning.

I would wager a good bet, database tuning would definitely help as it’s the most common and prevalent cause of

Read this: What four letter word divides dev and ops?

4. More testing & more Agility needed

Modern software projects take advantage of continuous integration & agile methods. That is they make small incremental changes. Developers build unit tests, and the code is always in a working state. There is no multi-month dev cycle, where your current software is in doubt.

Reports indicate that the healthcare.gov software was being designed & developed using this old and most agree inferior method of software development, the waterfall method. New Yorker criticises it in Don’t go chasing waterfalls.

Read: Why devops talent is in short supply

5. Caching is desperately needed

All high performance, high scale websites need to take advantage of various types of caching as I’ve discussed in detail before. From browser caching, to page & object caching on the server side.

Hayden James investigated in depth, and found healthcare.gov severely lacking. Again this is a huge failure in techops, sysops or devops. It’s not a bug, and not something the developers are responsible to deliver.

Read: Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

When fat fingers take down your business

apple sad mac fail

Join 14,000 others and follow Sean Hull on twitter @hullsean.

Github goes nuclear

I was flipping through reddit last night, and hit this crazy story. strange pushes on GitHub. For those who don’t know, github houses source code. It’s version control for the software world. Lots of projects use it, to keep track of change management.

Jenkins is a continuous integration platform. Someone working on the project accidentally did a force push up to the server. They overwrote not only their own work, but the work of hundreds of other plugins unrelated to his own project.

This is like doing a demolition to put up a new building, and taking down all the buildings on your block and the next. Not very neighborly, to say the list. They’re still at the time of this writing, doing cleanup, and digging through the rubble.

Read: Why DynamoDB can increase availability

How to kill a database

I worked a startup a few years back that had an interesting business model. Users would sit and watch videos, and get paid for their time. Watch the video, note the code, enter the code, earn cash. Somehow the advertisers had found a way to make this work.

The whole infrastructure ran on Amazon EC2 servers, and was managed by Rightscale. Well it was actually managed by an west coast outsourcing shop, whose specialty was managing deployments on Righscale.

The site kept it’s information in a MySQL database. They had various scripts to spinup slaves, remaster, switch roles and so forth. Of course MySQL can be finicky and is prone to throwing surprises your way from time to time.

One time this automation failed in a big way, switching over production customers to a database that took way way way too long to rebuild. As their automation didn’t perform checksums to bulletproof the setup it couldn’t know that all the data wasn’t finished moving!

Customers sure did notice though when the site fell over. Yes this was a failure of automation. But not of the Rightscale platform, but of the outsourcing firm managing the process, checking the pieces and components and ensuring the computer systems did their thing to completion. Huge fail!

Read: Why devops talent is in short supply

Your website will fail

Sites big and small fail. Hopefully these stories illustrate that fact. I’ve said over and over why perfect availability is a pipe dream.

At the end of the day, the difference between the successful sites and the sloppy ones isn’t failure and perfection. It’s *how* they fail, and how they get back up on their feet. What type of planning did they do for disaster recovery like many firms in NYC did before and after Sandy.

Also: Why startups need both devs and ops for scalability

Reducing failure

So instead of thinking about eliminating failure, let’s think about *reducing* it from happening, and when it does, reducing the fallout. One thing you can do is signup for scalable startups where we share tips once a month on the topic. Meanwhile try to put these best practices into play.

1. Test your DR plan by running real life fire drills
2. Use more than one hosting provider, data center or cloud provider
3. Give each op or end user the least privileges they need to do their job
4. Embrace a culture of caution in operations
5. Check, double check and triple check those fat fingers!

Read this: Why a four letter word divides dev and ops

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

What is eventually consistent and will it work for you?

aws dynamodb

In the tech world we’re fond of inventing all sorts of technical terms, that are admittedly kind of confusing.

Join 13,500 others and follow Sean Hull on twitter @hullsean.

I was attending an excellent talk recently called Data at Scale, part of the Database Month series that Eric Benari hosts. In it Mark Uhrmacher presented some phenomenal solutions which worked for flash site ideeli. It allowed them to support their incredible business model, where 15% of traffic would happen in 15 minutes everyday. As he called it a “self-imposed denial of service attack”. Interesting analogy.

What occurred to me though, is that a lot of companies and startups struggle to understand which database solutions will work for them, and what the strengths and weaknesses of each are, and further what tradeoffs they’ll grapple with.

One concept that we hear a lot is “eventually consistent”. Many of the new NoSQL databases achieve their speed & availability this way. But what’s it all about?

Let’s change a smartphone contact

I’m sure you have a smartphone in your pocket, and for demonstration sake I’ll use the iphone configured with iCloud.

Let’s go ahead and dial up your *OWN* contact card. Click “Edit” and go ahead and change something. Let’s change your title to “rock star”. Now click “Done”. We’ll wait a minute. Now go to your desktop and open up Contacts. Scroll through to your contact and verify that the Title field now shows “rock star”.

How does all this happen? When you click the “Done” button, the iphone sends changes up to iCloud. iCloud then lets your laptop know a change has happened and those then sync up.

Now let’s run through the same exercise, but change it in two places. We’ll change the smartphone contact to “Founder” and the desktop Contacts record title to “Consultant”. Wait a little bit and you’ll notice they will both eventually show “Consultant”.

Also: Why a killer title can make or break your content efforts

How long were laptop & phone out of sync?

As you probably noticed, the iCloud seems to lean in favor of the desktop client. It’s not clear to me what rules it uses here, nor does it seem to be configurable. Nevertheless eventually both the desktop and smartphone with have the same contact card for you. Quite a feat of magic!

Read: Why high availability is so very hard to deliver

Handling collisions

There is only one *YOU* and presumably your digital rolodex reflects that too. You have one and only one contact card. Or do you? As far as these digital tools are concerned there are actually THREE! One on your desktop, one in iCloud and one on your phone. Each time you change in any of those places, it syncs *UP* to iCloud and then down to the other devices.

Collisions happen if you make changes in two places. Imagine if you’re a road warrior and your laptop was offline for some days, or your smartphone for that matter. In those cases that syncing would happen much later, and collisions more likely.

Related: Why the twitter IPO made a shocking admission on scalability

In the high frequency world of online databases

With online databases, all of this becomes vastly more complex. Web based applications may have 100,000 simultaneous users. Some may be coming from IMEA while others the Americas. It gets pretty darn complex when you have databases in each of those regions.

We deploy applications this way, so one datacenter, say the East Coast region one version, can fail, but all the others still operate. They can still change data, read and write, without being impacted by the New York outage.

Once that datacenter is restored, the databases will then sync up and reconcile missing data.

Also: Why a killer title can make or break your content efforts

MariaDB and Amazon RDS read replicas

MySQL and it’s variants of MariaDB, Percona and Amazon RDS can do something like this with read-replicas. The read-only copies of the database are asynchronous and take time to catch up to changes. You can have the read-only copies in different regions.

This improves availability for browsing your application, but not for making changes. In other words MySQL can use this method to scale reads but not writes. That’s why I recommend your applications also support a browse only mode which means availability won’t be impacted if your authoritative master dies.

Although you can try to do the same for writes by sharding your MySQL instances, this starts to get very messy very fast. Imagine backing up 10 shards, 10x the complexity, and even more when you want to go and do a restore.

Read: Why devops talent is in short supply

Amazon’s Dynamo DB

Amazon’s DynamoDB is a technology based around the original Dynamo whitepaper which attempts to solve a whole class of problems by easing eventually consistent constraints.

What you get is more availability, it’s hard for the whole cluster to go down. That’s great for applications because they can continue to operate if one or more nodes fails. It also scales writes, which is a sort of holy grail in the database world as it’s typically hard to do.

But remember all this comes at a cost. Traditionally scaling writes is hard to do because all changes are kept in one place. You maintain a single authoritative master. If you want to imagine why this matters, think back to our smartphone example. We changed our contact card on our phone and our desktop at the same time. One of those two changes won the battle. But that’s a case where we’re not overly concerned.

If you imagine a bank doing the same thing, and you wire $1000 via phone and desktop, you can quickly see that there is a whole class of applications that won’t be happy with eventually consistent. Your web application may be one of those. Or it may not. Consider carefully before you go with Amazon RDS or DynamoDB as your datastore.

Read: Why startups need more than great developers to achieve scalability

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Round up of recent scalability, startup & social media posts

strawberries

If you’re checking back in, we’ve written a lot of new content recently. Here are some highlights for digging a little deeper.

Join 13,000 others and follow Sean Hull on twitter @hullsean.

1. Why you should evaluate carefully before hiring a consultant

You’re a startup, and you’re grappling with some particularly thorny problems. You’ve gotten pocked and scratched, and are still struggling with big issues. So you’ve decided to hire a consultant, now what?

Evaluating consultants is a key step to ensure you find someone you can work with. But how is the process different from interviewing a candidate for a fulltime role? Here’s our thoughts on it.

2. Why a killer title can make or break your content efforts

For devops & techops bloggers out there, I’ve put together this quick howto guide. Titles really make the difference as to whether your content gets noticed, or ends up dying on the vine.

Don’t let it happen. Practice some creative title writing and other tips and you’ll be zooming your way to the top!

3. Why real world high availability is so hard to deliver

Five nines, goes the saying, is the gold standard for availability. But if it is really a standard, then why the heck isn’t anybody really achieving it?

4. Why a four letter word divides dev and ops

The on-going battle between developers and operations teams rages, devops be damned. Here’s our take on the age-old turf war!

5. Why Amazon RDS doesn’t support Percona or MariaDB

Should I use Amazon RDS or build my own MySQL box on EC2? It’s a question I hear constantly from clients and prospects. The answer of course is it depends!

In this short article, I hit on some of the typical use cases, and discuss which solution is best. If you’re interested in Percona & MariaDB, you’ll want to take a look.

6. Why techops talent is in short supply

Database administrators? Systems administrators? Ops teams? They don’t carry the sexy allure that rock star developers do, but once code is deployed, and out in the wild, these are the swat teams, and national guardsmen that you’ll rely on everyday. They’ll monitor your systems, and when necessary wake at 3am to repair things that have fallen over.

Despite their crucial role in web application deployments in the cloud, they remain in short supply.

7. 5 more things deadly to scalability

Scalability is the goal every fast growth startup struggles with. Here are some key best practices to keep reliability and capacity in the crosshairs.

8. Why the Twitter IPO makes a shocking admission about scalability

Flip through a tech company IPO filing, and you’ll find some rather vulnerable admissions about data centers and fragile architectures. How can this even be possible, for a major internet firm that’s dealt with the fail whale many times before?

9. Why reaching journalists with email fails where social media & twitter succeed

After reading Adrienne Erin’s 7 deadly sins of pitching I felt discouraged. Everything she said in there I had done. Pitching is a game neither writers or journalists enjoy. I’d long since given up on it.

Then I thought about it some more. Actually I’d had some good success reaching journalists on social media. I just didn’t really think of it as pitching per se. That’s because it was more like getting into the conversation. It was almost like the networking and hob nobbing we do naturally at conferences and meetups. So I wrote about what worked for me. Read more

10. 25 Rumsfelds Rules for startups & managers with tweetible links

Donald Rumsfeld, what can be said? What can’t be said? Well for all controversy and bad press you have to give him credit for some great one liners.

I picked up his new book, and couldn’t put it down. There’s inspiration on every page!

So I selected out my twenty five favorite quotes, and included them here for your twitter enjoyment!

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters