Category Archives: Database Operations

Why startups need techops

devops divide

I was at a talk recently on node.js. Even if I’m not working with a technology directly, it’s exciting to see what’s out there, and node.js is bringing some hyper fast performance to a certain category of web applications.

During the keynote, the speaker mentioned a service to deploy applications on. I can’t name names unfortunately but it was a cloud solution on top of which you could deploy your application. Go this route
and you can do without an operations team. Avoid overhead of hiring ops, he claimed. And hey, then you can hire more developers!

To be fair I’ve heard much of the same thing at DBA or linux conferences. I can’t count the number of stories that start with “what some idiot developer did that took down our production systems…”.

Yes, it seems dev & ops are still just a tad bit adverserial.

Join 13,000 others and follow Sean Hull on twitter @hullsean.

1. My little known origins as a developer

Many colleagues and clients I’ve met in the New York City startup industry know me primarily as an operations & scalability guy. I tune databases, infrastructure and components to make things lightening fast.

I spent my earliest years at university on the computer lab operations staff. We watched and managed, made sure level zero backups were taken care of, and moved the tapes. Directly after college, I started at a software firm. I did C++ GUI development on the Mac, using the toolbox libraries with Metrowerks Codewarrior. I built split windows, and scroll bars, and displayed rows of data with nice resizable columns. All this wasn’t built into the class library, so for a lot of it we needed to roll our own solution.

We always had a long list of features coming from the business units. I also fielded many support calls, often from the windows platform as the code there hadn’t been managed and built as carefully. But that too was instructive as you could feel the pain of customers day-to-day challenges. It also illustrated the tradeoffs between new code and features, and existing bug fixes and support.

Also: Why generalists are better at scaling the web

2. A trip through the dot-com bubble as Oracle DBA

Through a circuitous path, I moved to New York in the mid-nineties and joined a startup. I had the opportunity to wear a lot of hats there, and apply my computer lab and Linux operating systems experience to the challenge of managed Oracle. I got a lot more involved with operations quick.

As the dot-com bubble grew, I saw a hot and growing demand for Oracle DBAs as most startups used Oracle, but the talent was in short supply. In one startup 80 million dollars was on the line as performance hobbled the website, and investors feared the worst.

Read: Why the Twitter IPO made a shocking admission about scalability

3. Different priorities & mandates

I remember working at Starmedia a media darling at the time. I was analyzing the database & server systems, and finding some code & jobs running during peak daytime hours. Management claimed that could not be the case. Yet for the next days and weeks I saw the same jobs running. I held strong and spoke truth to power as they say. That’s not always easy when you have a lot of investors, screaming CTOs and 100+ hour weeks. But eventually the source of the job was located, and disabled. And the website returned to it’s speedy self.

These experiences though do underline in my mind the different priorities and focus that developers and operations staff have.

Techops, system administrators & DBAs are typically averse to change. They fight it tooth and nail. That isn’t because they like to be curmudgeons though. They are typically very concerned about the business, but from a dramatically different perspective of stability, and reliability, even at 2am in the morning. They are concerned about the longevity of data, consistency, and durability of it.

Developers on the other hand have a different mandate. They are responsible for new business features, solutions to business requirements. Rapid prototyping & reactive or agile is embraced because it means you can deliver quicker to the business.

Crucially, both of these folks care very much for the business. Just with very different priorities.

Check this: Why AirBNB didn’t have to fail

4. Can developers do operations for you?

In a lot of small startups, the initial phase is obviously on building a product. That’s the build phase, and not surprisingly you hire a lot of developers. As you should. But as you grow you may find the operational tasks that are defaulting to one or more developers are taking more and more of their time. As your customer base grows and you’ve seen your first few spikes, it’s time to start thinking about hiring for a real ops role.

In summary, yes they can, but perhaps not well.

Related: How to hire a developer that you can work with

5. Volume discount, made to order or instant coffee

You may choose to go with instant coffee, by bringing someone in-house. You may find the right talent is hard to find. I wrote about this: Why techops and DBAs are in short supply.

Alternatively you may prefer a volume discount from one of the larger remote DBA or managed support solutions such as Oracle’s, Pythian or Percona. These guys all provide great service, but keep in mind how big of a fish you are. You’ll likely work through a ticketing system, and in some cases different engineers will look at your systems at different times. You will likely need either a very hands-on technical CTO or other in-house person to take ownership, and manage things closely.

The third option is a made-to-order coffee. Yes you pay more for Toby’s, Blue Bottle, or Ninth Street Espresso but you get what you pay for as they say. A boutique shop or independent consultant will provide a lot more hand holding, help your internal staff get up to speed, and communicate intimately about the process. If you’re a more non-technical CTO, or you’re very busy running the business, this solution may make a lot of sense for you.

Also: Why cloud detractors need a history lesson

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

No tools to reconcile MySQL with two masters

Here’s the last nail in the coffin. We hope this convinces you not to write to multiple masters with MySQL replication.

Reason 10 – No tools to reconcile inconsistent masters

For all the endless reasons we’ve already outlined MySQL replication is prone to failure. We know it’s going to happen, you now know too.

When replication breaks, your application may still be writing to both sides, both masters. Now you have data differences on both sides, random rows, in random tables.

Percona Toolkit checksum tool is great when you have a single authoritative master. It can compare & show you diffs. But when you’ve changed two masters, you have no authoritative side. Neither one is the last word on your data. You’re then faced with an intractable problem of reconciling your data in some manual or do-it-yourself method.

Not only would such a scenario be hairy and prone to error, but you would effectively have an outage for the duration. All ugly scenarios to be sure.

Click through to the end for multi-master solutions that work with MySQL.

PREV: Reason 9 – Temp tables can break replication

Want more? Grab our Scalable Startups monthly for more tips and special content. Here’s a sample

Temp tables can break replication

If you’re not convinced yet that writing to dual masters is a bad idea, we have a couple more reasons.

9. Temp tables break replication after restart

MySQL’s replication is sensitive to temporary tables. You shouldn’t use them. If your queries create them to work against, and a node crashes, the temp tables will be missing upon restart. You can imagine what this will do to all those queries which expect to see a temp table. That’s right, they’ll fail.

Theoretically you could have a start script which creates and populates such temp tables. However this may add operational complexity if not carefully coordinated with developers & change management.

NEXT: Reason 10 – No tools to reconcile inconsistent masters

PREV: Reason 8 – Crashed nodes can corrupt your cluster

Want more? Grab our Scalable Startups monthly for more tips and special content. Here’s a sample

Crashed nodes corrupt your MySQL cluster

Writing to two masters is like walking around with a loaded shotgun. Eventually one of your instances will fail and when it does, replications position & synchronization information could easily become corrupt!

Reason 8 – Crashed nodes cause big problems

MySQL instances, unfortunately can crash. When that happens, they don’t always sync the replication position properly. That’s a big risk. It’s one of the reasons why operational DBAs know that replication slaves need to be rebuilt from time to time.

If the replication position doesn’t sync properly when MySQL restarts it may do so at the wrong position and replay or miss some transactions. What this would do to an active-active setup is unpredictable. Again no single authoritative master means trouble!

MySQL 5.5 has introduced some parameters to address this. During my tests I’ve seen a performance hit, so use them with caution.

[code]
sync_binlog
sync_relay_log
sync_master_info
sync_relay_log_info
[/code]

NEXT: Reason 9 – Crashed nodes cause big problems

PREV: Reason 7 – Can’t add third node easily

Want more? Grab our Scalable Startups monthly for more tips and special content. Here’s a sample

10 reaons active-active is hard and how to solve it

Multi-master replication provides redundant copies of your most important business assets. What’s more it allows applications to scale out, which is perfect for cloud hosting solutions like Amazon Web Services.

But when you decide you need to scale your write capacity, you may be considering active-active setup. This is dangerous, messy and prone to failure. We’ve outlined how.

Click through to the end for multi-master solutions that work with MySQL.

Reason 1 – auto_increment introduces new problems

o MySQL’s auto_inc settings make it more difficult to change servers around in your overall replication topology

o Using auto_inc settings can cause MySQL to introducing gaps in your primary keys which is a waste of space

o Such a solution would require all tables to have auto_inc primary keys

NEXT: Reason 2 – MySQL replication is brittle to start with

Want more? Grab our Scalable Startups monthly for more tips and special content. Here’s a sample

The Art of Resistance

Sometimes, you have to be the bad guy. Be resistant to change. Here’s a story about how stubbornness pays off. As we’ve written about before A 4 letter word divides Dev & Ops.

I had one experience working as the primary MySQL DBA for an internet startup. Turns out they had Oracle for some applications too. And another DBA just to handle the Oracle stuff.

So it came time for Oracle guy to go on vacation. Suddenly these Oracle systems landed on my shoulders. We reviewed everything in advance, then he bid his goodbyes.

Almost as soon as he was out the door I started getting requests to change things.
“Oh we have to add this field”, or “oh we’d like to make this table change”.

I resisted enough to hold off development for a week.

“We’re now getting more heat from the CTO. Apparently certain pages on the site don’t save some very important content properly. It’s costing the business a lot of money. We need to make this change.”

My response -

[quote]No I can’t sanction this. If you want to do it, understand that it could well break Oracle’s replication.[/quote]

Oracle’s multi-master replication notoriously requires a lot of baby sitting. We held off for a few more days, and the Oracle DBA returned from his much needed vacation.

When discussing it afterward he said…

[quote]Am very glad you didn’t change those fields in the database. It would indeed have broken replication and caused problems with the backups![/quote]

Moral of the story…

  • apply the brakes around tight turns
  • don’t be afraid to say, we’re not going to do this before testing
  • you may have to say – we’re not going to do this at all…

A History lesson for Cloud Detractors

Computing history

We’ve all seen cloud computing discussed ad nauseam on blogs, on Twitter, Quora, Stack Exchange, your mom’s Facebook page… you get the idea. The tech bloggers and performance experts often pipe in with their graphs and statistics showing clearly that dollar-for-dollar, cloud hosted virtual servers can’t compete with physical servers in performance, so why is everyone pushing them? It’s just foolhardy, they say.

On the other end, management and their bean counters would simply roll their eyes saying this is why the tech guys aren’t running the business.

Seriously, why the disconnect? Open source has always involved a lot of bushwacking…

Continue reading

What Ops doesn’t tell you about your MySQL Database

MySQL is a very scalable platform which has proven robust even in the most dense and complex data environments. MySQL’s indispensable replication function is ‘sold’ as being fail-safe so you have little to sweat about as long as your backups are running regularly. But what the ops guys aren’t telling you is MySQL performs replication with tiny margins of error that could cause big problems in times of disaster.

Replication database backup

The Scenario

Imagine the scene, you use replication to backup your data. Your secondary database is your peace of mind. It’s the always-on clone of your crown jewels. You even perform backups off of it so you don’t impact your live website. Your backups run without errors. Your slave database runs without errors. Then the dreaded day comes when your primary database fails. You instruct your team to switchover your application to point to your live backup database. The site comes online again. But all is not right. You notice subtle differences and your team begins to question how deep the data divide could be.

The Problem with MySQL replication

Although MySQL replication is fairly easy to setup, and even to keep running without error, you may have unseen problems. MySQL’s core technology to replicate data between master and slave is primarily statement based. Various scenarios can cause what in other database platforms you might call database corruption, that is silent drifting of data from what tables and rows contain on the master. It is no fault of your own, or perhaps one might argue even of your operations team. It is a fundamental flaw in how MySQL performs replication.

The Solution

Fortunately there is a solution. Checksums, the wonderful computational tool for comparing things can be put to work nicely to compare database. The Percona Toolkit (formerly maatkit) includes just such a utility for use with MySQL. It can be used to check the integrity of your slave databases.
If you’ve never performed such a check, you should do so ASAP. If your database has been running for months at a stretch, chances are there could be differences lying undiscovered between the two systems.

Depending on the volume changing in your database, you can continue to use this tool periodically to confirm that all is consistent. If integrity checks fail, there is another tool in Maatkit to syncronize differences, and bring everything back to order.

Book Review – Effective MySQL

Effective MySQL: Optimizing SQL Statements

by Ronald Bradford

No Nonsense, Readable, Practical, and Compact

Effective MySQLI like that this book is small; 150 pages means you can carry it easily.  It’s also very no nonsense.  It does not dig too deeply into theory unless it directly relates to your day-to-day needs.  And those needs probably cluster heavily around optimizing SQL queries, as those pesky developers are always breaking things ;)

Jokes aside, this new book out on Oracle Press is a very readable volume. Bradford has drawn directly from real-world experience to give you the right bite size morsels you need in your day-to-day MySQL activities. Continue reading

Oracle Announces Paid MySQL Add-ons

 Oracle starts charging for MySQL Add-ons

Exciting news, Oracle just announced commercial MySQL extensions that they’ll be offering paid extensions to the core MySQL free product.

To be sure, this has raised waves of concern among the community, but on the whole I suspect it will be a good thing for MySQL.  This brings more commercial addons to the table, which only increases the options for customers.  Many will continue to use the core database product only, and avoid license hassles while others will surely embark on a hybrid approach if it solves their everyday business problems. Continue reading