MySQL needs single master to check data integrity

MySQL slaves can drift out of sync. Many of our clients are surprised to find some data differences in their replication topology, once we do some checking and sniffing around. Such checks require a single reliable or authoritative master to compare against.

Click through to the end for multi-master solutions that work with MySQL.

Reason 5 – No way to do integrity checking

MySQL’s statement based replication has a number of problems. It is notoriously bad at handling stored procedures. There are also lots of problems around non-deterministic functions. When you’re lucky it fails with an error. But many environments have replication problems and data inconsistencies that they don’t even know about.

This problem of data drift is well known in MySQL replication. Luckly there is a solution. Formerly part of Maatkit, now redubbed percona toolkit, the table checksum tool attacks this problem head on. pt-table-checksum is a great tool, but won’t help you much when you don’t have a single master.

Which database holds the correct view of all your data?

NEXT: Reason 6 – Active-active breaks transaction isolation

PREV: Reason 4 – Cannot clone & build a slave without a single master

Want more? Grab our Scalable Startups monthly for more tips and special content. Here’s a sample

MySQL requires an authoritative master to build slaves

In MySQL database operations, you often need to rebuild slaves. They fail for a lot of different reasons, fall out of sync or crash. When this happens you may find you need to reclone and start fresh. This is normally done by finding your authoritative master database, and doing a hotbackup.

Click through to the end for multi-master solutions that work with MySQL.

Reason 4 – You cannot reclone without single master

If you use active-active multi-master replication, you no longer have a single authoritative master copy of all your data. That means you have no reliable master to clone from, if a replica breaks.

If you’re an operational dba, you probably know that MySQL replication breaks regularly. You always need to be able to go back to a fresh copy when things get mixed up on the slaves.

NEXT: Reason 5 – MySQL row-based replication has limitations

PREV: Reason 3 – Limitations of MySQL row-based replication

Want more? Grab our Scalable Startups monthly for more tips and special content. Here’s a sample

Limitations of MySQL row-based replication

MySQL offers a few different options for how you perform replication. Statement-based has been around a lot longer, and though it has some troublesome characteristics they’re known well and can be managed. What’s more it supports online schema changes with multi-master active-passive setup. We recommend this solution.

Row-based replication is newer. It attempts to address problems like those introduced by non-deterministic functions, and replicating stored procedures. But it introduces it’s own challenges.

Click through to the end for multi-master solutions that work with MySQL.

3. Row-based replication limitations

Row-based replication addresses some of the limitations of statement based replication.

o works better with stored procedures
o reduces problems associated with non-deterministic functions

But it creates a few of it’s own, some are show-stoppers:

o won’t work if target table storage engine, column order, data types or row itself are different or missing.
o doesn’t write SQL to the binlogs – useful for troubleshooting
o harder to do point-in-time recovery without SQL in binlogs
o harder to do online schema changes by switching masters

NEXT: Reason 4 – Cannot reclone without a single master

PREV: Reason 2 – MySQL Replication is prone to failure

Want more? Grab our Scalable Startups monthly for more tips and special content. Here’s a sample

10 reaons active-active is hard and how to solve it

Multi-master replication provides redundant copies of your most important business assets. What’s more it allows applications to scale out, which is perfect for cloud hosting solutions like Amazon Web Services.

But when you decide you need to scale your write capacity, you may be considering active-active setup. This is dangerous, messy and prone to failure. We’ve outlined how.

Click through to the end for multi-master solutions that work with MySQL.

Reason 1 – auto_increment introduces new problems

o MySQL’s auto_inc settings make it more difficult to change servers around in your overall replication topology

o Using auto_inc settings can cause MySQL to introducing gaps in your primary keys which is a waste of space

o Such a solution would require all tables to have auto_inc primary keys

NEXT: Reason 2 – MySQL replication is brittle to start with

Want more? Grab our Scalable Startups monthly for more tips and special content. Here’s a sample

Why does MySQL replication fail?

When considering active-active multi-master, you must consider it’s foundation technology. Although MySQL replication is straightforward to setup, it can fail in a myriad of ways. Most of those are known and well understood. We can solve them only if we use the technology in the standard way.

Click through to the end for multi-master solutions that work with MySQL.

Reason 2 – Replication is brittle to start with

- statement based – throw it over the fence architecture
- use non-deterministic functions causes unpredictable behavior
- no globally unique transaction id (like Oracle SCN – system change number)
- no built in checksums – pt-table-checksum is a must
- replication position syncing to index files prone to breaking
- temp tables disappear after restart
- row-based still a new code path – doesn’t support zero downtime
- even with row-based mysql can fall back to statement
- row-based does not include SQL in binlogs
- MySQL replica slaves die frequently, require re-clone from authoritative master

NEXT: Reason 3 – MySQL row-based replication has limitations

PREV: Reason 1 – auto increment settings create new problems

Want more? Grab our Scalable Startups monthly for more tips and special content. Here’s a sample

Don't Miss Percona Live 2013

The biggest event on the MySQL calendar is the yearly Percona Live and it’s just around the corner.

This year you’ll be able to pick from a whopping 110 technical sessions by 90 different speakers from companies like Facebook, Amazon, Google and Linkedin. Learn what’s happening at the cutting edge of open source database deployments. Besides the technical sessions, there’s plenty of hobnobbing with fellow DBAs, developers, and industry folks. A great opportunity all around.

It’s just a couple of weeks away, so don’t waste any time. You can register here and enjoy a 20% discount code “SH-Live”!

Get to it!

Want more? Grab our Scalable Startups monthly for more tips and special content. Here’s a sample

When You Have to Take the Fall

Also find Sean Hull’s ramblings on twitter @hullsean.

One of the biggest jobs in operations is monitoring. There are so many servers, databases, webservers, search servers, backup servers. Each has lots of moving parts, lots that can go wrong. Typically if you have monitoring, and react to that monitoring, you’ll head off bigger problems later.

A problem is brewing

We, myself & the operations team started receiving alerts for one server. Space was filling up. Anyone can relate to this problem. You fill up your dropbox, or the drive on your laptop and all sorts of problems will quickly bubble to the surface.

Also check out – Why generalists are better at scaling the web.

As we investigated over the coming days, a complicated chain of processes and backups were using space on this server. Space that didn’t belong to them.

Dinner boils over

What happened next was inevitable. The weekly batch jobs kicked off and failed for lack of space. Those processes were not being monitored. Business units then discovered missing data in their reports and a firestorm of emails ensued.

Hiring? Get our MySQL DBA Interview Guide for managers, recruiters and candidates alike.

Why weren’t these services being monitored, they wanted to know.

Time to shoot the messenger

Having recently seen a changing of the guard, and a couple of key positions left vacant, it was clear that the root problem was communication.

Looking for talent? Why is it so hard to find a mythical MySQL DBA or devops expert these days?

I followed up the group emails, explaining in polite tone that we do in fact have monitoring in place, but that it seemed a clear chain of command was missing, and this process fell through the cracks.

I quickly received a response from the CTO requesting that I not send “these types of emails” to the team and to direct issues directly to him.

You might also like: A CTO Must Never Do This

A consultants job

As the sands continued to shift, a lead architect did emerge, one who took ownership of the products overall. Acting as a sort of life guard with a higher perch from which to watch, we were able to escalate important issues & he would then prioritize the team accordingly.

Are you a startup grappling with scalability? Keep in mind these 5 things toxic to scalability

Sometimes things have to break a little first.

What’s more a consultants job isn’t necessarily to lead the pack, nor to force management to act. A consultant’s job is to provide the best advice possible & to raise issues to the decision makers. And yes sometimes it means being a bit of a fall guy.

Those are the breaks of the game.

Want more? Grab our Scalable Startups monthly for more tips and special content. Here’s a sample

Ken Auletta Gets Us Googled

Also find Sean Hull’s ramblings on twitter @hullsean.

The title sounds vaguely fatalistic, the end of the world is nigh, that kind of thing. It turns out though that Auletta is a journalist having reported over the years a lot on old media. So when he says “as we know it” he’s speaking as much to old media as he is to the tech vanguard.

You’ll also want to check out The Big Switch – Rewiring the World From Edison to Google by Nicholas Carr.

But what makes his book superb isn’t just his phenomenal journalistic skills, in digging up all the facts and serving up a fair and accurate presentation of things. I think it’s important that he’s not a cheerleader at all, and approaches the topic with a critical eye as much to old media who ignored many of the warning signs in early 2000′s as to google who he emphasizes has been hubristic, at times arrogant, and has struggled with issues of privacy and copyright as they’ve built their technology.

Also check Jeff Jarvis’ What Would Google Do.

What makes this book even more important though is to step back and think of it as a case study in how the internet has become such a disruptive force. And in that light, google is a business which has rode that wave as much as it has defined it. Interestingly Google was not afraid to bring him to Mountain View to speak in their AtGoogleTalks series, and that video is now up on YouTube.

Ken Auletta on At Google Talks

Want more? Grab our Scalable Startups monthly for more tips and special content. Here’s a sample

When You're Hired to Solve a People Problem

Also find Sean Hull’s ramblings on twitter @hullsean.

A good five years ago I worked for a firm in online education. Among various products they provided through their website, they were struggling with a process to get content churned out more quickly. The bottleneck was slowing down their business, and limiting the new products they could offer.

Help Us Publish, Please…

Among a number of things I was asked to look at, one particularly vexing problem surrounded publishing. Adding new products had become a cumbersome & difficult process. It took days sometimes weeks. For obvious reasons the stake holders wanted to wrestle this process out of the hands of engineering, and place it were it arguably belonged in the hands of the business units.

[quote]
When you’re hired to solve specific technical problems it only figures that you go looking for software solutions. But sometimes the problems turn out to involve the people and processes of an organization. Getting them unstuck is one of the biggest challenges an professional services consultant can face. But it is also one of the most rewarding to solve.
[/quote]

Bumping into Fear, Uncertainty & Doubt

As I dug into the meat of the problem I began to work closely with the database administrator. He was a very smart gentleman & friendly in his own way. But he also spoke with a very thick accent and brusqueness about his manner that proved difficult at times. After working together for some awhile, however I began to win him over, and he started to trust me.

Looking for a top-flight database administrator? Here’s our interview guide for recruiters, managers and candidates alike

It became apparent that he was rather resistant to handing over the keys to the publish process to non-technical folks in other departments. Having handled his share of outages, and bungling screw ups, which sometimes fall on operations during some of the least hospitable hours on the dial, I could understand his concern. What’s more he knew the code which had grown unwieldy.

If I were to use a polite euphemism I would call it spaghetti code.

Management, Managers & Trouble Brewing

Around then the CTO decided to send a manager to sniff around. Unfortunately the manager in question was a very hands off type. His edict was simply to get this done in two weeks, and proceeded to go on vacation. Upon his return when things were still hitting snags, things started to go south.

Despite AWS failures firms like AirBNB and Reddit didn’t have to go down.

Though some of the process had been automated, I refused to move the changes into full push-button automation without first testing on dev environments. Of course those requests had fallen on deaf ears.

Problem comes to a head

Next the hands off manager escalated things upstream, of course adding his own spin on the situation. Shortly thereafter I’m called into the CTOs office only to get royally chewed out. A serious smack down which I’ll admit came almost out of nowhere.

A related article which readers also found quite popular: A CTO Must Never Do This

Oh, honestly I’m not complaining. On some level this is the job of the consultant. To act as the third party, wise or unbiased second opinion, and even punching bag at times.

Once things calmed down, I explained the situation from top to bottom. Yes there was messy code, and yes the process was complex, but it could of course be automated. What really stood in the way was a very resistant engineer who currently owned the process.

As much of the Sandy recovery continues, Devops can learn real lessons from the hurricane.

The CTO for his part concurred, having had trouble communicating with the engineer himself, and really not liking him much. He then appointed a proper project manager to oversee redoing the publish process from scratch.

A Plea for Cooperation

If I were to do it all again, for my part I’d sniff out the people dynamics more carefully. It’s often the case the companies have the engineering talent in house to solve a particular problem, but not the will or knowledge to put it into play.

Is your business growing? Having trouble scaling? Here’s how we do a performance review. It’s the first step on your way to hyper growth.

To managers & CTOs I’d encourage where possible to look for people, process and communication issues. Try to ferret out when something is an engineering problem, or whether it is one of people, silos and territory.

Want more? Grab our Scalable Startups monthly for more tips and special content. Here’s a sample