Juggling apples & oranges in the datacenter


In which a few choice words become one serious accident…

The Backstory

More than five years ago now, I worked for a shop in the business of news & information around the legal and real estate sectors. It was a fairly large organization with a number of Oracle and MySQL backed applications. The whole place ran on Sun servers, with a team of systems administrators, developers, and of course editors & content folks.

I was the primary database administrator for almost an entire year back then. I reported directly to the CTO. She was bright, competent and great to work for.

Although she had a technical background, she often spoke about products and gave very high level directives when making requests. This was made more confusing as the environment lacked naming conventions. So often product names didn’t match server or database names.

I tended to take the very paranoid approach. I’d ask over and over for clarification, and let some time pass before actually executing on a request.

A Changing of the Guard

After many months as a contractor DBA, the firm finally located a fulltime guy to replace me. It’s no easy task finding a DBA these days, especially for MySQL.

He was a very bright guy with a lot of technical knowledge. A bit green behind the ears, but fully capable to manage an enterprise database shop.

Looking for a top-notch DBA? Here’s our MySQL interview questions & hiring guide. We also have one for hiring an Oracle DBA.

Nuking the database

After two weeks on the job, something unpleasant happened.

Imagine a chef working with cooks & confusing dishes with vegetables.

[quote]
Chef says, “Toss the avocado”
Cook throws the avocado salad in the trash thinking it’s rotten.
Chef comes back later asking quizzically, “I wanted you to mix it up!”.
[/quote]

In the datacenter the conversation went something like this…

[quote]
CTO: Drop the journal database & rebuild.
DBA: Ok. Give me a few minutes
CTO: What did you do? The whole application is offline now!
[/quote]

From there scrambling ensued. After nearly six hours of screaming, and firefighting, everything is finally restored from backups and the application brought back online.

Naming – product or components?

Semantics is very important. Those in the trenches tend to take requests word-for-word while those managing the troops tend to make requests in terms of products, divisions & the vantage point of the business.

That’s why naming conventions can be so important. Don’t want to be talking about apples when you really mean oranges.

Living with dysfunction

As environments grow over years and years, they tend to evolve into a spaghetti of confusing names & relationships. It’s the nature of enterprise environments.

- big confusion can mean big mistakes
- check & recheck – be risk averse and a bit paranoid
- check yourself, your shell, your hostname, your login
- ask questions & clarify repeatedly
- let some time pass before executing a destructive command

Made it this far? Grab our newsletter.