Category Archives: All

Top MySQL DBA interview questions (Part 1)

MySQL DBA interview questions

Also find Sean Hull’s ramblings on twitter @hullsean.

MySQL DBAs are in greater demand now than they’ve ever been. While some firms are losing the fight for talent, promising startups with a progressive bent are getting first dibs with the best applicants. Whatever the case, interviewing for a MySQL DBA is a skill in itself so I thought I’d share a guide of top MySQL DBA interview questions to help with your screening process.
It’s long and detailed with some background to give context so I will be publishing this in two parts.

The history of the DBA as a career

In the Oracle world of enterprise applications, the DBA has long been a strong career path. Companies building their sales staff required Peoplesoft or SAP, and those deploying the financial applications or e-business suite needed operations teams to manage those systems. At the heart of that operations team were database administrators or DBAs, a catchall title that included the responsibility of guarding your businesses crown jewels. Security of those data assets, backups, management and performance were all entrusted to the DBA.

In the world of web applications, things have evolved a bit differently. Many a startup are driven only by developers. In those smaller shops, Operations tasks are designated to one developer who takes on the additional responsibility of managing systems. In that scenario, Operations or DBA duties becomes a sort of secondary role to the primary one of building the application. Even in cases where the startup creates a specific operations role with one person managing systems administration, chances are they don’t also have DBA experience. Instead, these startups are more likely to manage the database as a typical Linux application.

When I grow up I (don’t) want to be a MySQL DBA

Where do they come from, and why don’t a lot of computer science folks gravitate towards operations, and DBA? This may be in part due to and the romance of certain job roles which we discussed in a past article, The Mythical MySQL DBA. This pattern appeared a lot in the Oracle world as well. Many folks who were career DBAs actually moved to that role from the business side. In fact you’d find that many didn’t have a computer science or engineering background in the first place. In my experience I saw many Linux and Unix administrators with a stronger foundation who would fit into the DBA role but were simply not interested in it. The same can be said of the MySQL side of the house. Computer science grads don’t get out of school aiming for a career in ops or as a DBA because it has never been regarded as the pinnacle. It’s typically the PROGRAMMERS who become the rockstars in a cool startup.

But as the Internet grows into a richer and more complex medium, things are changing. People talk about scalability, high availability, zero downtime and performance tuning. When brief outages cost millions in losses expectations are very high and that requires skilled, experienced DBAs.

We’ve made a list of comprised of skill questions, general questions and ‘good-to-know’ questions. Have fun grilling your candidate with them, although bear in mind that with interviews it’s not about knowing it all, rather how the person demonstrates critical thinking skills.

Skills Questions

  1. Why are SQL queries so fundamental to database performance?
  2. This is the one question which a DBA should have an answer to. If they can’t answer this question, they’re unlikely to be a good candidate.

    After a MySQL server is setup and running, with many of the switches and dials set to use memory, and play well with other services on the Linux server, queries remain an everyday challenge. Why is this?

    SQL queries are like little programs in and of themselves. They ask the database server to collect selections of records, cross tabulate them with other collections of records, then sort them, and slice and dice them. All of this requires MySQL to build temporary tables, perform resource intensive sorts and then organize the output in nice bite size chunks.

    Unfortunately there are many ways to get the syntax and the results right, yet not do so efficiently. This might sound like a moot point, but with modern websites you may have 5000 concurrent users on your site, each hitting pages that have multiple queries inside them.

    What makes this an ongoing challenge is that websites are typically a moving target, with business requirements pushing new code changes all the time. New code means new queries, which pose ongoing risks to application stability.

  3. Indexes – too many, too few; what’s the difference?
  4. Indexes are very important to the smooth functioning of a relational database. Imagine your telephone book of yore. I can look up all the people with last name of “Hull” in Manhattan because I have the proper index. But most yellow pages don’t include an index for *first* names even though they might occaisionally come in handy, for example with the names “Star” or “Persephone”.

    You can imagine that, if you had a phone book which you maintain and update, everytime you add or remove a name you also have to update the index. That’s right, and the same goes for your relational database.

    So therein lies the trade off, and it’s an important one. When you are *modifying* your data, adding, updating or removing records, you must do work to keep the index up to date. More indexes mean more work. However when you’re looking up data or *querying* in SQL speak, more indexes mean more ways of looking up data fast. One more trade off is that indexes take up more pages in your phonebook, and so too they take up more space on disk.

  5. Backup & Recovery – explain various types & scenarios for restore
  6. Backups come in a few different flavors that the DBA should be familiar with.

    Cold backups involve shutdown down the database server (mysqld) and then backing up all the data files by making a copy of them to another directory. To be really thorough, the entire datadir including binlogs, log files, /etc/my.cnf config file should also be backed up. The cold backup is a database in itself, and can be copied to an alternate server and mounted as-is.

    Logical backups involve using the mysqldump tool. This locks tables while it runs to maintain consistency of changing data, and can cause downtime. The resulting dump file contains CREATE DATABASE, CREATE TABLE & CREATE INDEX statements to rebuild the database. Note the file itself is not a database, but rather a set of instructions which can tell a MySQL server *HOW* to reconstruct the database. Important distinction here.

    Hot backups are a great addition to the mix as they allow the physical database data files to be backed up *WHILE* the server is up and running. In MySQL this can be achieved with the xtrabackup tool, available from Percona. Despite the name, it works very well with MyISAM and InnoDB tables too, so don’t worry if you’re not using xtradb tables.

    There are a few different restore scenarios, and the candidate should be able to describe how these various backups can be restored, and what the steps to do so would be. In addition they should understand what point-in-time recovery is, and how to perform that as well. After restoring one of the above three backup types, the DBA would use the mysqlbinlog utility to apply any subsequent transactions from the binary logs. So if the backup was made at 2am last night, and you restore that backup, the mysqlbinlog tool would be used to dig up transactions since 2am, and apply them to that restored database.

  7. Troubleshooting Performance
  8. Since this is an ongoing challenge with relational databases, a good grasp of it is crucial. One way to challenge the candidate would be to describe a recent performance problem you experienced with your infrastructure, and ask them how they would go about resolving it.

    If they struggle with the particulars of what you ran into, ask them to describe a big performance challenge they solved, what the cause was, and how they performed analysis.

    Typically, first steps involve mitigating the immediate problem by finding out what changed in the environment either operationally or code changes. If there is a bug that was hit, or other strange performance anomaly, the first stop is usually looking at log files. MySQL server error logs, and the slow query log are key files. From there, analyzing those files during the timeframe where problems occurred should yield some clues.

    You might also hope to hear some comment about metrics collection in this discussion. Tools such as cacti, munin, opennms, or ganglia are invaluable tools for drilling down on a past event or outage, and sifting through server stats to find trouble.

  9. Joins – describe a few kinds and how the server performs them
  10. A basic understanding of INNER JOIN and OUTER JOIN would be a great start. A simple example might be employees and departments. If you have four employees and two departments, an INNER JOIN of these tables together will give you the departments employees belong to. Add another employee without assigning her to a department, and the inner join won’t display her. Further adding a new department which doesn’t yet contain employees won’t display either. However performing an OUTER JOIN will give you those matches with null in the department field, and null in the employee field respectively.

    Thought of with another example, take a credit card company. One tables contains cardholders identity, their number, address, and other personal information. A second table contains their account activity. When they first join, they don’t have any monthly statements, so an INNER JOIN of cardholders with statements will yield no rows. However an OUTER JOIN on those two tables will yield a record, with a null for the statements columns.

Feeling like a MySQL expert yet? In Part 2 of Top MySQL DBA Interview Questions we’ll walkthrough four more questions plus a bonus.

The Age of the Platform by Phil Simon

The Age of the Platform book coverI picked up Phil Simon’s The Age of the Platform after running into his blog, and some of his writing online. Simon is an interesting guy with an obvious strong technical background. He’s also an accomplished speaker and you can find several videos of his speaking online.

The first thing that struck me about this book was how it came to be. The book was funded through Kickstarter, an online platform for people to fund their creative projects. Perhaps it was Simon trying to drive home the point of his book. But it gets better, he self-published the book through Motion Publishing. Furthermore the book isn’t cheap for a paperback at $20. That said I admire that he has obviously eaten his own dog food, as the proverbial saying goes, and done it himself.

The premise of the book is that we’re entering a new age exemplified by four companies, namely Google, Facebook, Amazon and Apple. He takes us through a quick history of each company, then illustrates their successes and how each of them have successfully created platforms to extend their reach. Continue reading The Age of the Platform by Phil Simon

A History lesson for Cloud Detractors

Computing history

We’ve all seen cloud computing discussed ad nauseam on blogs, on Twitter, Quora, Stack Exchange, your mom’s Facebook page… you get the idea. The tech bloggers and performance experts often pipe in with their graphs and statistics showing clearly that dollar-for-dollar, cloud hosted virtual servers can’t compete with physical servers in performance, so why is everyone pushing them? It’s just foolhardy, they say.

On the other end, management and their bean counters would simply roll their eyes saying this is why the tech guys aren’t running the business.

Seriously, why the disconnect? Open source has always involved a lot of bushwacking…

Continue reading A History lesson for Cloud Detractors

What Wouldn't Google Do?

What Would Google DoIn his latest book, What Would Google Do? Jeff Jarvis seems to have authored a gushing tribute to the search giant that has pledged to do no evil. He paints a very optimistic picture, and shows us over and over how Google has opened up industries, and how that same openness helps consumers like you and I.

Jarvis, if you don’t know him by name, has been a journalist for some time, but gained particular cred and notoriety when he blogged with the headline “Dell lies. Dell Sucks” after his horrible experiences with Dell computers and customer service.

While digging through Googly chapters, on Real Estate, Publishing, Entertainment, Shopping, Education and even Airlines, Jarvis serves up anecdotes on how a more open approach can help these industries adapt to a new business environment brought about by the Internet. He cites interesting examples like Gary Vaynerchuk, the creator of the hilarious and insanely popular winelibrary.tv show about wines, and now a public speaker on social media and brand building; and Brazilian author Paulo Coelho pirating his own works.

Taking the cue from some of these successes Jarvis goes on to propagate the idea that sharing and dishing out services for free is the way to make money. The irony that you have to buy his book for him to tell you that deserves a chuckle, and also raises the question of whether he himself buys all of that (pun inevitable). Indeed openness is great for consumers as most of us would agree. A level playing field increases competition, drives down prices for consumers. But it also drives down profits and margins. Continue reading What Wouldn't Google Do?

The Problem with Startup Bootcamps

instant startups

Scanning Crains NY Business recently, I saw an article on ‘starting up’ in 54 hours.  It’s the brainchild of Marc Nager, Clint Nelsen and Franck Nouyrigat called Startup Weekend. Startup bootcamps seem to be the current extra-curricular activity of choice these days. Wharton is also getting in on it with their Innovation Tournament. Then there is the 48 Hour Startup  and of course let’s not forget the 3 Day Startup.

So what’s my beef?  Truth be told I admire the ambition, the optimism, and the openness of these efforts.  And for sure these bootstrapping marathons do introduce entrepreneurs to future colleagues and partners, get them asking the right questions about financing, customers, revenue, competition and so forth.

My problem with these events is they frame startups as something you *can* do quickly. As if it were a Lego set or pop-up book that gives instant results and gratification. Sure startups are 21st century tech-driven business that provide innovative products in a very short development cycle but a lot of the day-to-day running of the business are still very mundane 20th century sensibilities; not unlike running a mom and pop store, a laundromat, deli or sandwich shop.

Continue reading The Problem with Startup Bootcamps

Scalability Rules for managers and startups

Scalability RulesAbbott and Fisher’s previous book, The Art of Scalability received good reviews for shifting the way we think about scalability from merely splitting databases and adding servers, to include the human factors that weigh heavily on its success. Together with the authors’ distinguished pedigree (PayPal, Amazon, and eBay between them), I picked up a copy of their second book, Scalability Rules – 50 Principles for Scaling Web Sites without a second thought.

If Art was about laying a strong foundation for a scalable organization then Rules is the reference point for when you actually tackle the growth challenges. It acts as a reminder when you come to a crossroad of decision-taking, to keep with the principles of scaling. Each guiding principle is clearly explained and illustrated with examples. It also prescribes how and when to apply the rules. Continue reading Scalability Rules for managers and startups

How about an easier tip jar?

Tip Jar Walking around New York you find yourself stopping at plenty of different places to grab some takeout for lunch. There are Vietnamese sandwich places, pizza shops, noodle bars, taco stands, juice bars and of course your daily coffee shop. You’ll find an endless variety.

As is customary in New York, even for takeout there is usually a tip jar at the checkout. Many of them have a large bowl, or glass jar in which you can throw your change as tips, or if you really love the place and service, a couple of dollars.

Of late I’ve noticed a few have placed those small plastic boxes with a tiny slot on the top. You try to put some change in the slot, and half of the money falls on the floor. It’s as frustrating as threading a needle while suffering from astigmatic vision. Now when I come to a place that has this plastic box, I don’t even bother tipping. I get a headache thinking about my change falling all over the floor. All I keep thinking is, why make it so difficult to tip?

Continue reading How about an easier tip jar?

Review: Here Comes Everybody by Clay Shirky

Here Comes EverybodyClay Shirky tells a great story. Here Comes Everybody begins with a case of a lost phone in a taxi cab, and the extraordinary turn of events that led to the owner retrieving it. From photos posted online, to NYPD who were uninterested in following up, to taking it all online. Through that online publicity, the story got picked up by the NY Times and CNN, which put pressure on the police to track down the taxi.  It’s a great example that illustrates the nuances, both good and bad, powerful and persistent that the Internet can unleash.

Throughout the book he weaves stories about the network effect, friends and friends of friends, and how that impacts information, organization, and the spread of ideas. Citing examples such as the SCO vs Linux court case and Groklaw, flash mobs and political organization, Shirky notes how all these events were influenced and facilitated by the Internet. Continue reading Review: Here Comes Everybody by Clay Shirky

The Mythical MySQL DBA

I’ve  been getting more than my fair share of calls from recruiters of late. Even in this depressed economic climate where jobs are rarer than a cab at rush-hour, it’s heartening to know that tech engineers are in great demand. And it’s even more heartening to think that demand for MySQL DBAs has never been better.

My reckoning was confirmed by a Bloomberg news report about stalwart retailers suffering from a dearth of talented engineers. Bloomberg cited Target’s outage-prone e-commerce site as a symptom of, among other things the market’s shortage. One of the challenges old-timers like Target face is having to compete with Silicon Valley startups as a fulfilling and ultimately, financially rewarding place to work. Continue reading The Mythical MySQL DBA

What Ops doesn’t tell you about your MySQL Database

MySQL is a very scalable platform which has proven robust even in the most dense and complex data environments. MySQL’s indispensable replication function is ‘sold’ as being fail-safe so you have little to sweat about as long as your backups are running regularly. But what the ops guys aren’t telling you is MySQL performs replication with tiny margins of error that could cause big problems in times of disaster.

Replication database backup

The Scenario

Imagine the scene, you use replication to backup your data. Your secondary database is your peace of mind. It’s the always-on clone of your crown jewels. You even perform backups off of it so you don’t impact your live website. Your backups run without errors. Your slave database runs without errors. Then the dreaded day comes when your primary database fails. You instruct your team to switchover your application to point to your live backup database. The site comes online again. But all is not right. You notice subtle differences and your team begins to question how deep the data divide could be.

The Problem with MySQL replication

Although MySQL replication is fairly easy to setup, and even to keep running without error, you may have unseen problems. MySQL’s core technology to replicate data between master and slave is primarily statement based. Various scenarios can cause what in other database platforms you might call database corruption, that is silent drifting of data from what tables and rows contain on the master. It is no fault of your own, or perhaps one might argue even of your operations team. It is a fundamental flaw in how MySQL performs replication.

The Solution

Fortunately there is a solution. Checksums, the wonderful computational tool for comparing things can be put to work nicely to compare database. The Percona Toolkit (formerly maatkit) includes just such a utility for use with MySQL. It can be used to check the integrity of your slave databases.
If you’ve never performed such a check, you should do so ASAP. If your database has been running for months at a stretch, chances are there could be differences lying undiscovered between the two systems.

Depending on the volume changing in your database, you can continue to use this tool periodically to confirm that all is consistent. If integrity checks fail, there is another tool in Maatkit to syncronize differences, and bring everything back to order.