- Why are SQL queries so fundamental to database performance?
This is the one question which a DBA should have an answer to. If they can’t answer this question, they’re unlikely to be a good candidate.
After a MySQL server is setup and running, with many of the switches and dials set to use memory, and play well with other services on the Linux server, queries remain an everyday challenge. Why is this?
SQL queries are like little programs in and of themselves. They ask the database server to collect selections of records, cross tabulate them with other collections of records, then sort them, and slice and dice them. All of this requires MySQL to build temporary tables, perform resource intensive sorts and then organize the output in nice bite size chunks.
Unfortunately there are many ways to get the syntax and the results right, yet not do so efficiently. This might sound like a moot point, but with modern websites you may have 5000 concurrent users on your site, each hitting pages that have multiple queries inside them.
What makes this an ongoing challenge is that websites are typically a moving target, with business requirements pushing new code changes all the time. New code means new queries, which pose ongoing risks to application stability.
- Indexes – too many, too few; what’s the difference?
Indexes are very important to the smooth functioning of a relational database. Imagine your telephone book of yore. I can look up all the people with last name of “Hull” in Manhattan because I have the proper index. But most yellow pages don’t include an index for *first* names even though they might occaisionally come in handy, for example with the names “Star” or “Persephone”.
You can imagine that, if you had a phone book which you maintain and update, everytime you add or remove a name you also have to update the index. That’s right, and the same goes for your relational database.
So therein lies the trade off, and it’s an important one. When you are *modifying* your data, adding, updating or removing records, you must do work to keep the index up to date. More indexes mean more work. However when you’re looking up data or *querying* in SQL speak, more indexes mean more ways of looking up data fast. One more trade off is that indexes take up more pages in your phonebook, and so too they take up more space on disk.
- Backup & Recovery – explain various types & scenarios for restore
Backups come in a few different flavors that the DBA should be familiar with.
Cold backups involve shutdown down the database server (mysqld) and then backing up all the data files by making a copy of them to another directory. To be really thorough, the entire datadir including binlogs, log files, /etc/my.cnf config file should also be backed up. The cold backup is a database in itself, and can be copied to an alternate server and mounted as-is.
Logical backups involve using the mysqldump tool. This locks tables while it runs to maintain consistency of changing data, and can cause downtime. The resulting dump file contains CREATE DATABASE, CREATE TABLE & CREATE INDEX statements to rebuild the database. Note the file itself is not a database, but rather a set of instructions which can tell a MySQL server *HOW* to reconstruct the database. Important distinction here.
Hot backups are a great addition to the mix as they allow the physical database data files to be backed up *WHILE* the server is up and running. In MySQL this can be achieved with the xtrabackup tool, available from Percona. Despite the name, it works very well with MyISAM and InnoDB tables too, so don’t worry if you’re not using xtradb tables.
There are a few different restore scenarios, and the candidate should be able to describe how these various backups can be restored, and what the steps to do so would be. In addition they should understand what point-in-time recovery is, and how to perform that as well. After restoring one of the above three backup types, the DBA would use the mysqlbinlog utility to apply any subsequent transactions from the binary logs. So if the backup was made at 2am last night, and you restore that backup, the mysqlbinlog tool would be used to dig up transactions since 2am, and apply them to that restored database.
- Troubleshooting Performance
Since this is an ongoing challenge with relational databases, a good grasp of it is crucial. One way to challenge the candidate would be to describe a recent performance problem you experienced with your infrastructure, and ask them how they would go about resolving it.
If they struggle with the particulars of what you ran into, ask them to describe a big performance challenge they solved, what the cause was, and how they performed analysis.
Typically, first steps involve mitigating the immediate problem by finding out what changed in the environment either operationally or code changes. If there is a bug that was hit, or other strange performance anomaly, the first stop is usually looking at log files. MySQL server error logs, and the slow query log are key files. From there, analyzing those files during the timeframe where problems occurred should yield some clues.
You might also hope to hear some comment about metrics collection in this discussion. Tools such as cacti, munin, opennms, or ganglia are invaluable tools for drilling down on a past event or outage, and sifting through server stats to find trouble.
- Joins – describe a few kinds and how the server performs them
A basic understanding of INNER JOIN and OUTER JOIN would be a great start. A simple example might be employees and departments. If you have four employees and two departments, an INNER JOIN of these tables together will give you the departments employees belong to. Add another employee without assigning her to a department, and the inner join won’t display her. Further adding a new department which doesn’t yet contain employees won’t display either. However performing an OUTER JOIN will give you those matches with null in the department field, and null in the employee field respectively.
Thought of with another example, take a credit card company. One tables contains cardholders identity, their number, address, and other personal information. A second table contains their account activity. When they first join, they don’t have any monthly statements, so an INNER JOIN of cardholders with statements will yield no rows. However an OUTER JOIN on those two tables will yield a record, with a null for the statements columns.