Oracle to MySQL – prepare to bushwhack through the open source jungle

oracle to mysql

I was recently approached by a healthcare company for advice on suitable database solutions capable of executing its new initiative. The company was primarily an Oracle shop so naturally, they began by shopping for possible Oracle solutions.

The CTO relayed his conversation with the Oracle sales rep, who at first recommended an Oracle solution that, expensive as it may have been, ultimately aligned with the company’s existing technology and experience. Unfortunately this didn’t match their budget and so predictably, the Oracle sales rep whipped out a MySQL-based solution as an alternative.

Having worked as an Oracle DBA throughout the dot-com years, I know the technology well. I also know the cultural differences between enterprises that choose Oracle solutions and those that choose open-source ones.

This encounter with the healthcare firm struck me as a classic conundrum for today’s companies who are under pressure to meet business targets under a tight budget, and in a very short time.

Can an open-source solution like MySQL be the answer to such huge demands?

The Oracle sales rep will likely nod excitedly and say no sweat. But as a consultant I could only manage an equivocal yes.

As the healthcare CTO rattled off the list of products he wanted to use, specific RTOs and RPOs (recovery time objective + recovery point objective – all I could think was to react with concern.

In my experience with startup after startup I’ve seen plenty of different MySQL installations but I’d never heard of one with the technology stack he described. What’s more I’d never heard of these solutions described with the Oracle Corp titles.

On one hand I wanted to discuss the merits of the solution he was keen to implement, while on the other, I was expressing concern over possible directions and paths we might take.

An Oracle cluster is not a MySQL cluster

The solution Oracle suggested was a MySQL Cluster. The term cluster unfortunately means different things to different people. Such loose usage of the word dilutes its meaning. In particular a lot of Oracle technologists expect that this solution might be similar to Oracle’s Real Application Cluster technology. It’s not. There are a lot of limitations, and frankly it’s really just a different beast.

The list also included various management dashboards which Oracle likes to push, but which I rarely see in my consulting assignments. What’s more I heard nothing about replication integrity considering that replication problems are an ongoing concern for real-world MySQL installations due to the particular technology used under the hood. There are reliable solutions to this problem but none yet available from Oracle. In fact, this is a big problem but one that may be completely off the sales guys’ radar.

Don’t let sales frame your architecture

Honestly, I don’t have a particularly large axe to grind with the sales guys. They have a job to do, and providing solutions which bring revenue to their firm and commissions for themselves is what puts food on their tables. Each party is motivated in different ways. But as a company shopping for solutions, this should be kept clearly in mind when starting down that road.

Beware prescribed architectural frameworks that appear too easy because they almost always don’t do what they say on the tin. Unfortunately sales folks don’t have experiencing designing architectures in the real world, so they can’t really know how the technologies work beyond the data sheet with feature bullet points.

As we all know in the technology space, all software come with bugs and real-world experience does not match the feature lists in the brochures. In law they have de jure and de facto. The former describes what is written and the latter, what’s practiced. For technology solutions, its never just adding water for something to work.

Do your homework

Before you embark on a new trip through the open source technology jungle, do some due diligence. Read up on real-world solutions, and how other large firms are using the technology. What configurations are they having success with? Which are causing trouble for a lot of people.

One of the great advantages of open-source are the very vibrant communities, forums and discussion groups where people are glad to share their experiences and offer advice.

Allow sufficient time to test and
bring your team up to speed

This is very important one. Shifting from an enterprise that relies primarily on Oracle for it’s relational database solution over to one that relies on open source technologies is a very big step indeed. Open-source technologies tend to be much more do-it-yourself and roll your own. Oracle solutions tend much more toward predefined paths and solutions and prescriptions for customers.

There are merits to each of these paths, with attendant pros and cons. But they are decidedly different. It’s likely that your team will also require time to get up to speed, not just with the particular software components, but with the new process by which things happen in the open-source space. Allow sufficient time for this shift to take place, lest you create more problems than solutions.

A handy guide for PHP and MongoDB Web Development

PHP and MongoDBWhat makes a beginner’s guide handy is when it speaks to your intuition. It anticipates the burning questions that follow from a newbie trying to grasp new concepts and it quickly answers them. PHP and MongoDB Web Development – Beginner’s Guide is one such guide.

I hadn’t heard of Packt Publishing or Rubayeet Islam before picking up this title and I must say I’m impressed. Based in Birmingham, with offices in Mumbai, part of Packt’s business model is to give part of the royalties earned from its books to the open source projects they cover.

I already had a working knowledge of MongoDB, mostly from an operational perspective. If you are new to MongoDB you’ll certainly appreciate how this book is structured. it cuts to the chase, diving right into the nuts and bolts of installing the pieces you’ll need such as database and drivers, and getting your first application running.

From there they take you through a basic web application step-by-step, with chapters on session management, MapReduce and GridFS. EVery time I flip through the pages of a technical book, I find I always have questions in the background; ‘what about performance?’ or ‘How do I troubleshoot these pieces as I’m building them?’

What I liked about this book is that almost as quickly as I’d formulate some question about performance, I’d happen upon answers in the book, as if it knew what would come to my mind at each point of the

I was thinking about tuning and application performance and then found chapter 9 which discusses MongoDB’s explain facility, similar to that of MySQL. From there they cover index creation, hints, and finally profiling. These are all important topics for a developer, ones that he or she should have in mind while building their applications. So I was happy to see good coverage of that even in a self-avowed beginners’ guide.

Building apps that talk to both MySQL and MongoDB

Another interesting chapter was one introducing the idea of building an application that can talk to both MySQL and MongoDB and using those two datastores for different purposes. Again while I’m reading it I start thinking about operational concerns, and I start asking how one would support such an architecture. And then just like clockwork, Islam answers that very question.

He explains the challenges around data consistency and operational support in detail. It’s a great way to introduce a topic without necessarily pushing that adoption per se. Islam is clearly an experienced programmer, with much reasoned advice to share.

The book had great utility but I do have a few complaints.

First off the font is a little funky, and hard to read after a while. In that same vein, some of the screenshots are very wide and as such were zoomed down. This made those tiny and not very readable. Also the screenshots aren’t really consistent, some are black on white and some white text on black terminal which ended up being impossible to read.

Lastly I would have liked to see more use case discussions. Particularly, when should I consider a NoSQL database like MongoDB over a relational database? Which types of applications are really well suited? Which aren’t? What about versus other NoSQL’s? The same with GridFS. There was some caution there after the material was introduced but more discussion about what applications it is well suited for would be useful.

Those few complaints aside, the book is overall very good and perhaps the publishers will consider improving the type and diagrams in the next edition. It definitely sticks to it’s cover page motto “Learn by doing: less theory, more results”.

Top MySQL DBA interview questions (Part 2)

Also find Sean Hull’s ramblings on twitter @hullsean.

Continuing from our Top MySQL DBA interview questions (Part 1) here are five more questions that test a MySQL DBA’s knowledge, with two that will help suss out some personality traits.

  1. Disk I/O
  2. Disk performance should be an ever present concern to a DBA.  So although they don’t need to be a storage specialist, they should have a working knowledge.  Ask them about RAID versions, mirroring versus striping, and so forth.  Mirroring combines two disks as a unit.  Every write is duplicated on both disks.  If you lose one disk, you have an immediate copy.  Like a tandem truck that has spare tires running in parallel.  Lose one, and you don’t have to pull over immediately to replace it.  Striping spreads I/O over multiple disks so you on the one hand increase throughput linearly as you add disks.  That’s because you have more disks working for you.  At the same time you increase risk with each new disk you add, because the failure rate is then the sum total of all those disks.

    For relational databases the best RAID level is 10, which is striping over mirrored sets.  You use more disks, but disks are cheap compared to the hassle of any outage.

    If you’re deploying on Amazon, your candidate should be familiar with the Elastic Block Storage offering also known as EBS.  This is virtualized storage, so it introduces a whole world of operational flexibility.  No longer do you have to jump through hoops to attach, add or reconfigure storage on your servers.  It can all be done through command line API calls.  That said EBS suffers from variability problems as with any other shared resource.  Although Amazon guarantees your average throughput, the I/O you get at a given time can swing wildly from low to high.  Consider Linux software RAID across multiple EBS volumes to mitigate against this.

  3. How would you setup master/slave & master/master replication?
  4. A basic replication setup involves creating a full dump of the primary database, while it’s tables are locked.  The DBA should capture the master status, logfile & position at that time.  She should then copy the dump file to the secondary machine & import the full dump.  Finally the CHANGE MASTER TO statement should be run to point this database instance to it’s master.  Lastly START SLAVE should be issued.  If all goes well SHOW SLAVE STATUS should show YES for both of these status variables:

    Slave_IO_Running: Yes
    Slave_SQL_Running: Yes

    Master-master replication is similar, except one additional step.  After the above steps have run, you know that your application is not pointing at the slave database.  If you’re not sure, verify that fact first.  Now determine the logfile name & position on the slave with SHOW MASTER STATUS.  Return to the primary box, and run the CHANGE MASTER TO command to make it slave from the secondary box.  You’ve essentially asked MySQL to create a circular loop of replication.

    How does MySQL avoid getting into an infinite loop in this scenario?  The server_id variable must be set, and be unique for all MySQL instances in your replication topology.

    For extra credit, ask the candidate about replication integrity checking.  As important as this piece is to a solid reliable replication setup, many folks in the MySQL world are not aware of the necessity.  Though replication can be setup, and running properly, that does not mean it will keep your data clean and perfect.  Due to the nature of statement based replication, and non-deterministic functions and/or non-transactional tables, statements can make their way into the binary logs, without completing.  What this means is they may then complete on the slave, resulting in a different row set on the same table in master & slave instance.

    Percona’s pt-table-checksum is the preventative tool to use.  It can build checksums of all your tables, and then propagate those checksums through replication to the slave.  An additional check can then be run on the slave side to confirm consistency, or show which rows & data are different.

  5. How are Users & Grants different in MySQL than other DBs?
  6. Creating a grant in MySQL can effectively create the user as well.  MySQL users are implemented in a very rudimentary fashion.  The biggest misunderstanding in this area surrounds the idea of a user.  In most databases a username is unique by itself.  In MySQL it is the *combination* of user & hostname that must be unique.  So for example if I create user sean@localhost, sean@server2 and sean@server3, they are actually three distinct users, which can have distinct passwords, and privileges.  It can be very confusing that sean logging in from the local command line has different privileges or password than sean logging in from server2 and server3.  So that’s an important point.

  7. How might you hack a MySQL server?
  8. This is a good opportunity for the candidate to show some creativity with respect to operations and Linux servers.  There are all sorts of ways into a database server:

    a.bad, weak or unset passwords
    b.files with incorrect permissions – modifying or deleting filesystem files can take a database down or corrupt data
    c.intercepting packets – could reveal unencrypted data inside the database
    d.unpatched software – bugs often reveal vulnerabilities that allow unauthorized entry
    e.moving, disabling or interrupting the backup scripts – a possible timebomb until you need to restore
    f.DNS spoofing, could allow login as a different user
    g.generous permissions – may allow an unprivileged user access to protected data

    There are endless possibilities here.  Listening for creative thinking here, reveals how much that person will think thoroughly and effectively about protecting your systems from those same threats.

  9. Brain teasers, riddles and coding problems
  10. Google for a long time was a fan of these types of tests at interviews, but I’m not at all.  For one thing you filter for good test takers, and for another the candidate has no resources – either books or the internet at their disposal.

    Why not instead ask them to tell a story.  Storytelling conveys a lot of things.  It conveys a bit of teaching ability, which extends far beyond internalizing some multiple choice questions.  It tells you more about their personality, which as I’ve said is very important.  It shows how they solve problems, as they’ll take you through their process.  And gives them an opportunity to tell you a real world triumph they presided over.

Personality Questions

In my experience, some of the most important traits of a new hire center around personality traits, and how they might mix with your existing team.  Being punctual for an interview, for instance sets a precedent for many things.  But that door swings both ways, so if you want to hire these types of folks, don’t keep them waiting either!

Pay attention to whether or not the candidate takes some lead in the conversation at all.  This can indicate the person is a self starter.  Obviously a great candidate will also listen carefully and patiently to what you have to say, but may then take the ball and run with it somewhat.

Listen for signals that the person is active in the field, posting on forums, and attending conferences, meetups and forums on technology topics.  You might also ask them if they blog, and what topics interest them.

Top MySQL DBA interview questions (Part 1)

MySQL DBA interview questions

Also find Sean Hull’s ramblings on twitter @hullsean.

MySQL DBAs are in greater demand now than they’ve ever been. While some firms are losing the fight for talent, promising startups with a progressive bent are getting first dibs with the best applicants. Whatever the case, interviewing for a MySQL DBA is a skill in itself so I thought I’d share a guide of top MySQL DBA interview questions to help with your screening process.
It’s long and detailed with some background to give context so I will be publishing this in two parts.

The history of the DBA as a career

In the Oracle world of enterprise applications, the DBA has long been a strong career path. Companies building their sales staff required Peoplesoft or SAP, and those deploying the financial applications or e-business suite needed operations teams to manage those systems. At the heart of that operations team were database administrators or DBAs, a catchall title that included the responsibility of guarding your businesses crown jewels. Security of those data assets, backups, management and performance were all entrusted to the DBA.

In the world of web applications, things have evolved a bit differently. Many a startup are driven only by developers. In those smaller shops, Operations tasks are designated to one developer who takes on the additional responsibility of managing systems. In that scenario, Operations or DBA duties becomes a sort of secondary role to the primary one of building the application. Even in cases where the startup creates a specific operations role with one person managing systems administration, chances are they don’t also have DBA experience. Instead, these startups are more likely to manage the database as a typical Linux application.

When I grow up I (don’t) want to be a MySQL DBA

Where do they come from, and why don’t a lot of computer science folks gravitate towards operations, and DBA? This may be in part due to and the romance of certain job roles which we discussed in a past article, The Mythical MySQL DBA. This pattern appeared a lot in the Oracle world as well. Many folks who were career DBAs actually moved to that role from the business side. In fact you’d find that many didn’t have a computer science or engineering background in the first place. In my experience I saw many Linux and Unix administrators with a stronger foundation who would fit into the DBA role but were simply not interested in it. The same can be said of the MySQL side of the house. Computer science grads don’t get out of school aiming for a career in ops or as a DBA because it has never been regarded as the pinnacle. It’s typically the PROGRAMMERS who become the rockstars in a cool startup.

But as the Internet grows into a richer and more complex medium, things are changing. People talk about scalability, high availability, zero downtime and performance tuning. When brief outages cost millions in losses expectations are very high and that requires skilled, experienced DBAs.

We’ve made a list of comprised of skill questions, general questions and ‘good-to-know’ questions. Have fun grilling your candidate with them, although bear in mind that with interviews it’s not about knowing it all, rather how the person demonstrates critical thinking skills.

Skills Questions

  1. Why are SQL queries so fundamental to database performance?
  2. This is the one question which a DBA should have an answer to. If they can’t answer this question, they’re unlikely to be a good candidate.

    After a MySQL server is setup and running, with many of the switches and dials set to use memory, and play well with other services on the Linux server, queries remain an everyday challenge. Why is this?

    SQL queries are like little programs in and of themselves. They ask the database server to collect selections of records, cross tabulate them with other collections of records, then sort them, and slice and dice them. All of this requires MySQL to build temporary tables, perform resource intensive sorts and then organize the output in nice bite size chunks.

    Unfortunately there are many ways to get the syntax and the results right, yet not do so efficiently. This might sound like a moot point, but with modern websites you may have 5000 concurrent users on your site, each hitting pages that have multiple queries inside them.

    What makes this an ongoing challenge is that websites are typically a moving target, with business requirements pushing new code changes all the time. New code means new queries, which pose ongoing risks to application stability.

  3. Indexes – too many, too few; what’s the difference?
  4. Indexes are very important to the smooth functioning of a relational database. Imagine your telephone book of yore. I can look up all the people with last name of “Hull” in Manhattan because I have the proper index. But most yellow pages don’t include an index for *first* names even though they might occaisionally come in handy, for example with the names “Star” or “Persephone”.

    You can imagine that, if you had a phone book which you maintain and update, everytime you add or remove a name you also have to update the index. That’s right, and the same goes for your relational database.

    So therein lies the trade off, and it’s an important one. When you are *modifying* your data, adding, updating or removing records, you must do work to keep the index up to date. More indexes mean more work. However when you’re looking up data or *querying* in SQL speak, more indexes mean more ways of looking up data fast. One more trade off is that indexes take up more pages in your phonebook, and so too they take up more space on disk.

  5. Backup & Recovery – explain various types & scenarios for restore
  6. Backups come in a few different flavors that the DBA should be familiar with.

    Cold backups involve shutdown down the database server (mysqld) and then backing up all the data files by making a copy of them to another directory. To be really thorough, the entire datadir including binlogs, log files, /etc/my.cnf config file should also be backed up. The cold backup is a database in itself, and can be copied to an alternate server and mounted as-is.

    Logical backups involve using the mysqldump tool. This locks tables while it runs to maintain consistency of changing data, and can cause downtime. The resulting dump file contains CREATE DATABASE, CREATE TABLE & CREATE INDEX statements to rebuild the database. Note the file itself is not a database, but rather a set of instructions which can tell a MySQL server *HOW* to reconstruct the database. Important distinction here.

    Hot backups are a great addition to the mix as they allow the physical database data files to be backed up *WHILE* the server is up and running. In MySQL this can be achieved with the xtrabackup tool, available from Percona. Despite the name, it works very well with MyISAM and InnoDB tables too, so don’t worry if you’re not using xtradb tables.

    There are a few different restore scenarios, and the candidate should be able to describe how these various backups can be restored, and what the steps to do so would be. In addition they should understand what point-in-time recovery is, and how to perform that as well. After restoring one of the above three backup types, the DBA would use the mysqlbinlog utility to apply any subsequent transactions from the binary logs. So if the backup was made at 2am last night, and you restore that backup, the mysqlbinlog tool would be used to dig up transactions since 2am, and apply them to that restored database.

  7. Troubleshooting Performance
  8. Since this is an ongoing challenge with relational databases, a good grasp of it is crucial. One way to challenge the candidate would be to describe a recent performance problem you experienced with your infrastructure, and ask them how they would go about resolving it.

    If they struggle with the particulars of what you ran into, ask them to describe a big performance challenge they solved, what the cause was, and how they performed analysis.

    Typically, first steps involve mitigating the immediate problem by finding out what changed in the environment either operationally or code changes. If there is a bug that was hit, or other strange performance anomaly, the first stop is usually looking at log files. MySQL server error logs, and the slow query log are key files. From there, analyzing those files during the timeframe where problems occurred should yield some clues.

    You might also hope to hear some comment about metrics collection in this discussion. Tools such as cacti, munin, opennms, or ganglia are invaluable tools for drilling down on a past event or outage, and sifting through server stats to find trouble.

  9. Joins – describe a few kinds and how the server performs them
  10. A basic understanding of INNER JOIN and OUTER JOIN would be a great start. A simple example might be employees and departments. If you have four employees and two departments, an INNER JOIN of these tables together will give you the departments employees belong to. Add another employee without assigning her to a department, and the inner join won’t display her. Further adding a new department which doesn’t yet contain employees won’t display either. However performing an OUTER JOIN will give you those matches with null in the department field, and null in the employee field respectively.

    Thought of with another example, take a credit card company. One tables contains cardholders identity, their number, address, and other personal information. A second table contains their account activity. When they first join, they don’t have any monthly statements, so an INNER JOIN of cardholders with statements will yield no rows. However an OUTER JOIN on those two tables will yield a record, with a null for the statements columns.

Feeling like a MySQL expert yet? In Part 2 of Top MySQL DBA Interview Questions we’ll walkthrough four more questions plus a bonus.

The Age of the Platform by Phil Simon

The Age of the Platform book coverI picked up Phil Simon’s The Age of the Platform after running into his blog, and some of his writing online. Simon is an interesting guy with an obvious strong technical background. He’s also an accomplished speaker and you can find several videos of his speaking online.

The first thing that struck me about this book was how it came to be. The book was funded through Kickstarter, an online platform for people to fund their creative projects. Perhaps it was Simon trying to drive home the point of his book. But it gets better, he self-published the book through Motion Publishing. Furthermore the book isn’t cheap for a paperback at $20. That said I admire that he has obviously eaten his own dog food, as the proverbial saying goes, and done it himself.

The premise of the book is that we’re entering a new age exemplified by four companies, namely Google, Facebook, Amazon and Apple. He takes us through a quick history of each company, then illustrates their successes and how each of them have successfully created platforms to extend their reach. Continue reading The Age of the Platform by Phil Simon