Category Archives: MySQL

Easy MySQL replication with hotbackups

Clone army

Also find Sean Hull’s ramblings on twitter @hullsean.

Setting up replication in MySQL is something we need to do quite often. Slaves die, replication fails, or tables and data get out of sync. Whenever we build a slave, we must start with a snapshot of all the data from the master database.

MySQLdump is the tried and true method of doing this, however it requires that we lock all the tables in the database. If we’re dumping a large database, this could be a significant period, where no writing can happen to our database for the duration of the backup. For many environments read-only is still an outage.

Enter hotbackups to the rescue. Percona comes with a tool that allows you to perform hotbackups of a running MySQL database, with no blocking. It’s able to do this because of Innodb & multi-version concurrency control (MVCC). Luckily we don’t need to dig into the guts to enjoy the benefits of this great technology.

Here’s a quick step-by-step guide to using xtrabackup to create a slave.

  1. Install xtrabackup
  2. If you don’t have any Percona software already on your server, don’t worry. You don’t need to use the Percona distribution to use xtrabackup. But you will need their repository installed. Here’s how:

    $ rpm -Uhv http://www.percona.com/downloads/percona-release/percona-release-0.0-1.x86_64.rpm

    From there simply install xtrabackup:

    $ yum install -y xtrabackup
  3. Snapshot master datadir
  4. The innobackupex utility comes with xtrabackup, and does the heavy lifting. So we can just use that to perform the backup.

    $ innobackupex /data/backup/

    Now we’ll see a new directory created inside /data/backup which looks something like this:

    /data/backup/2012-04-08_04-36-15/
  5. Apply binary logs
  6. The backup which xtrabackup created above is of the current state of the database. However there are transactions which are incomplete, and others which have not been flushed to their datafiles. In other words the backup as-is would be similar to a datadir if your database crashed. Some additional transactions must still be applied.

    To apply those changes, use the following command on the backup directory you created above:

    $ innobackupex --apply-log /data/mysql/backup/2012-04-08_04-36-15/
  7. Copy to slave
  8. $ scp -r /data/mysql/backup/2012-04-08_04-36-15 root@newslave:/data/
  9. Stop MySQL
  10. $ /etc/init.d/mysql stop
  11. Swap datadir
  12. $ cd /data$ mv mysql mysql_old$ mv 2012-04-08_04-36-15 mysql

  13. Adjust my.cnf parameters
  14. At minimum you need to set the server_id to a unique value. The IP address with the periods removed can make a good server_id.

  15. Start MySQL
  16. $ /etc/init.d/mysql start
  17. Point to master & start the slave
  18. One very nice thing about xtrabackup is that it automatically captures the master info, so we’ll easily be able to find out the current log file & log position! That’s a very nice feature.

    Find out where the slave should start from:

    $ cat /data/mysql/xtrabackup_binlog_infolog_bin.000027 2973624

    Now tell MySQL where the new master is:

    mysql> change master to-> master_user=’rep’,-> master_password=’rep’,

    -> master_host=’10.20.30.40′,

    -> master_log_file=’log_bin.000027′,

    -> master_log_pos= 2973624;

    Now start the slave:

    mysql> start slave;

    Lastly verify that it is running properly:

    mysql> show slave statusG;

    You should see the following:

    Slave_IO_Running: YesSlave_SQL_Running: Yes
  19. Test Replication
  20. Once you have replication up and running, you should test it as well. I like to keep a test table installed in the test schema for this purpose. Then you can test as follows:

    master> insert into sean_test values ('xtrabackup is a great way to create a slave with MySQL');

    Then verify that you see that row on your new slave:

    slave> select * from sean_test;

    Once you’ve used xtrabackup a few times, I’m sure you’ll be converted. It makes building a slave much simpler in MySQL. It captures the file & position for you and what’s more there is no dump file to apply – which typically takes a lot of time too! All in all the tool makes you more efficient, and allows you to snapshot slaves anytime you like.

    Now that you have replication working, you should add the icing to the cake. MySQL’s statement based replication is powerful, but even when it’s not throwing errors, the databases can get silently out of sync. In a future article we’ll discuss how to bulletproof your replication setup with a tool that performs checksums on your tables. That will give you professional enterprise class data protection in MySQL.

Autoscaling MySQL on Amazon EC2

Also find Sean Hull’s ramblings on twitter @hullsean.

Autoscaling your webserver tier is typically straightforward. Image your apache server with source code or without, then sync down files from S3 upon spinup. Roll that image into the autoscale configuration and you’re all set.
autoscaling MySQL
With the database tier though, things can be a bit tricky. The typical configuration we see is to have a single master database where your application writes. But scaling out or horizontally on Amazon EC2 should be as easy as adding more slaves, right? Why not automate that process?

Below we’ve set out to answer some of the questions you’re likely to face when setting up slaves against your master. We’ve included instructions on building an AMI that automatically spins up as a slave. Fancy!

  1. How can I autoscale my database tier?
    1. Build an auto-starting MySQL slave against your master.
    2. Configure those to spinup. Amazon’s autoscaling loadbalancer is one option, another is to use a roll-your-own solution, monitoring thresholds on servers, and spinning up or dropping off slaves as necessary.
  2. Does an AWS snapshot capture subvolume data or just the SIZE of the attached volume?
  3. In fact, if you have an attached EBS volume and you create an new AMI off of that, you will capture the entire root volume, plus your attached volume data. In fact we find this a great way to create an auto-building slave in the cloud.

  4. How do I freeze MySQL during AWS snapshot?
  5. mysql> flush tables with read lock;mysql> system xfs_freeze -f /data

    At this point you can use the Amazon web console, ylastic, or ec2-create-image API call to do so from the command line. When the server you are imaging off of above restarts – as it will do by default – it will start with /data partition unfrozen and mysql’s tables unlocked again. Voila!

    If you’re not using xfs for your /data filesystem, you should be. It’s fast! The xfsprogs docs seem to indicate this may also work with foreign filesystems. Check the docs for details.

  6. How do I build an AMI mysql slave that autoconnects to master?
  7. Install mysql_serverid script below.

    1. Configure mysql to use your /data EBS mount.
    2. Set all your my.cnf settings including server_id
    3. Configure the instance as a slave in the normal way.
    4. When using GRANT to create the ‘rep’ user on master, specify the host with a subnet wildcard. For example ‘10.20.%’. That will subsequently allow any 10.20.x.y servers to connect and replicate.
    5. Point the slave at the master.
    6. When all is running properly, edit the my.cnf file and remove server_id. Don’t restart mysql.
    7. Freeze the filesystem as described above.
    8. Use the Amazon console, ylastic or API call to create your new image.
    9. Test it of course, to make sure it spins up, sets server_id and connects to master.
    10. Make a change in the test schema, and verify that it propagates to all slaves.
  8. How do I set server_id uniquely?
  9. As you hopefully already know, in MySQL replication environment each node requires a unique server_id setting. In my Amazon Machine Images, I want the server to startup and if it doesn’t find the server_id in the /etc/my.cnf file, to add it there, correctly! Is that so much to ask?

    Here’s what I did. Fire up your editor of choice and drop in this bit of code:

    #!/bin/shif grep -q “server_id” /etc/my.cnf

    then

    : # do nothing – it’s already set

    else

    # extract numeric component from hostname – should be internet IP in Amazon environment

    export server_id=`echo $HOSTNAME | sed ‘s/[^0-9]*//g’`

    echo “server_id=$server_id” >> /etc/my.cnf

    # restart mysql

    /etc/init.d/mysql restart

    fi

    Save that snippet at /root/mysql_serverid. Also be sure to make it executable:

    $ chmod +x /root/mysql_serverid

    Then just append it to your /etc/rc.local file with an editor or echo:

    $ echo "/root/mysql_serverid" >> /etc/rc.local

    Assuming your my.cnf file does *NOT* contain the server_id setting when you re-image, then it’ll set this automagically each time you spinup a new server off of that AMI. Nice!

  10. Can you easily slave off of a slave? How?
  11. It’s not terribly different from slaving off of a normal master.

    1. First enable slave updates. The setting is not dynamic, so if you don’t already have it set, you’ll have to restart your slave.
    2. log_slave_updates=true
    3. Get an initial snapshot of your slave data. You can do that the locking way:
    4. mysql> flush tables with read lock;mysql> show master statusG;

      mysql> system mysqldump -A > full_slave_dump.mysql

      mysql> unlock tables;

      You may also choose to use Percona’s excellent xtrabackup utility to create hotbackups without locking any tables. We are very lucky to have an open-source tool like this at our disposal. MySQL Enterprise Backup from Oracle Corp can also do this.

    5. On the slave, seed the database with your dump created above.
    6. $ mysql < full_slave_dump.mysql
    7. Now point your slave to the original slave.
    8. mysql> change master to master_user='rep', master_password='rep', master_host='192.168.0.1', master_log_file='server-bin-log.000004', master_log_pos=399;mysql> start slave;

      mysql> show slave statusG;

  12. Slave master is set as an IP address. Is there another way?
  13. It’s possible to use hostnames in MySQL replication, however it’s not recommended. Why? Because of the wacky world of DNS. Suffice it to say MySQL has to do a lot of work to resolve those names into IP addresses. A hickup in DNS can interrupt all MySQL services potentially as sessions will fail to authenticate. To avoid this problem do two things:

    1. Set this parameter in my.cnf
    2. skip_name_resolve = true
    3. Remove entries in mysql.user table where hostname is not an IP address. Those entries will be invalid for authentication after setting the above parameter.
  14. Doesn’t RDS take care of all of this for me?
  15. RDS is Amazon’s Relational Database Service which is built on MySQL. Amazon’s RDS solution presents MySQL as a service which brings certain benefits to administrators and startups:

    • Simpler administration. Nuts and bolts are handled for you.
    • Push-button replication. No more struggling with the nuances and issues of MySQL’s replication management.
    • Simplicity of administration of course has it’s downsides. Depending on your environment, these may or may not be dealbreakers.

    • No access to the slow query log.
    • This is huge. The single best tool for troubleshooting slow database response is this log file. Queries are a large part of keeping a relational database server healthy and happy, and without this facility, you are severely limited.

    • Locked in downtime window
    • When you signup for RDS, you must define a thirty minute maintenance window. This is a weekly window during which your instance *COULD* be unavailable. When you host yourself, you may not require as much downtime at all, especially if you’re using master-master mysql and zero-downtime configuration.

    • Can’t use Percona Server to host your MySQL data.
    • You won’t be able to do this in RDS. Percona server is a high performance distribution of MySQL which typically rolls in serious performance tweaks and updates before they make it to community addition. Well worth the effort to consider it.

    • No access to filesystem, server metrics & command line.
    • Again for troubleshooting problems, these are crucial. Gathering data about what’s really happening on the server is how you begin to diagnose and troubleshoot a server stall or pileup.

    • You are beholden to Amazon’s support services if things go awry.
    • That’s because you won’t have access to the raw iron to diagnose and troubleshoot things yourself. Want to call in an outside consultant to help you debug or troubleshoot? You’ll have your hands tied without access to the underlying server.

    • You can’t replicate to a non-RDS database.
    • Have your own datacenter connected to Amazon via VPC? Want to replication to a cloud server? RDS won’t fit the bill. You’ll have to roll your own – as we’ve described above. And if you want to replicate to an alternate cloud provider, again RDS won’t work for you.

Top MySQL DBA interview questions (Part 2)

Also find Sean Hull’s ramblings on twitter @hullsean.

Continuing from our Top MySQL DBA interview questions (Part 1) here are five more questions that test a MySQL DBA’s knowledge, with two that will help suss out some personality traits.

  1. Disk I/O
  2. Disk performance should be an ever present concern to a DBA.  So although they don’t need to be a storage specialist, they should have a working knowledge.  Ask them about RAID versions, mirroring versus striping, and so forth.  Mirroring combines two disks as a unit.  Every write is duplicated on both disks.  If you lose one disk, you have an immediate copy.  Like a tandem truck that has spare tires running in parallel.  Lose one, and you don’t have to pull over immediately to replace it.  Striping spreads I/O over multiple disks so you on the one hand increase throughput linearly as you add disks.  That’s because you have more disks working for you.  At the same time you increase risk with each new disk you add, because the failure rate is then the sum total of all those disks.

    For relational databases the best RAID level is 10, which is striping over mirrored sets.  You use more disks, but disks are cheap compared to the hassle of any outage.

    If you’re deploying on Amazon, your candidate should be familiar with the Elastic Block Storage offering also known as EBS.  This is virtualized storage, so it introduces a whole world of operational flexibility.  No longer do you have to jump through hoops to attach, add or reconfigure storage on your servers.  It can all be done through command line API calls.  That said EBS suffers from variability problems as with any other shared resource.  Although Amazon guarantees your average throughput, the I/O you get at a given time can swing wildly from low to high.  Consider Linux software RAID across multiple EBS volumes to mitigate against this.

  3. How would you setup master/slave & master/master replication?
  4. A basic replication setup involves creating a full dump of the primary database, while it’s tables are locked.  The DBA should capture the master status, logfile & position at that time.  She should then copy the dump file to the secondary machine & import the full dump.  Finally the CHANGE MASTER TO statement should be run to point this database instance to it’s master.  Lastly START SLAVE should be issued.  If all goes well SHOW SLAVE STATUS should show YES for both of these status variables:

    Slave_IO_Running: Yes
    Slave_SQL_Running: Yes

    Master-master replication is similar, except one additional step.  After the above steps have run, you know that your application is not pointing at the slave database.  If you’re not sure, verify that fact first.  Now determine the logfile name & position on the slave with SHOW MASTER STATUS.  Return to the primary box, and run the CHANGE MASTER TO command to make it slave from the secondary box.  You’ve essentially asked MySQL to create a circular loop of replication.

    How does MySQL avoid getting into an infinite loop in this scenario?  The server_id variable must be set, and be unique for all MySQL instances in your replication topology.

    For extra credit, ask the candidate about replication integrity checking.  As important as this piece is to a solid reliable replication setup, many folks in the MySQL world are not aware of the necessity.  Though replication can be setup, and running properly, that does not mean it will keep your data clean and perfect.  Due to the nature of statement based replication, and non-deterministic functions and/or non-transactional tables, statements can make their way into the binary logs, without completing.  What this means is they may then complete on the slave, resulting in a different row set on the same table in master & slave instance.

    Percona’s pt-table-checksum is the preventative tool to use.  It can build checksums of all your tables, and then propagate those checksums through replication to the slave.  An additional check can then be run on the slave side to confirm consistency, or show which rows & data are different.

  5. How are Users & Grants different in MySQL than other DBs?
  6. Creating a grant in MySQL can effectively create the user as well.  MySQL users are implemented in a very rudimentary fashion.  The biggest misunderstanding in this area surrounds the idea of a user.  In most databases a username is unique by itself.  In MySQL it is the *combination* of user & hostname that must be unique.  So for example if I create user sean@localhost, sean@server2 and sean@server3, they are actually three distinct users, which can have distinct passwords, and privileges.  It can be very confusing that sean logging in from the local command line has different privileges or password than sean logging in from server2 and server3.  So that’s an important point.

  7. How might you hack a MySQL server?
  8. This is a good opportunity for the candidate to show some creativity with respect to operations and Linux servers.  There are all sorts of ways into a database server:

    a.bad, weak or unset passwords
    b.files with incorrect permissions – modifying or deleting filesystem files can take a database down or corrupt data
    c.intercepting packets – could reveal unencrypted data inside the database
    d.unpatched software – bugs often reveal vulnerabilities that allow unauthorized entry
    e.moving, disabling or interrupting the backup scripts – a possible timebomb until you need to restore
    f.DNS spoofing, could allow login as a different user
    g.generous permissions – may allow an unprivileged user access to protected data

    There are endless possibilities here.  Listening for creative thinking here, reveals how much that person will think thoroughly and effectively about protecting your systems from those same threats.

  9. Brain teasers, riddles and coding problems
  10. Google for a long time was a fan of these types of tests at interviews, but I’m not at all.  For one thing you filter for good test takers, and for another the candidate has no resources – either books or the internet at their disposal.

    Why not instead ask them to tell a story.  Storytelling conveys a lot of things.  It conveys a bit of teaching ability, which extends far beyond internalizing some multiple choice questions.  It tells you more about their personality, which as I’ve said is very important.  It shows how they solve problems, as they’ll take you through their process.  And gives them an opportunity to tell you a real world triumph they presided over.

Personality Questions

In my experience, some of the most important traits of a new hire center around personality traits, and how they might mix with your existing team.  Being punctual for an interview, for instance sets a precedent for many things.  But that door swings both ways, so if you want to hire these types of folks, don’t keep them waiting either!

Pay attention to whether or not the candidate takes some lead in the conversation at all.  This can indicate the person is a self starter.  Obviously a great candidate will also listen carefully and patiently to what you have to say, but may then take the ball and run with it somewhat.

Listen for signals that the person is active in the field, posting on forums, and attending conferences, meetups and forums on technology topics.  You might also ask them if they blog, and what topics interest them.

Top MySQL DBA interview questions (Part 1)

MySQL DBA interview questions

Also find Sean Hull’s ramblings on twitter @hullsean.

MySQL DBAs are in greater demand now than they’ve ever been. While some firms are losing the fight for talent, promising startups with a progressive bent are getting first dibs with the best applicants. Whatever the case, interviewing for a MySQL DBA is a skill in itself so I thought I’d share a guide of top MySQL DBA interview questions to help with your screening process.
It’s long and detailed with some background to give context so I will be publishing this in two parts.

The history of the DBA as a career

In the Oracle world of enterprise applications, the DBA has long been a strong career path. Companies building their sales staff required Peoplesoft or SAP, and those deploying the financial applications or e-business suite needed operations teams to manage those systems. At the heart of that operations team were database administrators or DBAs, a catchall title that included the responsibility of guarding your businesses crown jewels. Security of those data assets, backups, management and performance were all entrusted to the DBA.

In the world of web applications, things have evolved a bit differently. Many a startup are driven only by developers. In those smaller shops, Operations tasks are designated to one developer who takes on the additional responsibility of managing systems. In that scenario, Operations or DBA duties becomes a sort of secondary role to the primary one of building the application. Even in cases where the startup creates a specific operations role with one person managing systems administration, chances are they don’t also have DBA experience. Instead, these startups are more likely to manage the database as a typical Linux application.

When I grow up I (don’t) want to be a MySQL DBA

Where do they come from, and why don’t a lot of computer science folks gravitate towards operations, and DBA? This may be in part due to and the romance of certain job roles which we discussed in a past article, The Mythical MySQL DBA. This pattern appeared a lot in the Oracle world as well. Many folks who were career DBAs actually moved to that role from the business side. In fact you’d find that many didn’t have a computer science or engineering background in the first place. In my experience I saw many Linux and Unix administrators with a stronger foundation who would fit into the DBA role but were simply not interested in it. The same can be said of the MySQL side of the house. Computer science grads don’t get out of school aiming for a career in ops or as a DBA because it has never been regarded as the pinnacle. It’s typically the PROGRAMMERS who become the rockstars in a cool startup.

But as the Internet grows into a richer and more complex medium, things are changing. People talk about scalability, high availability, zero downtime and performance tuning. When brief outages cost millions in losses expectations are very high and that requires skilled, experienced DBAs.

We’ve made a list of comprised of skill questions, general questions and ‘good-to-know’ questions. Have fun grilling your candidate with them, although bear in mind that with interviews it’s not about knowing it all, rather how the person demonstrates critical thinking skills.

Skills Questions

  1. Why are SQL queries so fundamental to database performance?
  2. This is the one question which a DBA should have an answer to. If they can’t answer this question, they’re unlikely to be a good candidate.

    After a MySQL server is setup and running, with many of the switches and dials set to use memory, and play well with other services on the Linux server, queries remain an everyday challenge. Why is this?

    SQL queries are like little programs in and of themselves. They ask the database server to collect selections of records, cross tabulate them with other collections of records, then sort them, and slice and dice them. All of this requires MySQL to build temporary tables, perform resource intensive sorts and then organize the output in nice bite size chunks.

    Unfortunately there are many ways to get the syntax and the results right, yet not do so efficiently. This might sound like a moot point, but with modern websites you may have 5000 concurrent users on your site, each hitting pages that have multiple queries inside them.

    What makes this an ongoing challenge is that websites are typically a moving target, with business requirements pushing new code changes all the time. New code means new queries, which pose ongoing risks to application stability.

  3. Indexes – too many, too few; what’s the difference?
  4. Indexes are very important to the smooth functioning of a relational database. Imagine your telephone book of yore. I can look up all the people with last name of “Hull” in Manhattan because I have the proper index. But most yellow pages don’t include an index for *first* names even though they might occaisionally come in handy, for example with the names “Star” or “Persephone”.

    You can imagine that, if you had a phone book which you maintain and update, everytime you add or remove a name you also have to update the index. That’s right, and the same goes for your relational database.

    So therein lies the trade off, and it’s an important one. When you are *modifying* your data, adding, updating or removing records, you must do work to keep the index up to date. More indexes mean more work. However when you’re looking up data or *querying* in SQL speak, more indexes mean more ways of looking up data fast. One more trade off is that indexes take up more pages in your phonebook, and so too they take up more space on disk.

  5. Backup & Recovery – explain various types & scenarios for restore
  6. Backups come in a few different flavors that the DBA should be familiar with.

    Cold backups involve shutdown down the database server (mysqld) and then backing up all the data files by making a copy of them to another directory. To be really thorough, the entire datadir including binlogs, log files, /etc/my.cnf config file should also be backed up. The cold backup is a database in itself, and can be copied to an alternate server and mounted as-is.

    Logical backups involve using the mysqldump tool. This locks tables while it runs to maintain consistency of changing data, and can cause downtime. The resulting dump file contains CREATE DATABASE, CREATE TABLE & CREATE INDEX statements to rebuild the database. Note the file itself is not a database, but rather a set of instructions which can tell a MySQL server *HOW* to reconstruct the database. Important distinction here.

    Hot backups are a great addition to the mix as they allow the physical database data files to be backed up *WHILE* the server is up and running. In MySQL this can be achieved with the xtrabackup tool, available from Percona. Despite the name, it works very well with MyISAM and InnoDB tables too, so don’t worry if you’re not using xtradb tables.

    There are a few different restore scenarios, and the candidate should be able to describe how these various backups can be restored, and what the steps to do so would be. In addition they should understand what point-in-time recovery is, and how to perform that as well. After restoring one of the above three backup types, the DBA would use the mysqlbinlog utility to apply any subsequent transactions from the binary logs. So if the backup was made at 2am last night, and you restore that backup, the mysqlbinlog tool would be used to dig up transactions since 2am, and apply them to that restored database.

  7. Troubleshooting Performance
  8. Since this is an ongoing challenge with relational databases, a good grasp of it is crucial. One way to challenge the candidate would be to describe a recent performance problem you experienced with your infrastructure, and ask them how they would go about resolving it.

    If they struggle with the particulars of what you ran into, ask them to describe a big performance challenge they solved, what the cause was, and how they performed analysis.

    Typically, first steps involve mitigating the immediate problem by finding out what changed in the environment either operationally or code changes. If there is a bug that was hit, or other strange performance anomaly, the first stop is usually looking at log files. MySQL server error logs, and the slow query log are key files. From there, analyzing those files during the timeframe where problems occurred should yield some clues.

    You might also hope to hear some comment about metrics collection in this discussion. Tools such as cacti, munin, opennms, or ganglia are invaluable tools for drilling down on a past event or outage, and sifting through server stats to find trouble.

  9. Joins – describe a few kinds and how the server performs them
  10. A basic understanding of INNER JOIN and OUTER JOIN would be a great start. A simple example might be employees and departments. If you have four employees and two departments, an INNER JOIN of these tables together will give you the departments employees belong to. Add another employee without assigning her to a department, and the inner join won’t display her. Further adding a new department which doesn’t yet contain employees won’t display either. However performing an OUTER JOIN will give you those matches with null in the department field, and null in the employee field respectively.

    Thought of with another example, take a credit card company. One tables contains cardholders identity, their number, address, and other personal information. A second table contains their account activity. When they first join, they don’t have any monthly statements, so an INNER JOIN of cardholders with statements will yield no rows. However an OUTER JOIN on those two tables will yield a record, with a null for the statements columns.

Feeling like a MySQL expert yet? In Part 2 of Top MySQL DBA Interview Questions we’ll walkthrough four more questions plus a bonus.

Deploying MySQL on Amazon EC2 – 8 Best Practices

Also find Sean Hull’s ramblings on twitter @hullsean.

There are a lot of considerations for deploying MySQL in the Cloud.  Some concepts and details won’t be obvious to DBAs used to deploying on traditional servers.  Here are eight best practices which will certainly set you off on the right foot.

This article is part of a multi-part series Intro to EC2 Cloud Deployments.

1. Replication

Master-Slave replication is easy to setup, and provides a hot online copy of your data.  One or more slaves can also be used for scaling your database tier horizontally.

Master-Master active/passive replication can also be used to bring higher uptime, and allow some operations such as ALTER statements and database upgrades to be done online with no downtime.  The secondary master can be used for offloading read queries, and additional slaves can also be added as in the master-slave configuration.

Caution: MySQL’s replication can drift silently out of sync with the master. If you’re using statement based replication with MySQL, be sure to perform integrity checking to make your setup run smoothly. Here’s our guide to bulletproofing MySQL replication.

2. Security

You’ll want to create an AWS security group for databases which opens port 3306, but don’t allow access to the internet at large.  Only to your AWS defined webserver security group.  You may also decide to use a single box and security group which allows port 22 (ssh) from the internet at large.  All ssh connections will then come in through that box, and internal security groups (database & webserver groups) should only allow port 22 connections from that security group.

When you setup replication, you’ll be creating users and granting privileges.  You’ll need to grant to the wildcard ‘%’ hostname designation as your internal and external IPs will change each time a server dies. This is safe since you expose your database server port 3306 only to other AWS security groups, and no internet hosts.

You may also decide to use an encrypted filesystem for your database mount point, your database backups, and/or your entire filesystem.  Be particularly careful of your most sensitive data.  If compliance requirements dictate, choose to store very sensitive data outside of the cloud and secure network connections to incorporate it into application pages.

Be particularly careful of your AWS logins.  The password recovery mechanism in Amazon Web Services is all that prevents an attacker from controlling your entire infrastructure, after all.

3. Backups

There are a few ways to backup a MySQL database.  By far the easiest way in EC2 is using the AWS snapshot mechanism for EBS volumes.  Keep in mind you’ll want to encrypt these snapshots as S3 may not be as secure as you might like.   Although you’ll need to lock your MySQL tables during the snapshot, it will typically only take a few seconds before you can release the database locks.

Now snapshots are great, but they can only be used within the AWS environment, so it also behooves you to be performing additional backups, and moving them offsite either to another cloud provider or to your own internal servers.  For this your choices are logical backups or hotbackups.

mysqldump can perform logical backups for you.  These backups perform SELECT * on every table in your database, so they can take quite some time, and really destroy the warm blocks in your InnoDB buffer cache.   What’s more rebuilding a database from a dump can take quite some time.  All these factors should be considered before deciding a dump is the best option for you.

xtrabackup is a great open source tool available from Percona.  It can perform hotbackups of all MySQL tables including MyISAM, InnoDB and XtraDB if you use them.  This means the database will be online, not locking tables, with smarter less destructive hits to your buffer cache and database server as a whole.  The hotbackup will build a complete copy of your datadir, so bringing up the server from a backup involves setting the datadir in your my.cnf file and starting.

We wrote a handy guide to using hotbackups to setup replication.

4. Disk I/O

Obviously Disk I/O is of paramount performance for any database server including MySQL.  In AWS you do not want to use instance store storage at all.  Be sure your AMI is built on EBS, and further, use a separate EBS mount point for the database datadir.

An even better configuration than the above, but slightly more complex to configure is a software raid stripe of a number of EBS volumes.  Linux’s software raid will create an md0 device file which you will then create a filesystem on top of – use xfs.  Keep in mind that this arrangement will require some care during snapshotting, but can still work well.  The performance gains are well worth it!

5. Network & IPs

When configuring Master & Slave replication, be sure to use the internal or private IPs and internal domain names so as not to incur additional network charges.  The same goes for your webservers which will point to your master database, and one or more slaves for read queries.

6. Availability Zones

Amazon Web Services provides a tremendous leap in options for high availability.  Take advantage of availability zones by putting one or more of your slaves in a separate zone where possible.  Interestingly if you ensure the use of internal or private IP addresses and names, you will not incur additional network charges to servers in other availability zones.

7. Disaster Recovery

EC2 servers are out of the gates *NOT* as reliable as traditional servers.  This should send shivers down your spine if you’re trying to treat AWS like a traditional hosted environment.  You shouldn’t.  It should force you to get serious about disaster recovery.  Build bulletproof scripts to spinup your servers from custom built AMIs and test them.  Finally you’re taking disaster recovery as seriously as you always wanted to.   Take advantage of Availability Zones as well, and various different scenarios.

8. Vertical and Horizontal Scaling

Interestingly vertical scaling can be done quite easily in EC2.  If you start with a 64bit AMI, you can stop such a server, without losing the root EBS mount.  From there you can then start a new larger instance in EC2 and use that existing EBS root volume and voila you’ve VERTICALLY scaled your server in place.  This is quite a powerful feature at the system administrators disposal.  Devops has never been smarter!  You can do the same to scale *DOWN* if you are no longer using all the power you thought you’d need.  Combine this phenomenal AWS feature with MySQL master-master active/passive configuration, and you can scale vertically with ZERO downtime.  Powerful indeed.

We wrote an EC2 Autoscaling Guide for MySQL that you should review.

Along with vertical scaling, you’ll also want the ability to scale out, that is add more servers to the mix as required, and scale back when your needs reduce.  Build in smarts in your application so you can point SELECT queries to read-only slaves.  Many web applications exhibit the bulk of there work in SELECTs so being able to scale those horizontally is very powerful and compelling.  By baking this logic into the application you also allow the application to check for slave lag.  If your slave is lagging slightly behind the master you can see stale data, or missing data.  In those cases your application can choose to go to the master to get the freshest data.

What about RDS?

Wondering whether RDS is right for you? It may be. We wrote a comprehensive guide to evaluating RDS over MySQL.

If you read this far, you should grab our newsletter!

Migrating MySQL to Oracle

This article is from 2006.  MySQL has come a long way since then. MySQL 5.5 is very robust and feature rich, matching Oracle in many different areas including datatypes, stored procedures and functions, high availability solutions, ACID compliance and MVCC, hotbackups, cold backups and dumps, full text and other index options, materialized views and much more.  Here’s a high level mysql feature guide.

What really separates the two technologies is cultural.  MySQL, rooted in the open source tradition is much more do-it-yourself, leaning towards roll your own solutions in many cases. Meanwhile Oracle provides named and proven paths to solve specific problems.

You might also check out: Migrating MySQL to Oracle Guide which is a larger multi-part series & work in progress.

For some basics What is a Database Migration?

Lastly these may be helpful – Migration to MySQL – Limitations, Anomalies, Considerations & Strengths and also Oracle to MySQL Migration Considerations

INTRODUCTION

MySQL is a powerful database platform for many simple web-based applications. Some of it’s power and speed comes from it’s simplicity. MySQL is actually a database without proper transactions. What this means in terms of speed is dramatic. However it also means you cannot rollback an update which encounters problems halfway through, and other sessions in the database will immediately see changes. There are many dramatic ramifications of this, as we’ll discuss later. Lastly there are limitations on dataset size. Oracle can obviously handle tables of a terabyte and larger. However since MySQL implements a table as one file, filesize limits as well as internal data organization, and indexing can become major limitations as tables grow to the millions of rows and higher.

When you begin to hit these limitations, whether in your application complexity, or the size of your data, you may consider moving your data and application to the Oracle Database. There you will have a rich set of features both on the programming side with stored procedures, views, materialized views, triggers, and so on. You will also have support for tables and indexes of virtually limitless size, full transaction support, and even sophisticated High Availability features such as Advanced Replication, Data Guard, and even clustering with Oracle’s Real Application Clusters technology.

With all these enticing features, and robustness, you’re eager to jump into Oracle. But don’t move so fast. There is a temendous amount of planning involved with both moving your data, and porting and testing your application on the new platform. Add to that Oracle licensing, and you’ll need some time to get there.

MySQL vs Oracle – feature comparisons

MySQL is a database fit for some types of applications. These tend to be smaller applications, or those which integrate applications with less sophisticated needs than those running Oracle on the backend.

It makes sense at this point to go through a feature comparison, and see what features MySQL shares with Oracle. Here’s a more in depth feature comparison of MySQL and Oracle.

MySQL shares with Oracle good support for database access including ODBC and JDBC drivers, as well as access libraries for Perl, Python and PHP. MySQL and Oracle both support binary large objects, character, numeric, and date datatypes. They both have primary and unique keys, and as of 4.x with InnoDB, MySQL has foreign keys, and transactions including READ UNCOMMITED, READ COMMITED, REPEATABLE READ, and SERIALIZABLE. Both databases have sophisticated language and character set support. MySQL can do table locking, and recently improved to include row-level locking. What’s more if you don’t need transactions, MyISAM tables are extremely fast. MySQL also includes a good data dump utility which you’ll see in action below when we migrate to Oracle. And lastly both databases of course include good b-tree indexes, which no database could be without.

There are, however, quite a number of features we find in Oracle as standard, which remain missing in MySQL. Until recently that included row-level locking, true transactions, and subqueries although as of 4.x those seem to be present. However, those have been present, and core technologies in Oracle for years, with very stable and solid implementation, you’re sure to achieve dramatic performance on tpc benchmarks. Views are still absent in MySQL, though they may be around the corner with subqueries available now.

Of course a lot of the high end Oracle features remain completely absent from MySQL, and may never be added. Features such as parallel query, and partitioned tables, which include a whole host of special features such as the ability to take one partition offline without impacting queries on the rest of the table. The indexing on partition tables is sophisticated too, allowing partition elimination, and range scans on indexes of specific partitions. There are other large database features such as special functions for star queries. Oracle has terabyte databases in production, so this fact speaks for itself.

MySQL still has a somewhat limited set of index types. For instance Oracle has reverse key, bitmap, and function based indexes, as well as index organized tables. These are all very powerful features for developers who are trying squeeze that last bit of performance out of complex SQL queries against large tables. Although MySQL does provide some index statistic collection, Oracle provides the full set of statistics, including histograms, and makes great use of it inside the Cost Based Optimizer. These statistics allow Oracle to better determine the best method of getting the data for your query and putting it together for you with the least use of resources in tems of memory cache, and disk access. This is really key for a database. When running large multi-user applications all of which are vying for data in different tables, you want to load just the data you need, and nothing more. Avoiding full table scans by using the proper index, and using various indexes in a join operation to pull fewer rows from the underlying tables means less disk I/O, which other processes are fighting for, and less use of cache, leaving more for other processes.

MySQL still does not have privilege groups, called ROLES in Oracle.

Oracle can also provide column level privilege control, called virtual private database and although we don’t see it used a lot in Oracle deployments, MySQL lacks this feature as well.

MySQL does not have hotbackups which have been an integral part of Oracle for years. (There are hotbackups now – 2012 – in MySQL here’s a howto on rebuilding replication using hotbackups guide) In addition, Oracle’s RMAN has become a sophisticated piece of software, and grown to be very stable, providing block level backups so only the data that changed can be included in subsequent backups. This makes nightly backups smaller overall. It also aids tremendously during recovery, providing a lot of automation, and assistence, during those times when you need it most. MySQL’s method is to dump data, and further if you want to guarentee a point in time dump of your data, you have to lock all the tables in the database, potentially slowing down your application tremendously. Lastly MySQL does not have automatic or point in time recovery, a real limitation on heavy use production databases.

MySQL also has some limitations on row sizes. MyISAM tables for instance, can have a maximum 64k of data per row, and InnoDB tables 8k per row. This does not include BLOB and TEXT types.

MySQL does not include database links such as those found in Oracle allowing you to query a table from an external database inside a query in a local database. There is the federated storage engine, but reports are that it’s rather buggy. DB Links can be useful for moving data between databases, and is key to implementing advanced replication in Oracle.

Lastly, MySQL has had some database size limitations. In 3.22 it could only access 4GB of data, however, as of 3.23 the only limitation has been with your operating system, and the size of files it can handle. On Linux with LFS or RaiserFS, this limitation is effectively eliminated. However, Oracle still has incredibly sophisticated storage cababilities, allowing virtually unlimited datafiles to be associated with a tablespace, and thus virtually limitless sized tables.

Updated note: In 5.5 and newer versions of MySQL there are no database size limitations. Also with Innodb you can use global tablespaces or one tablespace per table depending on your configuration. With most databases sitting on RAID or SAN these days, you’re getting pretty much the same deal with both MySQL & Oracle storage-wise.

Migration preparation

So you’ve seen what you can do with Oracle, and management has invested in licensing, and you’re now ready to get things setup in your development environment.

Now is the time to really get up to speed with Oracle. This goes for both Database Administration knowledge, as well as developer and programmer knowledge. The latter requires that you know a lot about Oracle’s features, in particular those which are relevant to your application. The former requires you understanding DBA roles, managing database files, memory structures, backups, and so on and so forth.

Thomas Kyte’s books are really excellent, and highly recommended. Check out “Expert One on One” on Wrox Press, and his newer title, “Effective Oracle by Design” which is on Oracle Press. He also has a website, http://asktom.oracle.com.

Also check out Kevin Loney + Marlene Therault’s DBA Handbook on Oracle Press. Of course don’t forget to read the Oracle docs, which are quite good. Start with the concepts manual for the version of Oracle you plan to go with.

In planning a migration the first thing you should do is take a look at the number, and types of tables in your database. Do that in MySQL as follows:

SQL> show table status

+——+——–+———+————+

| Name | Engine | Version | Row_format |

+——+——–+———+————+

| t | InnoDB | 9 | Fixed |

| u | MyISAM | 9 | Fixed |

+——+——–+———+————+

2 rows in set (0.05 sec)

This output is truncated, but serves as a useful example. You will see the tables, types, and a lot of other information about the tables you will be moving.

You’ll next want to review the datatypes of your various tables. CHAR in MySQL maps to CHAR in Oracle, VARCHAR to VARCHAR2, and the various Large Object types to RAW or BLOB in Oracle. DATE, DATETIME, and TIME map to Oracle’s DATE datatype, while TIMESTAMP and YEAR map to NUMBER. Lastly all of the various INT datatypes in MySQL map to NUMBER in Oracle and FLOAT, DOUBLE, REAL, and DECIMAL all map to FLOAT.

To get information about the you can use the ‘describe’ SQL command much like Oracle’s own describe:

mysql> describe t;

+-------+----------+------+-----+---------+----------------+

| Field | Type     | Null | Key | Default | Extra          |

+-------+----------+------+-----+---------+----------------+

| id    | int(11)  |      | PRI | NULL    | auto_increment |

| s     | char(60) | YES  |     | NULL    |                |

+-------+----------+------+-----+---------+----------------+

2 rows in set (0.01 sec)

Another way to get useful descriptions of tables is to use the mysqldump command. Here ‘test’ is the name of the database, and ‘t’ is the name of the table. Leave the table name off and the output will include all the tables in that database.

$ mysqldump test t

--

-- Table structure for table `t`

--

DROP TABLE IF EXISTS `t`;

CREATE TABLE `t` (

  `id` int(11) NOT NULL auto_increment,

  `s` char(60) default NULL,

  PRIMARY KEY  (`id`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

There’s actually quite a bit more output, and depending on the version of MySQL you may see additional comment lines etc. You’ll probably want to redirect the output to a file. Do so as follows:

$ mysqldump test t > t_table.mysql

$ mysqldump test > test_db.mysql

You will also want to get a sense of which columns are indexed to be sure that they get indexed in your Oracle database. Here is an example of how to list the indexes on a table:

mysql> show index from t;

+-------+------------+----------+--------------+-------------+

| Table | Non_unique | Key_name | Seq_in_index | Column_name |

+-------+------------+----------+--------------+-------------+

| t     |          0 | PRIMARY  |            1 | id          |

+-------+------------+----------+--------------+-------------+

1 row in set (0.04 sec)

An enumerated datatype is one where you define a static set of values up front. Oracle does not currently have such a datatype. The closest equivalent is a VARCHAR2 of sufficient size to hold all the various types. Migration Workbench will

do just that. If you want Oracle to enforce your set of values add a check constraint on that column.

Lastly, if you’re experiencing serious performance problems in MySQL, take a look at the slow query log. MySQL can be configured to dump queries which do not execute within a certain number of seconds, to a log file for your review. You can then use the EXPLAIN facility, a much simplified version of the facility found in Oracle, to tune queries for better execution path, possibly requiring a new index on a certain column or an index rebuild. In many instances restructuring a query can be of substantial

benefit.

What’s more many of these skills of tuning and optimizing queries will translate directly to Oracle. Queries are the lifeblood of your application. Bad ones can kill overall database performance by filling the cache with useless blocks of data and pushing out previously cached, and important data. What’s more inefficient queries cause more disk I/O which impacts the overall performance of your database. These issues hold true for all databases, and getting proficient with them now will

bring you up to speed faster with Oracle as you migrate.

Moving your data between MySQL and Oracle

At this point we’re still presuming that you will be moving your data by hand. This is not because we are gluttons for punishment. It is more to illustrate important points. Doing things by hand goes over each and every detail of the process so you understand it. You’ll need to when things go wrong, as they inevitably will. So we’re discussing moving the schema, and then the data by hand for all tables, however you may end up following the instructions below for using the Oracle Migration Workbench, and then only doing one or two special tables by hand. Or you may decide to use Migration Workbench to build scripts for you as a starting point, and then agressively edit those for your particular needs. All these options are valid.

So at this point you need to dump data. If you want to load data with Oracle’s SQL*Loader, an easy format to start with is CSV or Comma Separated Values file.

To dump one table named ‘t’ from database named ‘test’ use this bit of code. Note that we’ve broken things up into multiple lines to easily illustrate what’s happening with all those messy SED commands. You’re welcome to modify them for your needs but this works as-is. Note that ^V requires you to type ctrl-V and requires you to type ^I ctrl-I. Read your editor manual for details on inserting control characters into a file.

#!/bin/bash

# 1. get all rows from table t of database test

# 2. add double quote at beginning

# 3. replace tabs with ","

# 4. add double quote at end

# 5. remove header line

echo 'select * from t;' | mysql test 

 | sed -e 's/^/"/' 

 | sed -e 's/^V^I/","/' 

 | sed -e 's/$/"/' 

 | tail -n +2

Now is your chance to really put all those Oracle skills to work. You want to have CREATE TABLE statements to build each table, and scripts are an excellent way to get you going. They also self-document the process.

Here is an example of how to precreate the above very simple table in Oracle. Edit a file called t.sql and include these lines:

create table T (

  id   NUMBER(10,0) primary key,

  s    CHAR(60));

Save the file, and then fire up sqlplus and do:

SQL> @t.sql

Table created.

SQL> desc t;

 Name                                 Null?    Type

 ------------------------------------ -------- -----------------------

 ID                                        NOT NULL NUMBER(10)

 S                                                  CHAR(60)

Now use SQL*Loader to load the CSV data you created earlier. To do that you’ll need to create a control file. This tells SQL*Loader exactly what to do. Edit a file t.ctl and put this in the file:

LOAD DATA

REPLACE

INTO TABLE t

FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY """

TRAILING NULLCOLS

(id INTEGER EXTERNAL, s CHAR)

Once you’re done, save the file, and execute the following:

$ sqlldr your_username control=t.ctl data=t.dat log=t.log bad=t.bad

This should load your data into the table t that you created earlier. Check the log and bad files for details, and errors.

As with the Oracle Migration documentation, and any good documentation really, we’ll emphasize and reemphasize the need and importance of testing. You cannot assume that your script does things right, nor can you assume that the script Oracle’s Migration Workbench created will do things right. There are an infinite number of anomalies for any dataset, and testing your application is the only way you can verify you are in good shape.

What’s more you also need to verify that your data is correct. Suppose you have a banking application, and you are moving customer data from MySQL to Oracle. Suppose further you have records of monthly deposits and withdrawls against that account. You move the data from MySQL into Oracle, and the web or client based frontend is up and running again, after extensive porting, and testing. Does this guarentee that all the data is correct? Not at all. It means the right fields are in the right places, and probably the datatypes are correct. Imagine that that customer had a very high balance, and when moving to Oracle the field size was too small, and perhaps when the data was loaded, the inserted value failed, or was set to a default value like 0. Obviously you don’t want to find out when that customer comes calling. You want to look through the log files when the data is loaded, and then run some verification software against the db to compare values in the old db and the new db, or to calculate checksums such as running through deposits and withdrawls to make sure the current balances check out. This is really the most important step in the process and can’t be overstated.

Migration Workbench is Oracle’s recommended solution

Oracle’s Migration Workbench is a Java-based GUI tool which runs on various Windows versions, and comes complete with wizards to help you perform your migration. The process is divided into three steps.

First you capture your target database. This is done with a series of plugins which support various databases including of course MySQL. One plugin is available to handle 3.22 and 3.23 of MySQL and another one handles 4.x versions of MySQL. Capturing the source database is the same as the process we describe above manually of looking at your tables in mysql, and the columns, and indexes you are using. This is practical and feasible for a small number of tables, however, with hundreds or even thousands of tables, Oracle Migration Workbench becomes more and more of a

necessity.

Second, you migrate the source model to Oracle. This is the process where the Migration Workbench precreates all tables found in the source database, including columns of equivalent datatypes. We describe mappings of MySQL to Oracle datatypes above. Note that Oracle does not have ENUM or an enumerated datatype per se, but it can still migrate this data, and does so to VARCHAR2 in Oracle.

The third and last step is the review the scripts that the Migration Workbench has created, make any changes or modifications and then run them to move your data from your source MySQL database into your new Oracle database.

One thing that is important to remember about a migration is that it will take a lot more time, and end up being a lot more complicated than you expected. I liked this about the documentation. They make it clear from the beginning that planning will be a tremendous help to you in estimating time, and delivering successfully within budget. The documentation is also very thorough in it’s coverage of MySQL datatypes, and how they translate to Oracle datatypes, as described earlier in this article. And of course there is a strong emphasis on testing. The Migration Workbench provides customizable scripts which both document actions to be performed and provide a way for you to get your fingers into the works.

Keep in mind while using the Migration Workbench that it is NOT all or nothing. You can use the Migration Workbench, and then edit the scripts to leave certain tables alone, or you can migrate them all, then drop the few you want to do by hand using the methods we describe above. Ultimately a mix of the two will probably serve your needs best, as there is always some amount of manual intervention you want to perform for certain tables.

A migration between two databases is not a trivial undertaking. You have a lot of data, and an application which rely on it all being in the right format, with the right relationships. Moving to a new database, with a larger feature set, slightly different syntax, and different ways of doing things takes time and attention, but in the end you’ll be up and running on a sophisticated, scalable, world class database platform.

Oracle has a great set of resources on OTN devoted to migrating to Oracle. In particular there is the Migration Technology Center

Oracle’s Migration Workbench documentation and download page.

On the other side, here’s the MySQL 5.5 Reference Guide.

Made it this far? Grab our newsletter.