Category Archives: Oracle

Are SQL Databases Dead?

mesa verde city

I like the image of this city of Mesa Verde. It’s fascinating to see how ancient cities were built, especially as an inhabitant of one of the worlds largest cities today, New York.

I’m a long time relational database guy. I worked at scores of dot-coms in the 90′s as an old-guard Oracle DBA, and pivoted to MySQL into the new century. Would a guy like me who’s seen 20 years of relational database dominance really believe they could be dying?

There’s a lot to be excited about in this new realm of db, and some interesting bigger trends that are pushing things in a new way.

Join 15,100 others and follow Sean Hull on twitter @hullsean.

1. Growing use of ORMs

ORM probably sounds like some strange fossil archeologists just dug up in the ancient city of Mesa Verde. But they’re important. You may know them by their real-life names, Hibernate, Active Record, SQL Alchemy and Cake. There are many others. Object Relational Modelers provide a middleware between developers and the SQL of your chosen relational database. They abstract away the nitty gritty, and encapsulate it into a library.

In a way they’re like code generators. Mark Winand talks about them in SQL Performance Explained warning of the “eager fetching” problem. This is DBA speak for specifying all columns (SELECT *) or fetching all rows, when you don’t need them all. It’s inefficient in terms of asking the database to read & cache all that data, but also to send it across the network and then discard it on the webserver side. Like a lazy housekeeper the clutter & dust will grow to overwhelm you.

Martin Fowler is the author of the great book NoSQL Distilled. He tries to walk the fence in his post ORM Hate, trying to balance developers love of ORMs, and the obvious need for scalability. Ted Neward calls ORMs the Vietnam of Computer Science.

Mattias Geniar points out that BAD ORMs are infinitely worse than bad SQL and another on High Scalability by Drewsky The Case Against ORM Frameworks.

If you agree the ORM conversation is still a huge mess, you’ll be excited to know that NoSQL sidesteps it completely. They’re built out of the box to interface more like data structures, than reading rows and columns. So you eliminate the scalability problems they introduce when you go NoSQL. That makes developers happy, and pleases DBAs and techops too. Win!

Read: Why Oracle won’t kill MySQL

2. Widening field of options

NoSQL databases are not simply key value stores, though some like Memcache and Riak do fit that mold.

Mongodb offers configurable consistency & durability & the advantages of document storage, no need for an ORM here. You also have a mix of indexing options, that go a little deeper than other NoSQL solutions. A sort of middle ground solution that offers the best of both worlds.

Cassandra, a powerful db that is clustered out of the box. All nodes are writeable, and there are various ways to handle conflict resolution to suit your needs. Cassandra can grow big, and naturally takes advantage of cloud nodes. It also has a nice feature to naturally age out data, based on settings you control. No more monumental archiving jobs.

Hbase is the database part of Hadoop, based on Google’s seminal Bigtable paper.

Redis is another option with growing popularity. It’s a key-value store, but allowing more complex data in it’s buckets, such as hashes, lists, sets and sorted sets. Developers should be salivating at this one.

Also: 5 Great Things about Markus Winand’s Book SQL Performance Explained

3. Lowering bar

The old world of relational databases treat data as sacrosanct. DBAs are tasked with protecting it’s integrity & consistency. They manage backups to protect against disaster. In this world, every bit of data written is as sacred as any other, whether it’s your bank account balance, or a comment added to a facebook discussion.

But modern non-relational databases introduce the idea of eventually consistent. DBAs and architects would say we are relaxing our durability requirements. What they mean is data can get slightly out of sync and we’re ok with that. We’ll build our web applications to plan for that, or even in the case of Riak expose the levers of durability directly to the developers, allowing them to make some changes instant, while others more lax and lazy.

Check this: Why high availability is so very hard to deliver

4. Cloud demands

Virtualized environments like Amazon EC2, give easy access to legions of servers. Availability zones & regions only widen the deployment options. So deploying a single writeable master, the way traditional relational databases work best, is not natural.

Databases like Cassandra, Mongo & Redis are clustered right out of the box. They grew up in this virtual datacenter environment and feel comfortable there.

Related: Why I wrote the book on Oracle & Open Source

5. Only DBAs understand them

Devs may whine at this statement, and to be fair it’s a generalization. The popularity of ORMs speaks volumes here. Anything to eliminate the dreaded SQL writing. Meanwhile DBAs bemoan the use of ORMs for they represent everything they’re trying to fix.

SQL is hard enough, but the ugly truth is each database vendor has their own implementation, their own optimizations, their own optimal tweaks. Even between database versions, SQL code may not perform consistently.

Identifying slow SQL and tweaking it remains one of the primary tasks of performance tuning, for this reason. It hasn’t changed much in my two decades on the job.

Also: Why bemoaning AWS performance sounds like Linux detractors circa 1999

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

5 great things about Markus Winand’s book SQL Performance Explained

markus winand sql performance explained book

Join 12,100 others and follow Sean Hull on twitter @hullsean.

1. Covers databases broadly

You may not have noticed, but there’s a whole spectrum of relational databases on offer. Of course in the database world, most get infatuated with one, and that becomes their bread & butter before long. Their life, their passion, their devotion.

That’s fine as far as it goes, but Winand really stands out, offering a spectrum of ideas and optimization techniques for different platforms. If you’re an Oracle-only or MySQL-only dba you’ll gain a lot from this book but even more importantly if you work in professional services, and need to communicate with DBAs brought up on one of these platforms, it becomes like a rosetta stone for SQL query tuning.

Read: Why devops talent is in short supply

2. Shows you how to fish

I find database books and methods fall into two sort of broad categories. There’s the call Oracle support method, where you’ll be handed one very specific set of steps, commands, and a path to solve each specific problem. It’s more about memorization, it’s like they actually hand you the fish.

Then there is the investigative method, where you learn how to use a magnifying glass to look at fingerprints, and check for DNA samples, and interrogate suspects. You know learn the tools of the trade.

That’s what Markus brings you, in all it’s delicious glory.

Read: Why high availability is so very hard to deliver

3. It’s concise

Another gripe I have of technical books is that the publishing model is, this is a textbook and since the price tag is large, let’s make the physical book large! Of course no one wants to carry those books around. I even recently bought a kindle to solve this problem.

SQL Performance Explained is more a paperback book form factor, and that means you can tote it around with you easily, and keep it with you at work. Read it on the train, commuting to work.

200 pages packed cover to cover with all sorts of good chapters, including a primer on indexes & types, scalability & performance, joins, clustering, Top-N queries, DML, and more.

Read this: Why a four letter word divides dev and ops

4. It’s technical but accessible

If you’re a real rock bottom beginner, you might want to dig a bit more on your SQL syntax, and some of the basics. You could also keep a 101 book side-by-side, while you’re reading this book.

For the intermediate & advanced DBAs out there, this book will sit comfortably in your paws as you flip the pages and learn something new. For instance just today I learned that Postgres can index NULLs while MySQL, Oracle and SQL*Server cannot. Learn something new everyday.

Related: MySQL interview guide for managers and candidates alike

5. Gives you answers you can use today

After twenty years of consulting, I’ve seen a few patterns emerge. Besides the spectrum of team & communication challenges, firms hitting the performance wall often have issues with their relational databases.

Yes those databases are sometimes on the wrong hardware, or their are other obscure problems with setup or configuration. But the bulk of issues center on badly written SQL.

SQL is a much reviled language and often misunderstood. And it doesn’t seem like developers have gotten that much better at it over the years. It would explain the rise of NoSQL databases, as they often speak REST or xml, no need for pesky sequel.

One parting note. For all the devs and architects out there, who want to sing the virtues of ORMs, this book hits that squarely in the nose. By showing how differently each relational database implements SQL, performs work, and optimizes, Winand also illustrates the naivete behind trying to write database independent application code.

If you’re a developer and don’t know how to profile a query or run explain plan, don’t walk, run to your closest Amazon.com store and get this book!

Also: 5 more things deadly to scalability

Criticisms

If I were to offer two slight criticisms, it would be these. First, the index is a bit wonky. When I look under “P” for example, there’s no Postgres, while one quarter of the book is obviously devoted to that platform. Further, looking up NULL which are covered in depth, in various places in the book, only has one entry in the index, p54 on Oracle. So the index could be a bit more robust to be useful.

The other criticism is more perhaps my bias. On page 96, when he discusses ORMs I thought he was rather… shall we say gentle. Although he clearly states that “eager fetching” is problematic, I don’t think he goes far enough to condemn it. In my experience ORMs are always trouble.

Then again why am I complaining, their use keeps me forever employed.

Want a copy? Markus Winand’s book site has all the goods!.

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Cloud DBA and Management Interview

What does a cloud computing expert need to know? This is the last of a three part guide to interviewing for a cloud operations position. You can find them here – part one Operations Interview and part two Deployment Interview.

Here’s my guide to do just that.

1. Database administration experience

Although in some shops the DBA role is a completely separate one, there are many others where the Linux and Operations teams manage these services as well. We do have a some other material Oracle DBA Interview questions and MySQL DBA Interview Guide. Here’s a taste of what to expect.

o What is RAID? Which type is best?

RAID is a way to share a whole bunch of disks on one server. Databases like Oracle or MySQL do a lot of writing and reading from disk. If there are more disks sharing this work, it’s like you have more waiters in your restaurant. Faster serivce.

Although some folks still hang onto RAID 5 as an option, it’s generally a very bad one. It has a serious write penalty because of parity checking it must perform. Most databases do a lot of writing, even when user transactions are not doing INSERT or UPDATE. What’s more if a disk fails, RAID 5 although technically online, will be so slow as to be effectively unusable while the long slow rebuild happens.

What’s the answer then? RAID 10! It mirrors each volume, and then stripes across those mirrored sets. Fast I/O, fast recovery. Done & done.

o What are the tradeoffs with more indexes versus fewer?

In all relational databases, you build indexes on data. Indexes are just like the ones you think of in the yellow pages, phonebooks of yore. An index on first name means you can look up Obama by Barack as well. Index on street addresses means you can lookup on the White House. So the more indexes you have, the more different ways you can search for & fetch what you want.

On the other hand the penalty here, is that whenever you add new data & records to this database, all those indexes must be updated. That’s overhead, which slows down writes.

So the tradeoff is more indexes – faster fetching, slower writing. Fewer indexes slower fetching, faster writing.

o What do NoSQL databases eliminate? How do they achieve great speed?

There are quite a few different types of NoSQL databases. So I’m generalizing quite a lot here. One thing NoSQL databases eliminate is the ability to JOIN data across different columns. By removing this great feature of relational databases, they dramatically simplify the underlying implementation. No free lunch!

What else? Many of these databases cut corners on what’s called durability. What is durability? Imagine you are in a lecture hall and bring your notebook or are waiting tables, and taking orders. It might be quicker to do so without writing things down. You keep it all in your head. Great, but what if you forget something? You have to go ask for the order again! It may be faster, but more prone to error. Losing data is not something to be taken lightly. NoSQL databases don’t always flush data to permanent storage.

[quote]
Whether or not an web operations candidate uses command line may seem like a small issue. But it speaks to what their DNA is, and the strength of their foundation. Strength and comfort on the command line is key.
[/quote]

o What is Amazon RDS? When should I use it?

Amazon has a managed relational database solution called RDS. It’s basically MySQL, Oracle or SQL Server, but modified so you can’t shoot yourself in the foot. Administrative tasks are simplified, but so are your configuration options.

I wrote an in-depth Amazon RDS use cases article. It mostly covers MySQL, but the general rules apply to Oracle & SQL Server. At the end of the data RDS is a lot less configurable and flexible. But if you don’t have a regular DBA on staff, it will probably simplify your administration of these servers.

o What are read-replicas? What about Multi-az?

Read-replicas are read-only copies of your data. Using MySQL these are fairly stock master-slave configurations. Note since they’re the standard technology, they’re still asyncronous. So yes the read-replica can lag behind.

Multi-az is a proprietary technology, and Amazon doesn’t disclose what’s under the hood. However it’s likely running on top of something like DRBD which is a distributed filesystem. This allows the underlying disk I/O to be mirrored across the internet, and to another availability zone. You’ll enjoy syncronous copies of your data, and no data consistency problems. Keep in mind those that the alternate server is offline or cold and can take time to come online.

o What is the primary bottleneck of hosting databases in the cloud? How has Amazon recently addressed this?

As I explained above disk I/O remains the largest bottleneck for relational databases, even if the entire dataset fits in memory. Why? Because sorting, joining, and rearranging data can take orders of magnitude more memory to magically do in memory. And that’s not even talking about durability guarentees.

The cloud has traditionally lagged quite a lot behind physical servers in terms of disk I/O so some internet firms have shyed away from moving to the cloud. EBS volumes were typically limited to a few hundred IOPs.

Amazon’s recently announced Provisioned IOPs. It’s a mouthful of a name for a very big development. It means you can provision how fast you want those virtual disks to be. For individual volumes the limit seems to be 2000 IOPs but you can also software raid across many of those virtual disks. For Amazon RDS the limit is reportedly 10,000 IOPs. This new feature will make a huge difference for hosting large high I/O databases in Amazon’s cloud.

2. Architecture & Management Questions

o Why does the API battle between Amazon & Eucalyptus (FOSS) matter?

As large applications are architected to build hardware components, and resources in the cloud, the API they work through becomes key. Sticking to an open standard for this API means you can change cloud vendors and/or build on multiple ones. We talked about this multi-cloud solution as a key way to avoid outages like AirBNB and Reddit experienced when AWS had an outage.

Following on the heels of that article, we were quoted about multi-cloud by Brandon Butler in his Network World piece .

o Do you use command line tools? Why?

A good web operations candidate should be very comfortable with command line tools. Everything in Linux is command line. It’s like broadway acting to movie acting, or literature to books. It’s the original source, much more powerful, what’s more it indicates and requires much stronger theoretical knowledge of the underlying systems being managed.

o What can go wrong with backups? How do we test them?

Everything can go wrong with them. They can fail to complete. Be backups of the wrong service or resource. Even the backup software itself can have bugs. The only way to sleep well at night is if you run firedrills and restore your application and data top to bottom.

o Should we encrypt filesystems in the cloud? What are the risks?

This depends on your environment and how sensitive your data is. If you’re collecting credit card data for instance, it may be key. However some surprising blips may push other applications to encrypt as well. Bugs in the hypervisor could potentially make your data vulnerable. What’s more if the cloud provider gets subpeonaed, it may well capture your server and data into the net. Better safe than sorry. Remember you don’t know where your data actually resides, but you do control who has access if you’re encrypted.

We wrote a very in-depth piece on Deploying on Amazon EC2 where we discuss questions such as encryption in more depth.
o Should we use offsite backups?

It’s definitely worth doing this. One more layer of insurance.

o What is load balancing? Why is it difficult with databases?


Load balancing puts a digital traffic circle into your infrastructure, giving you two roads or paths to resources. However those resources have to be exactly the same. With databases you are constantly writing to tables, and updating records. When you scale those horizontally, it becomes impossible to keep track of changes.

[quote]
Relational databases are inherently difficult to scale. Most environments scale a single authoritative master vertically, and add multiple read-only slaves horizontally to allow the appplication to serve more customers.
[/quote]


o Why use a package manager? Can we install from source?

Package managers simplify the installation of software components. A team such as Redhat, Ubuntu or Debian builds a distribution, and compiles all components storing them in a repository. Installing packages this way allows your setup to be standard across servers. This allows more automation, and is simpler for another admin to figure out what you have, down the line when it passes to someone elses shoulders.

Installing from source is generally a bad idea. Although it allows you to tweak and configure each piece of software the way you want, tightly and efficiently, it also means everything is custom. No commoditization advantages.

o What is horizontal scalability?

This involves adding more hardware, more individual servers to service the same application and users.

o What is vertical scalability?

This means scaling up or growing your existing single server, so it is larger, has more memory, cpu or faster disk.

o What can go wrong with automatic failover?

Just about everything. Applications and services can stall, disks can fail, servers can hang. What’s more networks can exhibit latency. Automatic failover is ultimately a piece of software or algorithm trying to diagnose and handle situations. And it does so based on a very small list of rules or heuristics. The real world is messy, so this can often lead to false failure detection, and potentially loss of data.

o How do cloud vendors implement vertical scalability?

This may vary dramatically between cloud providers. Ultimately, however since virtualization allows you to boot a disk image onto any hardware, you can snapshot your current root volume or disk and then boot it on another server, one that is larger, smaller and so forth. About the only thing you need to watch out for is 32 versus 64 bit questions.

If you haven’t already, don’t forget to checkout the rest of this series – part one Operations Interview and part two Deployment Interview.

Read this far? Grab our newsletter – startup scalability.

10 ways I avoid trouble in database operations

1. Avoid destructive commands

From time to time I’m working with new recruits and bringing them up to speed in operations. The first thing I emphasize is care with destructive commands.

What do I mean here? Well there are all sorts of them. SQL commands such as DROP table & DROP database. But also TRUNCATE and DELETE are all destructive. They’re easy to execute but harder to undo. Think of all the steps it would take to restore from your backup.

If you are logged in as root there are many many ways to shoot your own foot. I hope you know this right? rm has lots of options that can be very difficult to step back from like -r (recursive) and -f (force). Better to not use the command at all and just move the file or directory you’re working on by renaming it. You can always delete later.

2. Set your command prompts

When working on the command line, your prompt is crucial. You check it over and over to make sure you’re working on the right box. At the OS, your prompt can tell you if you’re root or not, what directory you’re sitting in, and what’s the hostname of the box. With a few different terminals open, it’s very easy to execute a heavy loading command or destructive command on the wrong box. Check thrice, cut once!

You can also set your mysql prompt too. This can provide similar insurance. It can tell you the database schema you’re set at default, and the user you’re logged in as. Hostname or localhost too. It is one more piece in the risk aversion puzzle.

3. Perform backups & test them

I know I know, we’re all doing backups already. Well I sure hope so. But if you’re getting on a system for the first time, it should be your very initial impulse to check and find out what types of backups are being done. If they’re not, you should set them up. I don’t care how big the database is. If it’s an obstacle, you need to sell or educate management on what might happen if. Paint some ugly scenarios. It’s not always easy to see urgency in these things without a good war story or two.

We wrote a guide to using xtrabackup for hotbackups. These can be done online even while your production database is serving customers without table locking or other downtime.

4. Stay off production machines

This may sound funny to some of you, but I live by it. If it ain’t broke, don’t go and try to fix it! You don’t need to be on all these boxes all the time. That goes for other folks too. Don’t give devs access to every production box. Too many hands in the pie so to speak. Also limit root users. But again if those systems are running well, you don’t have to login to them and poke around every five minutes. This just brings more chances for operator error.

5. Avoid change as much as possible

This one might sound controversial but it’s saved me more than once.

I worked at one firm a few years back managing the MySQL servers. The Oracle DBA was going on vacation for a few weeks so I was picking up the reigns for a bit. I met with the DBA for some brain dump sessions, and he outlined the main things that can and do go wrong. He also asked that I avoid any table alterations.

Sure enough ten days into his vacation, a problem arose in the application. One page on the site was failing silently. There was a missing field which needed to be added. I resisted. A fight ensued. Suddenly a lot of money was at stake if this change wasn’t pushed through. I continued to resist. I explained that if such a change were not done correctly, it very likely would break replication, pushing a domino of other things to break and causing an unpredictable mess.

I also knew I only had to hold on for a few more days. The resident dba would be returning and he could juggle the change. You see Oracle was setup to use multi-master replication those changes needed to go through a rather complex process to be applied. Done incorrectly the damage would have taken days to cleanup and caused much more financial damage.

The DBA was very thankful at my resistance and management somewhat magically found a solution to the application & edit problem.

Push back is very important sometimes.

[quote]
Many of these ten tips are great characteristics to select for in the DBA hiring process. If you’re a candidate, emphasize your caution and track record with uptime. If you’re a manager, ask candidates about how they handle these situations. We wrote a MySQL DBA hiring guide too.

[/quote]

6. Monitor important things

You should monitor your OS syslog and MySQL error log for starters. But also your slow query log for new activity, analyze them and send the reports along to devs. Provide analysis. Monitor your partitions. You don’t ever want disks to fill up. Monitor load average, and have a check that the database login or some other simple transaction can succeed. You can even monitor your backups to make sure they complete without error. Use your judgement to decide what checks satisfy these requirements.

7. Use one or more slaves & checksum

MySQL slave databases are a great way to provide insurance. You can use a lagging slave to provide insurance against operator error, or one of those destructive commands we mentioned above. Have it lag a few hours behind so you’ll have that much insurance. At night this slave may be fresh enough to use for backups.

Also since mysql uses statement based replication, data can get out of sync over time. Those problems may or may not flag errors. So use a tool to compare your master and slave for data consistency. We wrote a howto on using checksums to do just that.

8. Be very careful of automatic failover

Automation is wonderful when it works. We dream of a data center that works like clockwork, with robots that never sleep. We can work towards this ideal, and in some cases get close. But it’s important to also understand that failure is by nature *not* what we predicted. The myriad ways that complex systems can fail boggles the mind, and surprises even seasoned veterans of operations. So maintain a heathy suspicion of this type of automation. Understand that if you automate things to happen in this crucial time, you can potentially put yourself in an even *more* compromised position than simply failing.

Sometimes monitoring, alerting, and manual intervention are the more prudent path. Your mileage may vary of course.

9. Be paranoid

It takes many years of doing ops to realize you can never be paranoid enough. Already checked that you’re on the right host, and about to execute some command? Quit the shell prompt and check again. Go back and ask the team if that table really needs to be dropped. Try to rephrase what you’re about to do in different words. Email out again to the team and wait some time before you pull the trigger. Check one more time that you have a fresh backup.

Delay that destructive command as long as you possibly can.

10. Keep it simple

I know I know, we all want to use that new command or tool, or jump on the latest hardware and take it for a spin. We want to build beautiful architectures that perform great feats of magic. But the fewer moving parts, the less things that can go wrong. And in ops, your job is stability and availability. Can you avoid using multi-master replication and go with just basic master-slave replication in MySQL? That’s simpler. Can you have fewer schemas or fewer filter rules? Can you skip the complicated HA layer, and use monitoring and manual failover?

Made it this far? Grab our newsletter.

Migrating MySQL to Oracle

This article is from 2006.  MySQL has come a long way since then. MySQL 5.5 is very robust and feature rich, matching Oracle in many different areas including datatypes, stored procedures and functions, high availability solutions, ACID compliance and MVCC, hotbackups, cold backups and dumps, full text and other index options, materialized views and much more.  Here’s a high level mysql feature guide.

What really separates the two technologies is cultural.  MySQL, rooted in the open source tradition is much more do-it-yourself, leaning towards roll your own solutions in many cases. Meanwhile Oracle provides named and proven paths to solve specific problems.

You might also check out: Migrating MySQL to Oracle Guide which is a larger multi-part series & work in progress.

For some basics What is a Database Migration?

Lastly these may be helpful – Migration to MySQL – Limitations, Anomalies, Considerations & Strengths and also Oracle to MySQL Migration Considerations

INTRODUCTION

MySQL is a powerful database platform for many simple web-based applications. Some of it’s power and speed comes from it’s simplicity. MySQL is actually a database without proper transactions. What this means in terms of speed is dramatic. However it also means you cannot rollback an update which encounters problems halfway through, and other sessions in the database will immediately see changes. There are many dramatic ramifications of this, as we’ll discuss later. Lastly there are limitations on dataset size. Oracle can obviously handle tables of a terabyte and larger. However since MySQL implements a table as one file, filesize limits as well as internal data organization, and indexing can become major limitations as tables grow to the millions of rows and higher.

When you begin to hit these limitations, whether in your application complexity, or the size of your data, you may consider moving your data and application to the Oracle Database. There you will have a rich set of features both on the programming side with stored procedures, views, materialized views, triggers, and so on. You will also have support for tables and indexes of virtually limitless size, full transaction support, and even sophisticated High Availability features such as Advanced Replication, Data Guard, and even clustering with Oracle’s Real Application Clusters technology.

With all these enticing features, and robustness, you’re eager to jump into Oracle. But don’t move so fast. There is a temendous amount of planning involved with both moving your data, and porting and testing your application on the new platform. Add to that Oracle licensing, and you’ll need some time to get there.

MySQL vs Oracle – feature comparisons

MySQL is a database fit for some types of applications. These tend to be smaller applications, or those which integrate applications with less sophisticated needs than those running Oracle on the backend.

It makes sense at this point to go through a feature comparison, and see what features MySQL shares with Oracle. Here’s a more in depth feature comparison of MySQL and Oracle.

MySQL shares with Oracle good support for database access including ODBC and JDBC drivers, as well as access libraries for Perl, Python and PHP. MySQL and Oracle both support binary large objects, character, numeric, and date datatypes. They both have primary and unique keys, and as of 4.x with InnoDB, MySQL has foreign keys, and transactions including READ UNCOMMITED, READ COMMITED, REPEATABLE READ, and SERIALIZABLE. Both databases have sophisticated language and character set support. MySQL can do table locking, and recently improved to include row-level locking. What’s more if you don’t need transactions, MyISAM tables are extremely fast. MySQL also includes a good data dump utility which you’ll see in action below when we migrate to Oracle. And lastly both databases of course include good b-tree indexes, which no database could be without.

There are, however, quite a number of features we find in Oracle as standard, which remain missing in MySQL. Until recently that included row-level locking, true transactions, and subqueries although as of 4.x those seem to be present. However, those have been present, and core technologies in Oracle for years, with very stable and solid implementation, you’re sure to achieve dramatic performance on tpc benchmarks. Views are still absent in MySQL, though they may be around the corner with subqueries available now.

Of course a lot of the high end Oracle features remain completely absent from MySQL, and may never be added. Features such as parallel query, and partitioned tables, which include a whole host of special features such as the ability to take one partition offline without impacting queries on the rest of the table. The indexing on partition tables is sophisticated too, allowing partition elimination, and range scans on indexes of specific partitions. There are other large database features such as special functions for star queries. Oracle has terabyte databases in production, so this fact speaks for itself.

MySQL still has a somewhat limited set of index types. For instance Oracle has reverse key, bitmap, and function based indexes, as well as index organized tables. These are all very powerful features for developers who are trying squeeze that last bit of performance out of complex SQL queries against large tables. Although MySQL does provide some index statistic collection, Oracle provides the full set of statistics, including histograms, and makes great use of it inside the Cost Based Optimizer. These statistics allow Oracle to better determine the best method of getting the data for your query and putting it together for you with the least use of resources in tems of memory cache, and disk access. This is really key for a database. When running large multi-user applications all of which are vying for data in different tables, you want to load just the data you need, and nothing more. Avoiding full table scans by using the proper index, and using various indexes in a join operation to pull fewer rows from the underlying tables means less disk I/O, which other processes are fighting for, and less use of cache, leaving more for other processes.

MySQL still does not have privilege groups, called ROLES in Oracle.

Oracle can also provide column level privilege control, called virtual private database and although we don’t see it used a lot in Oracle deployments, MySQL lacks this feature as well.

MySQL does not have hotbackups which have been an integral part of Oracle for years. (There are hotbackups now – 2012 – in MySQL here’s a howto on rebuilding replication using hotbackups guide) In addition, Oracle’s RMAN has become a sophisticated piece of software, and grown to be very stable, providing block level backups so only the data that changed can be included in subsequent backups. This makes nightly backups smaller overall. It also aids tremendously during recovery, providing a lot of automation, and assistence, during those times when you need it most. MySQL’s method is to dump data, and further if you want to guarentee a point in time dump of your data, you have to lock all the tables in the database, potentially slowing down your application tremendously. Lastly MySQL does not have automatic or point in time recovery, a real limitation on heavy use production databases.

MySQL also has some limitations on row sizes. MyISAM tables for instance, can have a maximum 64k of data per row, and InnoDB tables 8k per row. This does not include BLOB and TEXT types.

MySQL does not include database links such as those found in Oracle allowing you to query a table from an external database inside a query in a local database. There is the federated storage engine, but reports are that it’s rather buggy. DB Links can be useful for moving data between databases, and is key to implementing advanced replication in Oracle.

Lastly, MySQL has had some database size limitations. In 3.22 it could only access 4GB of data, however, as of 3.23 the only limitation has been with your operating system, and the size of files it can handle. On Linux with LFS or RaiserFS, this limitation is effectively eliminated. However, Oracle still has incredibly sophisticated storage cababilities, allowing virtually unlimited datafiles to be associated with a tablespace, and thus virtually limitless sized tables.

Updated note: In 5.5 and newer versions of MySQL there are no database size limitations. Also with Innodb you can use global tablespaces or one tablespace per table depending on your configuration. With most databases sitting on RAID or SAN these days, you’re getting pretty much the same deal with both MySQL & Oracle storage-wise.

Migration preparation

So you’ve seen what you can do with Oracle, and management has invested in licensing, and you’re now ready to get things setup in your development environment.

Now is the time to really get up to speed with Oracle. This goes for both Database Administration knowledge, as well as developer and programmer knowledge. The latter requires that you know a lot about Oracle’s features, in particular those which are relevant to your application. The former requires you understanding DBA roles, managing database files, memory structures, backups, and so on and so forth.

Thomas Kyte’s books are really excellent, and highly recommended. Check out “Expert One on One” on Wrox Press, and his newer title, “Effective Oracle by Design” which is on Oracle Press. He also has a website, http://asktom.oracle.com.

Also check out Kevin Loney + Marlene Therault’s DBA Handbook on Oracle Press. Of course don’t forget to read the Oracle docs, which are quite good. Start with the concepts manual for the version of Oracle you plan to go with.

In planning a migration the first thing you should do is take a look at the number, and types of tables in your database. Do that in MySQL as follows:

SQL> show table status

+——+——–+———+————+

| Name | Engine | Version | Row_format |

+——+——–+———+————+

| t | InnoDB | 9 | Fixed |

| u | MyISAM | 9 | Fixed |

+——+——–+———+————+

2 rows in set (0.05 sec)

This output is truncated, but serves as a useful example. You will see the tables, types, and a lot of other information about the tables you will be moving.

You’ll next want to review the datatypes of your various tables. CHAR in MySQL maps to CHAR in Oracle, VARCHAR to VARCHAR2, and the various Large Object types to RAW or BLOB in Oracle. DATE, DATETIME, and TIME map to Oracle’s DATE datatype, while TIMESTAMP and YEAR map to NUMBER. Lastly all of the various INT datatypes in MySQL map to NUMBER in Oracle and FLOAT, DOUBLE, REAL, and DECIMAL all map to FLOAT.

To get information about the you can use the ‘describe’ SQL command much like Oracle’s own describe:

mysql> describe t;

+-------+----------+------+-----+---------+----------------+

| Field | Type     | Null | Key | Default | Extra          |

+-------+----------+------+-----+---------+----------------+

| id    | int(11)  |      | PRI | NULL    | auto_increment |

| s     | char(60) | YES  |     | NULL    |                |

+-------+----------+------+-----+---------+----------------+

2 rows in set (0.01 sec)

Another way to get useful descriptions of tables is to use the mysqldump command. Here ‘test’ is the name of the database, and ‘t’ is the name of the table. Leave the table name off and the output will include all the tables in that database.

$ mysqldump test t

--

-- Table structure for table `t`

--

DROP TABLE IF EXISTS `t`;

CREATE TABLE `t` (

  `id` int(11) NOT NULL auto_increment,

  `s` char(60) default NULL,

  PRIMARY KEY  (`id`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

There’s actually quite a bit more output, and depending on the version of MySQL you may see additional comment lines etc. You’ll probably want to redirect the output to a file. Do so as follows:

$ mysqldump test t > t_table.mysql

$ mysqldump test > test_db.mysql

You will also want to get a sense of which columns are indexed to be sure that they get indexed in your Oracle database. Here is an example of how to list the indexes on a table:

mysql> show index from t;

+-------+------------+----------+--------------+-------------+

| Table | Non_unique | Key_name | Seq_in_index | Column_name |

+-------+------------+----------+--------------+-------------+

| t     |          0 | PRIMARY  |            1 | id          |

+-------+------------+----------+--------------+-------------+

1 row in set (0.04 sec)

An enumerated datatype is one where you define a static set of values up front. Oracle does not currently have such a datatype. The closest equivalent is a VARCHAR2 of sufficient size to hold all the various types. Migration Workbench will

do just that. If you want Oracle to enforce your set of values add a check constraint on that column.

Lastly, if you’re experiencing serious performance problems in MySQL, take a look at the slow query log. MySQL can be configured to dump queries which do not execute within a certain number of seconds, to a log file for your review. You can then use the EXPLAIN facility, a much simplified version of the facility found in Oracle, to tune queries for better execution path, possibly requiring a new index on a certain column or an index rebuild. In many instances restructuring a query can be of substantial

benefit.

What’s more many of these skills of tuning and optimizing queries will translate directly to Oracle. Queries are the lifeblood of your application. Bad ones can kill overall database performance by filling the cache with useless blocks of data and pushing out previously cached, and important data. What’s more inefficient queries cause more disk I/O which impacts the overall performance of your database. These issues hold true for all databases, and getting proficient with them now will

bring you up to speed faster with Oracle as you migrate.

Moving your data between MySQL and Oracle

At this point we’re still presuming that you will be moving your data by hand. This is not because we are gluttons for punishment. It is more to illustrate important points. Doing things by hand goes over each and every detail of the process so you understand it. You’ll need to when things go wrong, as they inevitably will. So we’re discussing moving the schema, and then the data by hand for all tables, however you may end up following the instructions below for using the Oracle Migration Workbench, and then only doing one or two special tables by hand. Or you may decide to use Migration Workbench to build scripts for you as a starting point, and then agressively edit those for your particular needs. All these options are valid.

So at this point you need to dump data. If you want to load data with Oracle’s SQL*Loader, an easy format to start with is CSV or Comma Separated Values file.

To dump one table named ‘t’ from database named ‘test’ use this bit of code. Note that we’ve broken things up into multiple lines to easily illustrate what’s happening with all those messy SED commands. You’re welcome to modify them for your needs but this works as-is. Note that ^V requires you to type ctrl-V and requires you to type ^I ctrl-I. Read your editor manual for details on inserting control characters into a file.

#!/bin/bash

# 1. get all rows from table t of database test

# 2. add double quote at beginning

# 3. replace tabs with ","

# 4. add double quote at end

# 5. remove header line

echo 'select * from t;' | mysql test 

 | sed -e 's/^/"/' 

 | sed -e 's/^V^I/","/' 

 | sed -e 's/$/"/' 

 | tail -n +2

Now is your chance to really put all those Oracle skills to work. You want to have CREATE TABLE statements to build each table, and scripts are an excellent way to get you going. They also self-document the process.

Here is an example of how to precreate the above very simple table in Oracle. Edit a file called t.sql and include these lines:

create table T (

  id   NUMBER(10,0) primary key,

  s    CHAR(60));

Save the file, and then fire up sqlplus and do:

SQL> @t.sql

Table created.

SQL> desc t;

 Name                                 Null?    Type

 ------------------------------------ -------- -----------------------

 ID                                        NOT NULL NUMBER(10)

 S                                                  CHAR(60)

Now use SQL*Loader to load the CSV data you created earlier. To do that you’ll need to create a control file. This tells SQL*Loader exactly what to do. Edit a file t.ctl and put this in the file:

LOAD DATA

REPLACE

INTO TABLE t

FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY """

TRAILING NULLCOLS

(id INTEGER EXTERNAL, s CHAR)

Once you’re done, save the file, and execute the following:

$ sqlldr your_username control=t.ctl data=t.dat log=t.log bad=t.bad

This should load your data into the table t that you created earlier. Check the log and bad files for details, and errors.

As with the Oracle Migration documentation, and any good documentation really, we’ll emphasize and reemphasize the need and importance of testing. You cannot assume that your script does things right, nor can you assume that the script Oracle’s Migration Workbench created will do things right. There are an infinite number of anomalies for any dataset, and testing your application is the only way you can verify you are in good shape.

What’s more you also need to verify that your data is correct. Suppose you have a banking application, and you are moving customer data from MySQL to Oracle. Suppose further you have records of monthly deposits and withdrawls against that account. You move the data from MySQL into Oracle, and the web or client based frontend is up and running again, after extensive porting, and testing. Does this guarentee that all the data is correct? Not at all. It means the right fields are in the right places, and probably the datatypes are correct. Imagine that that customer had a very high balance, and when moving to Oracle the field size was too small, and perhaps when the data was loaded, the inserted value failed, or was set to a default value like 0. Obviously you don’t want to find out when that customer comes calling. You want to look through the log files when the data is loaded, and then run some verification software against the db to compare values in the old db and the new db, or to calculate checksums such as running through deposits and withdrawls to make sure the current balances check out. This is really the most important step in the process and can’t be overstated.

Migration Workbench is Oracle’s recommended solution

Oracle’s Migration Workbench is a Java-based GUI tool which runs on various Windows versions, and comes complete with wizards to help you perform your migration. The process is divided into three steps.

First you capture your target database. This is done with a series of plugins which support various databases including of course MySQL. One plugin is available to handle 3.22 and 3.23 of MySQL and another one handles 4.x versions of MySQL. Capturing the source database is the same as the process we describe above manually of looking at your tables in mysql, and the columns, and indexes you are using. This is practical and feasible for a small number of tables, however, with hundreds or even thousands of tables, Oracle Migration Workbench becomes more and more of a

necessity.

Second, you migrate the source model to Oracle. This is the process where the Migration Workbench precreates all tables found in the source database, including columns of equivalent datatypes. We describe mappings of MySQL to Oracle datatypes above. Note that Oracle does not have ENUM or an enumerated datatype per se, but it can still migrate this data, and does so to VARCHAR2 in Oracle.

The third and last step is the review the scripts that the Migration Workbench has created, make any changes or modifications and then run them to move your data from your source MySQL database into your new Oracle database.

One thing that is important to remember about a migration is that it will take a lot more time, and end up being a lot more complicated than you expected. I liked this about the documentation. They make it clear from the beginning that planning will be a tremendous help to you in estimating time, and delivering successfully within budget. The documentation is also very thorough in it’s coverage of MySQL datatypes, and how they translate to Oracle datatypes, as described earlier in this article. And of course there is a strong emphasis on testing. The Migration Workbench provides customizable scripts which both document actions to be performed and provide a way for you to get your fingers into the works.

Keep in mind while using the Migration Workbench that it is NOT all or nothing. You can use the Migration Workbench, and then edit the scripts to leave certain tables alone, or you can migrate them all, then drop the few you want to do by hand using the methods we describe above. Ultimately a mix of the two will probably serve your needs best, as there is always some amount of manual intervention you want to perform for certain tables.

A migration between two databases is not a trivial undertaking. You have a lot of data, and an application which rely on it all being in the right format, with the right relationships. Moving to a new database, with a larger feature set, slightly different syntax, and different ways of doing things takes time and attention, but in the end you’ll be up and running on a sophisticated, scalable, world class database platform.

Oracle has a great set of resources on OTN devoted to migrating to Oracle. In particular there is the Migration Technology Center

Oracle’s Migration Workbench documentation and download page.

On the other side, here’s the MySQL 5.5 Reference Guide.

Made it this far? Grab our newsletter.