Tag Archives: oracle

Point-in-time Recovery – What is it and why is it important?

Web-facing database servers receive a barrage of activity 24 hours a day.  Sessions are managed for users logging in, ratings are clicked and comments are added.  Even more complex are web-based ecommerce applications.  All of this activity is organized into small chunks called transactions.  They are discrete sets of changes.  If you’re editing a word processing document, it might autosave every five minutes.  If you’re doing something in excel it may provide a similar feature.  There is also an in-built mechanism for undo and redo of recent edits you have made.  These are all analogous to transactions in a database.

These are important because all of these transactions are written to logfiles.  They make replication possible, by replaying those changes on another database server downstream.

If you have lost your database server because of hardware failure or instance failure in EC2, you’ll be faced with the challenge of restoring your database server.  How is this accomplished?  Well the first step would be to restore from the last full backup you have, perhaps a full database dump that you perform everyday late at night.  Great, now you’ve restored to 2am.  How do I get the rest of my data?

That is where point-in-time recovery comes in.  Since those transactions were being written to your transaction logs, all the changes made to your database since the last full backup must be reapplied.  In MySQL this transaction log is called the binlog, and there is a mysqlbinlog utility that reads the transaction log files, and replays those statements.  You’ll tell it the start time – in this case 2am when the backup happened.  And you’ll tell it the end time, which is the point-in-time you want to recover to.  That time will likely be the time you lost your database server hardware.

Point-in-time recovery is crucial to high availability, so be sure to backup your binlogs right alongside your full database backups that you keep every night.  If you lose the server or disk that the database is hosted on, you’ll want an alternate copy of those binlogs available for recovery!

Quora discussion on Point-in-time Recovery by Sean Hull

Data warehousing – What is it and why is it important?

A data warehouse is a special type of database.  It is used to store large amounts of data, such as analytics, historical, or customer data, and then build large reports and data mining against it.  It is markedly different from a web-facing or high-transaction database, which typically has many many small transactions or pieces of data that are constantly changing, through many 100’s or 1000’s or small user sessions.  These typically execute in speeds on the order of 1/100th of a second, while in data warehouse you have fewer large queries which can take minutes to execute.

Data warehouses are tuned for updates happening in bulk via batch jobs, and for large queries which need big chunks of memory to sort and cross-tabulate data from different tables.  Often full table scans are required because of the specialized one-off nature of these reports.  The same queries are not executed over and over.

It’s important not to mix data warehousing databases with transactional databases in the same instance, whether you are dealing with MySQL or Oracle.  That’s because they are tuned totally differently.  It would be like trying to use the same engine for commuting to work, and a container ship traveling around the world.  Different jobs require different databases or databases that with their dials set for different uses.

Quora discussion of data warehousing – Sean Hull

Relational Database – What is it and why is it important?

A relational database is the warehouse of your data.  Your crown jewels.  It’s your excel spreadsheet or filing cabinet writ large.  You use them everyday and may not know it.  Your smartphone stores it’s contact database in a relational database, most likely sqlite – the ever present but ever invisible embedded database platform.  Your online bank at Citibank or Chase stores all your financial history, statements, contact info, personal data and so forth, all in a relational database.

  • organized around records
  • data points are columns in a table
  • relationships are enforced with constraints
  • indexing data brings hi-speed access
  • SQL is used to get data in and out of the db
  • triggers, views, stored procs & materialized views may also be supported

Like excel, relational databases are organized around records.  A record is like a 3×5 card with a number of different data points on it.  Say you have 3×5 cards for your addressbook.  Each card holds one address, phone number, email, picture, notes and so forth.  By organizing things nicely on cards, and for each card predictable fields such as first name, last name, birthday etc, you can then search on those data points.  Want all the people in your addressbook with birthday of July 5th, no problem.

While relational databases have great advantages, they require a lot of work to get all of your information into neatly organized files.  What’s more the method for getting things into and out of them – SQL is a quirky and not very friendly language.  What’s more relational databases have trouble clustering, and scaling horizontally.  NOSql database have made some headway in these departments, but at costs to consistency and reliability of data.

As servers continue to get larger, it becomes rarer that a single web-facing database really needs more than one single server.  If it’s tuned right, that is.  Going forward and looking to the future, the landscape will probably continue to be populated by a mix of traditional relational databases, new nosql type databases, key-value stores, and other new technologies yet to be dreamed up.

Sean Hull asks on Quora – What is an rdbms and why are they important?

Query Profiling – What is it and why is it important?

Queries are so-named because of the lovely language we call SQL – simplified query language.  That’s a bit of sarcasm on my part, I’ve never found it particularly simple or elegant.  Profiling them involves finding how they are spending their time, and what they are asking the server to do.   In this way you can make them faster, and improve performance of the whole server, and thus your website.

At any rate queries ask the database for information.  If they are on the simpler side, something like give me all the users whose name starts with “hu” for example, and last name is indexed, that will run very fast.  The database will lookup in the index the last name field, and find the subset of ones starting with those letters, then go lookup the records by id and return them to you.  Cache the index blocks, cache the data blocks.  Great!  However, say those are customers, and you want their calling cellphone calling history.  Ok, now you have to join on another table, matching by some key, hold all those records in memory, shuffle them around, and so on.

So queries are effectively little jobs or bits of work you ask your database server to perform on your behalf.  With websites you typically have hundreds of concurrently running sessions all sending their own little SQL jobs to the server to get processed, and records returned.  And blip in the radar slows everyone down, so you want them to all run quickly.

That’s where profiling comes in.  MySQL, Oracle, and SQL Server alike all have EXPLAIN type facilities for showing the plan with which the database will use to fetch your data.  It shows indexes, sorting, joins, order, and so forth.  All of this contributes to the overall execution time and resources used on the database server.

Quora discussion by Sean Hull – What is query profiling and why is it important?

Oracle to MySQL Migration Considerations

There are a lot of forms of transportation, from walking to bike riding, motorcycles and cars to busses, trains and airplanes.  Each mode of transport will get you from point a to point b, but one may be faster, or more comfortable and another more cost effective.  It’s important to keep in mind when comparing databases like Oracle and MySQL that there are indeed a lot of feature differences, a lot of cultural differences, and a lot of cost differences.  There are also a lot of impassioned people on both sides arguing at the tomfoolery of the other.  Hopefully we can dispel some of the myths and discuss the topic fairly.

** See also: Migrating MySQL to Oracle Guide **

As a long time Oracle DBA turned MySQL expert, I’ve spent time with clients running both database engines and many migrating from one to the other.  I can speak to many of the differences between the two environments.  I’ll cover the following:

  1. Query & Optimizer Limitations
  2. Security Differences
  3. Replication & HA Are Done Differently
  4. Installation & Administration Simplicity
  5. Watch Out – Triggers, Stored Procedures, Materialized Views & Snapshots
  6. Huge Community Support – Open-source Add-ons
  7. Enter The Cloud With MySQL
  8. Backup and Recovery
  9. Miscellaneous Considerations

Check back again as we edit and publish the various sections above.

Oracle Expert – New York City

Looking for a top-flight DBA?  We are subject matter experts in Oracle and MySQL, especially integrating these database platforms with open source technologies.

We’ve always been about open-source technology, integration & mixing commercial technologies such as Oracle with open-source ones such as MySQL and Linux. As open-source becomes mainstream, and more shops consider moving critical services to these technologies, we continue to provide assistance and expertise for these transitions. Whether it is performance testing and tuning, benchmarking, high availability or recovery, we can provide services for your specific needs.

call_quote

Oracle DBA Interview Questions

Oracle Database Administrator or often called DBAs are an indispensable part of your operations team. They manage the systems that house all your business data, your customers, products, transactions and all that analytical data on what customers are actually doing. If you’ve ever been on the hunt, you may wonder, why the shortage of DBAs? To that we’ll answer, have you ever heard of Dustin Moskovitz?

So you certainly want to entrust that to someone who knows what they’re talking about. Enter the Oracle DBA Interview, a process that some will see as a technical test, while others will see as a fit of personalities, behaviors, and work ethic.

From the technical side we thought we’d bring you a quick and dirty checklist of questions. This isn’t an exhaustive list by any means, but is a good place to start and will certainly provide you with a glimpse of their knowledge.

Also if you’re looking to hire a MySQL DBA here’s a guide, and also one for hiring and EC2 expert.

1. What is the difference between RMAN and a traditional hotbackup?

RMAN is faster, can do incremental (changes only) backups, and does not place tablespaces into hotbackup mode.

2. What are bind variables and why are they important?

With bind variables in SQL, Oracle can cache related queries a single time in the SQL cache (area). This avoids a hard parse each time, which saves on various locking and latching resources we use to check objects existence and so on. BONUS: For rarely run queries, especially BATCH queries, we explicitely DO NOT want to use bind variables, as they hide information from the Cost Based Opitmizer.

BONUS BONUS: For batch queries from 3rd party apps like peoplesoft, if we can’t remove bind variables, we can use bind variable peeking!

3. In PL/SQL, what is bulk binding, and when/how would it help performance?

Oracle’s SQL and PL/SQL engines are separate parts of the kernel which require context switching, like between unix processes. This is slow, and uses up resources. If we loop on an SQL statement, we are implicitely flipping between these two engines. We can minimize this by loading our data into an array, and using PL/SQL bulk binding operation to do it all in one go!

4. Why is SQL*Loader direct path so fast?

SQL*Loader with direct path option can load data ABOVE the high water mark of a table, and DIRECTLY into the datafiles, without going through the SQL engine at all. This avoids all the locking, latching, and so on, and doesn’t impact the db (except possibly the I/O subsystem) at all.

5. What are the tradeoffs between many vs few indexes? When would you want to have many, and when would it be better to have fewer?

Fewer indexes on a table mean faster inserts/updates. More indexes mean faster, more specific WHERE clauses possibly without index merges.

6. What is the difference between RAID 5 and RAID 10? Which is better for Oracle?

RAID 5 is striping with an extra disk for parity. If we lose a disk we can reconstruct from that parity disk.

RAID 10 is mirroring pairs of disks, and then striping across those sets.

RAID 5 was created when disks were expensive. Its purpose was to provide RAID on the cheap. If a disk fails, the IO subsystem will perform VERY slowly during the rebuild process. What’s more your liklihood of failure increases dramatically during this period, with all the added weight of the rebuild. Even when it is operating normally RAID 5 is slow for everything but reading. Given that and knowing databases (especially Oracle’s redo logs) continue to experience write activity all the time, we should avoid RAID5 in all but the rare database that is MOSTLY read activity. Don’t put redologs on RAID5.

RAID10 is just all around goodness. If you lose one disk in a set of 10 for example, you could lose any one of eight other disks and have no troubles. What’s more rebuilding does not impact performance at all since you’re simply making a mirror copy. Lastly RAID10 perform exceedingly well in all types of databases.

7. When using Oracle export/import what character set concerns might come up? How do you handle them?

Be sure to set NLS_LANG for example to “AMERCIAN_AMERICA.WE8ISO8859P1”. If your source database is US7ASCII, beware of 8-bit characters. Also be wary of multi-byte characters sets as those may require extra attention. Also watch export/import for messages about any “character set conversions” which may occur.

8. How do you use automatic PGA memory management with Oracle 9i and above?

Set the WORKAREA_SIZE_POLICY parameter to AUTO and set PGA_AGGREGATE_TARGET

9. Explain two easy SQL optimizations.

a. EXISTS can be better than IN under various conditions

b. UNION ALL is faster than UNION (not sorting)

10. Name three SQL operations that perform a SORT.

a. CREATE INDEX

b. DISTINCT

c. GROUP BY

d. ORDER BY

f. INTERSECT

g. MINUS

h. UNION

i. UNINDEXED TABLE JOIN

11. What is your favorite tool for day-to-day Oracle operation?

Hopefully we hear some use of command line as the answer!

12. What is the difference between Truncate and Delete? Why is one faster?

Can we ROLLBACK both? How would a full table scan behave after?

Truncate is nearly instantaenous, cannot be rolled back, and is fast because Oracle simply resets the HWM. When a full table scan is performed on a table, such as for a sort operation, Oracle reads to the HWM. So if you delete every single solitary row in 10 million row table so it is now empty, sorting on that table of 0 rows would still be extremely slow.

13. What is the difference between a materialized view (snapshot) fast refresh versus complete refresh? When is one better, and when the other?

Fast refresh maintains a change log table, which records change vectors, not unlike how the redo logs work. There is overhead to this, as with a table that has a LOT of indexes on it, and inserts and updates will be slower. However if you are performing refreshes often, like every few minutes, you want to do fast refresh so you don’t have to full-table-scan the source table. Complete refresh is good if you’re going to refresh once a day. Does a full table scan on the source table, and recreats the snapshot/mview. Also inserts/updates on the source table are NOT impacted on tables where complete refresh snapshots have been created.

14. What does the NO LOGGING option do? Why would we use it? Why would we be careful of using it?

It disables the logging of changes to the redologs. It does not disable ALL LOGGING, however as Oracle continues to use a base of changes, for recovery if you pull the plug on the box, for instance. However it will cause problems if you are using standby database. Use it to speed up operations, like an index rebuild, or partition maintenance operations.

15. Tell me about standby database? What are some of the configurations of it? What should we watch out for?

Standby databases allow us to create a copy of our production db, for disaster recovery. We merely switch mode on the target db, and bring it up as read/write. Can setup as master->slave or master->master. The latter allows the former prod db to become the standby, once the failure cause is remedied. Watch out for NO LOGGING!! Be sure we’re in archivelog mode.

Hey you! If you made it this far, definitely grab our newsletter.

APress – Cost-Based Oracle by Jonathan Lewis

The beauty of reading a book by a publisher not sanctioned by Oracle and by an author who doesn’t work for Oracle is that they can openly mention bugs. And there are oh-so-many! This book is a superb introduction to the Cost Based Optimizer, and is not afraid to discuss it’s many shortcomings. In so doing it also explains how to patch up those shortcomings by giving the CBO more information, either by creating a histogram here and there, or by using the DBMS_STATS package to insert your own statistics in those specific cases where you need to.

Another interesting thing is how this book illustrates, though accidentally, the challenges of proprietary software systems. Much of this book and the authors time is spent reverse engineering the CBO, Oracle’s bread and butter optimizing engine. Source code, and details about its inner workings are not published or available. And of course that’s intentional. But what’s clear page after page in this book is that for the DBA and system tuner, going about their day to day tasks, they really need inside information about what the optimizer is doing, and so this book goes on a long journal to illuminate much of what the CBO is doing, or in some cases provide very educated guesses and some speculation. In contrast, as we know and hear about often, the Open Source alternative provides free access to source code, though not necessarily to the goods themselves. What this means in a very real way is that a book like this would not need to be written for an alternative open source application, because the internal code would be a proverbial open book. That said it remains difficult to imagine how a company like Oracle might persue a more open strategy given that their bread and butter really is the secrets hidden inside their Cost Based Optimizing engine. At any rate, let’s get back to Jonathan’s book.

Reading this book was like reading a scientists notebook. I found it:

o of inestimable value, but sometimes difficult to sift through

o very anecdotal in nature, debugging, and constantly demonstrating that the CBO is much more faulty and prone to errors than you might imagine

o may not be easy to say I have a query of type X, and it is behaving funny, how do I lookup information on this?

o his discussion of the evolution of the product is so good I’ll quote it:

“A common evolutionary path in the optimizer code seems to be the following: hidden by undocumented parameter and disabled in first release; silently enabled but not costed in second release; enabled and costed in third release.”

o has excellent chapter summaries which were particularly good for sifting, and boiling down the previous pages into a few conclusions.

o it will probably be of particular value to Oracle’s own CBO development teams

Chapter highlights

CH2 – Tablescans

explains how to gather system stats, how to use dbms_stats to set ind. stats manually, bind variables can make the CBO blind, bind variable peeking may not help, partition exchange may break global stats for table, use CPU costing when possible

CH3 – Selectivity

big problem with IN lists in 8i, fixed in 9i/10g, but still prob. with NOT IN, uses very good example of astrological signs overlapping birth months, and associated CBO cardinality problems, reminds us that the optimizer isn’t actually intelligent per se, but merely a piece of software

CH4 BTree Access

cost based on depth, #leaf blocks, and clustering factor, try to use CPU costing (system statistics)

CH5 – Clustering Factor

mainly a measure of the degree of random distribution of your data, very important for costing indx scans, use dbms_stats to correct when necessary, just giving CBO better information, freelists (procID problem) + freelist groups discussion with RAC

CH6 – Selectivity Issues

there is a big problem with string selectivity, Oracle uses only first seven characters, will be even more trouble for urls all starting with “http://”, and multibyte charactersets, trouble when you have db ind. apps which use string for date, use histrograms when you have problems, can use the tuning advisor for “offline optimization”, Oracle uses transitive closure to transform queries to more easily opt versions, moves predicates around, sometimes runs astray

CH7 – Histograms

height balanced > 255 buckets (outside Oracle called equi-depth),

otherwise frequency histograms, don’t use cursor sharing as it forces bind variables, blinds CBO, bind var peeking is only first call, Oracle doesn’t use histograms much, expensive to create, use sparingly, dist queries don’t pull hist from remote site, don’t work well with joins, no impact if you’re using bind vars, if using dbms_stats to hack certain stats be careful of rare codepaths

CH8 – Bitmap Indexes

don’t stop at just one, avoid updates like the plague as can cause deadlocking, opt assumes 80% data tightly packed, 20% widely scattered

CH9 – Query Transformation

partly rule based, peeling the onion w views to understand complex queries, natural language queries often not the most efficient, therefore this transformation process has huge potential upside for Oracle in overall optimization of app code behind the scenes by db engine, always remember Oracle may rewrite your query, sometimes want to block with hints, tell CBO about uniqueness, not NULL if you know this

CH10 – Join Cardinality

makes sensible guess at best first table, continues from there,

don’t hide useful information from the CBO, histograms may help with some difficult queries

CH11 – Nested Loops

fairly straightforward costing based on cardinality of each returned set multiplied together

CH12 – Hash Joins

Oracle executes as optimal (all in memory), onepass (doesn’t quite fit so dumped to disk for one pass) and multipass (least attractive sort to disk), avoid scripts writing scripts in prod, best option is to use workarea_size_policy=AUTO, set pga_aggregate_target & use CPU costing

CH 13 – Sorting + Merge Joins

also uses optimal, onepass, & multipass algorithms, need more than 4x dataset size for in memory sort, 8x on 64bit system, increasing sort_area_size will incr. CPU util so on CPU bottlenecked machines sorting to disk (onepass) may improve performance, must always use ORDER BY to guarentee sorted output, Oracle may not need to sort behind the scenes, Oracle very good at avoiding sorts, again try to use workarea_size_policy=AUTO

CH 14 – 10053 Trace

reviews various ways to enable, detailed rundown of trace with comments inline, and highlights; even mentions a VOL 2 + 3 of the book is coming!

Appendix A

be careful when switching from analyze to dbms_stats, in 10g some new hist will appear w/default dbms_stats options, 10g creates job to gather stats

Conclusion

I found this book to be full of gems of information that you won’t find anywhere else. If you’re at the more technical end of the spectrum, this is a one of a kind Oracle book and a

must-have for your collection. Keep in mind something Jonathan mentions in appendix A: “New features that improve 99% of all known queries may cripple your database because you fall into the remaining 1% of special cases”. If these cases are your concern, then this book will surely prove to be one-of-a-kind for you!

View this review on Amazon.com

APress – Expert Oracle DB Arch by Tom Kyte

I have a confession to make. I haven’t read an Oracle book cover-to-cover in almost three years. Sure I skim through the latest titles for what I need and of course check out documentation of the latest releases. That’s what good docs provide, quick reference when you need to check syntax, or details of a particular parameter, or feature, but have you ever read some documentation, sift through a paragraph, page or two, and say to yourself, that’s great, but what about this situation I have right now? Unfortunately documentation doesn’t always

speak to your real everyday needs. It is excellent for reference, but doesn’t

have a lot of real-world test cases, and practical usage examples. That’s where Tom Kyte’s new book comes in, and boy is it a killer.

I’ve read Tom’s books before, and always enjoyed them. But his new APress title really stands out as an achievement. Page after page and chapter after chapter he uses straightforward examples pasted right from the SQL*Plus prompt to illustrate, demonstrate, and illuminate concepts that he is explaining. It is this practical hands on, relentless approach that makes this book 700 pages of goodness.

Already an expert at Oracle? You’ll become more of one after reading this book. With reviewers like Jonathan Lewis I expected this book to be good from the outset I have to admit. But each chapter delves into a bit more depth around subjects that are central to Oracle programming and administration.

No SCREEN SHOTS!

One of the things I loved about this book most of all is its complete lack of screenshots! But how does one illustrate a concept then, you might ask? These days with graphical interfaces becoming more and more popular even among technical folks, I run into the question of the command line over an over again. How can you be doing sophisticated database administration of the latest servers running Oracle with the command line? Or another question I often get is, can you really do everything with the command line? The answer to both is a resounding yes, in fact you can do much more with the command line. Luckily for us, Tom is of this school too, and page after page of his book are full of real examples and commands that you can try for yourself, with specific instructions on

setting up the environment, using statistics gathering packages, and so on. In an era of computing where GUIs seem to reign like magazines over the best literature of the day, it is refreshing to see some of the best and most technical minds around Oracle still advocate the best tool, command line as the interface

of choice. In fact it is the command line examples, and happily the complete lack of screenshots that indeed makes this book a jewel of a find.

Audience

As a DBA you might wonder why I’m talking so highly of a book more focused towards developers. There are a couple of reasons. First this book is about the Oracle architecture, as it pertains to developers. In order for developers to best take advantage of the enterprise investment in Oracle *** they need to thoroughly understand the architecture, how specific features operate, which features are appropriate, and how to optimize their code for best interaction with them. Of course a DBA who is trying to keep a database operating in tip top shape needs to be aware of when developers are not best using Oracle, to identify,

and bring attention to bottlenecks, and problem areas in the application. Second, it is often a DBAs job to tune an existing database, and the very largest benefits come from tuning application SQL. For instance if a developer has chosen to use a bitmap index on an INSERT/UPDATE intensive table, they’re in for serious problems. Or if a developer forgot to index a foreign key column. This book directly spearheads those types of questions, and when necessary does mention a thing or two of direct importance to DBAs as well.

Highlights

Chapter 2 has an excellent example of creating an Oracle database. You simply write one line to your init.ora “db_name=sean” for example, and then from the SQL> prompt issues “startup nomount” and then “create database”. Looking at the processes Oracle starts, and the files that are created can do wonders for your understanding of database, instance, and Oracle in general.

Chapter 3 covers files, files, and more files. Spfile replaces a text init.ora allowing parameters to be modified while an instance is running *AND* stored persistently. He covers redolog files, flashback logs, and change tracking file

s, as well as import/export dump files, and lastly datapump files.

Chapter 4 covers memory, and specifically some of the new auto-magic options, how they work, and what to watch out for.

Chapter 5 covers processes.

Chapter 6, 7, and 8 cover lock/latching, multiversioning, and transactions respectively. I mention them all here together because to me these chapters are the real meat of the book. And that’s coming from a vegetarian! Seriously these

topics are what I consider to be the most crucial to understanding Oracle, and modern databases in general, and the least understood. They are the darkest corners, but Tom illuminates them for us. You’ll learn about optimistic versus pessismistic locking, page level, row level, and block level locking in various modern databases such as SQLServer, Informix, Sybase, DB2 and Oracle. Note Oracle is by far in the lead in this department, never locking more than it needs to, which yields the best concurrency with few situations where users block each other. Readers never block, for instance, because of the way Oracle implements all of this. He mentions latch spinning, which Oracle does to avoid a context switch, that is more expensive, how to detect, and reduce this type of contention. You’ll learn about dirty reads, phantom reads, and non-repeatable reads, and about Oracle’s Read-committed versus Serializable modes. What’s more you’ll learn about the implications of these various models on your applications, and what type of assumptions you may have to unlearn if you’re coming from developing on another database to Oracle. If I were to make any criticism at all, I might mention that in this area Tom becomes ever so slightly preachy about Oracle’s superb implementation of minimal locking, and non-blocking reads. This is in large part due I’m sure to running into so many folks who are used to developing on databases which do indeed dumb you down *BECAUSE* of their implementation, encouraging bad habits with respect to transactions, and auto-commit for instance. One thing is for sure you will learn a heck of a lot from these three chapters, I know I did.

Chapter 9 Redo & Undo describes what each is, how to avoid checkpoint not complete and why you want to, how to *MEASURE* undo so as to reduce it, how to avoid log file waits (are you on RAID5, are your redologs on a buffered filesystem?), and what block cleanouts are.

Chapter 10 covers tables. After reading it I’d say the most important types are normal (HEAP), Index Organized, Temporary, and External Tables. Use ASSM where possible as it will save you in many ways, use DBMS_METADATA to reverse engineer objects you’ve created to get all the options, don’t use TEMP tables to avoid inline views, or complex joins, your performance will probably suffer, and how to handle LONG/LOB data in tables.

Chapter 11 covers indexes, topics ranging from height, compression count, DESC sorted, colocated data, bitmap indexes and why you don’t want them in OLTP data

bases, function based indexes and how they’re most useful for user defined functions, why indexing foreign keys is important, and choosing the leading edge of an index. Plus when to rebuild or coalesce and why.

Chapter 12 covers datatypes, why never to use CHAR, using the NLS features, the CAST function, the number datatypes and precision versus performance, raw_to_hex, date arithmatic, handling LOB data and why not to use LONG, BFILEs and the new UROWID.

Chapter 13 discusses partitioning. What I like is he starts the chapter with the caveat that partitioning is not the FAST=TRUE option. That says it all. For OLTP databases you will achieve higher availability, and ease of administration of large options, as well as possibly reduced contention on larger objects,

but it is NOT LIKELY that you will receive query performance improvements because of the nature of OLTP. With a datawarehouse, you can use partition elimination on queries that do range or full table scans which can speed up queries dramatically. He discusses range, list, hash, and composite partitioning, local indexing (prefixed & non-prefixed) and global indexing. Why datawarehouses tend to use local, and OLTP databases tend to use global indexes, and even how you

can rebuild your global indexes as you’re doing partition maintenance avoiding a costly rebuild of THE ENTIRE INDEX, and associated downtime. He also includes a great auditing example.

Chapter 14 covers parallel execution such as parallel dml, ddl, and so on. Here is where a book like Tom’s is invaluable, as he comes straight out with his opinions on a weighty topic. He says these features are most relevant to DBAs doing one-off maintenance and data loading operations. That is because even in

datawarehouses, todays environments often have many many users. The parallel features are designed to allow single session jobs to utilize the entire system resources. He explains that Oracle’s real sweet spot in this real is parallel

DDL, such as CREATE INDEX, CREATE TABLE AS SELECT, ALTER INDEX REBUILD, ALTER TABLE MOVE, and so on.

Chapter 15, the final chapter covers loading and unloading data. A significant portion of the chapter covers SQL*Loader for completeness, but he goes on to celebrate the wonders of external tables for loading data into Oracle. In particular there is an option in SQL*Loader to generate the CREATE statement for an

external table that does the SAME load! This is great stuff. External tables provide advantages over SQL*Loader in almost every way, except perhaps loading over a network, concurrent user access, and handling LOB data. External tables can use complex where clauses, merge data, do fast code lookups, insert into multiple tables, and finally provide a simpler learning curve.

Conclusions

Yum. If you love Oracle, you’ll want to read this book. If you need to know more about Oracle say, for your job, that’s another reason you might read this book. Oracle is fascinating technology, and Tom’s passion for understanding every last bit of it makes this book both a necessary read, and a very gratifying

one.

View this review on Amazon.com

Migrating MySQL to Oracle

This article is from 2006.  MySQL has come a long way since then. MySQL 5.5 is very robust and feature rich, matching Oracle in many different areas including datatypes, stored procedures and functions, high availability solutions, ACID compliance and MVCC, hotbackups, cold backups and dumps, full text and other index options, materialized views and much more.  Here’s a high level mysql feature guide.

What really separates the two technologies is cultural.  MySQL, rooted in the open source tradition is much more do-it-yourself, leaning towards roll your own solutions in many cases. Meanwhile Oracle provides named and proven paths to solve specific problems.

You might also check out: Migrating MySQL to Oracle Guide which is a larger multi-part series & work in progress.

For some basics What is a Database Migration?

Lastly these may be helpful – Migration to MySQL – Limitations, Anomalies, Considerations & Strengths and also Oracle to MySQL Migration Considerations

INTRODUCTION

MySQL is a powerful database platform for many simple web-based applications. Some of it’s power and speed comes from it’s simplicity. MySQL is actually a database without proper transactions. What this means in terms of speed is dramatic. However it also means you cannot rollback an update which encounters problems halfway through, and other sessions in the database will immediately see changes. There are many dramatic ramifications of this, as we’ll discuss later. Lastly there are limitations on dataset size. Oracle can obviously handle tables of a terabyte and larger. However since MySQL implements a table as one file, filesize limits as well as internal data organization, and indexing can become major limitations as tables grow to the millions of rows and higher.

When you begin to hit these limitations, whether in your application complexity, or the size of your data, you may consider moving your data and application to the Oracle Database. There you will have a rich set of features both on the programming side with stored procedures, views, materialized views, triggers, and so on. You will also have support for tables and indexes of virtually limitless size, full transaction support, and even sophisticated High Availability features such as Advanced Replication, Data Guard, and even clustering with Oracle’s Real Application Clusters technology.

With all these enticing features, and robustness, you’re eager to jump into Oracle. But don’t move so fast. There is a temendous amount of planning involved with both moving your data, and porting and testing your application on the new platform. Add to that Oracle licensing, and you’ll need some time to get there.

MySQL vs Oracle – feature comparisons

MySQL is a database fit for some types of applications. These tend to be smaller applications, or those which integrate applications with less sophisticated needs than those running Oracle on the backend.

It makes sense at this point to go through a feature comparison, and see what features MySQL shares with Oracle. Here’s a more in depth feature comparison of MySQL and Oracle.

MySQL shares with Oracle good support for database access including ODBC and JDBC drivers, as well as access libraries for Perl, Python and PHP. MySQL and Oracle both support binary large objects, character, numeric, and date datatypes. They both have primary and unique keys, and as of 4.x with InnoDB, MySQL has foreign keys, and transactions including READ UNCOMMITED, READ COMMITED, REPEATABLE READ, and SERIALIZABLE. Both databases have sophisticated language and character set support. MySQL can do table locking, and recently improved to include row-level locking. What’s more if you don’t need transactions, MyISAM tables are extremely fast. MySQL also includes a good data dump utility which you’ll see in action below when we migrate to Oracle. And lastly both databases of course include good b-tree indexes, which no database could be without.

There are, however, quite a number of features we find in Oracle as standard, which remain missing in MySQL. Until recently that included row-level locking, true transactions, and subqueries although as of 4.x those seem to be present. However, those have been present, and core technologies in Oracle for years, with very stable and solid implementation, you’re sure to achieve dramatic performance on tpc benchmarks. Views are still absent in MySQL, though they may be around the corner with subqueries available now.

Of course a lot of the high end Oracle features remain completely absent from MySQL, and may never be added. Features such as parallel query, and partitioned tables, which include a whole host of special features such as the ability to take one partition offline without impacting queries on the rest of the table. The indexing on partition tables is sophisticated too, allowing partition elimination, and range scans on indexes of specific partitions. There are other large database features such as special functions for star queries. Oracle has terabyte databases in production, so this fact speaks for itself.

MySQL still has a somewhat limited set of index types. For instance Oracle has reverse key, bitmap, and function based indexes, as well as index organized tables. These are all very powerful features for developers who are trying squeeze that last bit of performance out of complex SQL queries against large tables. Although MySQL does provide some index statistic collection, Oracle provides the full set of statistics, including histograms, and makes great use of it inside the Cost Based Optimizer. These statistics allow Oracle to better determine the best method of getting the data for your query and putting it together for you with the least use of resources in tems of memory cache, and disk access. This is really key for a database. When running large multi-user applications all of which are vying for data in different tables, you want to load just the data you need, and nothing more. Avoiding full table scans by using the proper index, and using various indexes in a join operation to pull fewer rows from the underlying tables means less disk I/O, which other processes are fighting for, and less use of cache, leaving more for other processes.

MySQL still does not have privilege groups, called ROLES in Oracle.

Oracle can also provide column level privilege control, called virtual private database and although we don’t see it used a lot in Oracle deployments, MySQL lacks this feature as well.

MySQL does not have hotbackups which have been an integral part of Oracle for years. (There are hotbackups now – 2012 – in MySQL here’s a howto on rebuilding replication using hotbackups guide) In addition, Oracle’s RMAN has become a sophisticated piece of software, and grown to be very stable, providing block level backups so only the data that changed can be included in subsequent backups. This makes nightly backups smaller overall. It also aids tremendously during recovery, providing a lot of automation, and assistence, during those times when you need it most. MySQL’s method is to dump data, and further if you want to guarentee a point in time dump of your data, you have to lock all the tables in the database, potentially slowing down your application tremendously. Lastly MySQL does not have automatic or point in time recovery, a real limitation on heavy use production databases.

MySQL also has some limitations on row sizes. MyISAM tables for instance, can have a maximum 64k of data per row, and InnoDB tables 8k per row. This does not include BLOB and TEXT types.

MySQL does not include database links such as those found in Oracle allowing you to query a table from an external database inside a query in a local database. There is the federated storage engine, but reports are that it’s rather buggy. DB Links can be useful for moving data between databases, and is key to implementing advanced replication in Oracle.

Lastly, MySQL has had some database size limitations. In 3.22 it could only access 4GB of data, however, as of 3.23 the only limitation has been with your operating system, and the size of files it can handle. On Linux with LFS or RaiserFS, this limitation is effectively eliminated. However, Oracle still has incredibly sophisticated storage cababilities, allowing virtually unlimited datafiles to be associated with a tablespace, and thus virtually limitless sized tables.

Updated note: In 5.5 and newer versions of MySQL there are no database size limitations. Also with Innodb you can use global tablespaces or one tablespace per table depending on your configuration. With most databases sitting on RAID or SAN these days, you’re getting pretty much the same deal with both MySQL & Oracle storage-wise.

Migration preparation

So you’ve seen what you can do with Oracle, and management has invested in licensing, and you’re now ready to get things setup in your development environment.

Now is the time to really get up to speed with Oracle. This goes for both Database Administration knowledge, as well as developer and programmer knowledge. The latter requires that you know a lot about Oracle’s features, in particular those which are relevant to your application. The former requires you understanding DBA roles, managing database files, memory structures, backups, and so on and so forth.

Thomas Kyte’s books are really excellent, and highly recommended. Check out “Expert One on One” on Wrox Press, and his newer title, “Effective Oracle by Design” which is on Oracle Press. He also has a website, http://asktom.oracle.com.

Also check out Kevin Loney + Marlene Therault’s DBA Handbook on Oracle Press. Of course don’t forget to read the Oracle docs, which are quite good. Start with the concepts manual for the version of Oracle you plan to go with.

In planning a migration the first thing you should do is take a look at the number, and types of tables in your database. Do that in MySQL as follows:

SQL> show table status

+——+——–+———+————+

| Name | Engine | Version | Row_format |

+——+——–+———+————+

| t | InnoDB | 9 | Fixed |

| u | MyISAM | 9 | Fixed |

+——+——–+———+————+

2 rows in set (0.05 sec)

This output is truncated, but serves as a useful example. You will see the tables, types, and a lot of other information about the tables you will be moving.

You’ll next want to review the datatypes of your various tables. CHAR in MySQL maps to CHAR in Oracle, VARCHAR to VARCHAR2, and the various Large Object types to RAW or BLOB in Oracle. DATE, DATETIME, and TIME map to Oracle’s DATE datatype, while TIMESTAMP and YEAR map to NUMBER. Lastly all of the various INT datatypes in MySQL map to NUMBER in Oracle and FLOAT, DOUBLE, REAL, and DECIMAL all map to FLOAT.

To get information about the you can use the ‘describe’ SQL command much like Oracle’s own describe:

mysql> describe t;

+-------+----------+------+-----+---------+----------------+

| Field | Type     | Null | Key | Default | Extra          |

+-------+----------+------+-----+---------+----------------+

| id    | int(11)  |      | PRI | NULL    | auto_increment |

| s     | char(60) | YES  |     | NULL    |                |

+-------+----------+------+-----+---------+----------------+

2 rows in set (0.01 sec)

Another way to get useful descriptions of tables is to use the mysqldump command. Here ‘test’ is the name of the database, and ‘t’ is the name of the table. Leave the table name off and the output will include all the tables in that database.

$ mysqldump test t

--

-- Table structure for table `t`

--

DROP TABLE IF EXISTS `t`;

CREATE TABLE `t` (

  `id` int(11) NOT NULL auto_increment,

  `s` char(60) default NULL,

  PRIMARY KEY  (`id`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

There’s actually quite a bit more output, and depending on the version of MySQL you may see additional comment lines etc. You’ll probably want to redirect the output to a file. Do so as follows:

$ mysqldump test t > t_table.mysql

$ mysqldump test > test_db.mysql

You will also want to get a sense of which columns are indexed to be sure that they get indexed in your Oracle database. Here is an example of how to list the indexes on a table:

mysql> show index from t;

+-------+------------+----------+--------------+-------------+

| Table | Non_unique | Key_name | Seq_in_index | Column_name |

+-------+------------+----------+--------------+-------------+

| t     |          0 | PRIMARY  |            1 | id          |

+-------+------------+----------+--------------+-------------+

1 row in set (0.04 sec)

An enumerated datatype is one where you define a static set of values up front. Oracle does not currently have such a datatype. The closest equivalent is a VARCHAR2 of sufficient size to hold all the various types. Migration Workbench will

do just that. If you want Oracle to enforce your set of values add a check constraint on that column.

Lastly, if you’re experiencing serious performance problems in MySQL, take a look at the slow query log. MySQL can be configured to dump queries which do not execute within a certain number of seconds, to a log file for your review. You can then use the EXPLAIN facility, a much simplified version of the facility found in Oracle, to tune queries for better execution path, possibly requiring a new index on a certain column or an index rebuild. In many instances restructuring a query can be of substantial

benefit.

What’s more many of these skills of tuning and optimizing queries will translate directly to Oracle. Queries are the lifeblood of your application. Bad ones can kill overall database performance by filling the cache with useless blocks of data and pushing out previously cached, and important data. What’s more inefficient queries cause more disk I/O which impacts the overall performance of your database. These issues hold true for all databases, and getting proficient with them now will

bring you up to speed faster with Oracle as you migrate.

Moving your data between MySQL and Oracle

At this point we’re still presuming that you will be moving your data by hand. This is not because we are gluttons for punishment. It is more to illustrate important points. Doing things by hand goes over each and every detail of the process so you understand it. You’ll need to when things go wrong, as they inevitably will. So we’re discussing moving the schema, and then the data by hand for all tables, however you may end up following the instructions below for using the Oracle Migration Workbench, and then only doing one or two special tables by hand. Or you may decide to use Migration Workbench to build scripts for you as a starting point, and then agressively edit those for your particular needs. All these options are valid.

So at this point you need to dump data. If you want to load data with Oracle’s SQL*Loader, an easy format to start with is CSV or Comma Separated Values file.

To dump one table named ‘t’ from database named ‘test’ use this bit of code. Note that we’ve broken things up into multiple lines to easily illustrate what’s happening with all those messy SED commands. You’re welcome to modify them for your needs but this works as-is. Note that ^V requires you to type ctrl-V and requires you to type ^I ctrl-I. Read your editor manual for details on inserting control characters into a file.

#!/bin/bash

# 1. get all rows from table t of database test

# 2. add double quote at beginning

# 3. replace tabs with ","

# 4. add double quote at end

# 5. remove header line

echo 'select * from t;' | mysql test 

 | sed -e 's/^/"/' 

 | sed -e 's/^V^I/","/' 

 | sed -e 's/$/"/' 

 | tail -n +2

Now is your chance to really put all those Oracle skills to work. You want to have CREATE TABLE statements to build each table, and scripts are an excellent way to get you going. They also self-document the process.

Here is an example of how to precreate the above very simple table in Oracle. Edit a file called t.sql and include these lines:

create table T (

  id   NUMBER(10,0) primary key,

  s    CHAR(60));

Save the file, and then fire up sqlplus and do:

SQL> @t.sql

Table created.

SQL> desc t;

 Name                                 Null?    Type

 ------------------------------------ -------- -----------------------

 ID                                        NOT NULL NUMBER(10)

 S                                                  CHAR(60)

Now use SQL*Loader to load the CSV data you created earlier. To do that you’ll need to create a control file. This tells SQL*Loader exactly what to do. Edit a file t.ctl and put this in the file:

LOAD DATA

REPLACE

INTO TABLE t

FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY """

TRAILING NULLCOLS

(id INTEGER EXTERNAL, s CHAR)

Once you’re done, save the file, and execute the following:

$ sqlldr your_username control=t.ctl data=t.dat log=t.log bad=t.bad

This should load your data into the table t that you created earlier. Check the log and bad files for details, and errors.

As with the Oracle Migration documentation, and any good documentation really, we’ll emphasize and reemphasize the need and importance of testing. You cannot assume that your script does things right, nor can you assume that the script Oracle’s Migration Workbench created will do things right. There are an infinite number of anomalies for any dataset, and testing your application is the only way you can verify you are in good shape.

What’s more you also need to verify that your data is correct. Suppose you have a banking application, and you are moving customer data from MySQL to Oracle. Suppose further you have records of monthly deposits and withdrawls against that account. You move the data from MySQL into Oracle, and the web or client based frontend is up and running again, after extensive porting, and testing. Does this guarentee that all the data is correct? Not at all. It means the right fields are in the right places, and probably the datatypes are correct. Imagine that that customer had a very high balance, and when moving to Oracle the field size was too small, and perhaps when the data was loaded, the inserted value failed, or was set to a default value like 0. Obviously you don’t want to find out when that customer comes calling. You want to look through the log files when the data is loaded, and then run some verification software against the db to compare values in the old db and the new db, or to calculate checksums such as running through deposits and withdrawls to make sure the current balances check out. This is really the most important step in the process and can’t be overstated.

Migration Workbench is Oracle’s recommended solution

Oracle’s Migration Workbench is a Java-based GUI tool which runs on various Windows versions, and comes complete with wizards to help you perform your migration. The process is divided into three steps.

First you capture your target database. This is done with a series of plugins which support various databases including of course MySQL. One plugin is available to handle 3.22 and 3.23 of MySQL and another one handles 4.x versions of MySQL. Capturing the source database is the same as the process we describe above manually of looking at your tables in mysql, and the columns, and indexes you are using. This is practical and feasible for a small number of tables, however, with hundreds or even thousands of tables, Oracle Migration Workbench becomes more and more of a

necessity.

Second, you migrate the source model to Oracle. This is the process where the Migration Workbench precreates all tables found in the source database, including columns of equivalent datatypes. We describe mappings of MySQL to Oracle datatypes above. Note that Oracle does not have ENUM or an enumerated datatype per se, but it can still migrate this data, and does so to VARCHAR2 in Oracle.

The third and last step is the review the scripts that the Migration Workbench has created, make any changes or modifications and then run them to move your data from your source MySQL database into your new Oracle database.

One thing that is important to remember about a migration is that it will take a lot more time, and end up being a lot more complicated than you expected. I liked this about the documentation. They make it clear from the beginning that planning will be a tremendous help to you in estimating time, and delivering successfully within budget. The documentation is also very thorough in it’s coverage of MySQL datatypes, and how they translate to Oracle datatypes, as described earlier in this article. And of course there is a strong emphasis on testing. The Migration Workbench provides customizable scripts which both document actions to be performed and provide a way for you to get your fingers into the works.

Keep in mind while using the Migration Workbench that it is NOT all or nothing. You can use the Migration Workbench, and then edit the scripts to leave certain tables alone, or you can migrate them all, then drop the few you want to do by hand using the methods we describe above. Ultimately a mix of the two will probably serve your needs best, as there is always some amount of manual intervention you want to perform for certain tables.

A migration between two databases is not a trivial undertaking. You have a lot of data, and an application which rely on it all being in the right format, with the right relationships. Moving to a new database, with a larger feature set, slightly different syntax, and different ways of doing things takes time and attention, but in the end you’ll be up and running on a sophisticated, scalable, world class database platform.

Oracle has a great set of resources on OTN devoted to migrating to Oracle. In particular there is the Migration Technology Center

Oracle’s Migration Workbench documentation and download page.

On the other side, here’s the MySQL 5.5 Reference Guide.

Made it this far? Grab our newsletter.