Tag Archives: dba

If you use MySQL in the Amazon cloud, you need to ask yourself this question

Join 25,000 others and follow Sean Hull on twitter @hullsean.

Are you serious about backups?

If you’re just using Amazon EBS snapshots, that may not be sufficient. There’s a good chance it won’t protect you against your next data loss.

That’s why I like to have a few different types of backups

Also: 5 more things deadly to scalability

Protect against operator error

mysqldump is a tool every DBA is familiar with. Same as a hotbackup or snapshot you say? Just more labor? Not true.

A dump allows you to restore one table, or one schema. That’s why they’re also known as logical backups. What’s more you can edit the file, remove indexes, change object names, or datatypes. All these can be essential in the screwy and unpredictable event of a real world outage.

Expect the unexpected!

Read: Why devops talent is in short supply

Test those backups regularly

If you haven’t actually tried to restore, you really don’t know if you have everything. Did you backup stored procedures & database code? How about grants? Database events? How about cronjobs? What about the my.cnf file? And your replication configuration?

Yes there are a lot of little pieces, and testing your backups by rebuilding everything is an attempt to poke holes in your plan, and hit issues before d-day!

Related: MySQL interview guide for managers and candidates alike

Replication isn’t a backup

Replication is getting better and better in MySQL. It used to fail regularly. MyiSAM was very unpredictable. But even in the comfortable realm of Innodb, there can still be data drift. If you’re on MySQL 5.0 or 5.1, you should consider performing regular checksums. These test the integrity of data and compare what’s actually in master & slave. Bulletproofing MySQL replication with checksums.

Read: Why high availability is so very hard to deliver

Have you considered security around your backup files?

While you’re thinking about backups, make sure the files themselves are secure. Remember they contain your crown jewels. Hopefully individual data that’s sensitive is encrypted, but still you should secure their final resting place as well.

If you’re using S3, consider encrypting the file before shipping it up to the bucket.

Read this: Why a four letter word divides dev and ops

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Why startups need techops

devops divide

I was at a talk recently on node.js. Even if I’m not working with a technology directly, it’s exciting to see what’s out there, and node.js is bringing some hyper fast performance to a certain category of web applications.

During the keynote, the speaker mentioned a service to deploy applications on. I can’t name names unfortunately but it was a cloud solution on top of which you could deploy your application. Go this route
and you can do without an operations team. Avoid overhead of hiring ops, he claimed. And hey, then you can hire more developers!

To be fair I’ve heard much of the same thing at DBA or linux conferences. I can’t count the number of stories that start with “what some idiot developer did that took down our production systems…”.

Yes, it seems dev & ops are still just a tad bit adverserial.

Join 13,000 others and follow Sean Hull on twitter @hullsean.

1. My little known origins as a developer

Many colleagues and clients I’ve met in the New York City startup industry know me primarily as an operations & scalability guy. I tune databases, infrastructure and components to make things lightening fast.

I spent my earliest years at university on the computer lab operations staff. We watched and managed, made sure level zero backups were taken care of, and moved the tapes. Directly after college, I started at a software firm. I did C++ GUI development on the Mac, using the toolbox libraries with Metrowerks Codewarrior. I built split windows, and scroll bars, and displayed rows of data with nice resizable columns. All this wasn’t built into the class library, so for a lot of it we needed to roll our own solution.

We always had a long list of features coming from the business units. I also fielded many support calls, often from the windows platform as the code there hadn’t been managed and built as carefully. But that too was instructive as you could feel the pain of customers day-to-day challenges. It also illustrated the tradeoffs between new code and features, and existing bug fixes and support.

Also: Why generalists are better at scaling the web

2. A trip through the dot-com bubble as Oracle DBA

Through a circuitous path, I moved to New York in the mid-nineties and joined a startup. I had the opportunity to wear a lot of hats there, and apply my computer lab and Linux operating systems experience to the challenge of managed Oracle. I got a lot more involved with operations quick.

As the dot-com bubble grew, I saw a hot and growing demand for Oracle DBAs as most startups used Oracle, but the talent was in short supply. In one startup 80 million dollars was on the line as performance hobbled the website, and investors feared the worst.

Read: Why the Twitter IPO made a shocking admission about scalability

3. Different priorities & mandates

I remember working at Starmedia a media darling at the time. I was analyzing the database & server systems, and finding some code & jobs running during peak daytime hours. Management claimed that could not be the case. Yet for the next days and weeks I saw the same jobs running. I held strong and spoke truth to power as they say. That’s not always easy when you have a lot of investors, screaming CTOs and 100+ hour weeks. But eventually the source of the job was located, and disabled. And the website returned to it’s speedy self.

These experiences though do underline in my mind the different priorities and focus that developers and operations staff have.

Techops, system administrators & DBAs are typically averse to change. They fight it tooth and nail. That isn’t because they like to be curmudgeons though. They are typically very concerned about the business, but from a dramatically different perspective of stability, and reliability, even at 2am in the morning. They are concerned about the longevity of data, consistency, and durability of it.

Developers on the other hand have a different mandate. They are responsible for new business features, solutions to business requirements. Rapid prototyping & reactive or agile is embraced because it means you can deliver quicker to the business.

Crucially, both of these folks care very much for the business. Just with very different priorities.

Check this: Why AirBNB didn’t have to fail

4. Can developers do operations for you?

In a lot of small startups, the initial phase is obviously on building a product. That’s the build phase, and not surprisingly you hire a lot of developers. As you should. But as you grow you may find the operational tasks that are defaulting to one or more developers are taking more and more of their time. As your customer base grows and you’ve seen your first few spikes, it’s time to start thinking about hiring for a real ops role.

In summary, yes they can, but perhaps not well.

Related: How to hire a developer that you can work with

5. Volume discount, made to order or instant coffee

You may choose to go with instant coffee, by bringing someone in-house. You may find the right talent is hard to find. I wrote about this: Why techops and DBAs are in short supply.

Alternatively you may prefer a volume discount from one of the larger remote DBA or managed support solutions such as Oracle’s, Pythian or Percona. These guys all provide great service, but keep in mind how big of a fish you are. You’ll likely work through a ticketing system, and in some cases different engineers will look at your systems at different times. You will likely need either a very hands-on technical CTO or other in-house person to take ownership, and manage things closely.

The third option is a made-to-order coffee. Yes you pay more for Toby’s, Blue Bottle, or Ninth Street Espresso but you get what you pay for as they say. A boutique shop or independent consultant will provide a lot more hand holding, help your internal staff get up to speed, and communicate intimately about the process. If you’re a more non-technical CTO, or you’re very busy running the business, this solution may make a lot of sense for you.

Also: Why cloud detractors need a history lesson

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

A CTO Must Never Do This…

A couple years back I was contacted to look at a very strange problem.

The firm ran flash sales. An email goes out at noon, the website traffic explodes for a couple of hours, then settles back down to a trickle.

Of course you might imagine where this is going. During that peak, the MySQL database was brought to its knees. I was asked to do analysis during this peak load, and identify and fix problems. Make it go faster, please!

First day on the job I’m working with a team of outsourced DBAs. I was also working with a sort of swat team chatting on SKYPE, while monitoring the systems closely.

Then up popped one comment from a gentlemen I hadn’t worked with. He insisted there was contention for a little known MySQL resource called the AUTO_INC lock. Since I wanted to know more, I asked who the guy was and to my surprise he turned out to be the CTO.

[quote]The CTO was tuning and troubleshooting the database![/quote]

Wow, that’s a first. I thought I’d seen it all. A CTO is normally overseeing technology & the team rather than crawling around in the trenches on the front line.

This all raised some important points

1. The app was having major growing pains
2. Current architecture was not scaling
3. Amazon elasticity was not helping at the database layer
4. People & process were also failing, hence the CTOs hands on approach

It was shocking to see a problem deteriorate to this point, but when you consider the context its understandable. A company like this is struggling with hypergrowth to such a degree, that each day seems like a hurricane storm. With emergency meetings, followed by hardware & application emergencies, trouble seems constant. It can be very difficult to step back and see the larger picture.

The takeaway from this experience…

o Amazon EC2 can’t do it all – consider physical servers for disk intensive apps
o MySQL still has some real scalability limitations
o use technology for its intended purpose – MySQL isn’t great for queueing
o A CTO tuning the database means problems have deteriorated too far

Read all the way to the end? Grab our newsletter – scalable startups.

Extract Transform & Load – What is it and why is it important?

So-called ETL relates to moving data from external sources into and out of relational databases or data warehouses.

Extract

Source systems may store data in an infinite variety of formats.  Extracting involves getting that data into common files for moving to the destination system.  CSV file also known as comma separated values is named because each of the records is stored as one line in the file, and fields are separated by commas, and often surrounded by quotes as well.  In MySQL INTO OUTFILE syntax can perform this function.  If you have a lot of tables to work with, you can script the process using the data dictionary as a lookup for table names, and create a .mysql script to then run with the mysql shell.  In Oracle you would use the spool command in SQL*Plus the command line shell.  Spool sends subsequent output from the screen also to a file.

Transform

This step involves modifying the extracted data in preparation for moving it into the target database server.  It may involve sweeping out blank records, or rearranging columns, or breaking files into smaller subsets of data.  You might also map values differently for instance if one column in the source database was gender with values M/F you might transform those to the strings “Male” and “Female” if that is more useful for your target database server.  Or you might transform those to numerical values, for instance Male & Female might be 0/1 in your target database.

Although I myriad of high level GUI tools exist to perform these functions, the Unix operating system includes a plethora of very powerful tools that every experience System Administrator is familiar with.  Those include grep & sed which operate on regular expressions and can perform data transformation at lightening speed.  Then there is sort which can sort data and send the results to stdout or the file of your choosing.  Other tools include wc – word count, cut which can remove columns and so forth.
Load

This final step involves moving the data into the database server, and it’s final target tables.  For instance in MySQL this might be done with the LOAD DATA INFILE syntax, while in Oracle you might use SQL*Loader, which is a very fast flat file dataloader.

Quora discussion by Sean Hull – What is ETL?

Oracle DBA Interview Questions

Oracle Database Administrator or often called DBAs are an indispensable part of your operations team. They manage the systems that house all your business data, your customers, products, transactions and all that analytical data on what customers are actually doing. If you’ve ever been on the hunt, you may wonder, why the shortage of DBAs? To that we’ll answer, have you ever heard of Dustin Moskovitz?

So you certainly want to entrust that to someone who knows what they’re talking about. Enter the Oracle DBA Interview, a process that some will see as a technical test, while others will see as a fit of personalities, behaviors, and work ethic.

From the technical side we thought we’d bring you a quick and dirty checklist of questions. This isn’t an exhaustive list by any means, but is a good place to start and will certainly provide you with a glimpse of their knowledge.

Also if you’re looking to hire a MySQL DBA here’s a guide, and also one for hiring and EC2 expert.

1. What is the difference between RMAN and a traditional hotbackup?

RMAN is faster, can do incremental (changes only) backups, and does not place tablespaces into hotbackup mode.

2. What are bind variables and why are they important?

With bind variables in SQL, Oracle can cache related queries a single time in the SQL cache (area). This avoids a hard parse each time, which saves on various locking and latching resources we use to check objects existence and so on. BONUS: For rarely run queries, especially BATCH queries, we explicitely DO NOT want to use bind variables, as they hide information from the Cost Based Opitmizer.

BONUS BONUS: For batch queries from 3rd party apps like peoplesoft, if we can’t remove bind variables, we can use bind variable peeking!

3. In PL/SQL, what is bulk binding, and when/how would it help performance?

Oracle’s SQL and PL/SQL engines are separate parts of the kernel which require context switching, like between unix processes. This is slow, and uses up resources. If we loop on an SQL statement, we are implicitely flipping between these two engines. We can minimize this by loading our data into an array, and using PL/SQL bulk binding operation to do it all in one go!

4. Why is SQL*Loader direct path so fast?

SQL*Loader with direct path option can load data ABOVE the high water mark of a table, and DIRECTLY into the datafiles, without going through the SQL engine at all. This avoids all the locking, latching, and so on, and doesn’t impact the db (except possibly the I/O subsystem) at all.

5. What are the tradeoffs between many vs few indexes? When would you want to have many, and when would it be better to have fewer?

Fewer indexes on a table mean faster inserts/updates. More indexes mean faster, more specific WHERE clauses possibly without index merges.

6. What is the difference between RAID 5 and RAID 10? Which is better for Oracle?

RAID 5 is striping with an extra disk for parity. If we lose a disk we can reconstruct from that parity disk.

RAID 10 is mirroring pairs of disks, and then striping across those sets.

RAID 5 was created when disks were expensive. Its purpose was to provide RAID on the cheap. If a disk fails, the IO subsystem will perform VERY slowly during the rebuild process. What’s more your liklihood of failure increases dramatically during this period, with all the added weight of the rebuild. Even when it is operating normally RAID 5 is slow for everything but reading. Given that and knowing databases (especially Oracle’s redo logs) continue to experience write activity all the time, we should avoid RAID5 in all but the rare database that is MOSTLY read activity. Don’t put redologs on RAID5.

RAID10 is just all around goodness. If you lose one disk in a set of 10 for example, you could lose any one of eight other disks and have no troubles. What’s more rebuilding does not impact performance at all since you’re simply making a mirror copy. Lastly RAID10 perform exceedingly well in all types of databases.

7. When using Oracle export/import what character set concerns might come up? How do you handle them?

Be sure to set NLS_LANG for example to “AMERCIAN_AMERICA.WE8ISO8859P1”. If your source database is US7ASCII, beware of 8-bit characters. Also be wary of multi-byte characters sets as those may require extra attention. Also watch export/import for messages about any “character set conversions” which may occur.

8. How do you use automatic PGA memory management with Oracle 9i and above?

Set the WORKAREA_SIZE_POLICY parameter to AUTO and set PGA_AGGREGATE_TARGET

9. Explain two easy SQL optimizations.

a. EXISTS can be better than IN under various conditions

b. UNION ALL is faster than UNION (not sorting)

10. Name three SQL operations that perform a SORT.

a. CREATE INDEX

b. DISTINCT

c. GROUP BY

d. ORDER BY

f. INTERSECT

g. MINUS

h. UNION

i. UNINDEXED TABLE JOIN

11. What is your favorite tool for day-to-day Oracle operation?

Hopefully we hear some use of command line as the answer!

12. What is the difference between Truncate and Delete? Why is one faster?

Can we ROLLBACK both? How would a full table scan behave after?

Truncate is nearly instantaenous, cannot be rolled back, and is fast because Oracle simply resets the HWM. When a full table scan is performed on a table, such as for a sort operation, Oracle reads to the HWM. So if you delete every single solitary row in 10 million row table so it is now empty, sorting on that table of 0 rows would still be extremely slow.

13. What is the difference between a materialized view (snapshot) fast refresh versus complete refresh? When is one better, and when the other?

Fast refresh maintains a change log table, which records change vectors, not unlike how the redo logs work. There is overhead to this, as with a table that has a LOT of indexes on it, and inserts and updates will be slower. However if you are performing refreshes often, like every few minutes, you want to do fast refresh so you don’t have to full-table-scan the source table. Complete refresh is good if you’re going to refresh once a day. Does a full table scan on the source table, and recreats the snapshot/mview. Also inserts/updates on the source table are NOT impacted on tables where complete refresh snapshots have been created.

14. What does the NO LOGGING option do? Why would we use it? Why would we be careful of using it?

It disables the logging of changes to the redologs. It does not disable ALL LOGGING, however as Oracle continues to use a base of changes, for recovery if you pull the plug on the box, for instance. However it will cause problems if you are using standby database. Use it to speed up operations, like an index rebuild, or partition maintenance operations.

15. Tell me about standby database? What are some of the configurations of it? What should we watch out for?

Standby databases allow us to create a copy of our production db, for disaster recovery. We merely switch mode on the target db, and bring it up as read/write. Can setup as master->slave or master->master. The latter allows the former prod db to become the standby, once the failure cause is remedied. Watch out for NO LOGGING!! Be sure we’re in archivelog mode.

Hey you! If you made it this far, definitely grab our newsletter.

MySQL DBA Interview Questions

One of the more popular articles on our site according to Google is the Oracle DBA Interview Questions article we did a few years ago. So with that in mind, we’ve put together a similar article for MySQL DBA Interviews.

1. Explain two ways that MySQL Replication can get out of sync. What are the solutions to these problems?

One way is if your code contains non-deterministic functions such as SYSDATE, USER and UUID. The second way is if you have mixed transactions between InnoDB and MyISAM tables, in some cases those can get replicated incorrectly.

2. How does one create a new user and give it privileges on an existing database? Why should “with grant option” be avoided?

mysql> grant all privileges on test.* to ‘newuser’@’localhost’ identified by ‘mypassword’;

3. Explain the differences, advantages and disadvantages to using the MERGE storage engine, versus using Partitioned tables to manage large datasets.

The MERGE storage behaves much like a view with UNION ALL between the tables. It is easy to add and remove tables and redefine the MERGE table. They are good for logging and huge datasets.

One of the primary differences between partitioning is that a row can exist in one and only one partition, which is not the case with a MERGE table. Also you can cluster data together in certain partitions reducing the amount of work the server may have to do to get at related data. Also partitioned data can be distributed across multiple harddrives.

4. How do you determine what storage engines are installed?

mysql> show global variables like 'have%';                                       
+-----------------------+----------+
| Variable_name         | Value    |
+-----------------------+----------+
| have_archive          | NO       |
| have_bdb              | YES      |
| have_blackhole_engine | NO       |
| have_compress         | YES      |
| have_crypt            | YES      |
| have_csv              | NO       |
| have_dynamic_loading  | YES      |
| have_example_engine   | NO       |
| have_federated_engine | NO       |
| have_geometry         | YES      |
| have_innodb           | YES      |
| have_isam             | NO       |
| have_merge_engine     | YES      |
| have_ndbcluster       | NO       |
| have_openssl          | DISABLED |
| have_query_cache      | YES      |
| have_raid             | NO       |
| have_rtree_keys       | YES      |
| have_symlink          | YES      |
+-----------------------+----------+
19 rows in set (0.00 sec)
mysql>

5. How do you get the query cache status? How do you tune it?

mysql> show global status like 'qcache%';                                        
+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| Qcache_free_blocks      | 0     |
| Qcache_free_memory      | 0     |
| Qcache_hits             | 0     |
| Qcache_inserts          | 0     |
| Qcache_lowmem_prunes    | 0     |
| Qcache_not_cached       | 0     |
| Qcache_queries_in_cache | 0     |
| Qcache_total_blocks     | 0     |
+-------------------------+-------+
8 rows in set (0.16 sec)
mysql>

Tune it by looking at qcache_hits/(qcache_hits+com_selects)

6. What is DRBD?  Explain the advantages and disadvantages to MySQL Replication for High Availability.

7. What is circular replication?  How is it different from master-slave replication?

APress – Expert Oracle DB Arch by Tom Kyte

I have a confession to make. I haven’t read an Oracle book cover-to-cover in almost three years. Sure I skim through the latest titles for what I need and of course check out documentation of the latest releases. That’s what good docs provide, quick reference when you need to check syntax, or details of a particular parameter, or feature, but have you ever read some documentation, sift through a paragraph, page or two, and say to yourself, that’s great, but what about this situation I have right now? Unfortunately documentation doesn’t always

speak to your real everyday needs. It is excellent for reference, but doesn’t

have a lot of real-world test cases, and practical usage examples. That’s where Tom Kyte’s new book comes in, and boy is it a killer.

I’ve read Tom’s books before, and always enjoyed them. But his new APress title really stands out as an achievement. Page after page and chapter after chapter he uses straightforward examples pasted right from the SQL*Plus prompt to illustrate, demonstrate, and illuminate concepts that he is explaining. It is this practical hands on, relentless approach that makes this book 700 pages of goodness.

Already an expert at Oracle? You’ll become more of one after reading this book. With reviewers like Jonathan Lewis I expected this book to be good from the outset I have to admit. But each chapter delves into a bit more depth around subjects that are central to Oracle programming and administration.

No SCREEN SHOTS!

One of the things I loved about this book most of all is its complete lack of screenshots! But how does one illustrate a concept then, you might ask? These days with graphical interfaces becoming more and more popular even among technical folks, I run into the question of the command line over an over again. How can you be doing sophisticated database administration of the latest servers running Oracle with the command line? Or another question I often get is, can you really do everything with the command line? The answer to both is a resounding yes, in fact you can do much more with the command line. Luckily for us, Tom is of this school too, and page after page of his book are full of real examples and commands that you can try for yourself, with specific instructions on

setting up the environment, using statistics gathering packages, and so on. In an era of computing where GUIs seem to reign like magazines over the best literature of the day, it is refreshing to see some of the best and most technical minds around Oracle still advocate the best tool, command line as the interface

of choice. In fact it is the command line examples, and happily the complete lack of screenshots that indeed makes this book a jewel of a find.

Audience

As a DBA you might wonder why I’m talking so highly of a book more focused towards developers. There are a couple of reasons. First this book is about the Oracle architecture, as it pertains to developers. In order for developers to best take advantage of the enterprise investment in Oracle *** they need to thoroughly understand the architecture, how specific features operate, which features are appropriate, and how to optimize their code for best interaction with them. Of course a DBA who is trying to keep a database operating in tip top shape needs to be aware of when developers are not best using Oracle, to identify,

and bring attention to bottlenecks, and problem areas in the application. Second, it is often a DBAs job to tune an existing database, and the very largest benefits come from tuning application SQL. For instance if a developer has chosen to use a bitmap index on an INSERT/UPDATE intensive table, they’re in for serious problems. Or if a developer forgot to index a foreign key column. This book directly spearheads those types of questions, and when necessary does mention a thing or two of direct importance to DBAs as well.

Highlights

Chapter 2 has an excellent example of creating an Oracle database. You simply write one line to your init.ora “db_name=sean” for example, and then from the SQL> prompt issues “startup nomount” and then “create database”. Looking at the processes Oracle starts, and the files that are created can do wonders for your understanding of database, instance, and Oracle in general.

Chapter 3 covers files, files, and more files. Spfile replaces a text init.ora allowing parameters to be modified while an instance is running *AND* stored persistently. He covers redolog files, flashback logs, and change tracking file

s, as well as import/export dump files, and lastly datapump files.

Chapter 4 covers memory, and specifically some of the new auto-magic options, how they work, and what to watch out for.

Chapter 5 covers processes.

Chapter 6, 7, and 8 cover lock/latching, multiversioning, and transactions respectively. I mention them all here together because to me these chapters are the real meat of the book. And that’s coming from a vegetarian! Seriously these

topics are what I consider to be the most crucial to understanding Oracle, and modern databases in general, and the least understood. They are the darkest corners, but Tom illuminates them for us. You’ll learn about optimistic versus pessismistic locking, page level, row level, and block level locking in various modern databases such as SQLServer, Informix, Sybase, DB2 and Oracle. Note Oracle is by far in the lead in this department, never locking more than it needs to, which yields the best concurrency with few situations where users block each other. Readers never block, for instance, because of the way Oracle implements all of this. He mentions latch spinning, which Oracle does to avoid a context switch, that is more expensive, how to detect, and reduce this type of contention. You’ll learn about dirty reads, phantom reads, and non-repeatable reads, and about Oracle’s Read-committed versus Serializable modes. What’s more you’ll learn about the implications of these various models on your applications, and what type of assumptions you may have to unlearn if you’re coming from developing on another database to Oracle. If I were to make any criticism at all, I might mention that in this area Tom becomes ever so slightly preachy about Oracle’s superb implementation of minimal locking, and non-blocking reads. This is in large part due I’m sure to running into so many folks who are used to developing on databases which do indeed dumb you down *BECAUSE* of their implementation, encouraging bad habits with respect to transactions, and auto-commit for instance. One thing is for sure you will learn a heck of a lot from these three chapters, I know I did.

Chapter 9 Redo & Undo describes what each is, how to avoid checkpoint not complete and why you want to, how to *MEASURE* undo so as to reduce it, how to avoid log file waits (are you on RAID5, are your redologs on a buffered filesystem?), and what block cleanouts are.

Chapter 10 covers tables. After reading it I’d say the most important types are normal (HEAP), Index Organized, Temporary, and External Tables. Use ASSM where possible as it will save you in many ways, use DBMS_METADATA to reverse engineer objects you’ve created to get all the options, don’t use TEMP tables to avoid inline views, or complex joins, your performance will probably suffer, and how to handle LONG/LOB data in tables.

Chapter 11 covers indexes, topics ranging from height, compression count, DESC sorted, colocated data, bitmap indexes and why you don’t want them in OLTP data

bases, function based indexes and how they’re most useful for user defined functions, why indexing foreign keys is important, and choosing the leading edge of an index. Plus when to rebuild or coalesce and why.

Chapter 12 covers datatypes, why never to use CHAR, using the NLS features, the CAST function, the number datatypes and precision versus performance, raw_to_hex, date arithmatic, handling LOB data and why not to use LONG, BFILEs and the new UROWID.

Chapter 13 discusses partitioning. What I like is he starts the chapter with the caveat that partitioning is not the FAST=TRUE option. That says it all. For OLTP databases you will achieve higher availability, and ease of administration of large options, as well as possibly reduced contention on larger objects,

but it is NOT LIKELY that you will receive query performance improvements because of the nature of OLTP. With a datawarehouse, you can use partition elimination on queries that do range or full table scans which can speed up queries dramatically. He discusses range, list, hash, and composite partitioning, local indexing (prefixed & non-prefixed) and global indexing. Why datawarehouses tend to use local, and OLTP databases tend to use global indexes, and even how you

can rebuild your global indexes as you’re doing partition maintenance avoiding a costly rebuild of THE ENTIRE INDEX, and associated downtime. He also includes a great auditing example.

Chapter 14 covers parallel execution such as parallel dml, ddl, and so on. Here is where a book like Tom’s is invaluable, as he comes straight out with his opinions on a weighty topic. He says these features are most relevant to DBAs doing one-off maintenance and data loading operations. That is because even in

datawarehouses, todays environments often have many many users. The parallel features are designed to allow single session jobs to utilize the entire system resources. He explains that Oracle’s real sweet spot in this real is parallel

DDL, such as CREATE INDEX, CREATE TABLE AS SELECT, ALTER INDEX REBUILD, ALTER TABLE MOVE, and so on.

Chapter 15, the final chapter covers loading and unloading data. A significant portion of the chapter covers SQL*Loader for completeness, but he goes on to celebrate the wonders of external tables for loading data into Oracle. In particular there is an option in SQL*Loader to generate the CREATE statement for an

external table that does the SAME load! This is great stuff. External tables provide advantages over SQL*Loader in almost every way, except perhaps loading over a network, concurrent user access, and handling LOB data. External tables can use complex where clauses, merge data, do fast code lookups, insert into multiple tables, and finally provide a simpler learning curve.

Conclusions

Yum. If you love Oracle, you’ll want to read this book. If you need to know more about Oracle say, for your job, that’s another reason you might read this book. Oracle is fascinating technology, and Tom’s passion for understanding every last bit of it makes this book both a necessary read, and a very gratifying

one.

View this review on Amazon.com