The beauty of reading a book by a publisher not sanctioned by Oracle and by an author who doesn’t work for Oracle is that they can openly mention bugs. And there are oh-so-many! This book is a superb introduction to the Cost Based Optimizer, and is not afraid to discuss it’s many shortcomings. In so doing it also explains how to patch up those shortcomings by giving the CBO more information, either by creating a histogram here and there, or by using the DBMS_STATS package to insert your own statistics in those specific cases where you need to.
Another interesting thing is how this book illustrates, though accidentally, the challenges of proprietary software systems. Much of this book and the authors time is spent reverse engineering the CBO, Oracle’s bread and butter optimizing engine. Source code, and details about its inner workings are not published or available. And of course that’s intentional. But what’s clear page after page in this book is that for the DBA and system tuner, going about their day to day tasks, they really need inside information about what the optimizer is doing, and so this book goes on a long journal to illuminate much of what the CBO is doing, or in some cases provide very educated guesses and some speculation. In contrast, as we know and hear about often, the Open Source alternative provides free access to source code, though not necessarily to the goods themselves. What this means in a very real way is that a book like this would not need to be written for an alternative open source application, because the internal code would be a proverbial open book. That said it remains difficult to imagine how a company like Oracle might persue a more open strategy given that their bread and butter really is the secrets hidden inside their Cost Based Optimizing engine. At any rate, let’s get back to Jonathan’s book.
Reading this book was like reading a scientists notebook. I found it:
o of inestimable value, but sometimes difficult to sift through
o very anecdotal in nature, debugging, and constantly demonstrating that the CBO is much more faulty and prone to errors than you might imagine
o may not be easy to say I have a query of type X, and it is behaving funny, how do I lookup information on this?
o his discussion of the evolution of the product is so good I’ll quote it:
“A common evolutionary path in the optimizer code seems to be the following: hidden by undocumented parameter and disabled in first release; silently enabled but not costed in second release; enabled and costed in third release.”
o has excellent chapter summaries which were particularly good for sifting, and boiling down the previous pages into a few conclusions.
o it will probably be of particular value to Oracle’s own CBO development teams
CH2 – Tablescans
explains how to gather system stats, how to use dbms_stats to set ind. stats manually, bind variables can make the CBO blind, bind variable peeking may not help, partition exchange may break global stats for table, use CPU costing when possible
CH3 – Selectivity
big problem with IN lists in 8i, fixed in 9i/10g, but still prob. with NOT IN, uses very good example of astrological signs overlapping birth months, and associated CBO cardinality problems, reminds us that the optimizer isn’t actually intelligent per se, but merely a piece of software
CH4 BTree Access
cost based on depth, #leaf blocks, and clustering factor, try to use CPU costing (system statistics)
CH5 – Clustering Factor
mainly a measure of the degree of random distribution of your data, very important for costing indx scans, use dbms_stats to correct when necessary, just giving CBO better information, freelists (procID problem) + freelist groups discussion with RAC
CH6 – Selectivity Issues
there is a big problem with string selectivity, Oracle uses only first seven characters, will be even more trouble for urls all starting with “http://”, and multibyte charactersets, trouble when you have db ind. apps which use string for date, use histrograms when you have problems, can use the tuning advisor for “offline optimization”, Oracle uses transitive closure to transform queries to more easily opt versions, moves predicates around, sometimes runs astray
CH7 – Histograms
height balanced > 255 buckets (outside Oracle called equi-depth),
otherwise frequency histograms, don’t use cursor sharing as it forces bind variables, blinds CBO, bind var peeking is only first call, Oracle doesn’t use histograms much, expensive to create, use sparingly, dist queries don’t pull hist from remote site, don’t work well with joins, no impact if you’re using bind vars, if using dbms_stats to hack certain stats be careful of rare codepaths
CH8 – Bitmap Indexes
don’t stop at just one, avoid updates like the plague as can cause deadlocking, opt assumes 80% data tightly packed, 20% widely scattered
CH9 – Query Transformation
partly rule based, peeling the onion w views to understand complex queries, natural language queries often not the most efficient, therefore this transformation process has huge potential upside for Oracle in overall optimization of app code behind the scenes by db engine, always remember Oracle may rewrite your query, sometimes want to block with hints, tell CBO about uniqueness, not NULL if you know this
CH10 – Join Cardinality
makes sensible guess at best first table, continues from there,
don’t hide useful information from the CBO, histograms may help with some difficult queries
CH11 – Nested Loops
fairly straightforward costing based on cardinality of each returned set multiplied together
CH12 – Hash Joins
Oracle executes as optimal (all in memory), onepass (doesn’t quite fit so dumped to disk for one pass) and multipass (least attractive sort to disk), avoid scripts writing scripts in prod, best option is to use workarea_size_policy=AUTO, set pga_aggregate_target & use CPU costing
CH 13 – Sorting + Merge Joins
also uses optimal, onepass, & multipass algorithms, need more than 4x dataset size for in memory sort, 8x on 64bit system, increasing sort_area_size will incr. CPU util so on CPU bottlenecked machines sorting to disk (onepass) may improve performance, must always use ORDER BY to guarentee sorted output, Oracle may not need to sort behind the scenes, Oracle very good at avoiding sorts, again try to use workarea_size_policy=AUTO
CH 14 – 10053 Trace
reviews various ways to enable, detailed rundown of trace with comments inline, and highlights; even mentions a VOL 2 + 3 of the book is coming!
be careful when switching from analyze to dbms_stats, in 10g some new hist will appear w/default dbms_stats options, 10g creates job to gather stats
I found this book to be full of gems of information that you won’t find anywhere else. If you’re at the more technical end of the spectrum, this is a one of a kind Oracle book and a
must-have for your collection. Keep in mind something Jonathan mentions in appendix A: “New features that improve 99% of all known queries may cripple your database because you fall into the remaining 1% of special cases”. If these cases are your concern, then this book will surely prove to be one-of-a-kind for you!