Category Archives: Technical Article

5 Steps to Cloud Computing

Believe it or not you can actually start playing around with virtual servers that are as real and powerful as the physical servers you’re already used to deploying.  And you can do it for literally pennies per month.

  1. Signup for an Amazon account or use the one you buy books with.
  2. Browse over to http://aws.amazon.com & click Sign Up Now
  3. Navigate to AWS Management Console, follow the Amazon EC2 link, and click Launch Instance
  4. Download Elastic Fox or the API tools & configure your credentials for easy browser or command line control of your virtual infrastructure and deployments.
  5. Terminate instances & delete volumes & snapshots so you’ll have no recurring charges.

At a mere 8 and 1/2 cents per hour, you can play around with the technology with no real ongoing costs.  And you can do it with your existing Amazon account and credit card info.

Good stuff!

5 Tips for Scalability

Your website is slow but you’re not sure why.  You do know that it’s impacting your business.  Are you losing customers to the competition? Here are five quick tips to achieve scalability

1. Gather Intelligence

With any detective work you need information.  That’s where intelligence comes in.  If you don’t have the right data already, install monitoring and trending systems such as Cacti and Collectd.  That way you can look at where your systems have been and where they’re going.

2. Identify Bottlenecks

Put all that information to use in your investigation.  Use stress testing tools to hit areas of the application, and identify which ones are most troublesome.  Some pages get hit A LOT, such as the login page, so slowness there is more serious than one small report that gets hit by only  a few users.  Work on the biggest culprits first to get the best bang for your buck.

3. Smooth Out the Wrinkles

Reconfigure your webservers to make more connections to your database, or spin-up more servers.  On the database tier make sure you have fast RAIDed disk, and lots of memory.  Tune queries coming from your application, and look at possible upgrades to servers.

4. Be Agile But Plan for the Future

Can your webserver tier scale horizontally?  Pretty easy to add more servers under a load balancer.  How about your database.  Chances are with a little work and some HA magic your database can scale out with more servers too, moving the bulk of select operations to read-only copies of your primary server, while letting it focus on transactions, and data updates.  Be ready and tested so you know exactly how to add servers without impacting the customers or application.  Don’t know how?  Look at the big guys like Facebook, an investigate how they’re doing it.

5. A Going Concern

Most importantly, just like your business, your technology infrastructure is an ongoing work in progress.  Stay proactive with monitoring, analysis, trending, and vigilance.  Watch application changes, and filter for slow queries.  Have new hardware or additional hardware dynamically at-the-ready for when you need it.

Top Five Useful Twitter Techniques

You’ve heard all about twitter, and you may have visited the website, or used it through other tools that bring your messages from linkedin or facebook into twitter.  It turns out there is a lot more to the twitter world that first meets the eye.

1. Make regular and relevant use of hashtags

2. Focus on your subject matter expertise

3. Monitor trends on google and twitter.com

4. Link to colleagues you respect and promote cross-pollination of networks.

5. Remember you are broadcasting, so word your tweets appropriate to your audience.

Oracle DBA Interview Questions

Oracle Database Administrator or often called DBAs are an indispensable part of your operations team. They manage the systems that house all your business data, your customers, products, transactions and all that analytical data on what customers are actually doing. If you’ve ever been on the hunt, you may wonder, why the shortage of DBAs? To that we’ll answer, have you ever heard of Dustin Moskovitz?

So you certainly want to entrust that to someone who knows what they’re talking about. Enter the Oracle DBA Interview, a process that some will see as a technical test, while others will see as a fit of personalities, behaviors, and work ethic.

From the technical side we thought we’d bring you a quick and dirty checklist of questions. This isn’t an exhaustive list by any means, but is a good place to start and will certainly provide you with a glimpse of their knowledge.

Also if you’re looking to hire a MySQL DBA here’s a guide, and also one for hiring and EC2 expert.

1. What is the difference between RMAN and a traditional hotbackup?

RMAN is faster, can do incremental (changes only) backups, and does not place tablespaces into hotbackup mode.

2. What are bind variables and why are they important?

With bind variables in SQL, Oracle can cache related queries a single time in the SQL cache (area). This avoids a hard parse each time, which saves on various locking and latching resources we use to check objects existence and so on. BONUS: For rarely run queries, especially BATCH queries, we explicitely DO NOT want to use bind variables, as they hide information from the Cost Based Opitmizer.

BONUS BONUS: For batch queries from 3rd party apps like peoplesoft, if we can’t remove bind variables, we can use bind variable peeking!

3. In PL/SQL, what is bulk binding, and when/how would it help performance?

Oracle’s SQL and PL/SQL engines are separate parts of the kernel which require context switching, like between unix processes. This is slow, and uses up resources. If we loop on an SQL statement, we are implicitely flipping between these two engines. We can minimize this by loading our data into an array, and using PL/SQL bulk binding operation to do it all in one go!

4. Why is SQL*Loader direct path so fast?

SQL*Loader with direct path option can load data ABOVE the high water mark of a table, and DIRECTLY into the datafiles, without going through the SQL engine at all. This avoids all the locking, latching, and so on, and doesn’t impact the db (except possibly the I/O subsystem) at all.

5. What are the tradeoffs between many vs few indexes? When would you want to have many, and when would it be better to have fewer?

Fewer indexes on a table mean faster inserts/updates. More indexes mean faster, more specific WHERE clauses possibly without index merges.

6. What is the difference between RAID 5 and RAID 10? Which is better for Oracle?

RAID 5 is striping with an extra disk for parity. If we lose a disk we can reconstruct from that parity disk.

RAID 10 is mirroring pairs of disks, and then striping across those sets.

RAID 5 was created when disks were expensive. Its purpose was to provide RAID on the cheap. If a disk fails, the IO subsystem will perform VERY slowly during the rebuild process. What’s more your liklihood of failure increases dramatically during this period, with all the added weight of the rebuild. Even when it is operating normally RAID 5 is slow for everything but reading. Given that and knowing databases (especially Oracle’s redo logs) continue to experience write activity all the time, we should avoid RAID5 in all but the rare database that is MOSTLY read activity. Don’t put redologs on RAID5.

RAID10 is just all around goodness. If you lose one disk in a set of 10 for example, you could lose any one of eight other disks and have no troubles. What’s more rebuilding does not impact performance at all since you’re simply making a mirror copy. Lastly RAID10 perform exceedingly well in all types of databases.

7. When using Oracle export/import what character set concerns might come up? How do you handle them?

Be sure to set NLS_LANG for example to “AMERCIAN_AMERICA.WE8ISO8859P1″. If your source database is US7ASCII, beware of 8-bit characters. Also be wary of multi-byte characters sets as those may require extra attention. Also watch export/import for messages about any “character set conversions” which may occur.

8. How do you use automatic PGA memory management with Oracle 9i and above?

Set the WORKAREA_SIZE_POLICY parameter to AUTO and set PGA_AGGREGATE_TARGET

9. Explain two easy SQL optimizations.

a. EXISTS can be better than IN under various conditions

b. UNION ALL is faster than UNION (not sorting)

10. Name three SQL operations that perform a SORT.

a. CREATE INDEX

b. DISTINCT

c. GROUP BY

d. ORDER BY

f. INTERSECT

g. MINUS

h. UNION

i. UNINDEXED TABLE JOIN

11. What is your favorite tool for day-to-day Oracle operation?

Hopefully we hear some use of command line as the answer!

12. What is the difference between Truncate and Delete? Why is one faster?

Can we ROLLBACK both? How would a full table scan behave after?

Truncate is nearly instantaenous, cannot be rolled back, and is fast because Oracle simply resets the HWM. When a full table scan is performed on a table, such as for a sort operation, Oracle reads to the HWM. So if you delete every single solitary row in 10 million row table so it is now empty, sorting on that table of 0 rows would still be extremely slow.

13. What is the difference between a materialized view (snapshot) fast refresh versus complete refresh? When is one better, and when the other?

Fast refresh maintains a change log table, which records change vectors, not unlike how the redo logs work. There is overhead to this, as with a table that has a LOT of indexes on it, and inserts and updates will be slower. However if you are performing refreshes often, like every few minutes, you want to do fast refresh so you don’t have to full-table-scan the source table. Complete refresh is good if you’re going to refresh once a day. Does a full table scan on the source table, and recreats the snapshot/mview. Also inserts/updates on the source table are NOT impacted on tables where complete refresh snapshots have been created.

14. What does the NO LOGGING option do? Why would we use it? Why would we be careful of using it?

It disables the logging of changes to the redologs. It does not disable ALL LOGGING, however as Oracle continues to use a base of changes, for recovery if you pull the plug on the box, for instance. However it will cause problems if you are using standby database. Use it to speed up operations, like an index rebuild, or partition maintenance operations.

15. Tell me about standby database? What are some of the configurations of it? What should we watch out for?

Standby databases allow us to create a copy of our production db, for disaster recovery. We merely switch mode on the target db, and bring it up as read/write. Can setup as master->slave or master->master. The latter allows the former prod db to become the standby, once the failure cause is remedied. Watch out for NO LOGGING!! Be sure we’re in archivelog mode.

Hey you! If you made it this far, definitely grab our newsletter.

MySQL DBA Interview Questions

One of the more popular articles on our site according to Google is the Oracle DBA Interview Questions article we did a few years ago. So with that in mind, we’ve put together a similar article for MySQL DBA Interviews.

1. Explain two ways that MySQL Replication can get out of sync. What are the solutions to these problems?

One way is if your code contains non-deterministic functions such as SYSDATE, USER and UUID. The second way is if you have mixed transactions between InnoDB and MyISAM tables, in some cases those can get replicated incorrectly.

2. How does one create a new user and give it privileges on an existing database? Why should “with grant option” be avoided?

mysql> grant all privileges on test.* to ‘newuser’@'localhost’ identified by ‘mypassword’;

3. Explain the differences, advantages and disadvantages to using the MERGE storage engine, versus using Partitioned tables to manage large datasets.

The MERGE storage behaves much like a view with UNION ALL between the tables. It is easy to add and remove tables and redefine the MERGE table. They are good for logging and huge datasets.

One of the primary differences between partitioning is that a row can exist in one and only one partition, which is not the case with a MERGE table. Also you can cluster data together in certain partitions reducing the amount of work the server may have to do to get at related data. Also partitioned data can be distributed across multiple harddrives.

4. How do you determine what storage engines are installed?

mysql> show global variables like 'have%';                                       
+-----------------------+----------+
| Variable_name         | Value    |
+-----------------------+----------+
| have_archive          | NO       |
| have_bdb              | YES      |
| have_blackhole_engine | NO       |
| have_compress         | YES      |
| have_crypt            | YES      |
| have_csv              | NO       |
| have_dynamic_loading  | YES      |
| have_example_engine   | NO       |
| have_federated_engine | NO       |
| have_geometry         | YES      |
| have_innodb           | YES      |
| have_isam             | NO       |
| have_merge_engine     | YES      |
| have_ndbcluster       | NO       |
| have_openssl          | DISABLED |
| have_query_cache      | YES      |
| have_raid             | NO       |
| have_rtree_keys       | YES      |
| have_symlink          | YES      |
+-----------------------+----------+
19 rows in set (0.00 sec)
mysql>

5. How do you get the query cache status? How do you tune it?

mysql> show global status like 'qcache%';                                        
+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| Qcache_free_blocks      | 0     |
| Qcache_free_memory      | 0     |
| Qcache_hits             | 0     |
| Qcache_inserts          | 0     |
| Qcache_lowmem_prunes    | 0     |
| Qcache_not_cached       | 0     |
| Qcache_queries_in_cache | 0     |
| Qcache_total_blocks     | 0     |
+-------------------------+-------+
8 rows in set (0.16 sec)
mysql>

Tune it by looking at qcache_hits/(qcache_hits+com_selects)

6. What is DRBD?  Explain the advantages and disadvantages to MySQL Replication for High Availability.

7. What is circular replication?  How is it different from master-slave replication?

APress – Cost-Based Oracle by Jonathan Lewis

The beauty of reading a book by a publisher not sanctioned by Oracle and by an author who doesn’t work for Oracle is that they can openly mention bugs. And there are oh-so-many! This book is a superb introduction to the Cost Based Optimizer, and is not afraid to discuss it’s many shortcomings. In so doing it also explains how to patch up those shortcomings by giving the CBO more information, either by creating a histogram here and there, or by using the DBMS_STATS package to insert your own statistics in those specific cases where you need to.

Another interesting thing is how this book illustrates, though accidentally, the challenges of proprietary software systems. Much of this book and the authors time is spent reverse engineering the CBO, Oracle’s bread and butter optimizing engine. Source code, and details about its inner workings are not published or available. And of course that’s intentional. But what’s clear page after page in this book is that for the DBA and system tuner, going about their day to day tasks, they really need inside information about what the optimizer is doing, and so this book goes on a long journal to illuminate much of what the CBO is doing, or in some cases provide very educated guesses and some speculation. In contrast, as we know and hear about often, the Open Source alternative provides free access to source code, though not necessarily to the goods themselves. What this means in a very real way is that a book like this would not need to be written for an alternative open source application, because the internal code would be a proverbial open book. That said it remains difficult to imagine how a company like Oracle might persue a more open strategy given that their bread and butter really is the secrets hidden inside their Cost Based Optimizing engine. At any rate, let’s get back to Jonathan’s book.

Reading this book was like reading a scientists notebook. I found it:

o of inestimable value, but sometimes difficult to sift through

o very anecdotal in nature, debugging, and constantly demonstrating that the CBO is much more faulty and prone to errors than you might imagine

o may not be easy to say I have a query of type X, and it is behaving funny, how do I lookup information on this?

o his discussion of the evolution of the product is so good I’ll quote it:

“A common evolutionary path in the optimizer code seems to be the following: hidden by undocumented parameter and disabled in first release; silently enabled but not costed in second release; enabled and costed in third release.”

o has excellent chapter summaries which were particularly good for sifting, and boiling down the previous pages into a few conclusions.

o it will probably be of particular value to Oracle’s own CBO development teams

Chapter highlights

CH2 – Tablescans

explains how to gather system stats, how to use dbms_stats to set ind. stats manually, bind variables can make the CBO blind, bind variable peeking may not help, partition exchange may break global stats for table, use CPU costing when possible

CH3 – Selectivity

big problem with IN lists in 8i, fixed in 9i/10g, but still prob. with NOT IN, uses very good example of astrological signs overlapping birth months, and associated CBO cardinality problems, reminds us that the optimizer isn’t actually intelligent per se, but merely a piece of software

CH4 BTree Access

cost based on depth, #leaf blocks, and clustering factor, try to use CPU costing (system statistics)

CH5 – Clustering Factor

mainly a measure of the degree of random distribution of your data, very important for costing indx scans, use dbms_stats to correct when necessary, just giving CBO better information, freelists (procID problem) + freelist groups discussion with RAC

CH6 – Selectivity Issues

there is a big problem with string selectivity, Oracle uses only first seven characters, will be even more trouble for urls all starting with “http://”, and multibyte charactersets, trouble when you have db ind. apps which use string for date, use histrograms when you have problems, can use the tuning advisor for “offline optimization”, Oracle uses transitive closure to transform queries to more easily opt versions, moves predicates around, sometimes runs astray

CH7 – Histograms

height balanced > 255 buckets (outside Oracle called equi-depth),

otherwise frequency histograms, don’t use cursor sharing as it forces bind variables, blinds CBO, bind var peeking is only first call, Oracle doesn’t use histograms much, expensive to create, use sparingly, dist queries don’t pull hist from remote site, don’t work well with joins, no impact if you’re using bind vars, if using dbms_stats to hack certain stats be careful of rare codepaths

CH8 – Bitmap Indexes

don’t stop at just one, avoid updates like the plague as can cause deadlocking, opt assumes 80% data tightly packed, 20% widely scattered

CH9 – Query Transformation

partly rule based, peeling the onion w views to understand complex queries, natural language queries often not the most efficient, therefore this transformation process has huge potential upside for Oracle in overall optimization of app code behind the scenes by db engine, always remember Oracle may rewrite your query, sometimes want to block with hints, tell CBO about uniqueness, not NULL if you know this

CH10 – Join Cardinality

makes sensible guess at best first table, continues from there,

don’t hide useful information from the CBO, histograms may help with some difficult queries

CH11 – Nested Loops

fairly straightforward costing based on cardinality of each returned set multiplied together

CH12 – Hash Joins

Oracle executes as optimal (all in memory), onepass (doesn’t quite fit so dumped to disk for one pass) and multipass (least attractive sort to disk), avoid scripts writing scripts in prod, best option is to use workarea_size_policy=AUTO, set pga_aggregate_target & use CPU costing

CH 13 – Sorting + Merge Joins

also uses optimal, onepass, & multipass algorithms, need more than 4x dataset size for in memory sort, 8x on 64bit system, increasing sort_area_size will incr. CPU util so on CPU bottlenecked machines sorting to disk (onepass) may improve performance, must always use ORDER BY to guarentee sorted output, Oracle may not need to sort behind the scenes, Oracle very good at avoiding sorts, again try to use workarea_size_policy=AUTO

CH 14 – 10053 Trace

reviews various ways to enable, detailed rundown of trace with comments inline, and highlights; even mentions a VOL 2 + 3 of the book is coming!

Appendix A

be careful when switching from analyze to dbms_stats, in 10g some new hist will appear w/default dbms_stats options, 10g creates job to gather stats

Conclusion

I found this book to be full of gems of information that you won’t find anywhere else. If you’re at the more technical end of the spectrum, this is a one of a kind Oracle book and a

must-have for your collection. Keep in mind something Jonathan mentions in appendix A: “New features that improve 99% of all known queries may cripple your database because you fall into the remaining 1% of special cases”. If these cases are your concern, then this book will surely prove to be one-of-a-kind for you!

View this review on Amazon.com

APress – Expert Oracle DB Arch by Tom Kyte

I have a confession to make. I haven’t read an Oracle book cover-to-cover in almost three years. Sure I skim through the latest titles for what I need and of course check out documentation of the latest releases. That’s what good docs provide, quick reference when you need to check syntax, or details of a particular parameter, or feature, but have you ever read some documentation, sift through a paragraph, page or two, and say to yourself, that’s great, but what about this situation I have right now? Unfortunately documentation doesn’t always

speak to your real everyday needs. It is excellent for reference, but doesn’t

have a lot of real-world test cases, and practical usage examples. That’s where Tom Kyte’s new book comes in, and boy is it a killer.

I’ve read Tom’s books before, and always enjoyed them. But his new APress title really stands out as an achievement. Page after page and chapter after chapter he uses straightforward examples pasted right from the SQL*Plus prompt to illustrate, demonstrate, and illuminate concepts that he is explaining. It is this practical hands on, relentless approach that makes this book 700 pages of goodness.

Already an expert at Oracle? You’ll become more of one after reading this book. With reviewers like Jonathan Lewis I expected this book to be good from the outset I have to admit. But each chapter delves into a bit more depth around subjects that are central to Oracle programming and administration.

No SCREEN SHOTS!

One of the things I loved about this book most of all is its complete lack of screenshots! But how does one illustrate a concept then, you might ask? These days with graphical interfaces becoming more and more popular even among technical folks, I run into the question of the command line over an over again. How can you be doing sophisticated database administration of the latest servers running Oracle with the command line? Or another question I often get is, can you really do everything with the command line? The answer to both is a resounding yes, in fact you can do much more with the command line. Luckily for us, Tom is of this school too, and page after page of his book are full of real examples and commands that you can try for yourself, with specific instructions on

setting up the environment, using statistics gathering packages, and so on. In an era of computing where GUIs seem to reign like magazines over the best literature of the day, it is refreshing to see some of the best and most technical minds around Oracle still advocate the best tool, command line as the interface

of choice. In fact it is the command line examples, and happily the complete lack of screenshots that indeed makes this book a jewel of a find.

Audience

As a DBA you might wonder why I’m talking so highly of a book more focused towards developers. There are a couple of reasons. First this book is about the Oracle architecture, as it pertains to developers. In order for developers to best take advantage of the enterprise investment in Oracle *** they need to thoroughly understand the architecture, how specific features operate, which features are appropriate, and how to optimize their code for best interaction with them. Of course a DBA who is trying to keep a database operating in tip top shape needs to be aware of when developers are not best using Oracle, to identify,

and bring attention to bottlenecks, and problem areas in the application. Second, it is often a DBAs job to tune an existing database, and the very largest benefits come from tuning application SQL. For instance if a developer has chosen to use a bitmap index on an INSERT/UPDATE intensive table, they’re in for serious problems. Or if a developer forgot to index a foreign key column. This book directly spearheads those types of questions, and when necessary does mention a thing or two of direct importance to DBAs as well.

Highlights

Chapter 2 has an excellent example of creating an Oracle database. You simply write one line to your init.ora “db_name=sean” for example, and then from the SQL> prompt issues “startup nomount” and then “create database”. Looking at the processes Oracle starts, and the files that are created can do wonders for your understanding of database, instance, and Oracle in general.

Chapter 3 covers files, files, and more files. Spfile replaces a text init.ora allowing parameters to be modified while an instance is running *AND* stored persistently. He covers redolog files, flashback logs, and change tracking file

s, as well as import/export dump files, and lastly datapump files.

Chapter 4 covers memory, and specifically some of the new auto-magic options, how they work, and what to watch out for.

Chapter 5 covers processes.

Chapter 6, 7, and 8 cover lock/latching, multiversioning, and transactions respectively. I mention them all here together because to me these chapters are the real meat of the book. And that’s coming from a vegetarian! Seriously these

topics are what I consider to be the most crucial to understanding Oracle, and modern databases in general, and the least understood. They are the darkest corners, but Tom illuminates them for us. You’ll learn about optimistic versus pessismistic locking, page level, row level, and block level locking in various modern databases such as SQLServer, Informix, Sybase, DB2 and Oracle. Note Oracle is by far in the lead in this department, never locking more than it needs to, which yields the best concurrency with few situations where users block each other. Readers never block, for instance, because of the way Oracle implements all of this. He mentions latch spinning, which Oracle does to avoid a context switch, that is more expensive, how to detect, and reduce this type of contention. You’ll learn about dirty reads, phantom reads, and non-repeatable reads, and about Oracle’s Read-committed versus Serializable modes. What’s more you’ll learn about the implications of these various models on your applications, and what type of assumptions you may have to unlearn if you’re coming from developing on another database to Oracle. If I were to make any criticism at all, I might mention that in this area Tom becomes ever so slightly preachy about Oracle’s superb implementation of minimal locking, and non-blocking reads. This is in large part due I’m sure to running into so many folks who are used to developing on databases which do indeed dumb you down *BECAUSE* of their implementation, encouraging bad habits with respect to transactions, and auto-commit for instance. One thing is for sure you will learn a heck of a lot from these three chapters, I know I did.

Chapter 9 Redo & Undo describes what each is, how to avoid checkpoint not complete and why you want to, how to *MEASURE* undo so as to reduce it, how to avoid log file waits (are you on RAID5, are your redologs on a buffered filesystem?), and what block cleanouts are.

Chapter 10 covers tables. After reading it I’d say the most important types are normal (HEAP), Index Organized, Temporary, and External Tables. Use ASSM where possible as it will save you in many ways, use DBMS_METADATA to reverse engineer objects you’ve created to get all the options, don’t use TEMP tables to avoid inline views, or complex joins, your performance will probably suffer, and how to handle LONG/LOB data in tables.

Chapter 11 covers indexes, topics ranging from height, compression count, DESC sorted, colocated data, bitmap indexes and why you don’t want them in OLTP data

bases, function based indexes and how they’re most useful for user defined functions, why indexing foreign keys is important, and choosing the leading edge of an index. Plus when to rebuild or coalesce and why.

Chapter 12 covers datatypes, why never to use CHAR, using the NLS features, the CAST function, the number datatypes and precision versus performance, raw_to_hex, date arithmatic, handling LOB data and why not to use LONG, BFILEs and the new UROWID.

Chapter 13 discusses partitioning. What I like is he starts the chapter with the caveat that partitioning is not the FAST=TRUE option. That says it all. For OLTP databases you will achieve higher availability, and ease of administration of large options, as well as possibly reduced contention on larger objects,

but it is NOT LIKELY that you will receive query performance improvements because of the nature of OLTP. With a datawarehouse, you can use partition elimination on queries that do range or full table scans which can speed up queries dramatically. He discusses range, list, hash, and composite partitioning, local indexing (prefixed & non-prefixed) and global indexing. Why datawarehouses tend to use local, and OLTP databases tend to use global indexes, and even how you

can rebuild your global indexes as you’re doing partition maintenance avoiding a costly rebuild of THE ENTIRE INDEX, and associated downtime. He also includes a great auditing example.

Chapter 14 covers parallel execution such as parallel dml, ddl, and so on. Here is where a book like Tom’s is invaluable, as he comes straight out with his opinions on a weighty topic. He says these features are most relevant to DBAs doing one-off maintenance and data loading operations. That is because even in

datawarehouses, todays environments often have many many users. The parallel features are designed to allow single session jobs to utilize the entire system resources. He explains that Oracle’s real sweet spot in this real is parallel

DDL, such as CREATE INDEX, CREATE TABLE AS SELECT, ALTER INDEX REBUILD, ALTER TABLE MOVE, and so on.

Chapter 15, the final chapter covers loading and unloading data. A significant portion of the chapter covers SQL*Loader for completeness, but he goes on to celebrate the wonders of external tables for loading data into Oracle. In particular there is an option in SQL*Loader to generate the CREATE statement for an

external table that does the SAME load! This is great stuff. External tables provide advantages over SQL*Loader in almost every way, except perhaps loading over a network, concurrent user access, and handling LOB data. External tables can use complex where clauses, merge data, do fast code lookups, insert into multiple tables, and finally provide a simpler learning curve.

Conclusions

Yum. If you love Oracle, you’ll want to read this book. If you need to know more about Oracle say, for your job, that’s another reason you might read this book. Oracle is fascinating technology, and Tom’s passion for understanding every last bit of it makes this book both a necessary read, and a very gratifying

one.

View this review on Amazon.com

Dummy's Guide to Linux firewalls

Security experts will probably tell you it’s not a good idea to be a dummy and also in charge of your own firewall. They’re probably right, but it’s a catchy title. In this article, I’ll quickly go over some common firewall rules for iptables under linux.
First things first. If you don’t have the right kernel, you’re not going to get anywhere. A quick way to find out of all the right pieces are in place is to try to load the iptables kernel module.

$ modprobe iptable_nat

If you get errors you may need to compile various support into your kernel, and of course you may need to compile the iptable_nat module itself. The easiest way is to download the source RPM for your installed distribution, and do ‘make menuconfig’ with it’s default configuration, that way all the things that are currently working with your kernel won’t break when you forget to select them. For details see the Linux Firewall using IPTables HOWTO.
Once the module is loaded, start the service:

$ /etc/rc.d/init.d/iptables start

You will also have to have your interfaces up. I did this as follows:

# startup dhcp

/usr/sbin/dhcpd eth0

# bring up twc cable connection to internet

ifup eth1

You’ll need to set some rules. Be sure to get your internet interface, and local network interface right on these commands. First to setup masquerade which allows multiple machines behind your firewall to all share your single dynamically assigned IP address from your internet provider:

$ iptables -t nat -A POSTROUTING -o eth1 -j MASQUERADE

On my firewall, eth1 is the device which talks to the ISP, and gets the IP address we’ll use on the internet. The other interface, eth0 is for my local internal network.

Next be sure to enable VPN traffic through the firewall if you have a VPN connection to your office:

iptables -A INPUT -s 10.0.0.0/24 -p 50 -j ACCEPT

iptables -A INPUT -s 10.0.0.0/24 -p 51 -j ACCEPT

iptables -A INPUT -s 10.0.0.0/24 -p udp --dport 500 -j ACCEPT

Lastly enable ip forwarding:

echo 1 > /proc/sys/net/ipv4/ip_forward

Of course you don’t really want to be a dummy forever, so you should read up Linux Firewall HOWTO and other linux docs.

Oracle 10g RAC Versus DataGuard For High Availability

Oracle has two very different technologies, each with it’s own strengths and weaknesses that implement high availability solutions. In choosing between the two technologies, it’s important to factor in the relevant risks, both small and large, to put the entire picture into perspective.

Two Alternatives

RAC or Real Application Clusters, is essentially an always-on solution. You have multiple instances or servers accessing the same database on shared storage in your network. With existing technology limitations, in practical terms, these different servers must be on the same local network, in the same datacenter.

Oracle’s DataGuard technology, formerly called Standby database in previous versions, provides a rolling copy of your production database. The standby database is started in read-only mode, constantly receiving change data, sent over from the production database, keeping it always in sync at all times, and at most only a few minutes behind. Were the production server to fail, that server could take over in less than the time the DNS change or IP swap would take. What’s more the standby copy can be at another datacenter, or on another continent!

Software Failure

Before we compare the strengths and weaknesses, let’s talk about software risks. In the real-world, you can have operator errors, which means someone made a mistake at the keyboard, or someone decided to drop the wrong table, and realized only later their mistake. None of these solutions protect you from that. You would have to recover either point-in-time, or from an export. You could also encounter bugs in software that could cause a crash (downtime) or corruption (data loss and downtime to repair). There are also potential configuration errors, so the more components you have the more potential problems. And then lastly there is the risk of buying into technologies for which experienced help is hard to find.

Hardware Failure

You could have hardware failure of your server, motherboard, memory, nic card, or related problems. You could also have failure of a powersupply in the disk subsystem, failure of one of those boards, or of the fibre channel switch or IP switch. Hence redundancy in these areas is crucial as well. But you can also have power failure on that floor or in the datacenter as a whole, or someone could trip the chord.

Larger Failures

Also in a very real sense, the power grid is at some risk. If the Northeast is any indication, a 24 hours of outage every 20-30 years is not unusual. Beyond power, their is the potential for fires earthquakes, and other natural disasters.

Strengths and Weaknesses

For RAC, it’s strength is it’s always-on aspect. The second instance is always available, so in as much as hardware failure at the server level goes, it protects you very well.

In terms of weaknesses, however, anything outside the server, disk subsystem, power grid failure, or natural disaster that impacts the hosting facility, it does not protect you against. Furthermore there are more software components in the mix, so more software that will have bugs, and hurdles you can stumble over. Lastly, it may be harder to find resources who have experience with RAC, as it certainly is a bigger can of worms to administer.

For DataGuard, it’s strength is that the failover server can be physically remote, even on another continent. This really brings peace of mind, as everything is physically separate. It will survive any failure in the primary system.

In terms of weaknesses, however, there is a slight lag, depending on network latency, amount of change data being generated, and how in-sync you keep the two systems.

Conclusions

In 10g, Oracle really brings to the table world-class High Availability solutions. Both DataGuard and RAC have their strengths and weaknesses. Some sites even use both. Each makes sense in particular circumstances but more often than not, DataGuard will prove to be a robust solution for most enterprises.