Open Insights 28 – High Availability

OPEN INSIGHTS Newsletter
Issue 28 – High Availability
February 1, 2007

by Sean Hull

Founder and Senior Consultant
Heavyweight Internet Group

Welcome back to our Open Insights newsletter. Our readership is now north of 3000 subscribers and growing everyday. Thanks to everyone for your support and for forwarding us on to friends and colleagues!

Reading from your blackberry or other handheld device? We’ve made some formatting changes which we hope improve the appearance on mobile devices. Let us know if you have any suggestions or comments.


In This Issue:

1. Feature: High Availability

2. Upcoming Speaking Engagements

3. New Articles
4. Audio Interviews
5. Current Reading
6. Lightweight Humor

7. Miscellaneous

8. Past Issues
9. Technical Articles
10. About Heavyweight Internet Group


1. Feature: High Availability

What is High Availability?


In enterprise applications, that is internet websites, or payroll systems, or other large computing systems we talk about high availability when we need those systems to be available all the time. So High Availability is the catch phrase to discuss that, what does it mean, what can I expect, what are some solutions, and so forth.


One phrase you might hear a lot is Five-9′s. This means systems that are available 99.999% of the time. In a year there are 24×365 hours, so that is about 8 3/4 hours downtime per year. Why not 100%, you might ask? Well we will discuss that too.


Why does it matter?


If your money isn’t available when you go to an ATM, it might be obvious there’s a problem. What about if your favorite search engine, google, isn’t available when you try to reach it? How about online banking? For these day-to-day needs, you can usually tolerate some downtime.


In the corporate environment, however, there is much more at stake. Suppose a large bank does not have access to it’s database systems, and can’t perform transactions. Maybe their customers will go elsewhere. Perhaps some trading systems are down for a few minutes, and can’t conduct transactions, they can lose big time. In each case we have tradeoffs, between the costs of more perfect systems, and the costs of not being able to do business.


How do various risks play a part?


There are all sorts of events that can get in the way of a transaction. Segments of the internet can go down, name services (DNS) can be interrupted, the data center could have a fire, or an earthquake could bring things to a halt. We can even experience a power grid failure. But beyond all these external forces, we can have hardware failure, data corruption, software bugs, and even operator error. Considering all of these factors, the spectrum of risks becomes clearer, as we understand how many interconnected parts come into play.


Take for example the power grid. The entire Northeast lost power for 24 hours. Now that happened once in

roughly 24 years. That means on average one hour per year. That’s not even including other various smaller outages that we’re likely to experience.


In computing systems themselves, we can build redundant databases, and all sorts of parallel or grid solutions, but in so doing we introduce more moving parts, more points of failure, and more software bugs. We even introduce more chance for operator error, and potentially more complex upgrades, leading to more downtime.


Reasonable Goals and Practical Expectations


For those not accustomed to thinking about risk in these terms, a sense of perspective is very very important. Our gut instinct is to go for perfect systems, with 100% uptime, and start selecting solutions based on that. But we may miss a lot of important and relevant facts when we do that. If the bar is already lowered by factors completely outside of our control, we get a better perspective on what we can reasonably expect to have in terms of uptime, and plan accordingly. Much better to fall well within our expected goals, than have the real world come crashing into our forecasts.


Bruce Schneier publishes an excellent newsletter on risk, and security called Cryptogram. Here’s the latest issue.


What are some commercial and open-source solutions that provide it?


Since we work in the Oracle world, as well as the open-source database world, we’ll describe a few solutions available for Oracle and for MySQL.


Oracle Clustering (SE & EE)


Also know as RAC or Real Application Clusters, this is evolved from the former parallel server product. RAC is implemented with shared disk between two or more locally connected computers. That is they’re in the same room, rack, or hosting facility. Based on the discussion above, there are some real-world disasters that you would be vulnerable to here, but what does it help you with? Well it helps you scale your application on commodity hardware, albeit with a few more moving parts, and somewhat more complicated administration. But with rolling upgrades, and smart load balancing, as well as distributed caching scheme called cache fusion, Oracle has taken this technology to the next level, and made it work.


Oracle DataGuard (Enterprise Edition)


For many enterprises, Data Guard is the high availability solution you want and need. It requires Enterprise Edition of the database, so is not cheap, but provides near instantaneous shipping of transaction data to one or more remote databases, keeping them up to the second in sync with the production instance. These standby databases can be very remote, across the country, or around the globe.


Oracle Standby Database (Standard Edition)


The Standby technology is the heart of Oracle’s Data Guard solution, and is available with all versions of the software, including Standard Edition. However the standard edition does not provide the software layering on top to make it seemless, synchronous, and trivial to install. With standard edition you must settle with manual standby database. For many enterprises, the cost savings is worth being behind the production system by perhaps fifteen minutes.


Heavyweight Internet Group has just such a solution, so if you’re looking to implement a standby database, and don’t want to buy Oracle EE, please contact us for details.


Tom Kyte’s article on standby versus RAC and another standby on Oracle SE


MySQL Clustering


In the MySQL realm, you have clustering available to you. Using the NDB storage engine (as opposed to MyISAM or InnoDB), you get clustering built in. Performance on those clustered tables is likely to be a bit slower than other storage engines, but you have the confidence that your data is redundant. This however, does not provide a geographically remote solution. For that you will need to look at MySQL Replication.


MySQL Replication


MySQL Replication can keep various tables in sync, but is best used with a single master. Your databases can be geographically remote, however it is best to modify the application to point all changes towards one master node.


Conclusions


High Availability can be, like everything in computing a loaded term, or the proverbial can of worms. However, having a good sense of real-world risks, as well as what can fail in your computing environment, from hardware to software bugs, to human error, all taken together you will get a better perspective on what is achievable. Only then can you formulate reasonable expectations, and plan for your business needs.

2. Upcoming Speaking Engagements


March 15th, New York Oracle User Group


in bed with Oracle – Lifting The Covers On Database Creation

Database creation, because of better GUI tools, has become a more & more overlooked area of Oracle. We pull back the covers, revealing what Oracle is doing at each stage. Why do we have startup nomount, mount, restrict, and open? What OS resources is Oracle using at each step? How do we issue CREATE DATABASE? What is the simplest init.ora file? How many file descriptors does Oracle use and why? From conception to birth, our microscope will reveal the secrets.


April 15th, Collaborate 2007 – Las Vegas Nevada


in bed with Oracle – Lifting The Covers On Database Creation

Database creation, because of better GUI tools, has become a more & more overlooked area of Oracle. We pull back the covers, revealing what Oracle is doing at each stage. Why do we have startup nomount, mount, restrict, and open? What OS resources is Oracle using at each step? How do we issue CREATE DATABASE? What is the simplest init.ora file? How many file descriptors does Oracle use and why? From conception to birth, our microscope will reveal the secrets.

3. New Articles

Oracle 10g RAC versus DataGuard for High Availability

4. Audio Interviews

This month we have the opportunity to talk with William Hurley aka Whurley, the Chairman of the Open Management Consortium.

In our interview, we discuss open-source, and it’s impact on commercial software and solutions, and wrestle head on with some of the concerns people have on both sides of the fence.

William Hurley is the CTO at Qlusters, where he launched the openQRM project. He has been awarded IBM’s Master Inventor title, multiple awards for innovation at Apple Computer. Prior to joining Qlusters he was CTO and founder at Symbiot. He holds 11 patents for research and development at IBM, Tivoli Systems, and Apple Computer. He was recently elected Chairman of the Open Management Consortium.

5. Current Reading

How to Win Friends and Influence People – Dale Carnegie

Carnegie’s classic book is a great read. It is full of anecdotes, and simple rules to help you understand people better, get along with people better, and allow you to think as though you were in their shoes.


The Meaning of Lost and Mismatched Socks – Perditus Pedale

Dr Pedale takes us along a humorous and enlightening tale, using the metaphor we’re all familiar with, lost socks. A lighthearted read, it nevertheless underlines some of our more quirky human habits, and lets us see them in a new light.


The Wal-Mart Effect: How the World’s Most Powerful Company Really Works – Charles Fishman

Fishman brings the much maligned company to light, discussing them in the context of globalization, and how they impact our economy. As one reviewer notes, “like it or not” they are a mega-corporation that has turned many things, including everyday prices, upside down.


6. Lightweight Humor

Visit Gaping Void’s funny entry:

Fake Walmart Blog.

7. Miscellaneous


Improve Body Language:


Presentation Zen – Lessig Method


Dick Hardt’s famous Identity 2.0 presentation

8. Past Issues

Issue 27: Fragile Foundations

Issue 26: Logistical Fitness

Issue 25: Which Red Button
Issue 24: Consulting Conflicts of Interest
Issue 23: Devil In The Details
Issue 22: Beware of Software Fashion
Issue 21: Open Season, Open Sesame?
Issue 20: Better Web Better Business
Archive: Past Issues

9. Technical Articles

Oracle DBA Interview: click here
Tools for the Intrepid DBA: click here
Oracle9i + RAC on Linux/Firewire: click here
Migrating MySQL to Oracle: click here
MySQL Disaster Recovery: click here

10. About Heavyweight Internet Group

In a nutshell, Oracle. Everything related to and surrounding the database technology we specialize in, but specifically setup, admin and tuning of Oracle technology. I have 10 years experience with Oracle, wrote a book on the technology, and write and lecture frequently. I’m founder and senior consultant of the company. In capacities where your company might hire Deloitte, AIG, or Oracle Consulting we can bring the same level of service and experience, at about half the price. Simple equation.

Looking for top-flight a DBA? Visit us on the web at iheavy.com.