MySQL Consulting in New York

Heavyweight Internet Group provides Open Source Database Professional Services and Consulting to fortune 500 companies. With our low overhead, and focused specialty we can offer very competitive prices. Our value add is simple: aggressive pricing, and personalized service. Contact us for details at 212-533-6828. Our services include:

  • New MySQL database setup and administration
  • MySQL & Postgres tuning of problem areas
  • Correcting degraded MySQL application performance
  • Remote DBA – 24×7 Support Services
  • High profile deployments & Migrations to EC2
  • Running MySQL in the cloud, on Amazon Web Services, EC2 & EBS
  • Migrating Web Applications to Amazon RDS
  • Tuning and optimizing challenges for open-source databases and Amazon RDS

We have 20 years of experience working on Open Source databases in all types of industries including banking, finance, education, entertainment, media and government. Our consultants are experts in the field, with published material including books, online and print magazine articles, and lectures. Please also feel free to browse our business newsletter archives. Our monthly newsletter discusses business best practices in Oracle consulting, and Open Source integration. We are conveniently located in Rockefeller Center, and are available for onsite meetings at your New York City offices.

Oracle Expert – New York City

Looking for a top-flight DBA?  We are subject matter experts in Oracle and MySQL, especially integrating these database platforms with open source technologies.

We’ve always been about open-source technology, integration & mixing commercial technologies such as Oracle with open-source ones such as MySQL and Linux. As open-source becomes mainstream, and more shops consider moving critical services to these technologies, we continue to provide assistance and expertise for these transitions. Whether it is performance testing and tuning, benchmarking, high availability or recovery, we can provide services for your specific needs.

call_quote

MySQL Professionals

Looking for a top-flight MySQL DBA to help tune or troubleshoot your application?  You’ve come to the right place.  At Heavyweight Internet Group we have fourteen years experience identifying bottlenecks in your services and systems, and resolving intractable problems.  We tune, optimize and rearchitect your application as necessary, to deliver you a better site.  One that performs better, and handles all of your customers as your business grows.

call_quote

Open Insights 51 – Stretch Your Database Dollar

Open Insights Newsletter

Issue 51 – Stretch Your Database Dollar

  January 1, 2009

by Sean Hull


2009 is finally here.  Let’s put the struggles of 2008 and the last few months behind us, and take the downturn as an opportunity to dig in, work harder, and get creative in business. 


In This Issue:

1. Feature: Stretch Your Database Dollar

2. Current Reading

3. Podcast Reviews

4. Past Issues

5. Technical Articles

6. Audio Interviews

7. Lightweight Humor

8. About Heavyweight Internet Group


1. Feature: Stretch Your Database Dollar

I’ve taken to listening to some podcasts by the Financial Times of late.  A particularly good one which is very relevant to Information Technology is called Digital Business.  It’s a weekly show hosted by Peter Whitehead, Editor of FT.com’s Digital business section. 

The last episode of 2008 is one of particular interest, and I’d recommend it to readers of this newsletter.  In it  Whitehead discusses trends for 2009.  Obviously with the recent shift in markets and availability of credit 2009 will be a lot about stretching your budgets, and making your dollar go further. 

Interviewee Alan Kane suggests that virtualization and going green to save money will both be big trends in 2009.  Steven Pritchard emphasizes Software As A Service, Cloud Computing, and notably open source, will all be big in 2009.  It will also be a year for the public sector, with all the new government spending initiatives planned.  Finally, Whitehead adds that as web2.0 has ramped up it has opened a whole new category of vulnerabilities thus web2.0 security will become a new priority.  He also loves Twitter, and points to it’s recent explosive growth as a trend which will surely be significant in 2009.

Obviously we’ve been talking about open-source for years.  With the heady mix of flexibility, few licensing restrictions, and customizability, it has always appealed to the do-it-yourself side of the IT world.  But as IT managers look to optimise their budgets, I can definitely see how open-source will win even greater consideration. 

In fact anecdotally I’ve already seen an up tick in the number of companies calling us about MySQL in the fourth quarter of 2008.  MySQL has matured from the web-facing mini-database of five or ten years ago to a real force to be reckoned with.

MySQL for those new to the technology, started out as a lightweight, primarily web-facing database.  It’s limited feature set and transactional support posed little problem for the mostly read-only web-based applications it was designed for.  But as MySQL 4 came online, a transactional storage engine named InnoDB was added providing read-consistency, and recoverability of data.  In 5.0 more and more enterprise features have become available, including stored procedures, functions, views, subqueries, complex joins, as well as a whole host of sophisticated caching mechanisms to make data access even faster.  I now hear from clients supporting terabyte MySQL databases, using partitioning and other features previously only available in the more powerful commercial databases like Oracle. 

If I might add my own predictions of trends for 2009, I would reaffirm Pritchard’s forecast of growth in open-source adoption, as IT shops look to squeeze more out of their budgets.  In particular we expect MySQL to be a big part of this.  We’ve already seen shops moving some of their enterprise data to MySQL in an effort to gradually test the waters.  I firmly expect this trend to continue as IT staff & managers’ confidence in the

technology grows, and as familiarity with its strengths and weaknesses improves.  Awareness of where the technology can fit in well, what type of jobs it is well suited for, and how to roll it into the mix without disrupting services will be crucial.

For DBAs all this will mean beginning to look at the technology, and building test and sandbox environments.  For managers it will mean porting small and peripheral applications, testing for performance, reliability, benchmarking, and recoverability.  All of these will weigh heavier as you look at moving more core business functions onto the open-source database platform.


2. Current Reading

Reality Check by Guy Kawasaki

Some like irreverent writing, and some don’t.  Kawasaki’s subtitle is "the irreverent guide to outsmarting, outmanaging, and outmarketing your competition" sets the tone right out of the gates.  I find his writing to be smart and hardhitting, and very relevant.  This new book is no exception.  The Standard has a piece Five reasons why a recession is a good time to start a company.  I have to agree, and Kawasaki’s book will provide you with tons of cut-to-the chase, and get-the-job-done advice.  Highly recommended.


3. Podcast Reviews

I’ve been trolling through iTunes in the last few weeks, and have found some real gems.  With that in mind I decided to add a new section to the newsletter to focus on interesting, and technology relevant shows that I find.  By far the best one I’ve stumbled upon is Financial Times – Digital Business

For instance the September 10th episode of this year talked about Mesh Collaboration, Globalization, and Social Networking.  Definitely worth a listen.

You can read it online with updates almost everyday, and the print edition comes out on alternate wednesdays.  The podcast you can find here.


4. Past Issues

Issue 50 – Do Your Dishes

Issue 49 – Things Fall Apart

Issue 48 – Balancing Time & Money

Issue 47 – Change the Problem

Issue 46 – Interests Aligned

Issue 45 – Contractualities

Issue 44 – Gaining Legs

Newsletter Archives


5. Technical Articles

Intro to Oracle’s Automatic Workload Repository

Intro to PHP + Oracle

Useful PL/SQL Packages

Oracle Automatic Storage Management

Programming Perl + Oracle


6. Audio Interviews

Though we haven’t added a new audio interview in a while, we certainly plan to do some new interviews in the coming months.  So please stay tuned.  In the meantime, please listen to our past audio interviews.

In our last interview we had the opportunity to talk with Norman Yamada CTO of Millburn Corporation.

Norman shares with us his experiences providing world-class computing solutions, and the pros and cons of doing it with open source.

We are hosting our podcast to Odeo.  It is a great service, and provides all the RSS and subscribe links automatically.  So please subscribe if you haven’t already.


7. Lightweight Humor

The Onion does it again, this time with "Apple Employee Fired for Thinking Different"


8. About Heavyweight Internet Group

We’ve always been about open-source technology, integration & mixing commercial technologies such as Oracle with open-source ones such as MySQL and Linux.  As open-source becomes mainstream, and more shops consider moving critical services to these technologies, we continue to provide assistance and expertise for these transitions.  Whether it is performance testing and tuning, benchmarking, high availability or recovery, we can provide services for your specific needs.

Looking for a top-flight DBA?  Visit us on the web at www.iheavy.com.

Oracle DBA Interview Questions

Oracle Database Administrator or often called DBAs are an indispensable part of your operations team. They manage the systems that house all your business data, your customers, products, transactions and all that analytical data on what customers are actually doing. If you’ve ever been on the hunt, you may wonder, why the shortage of DBAs? To that we’ll answer, have you ever heard of Dustin Moskovitz?

So you certainly want to entrust that to someone who knows what they’re talking about. Enter the Oracle DBA Interview, a process that some will see as a technical test, while others will see as a fit of personalities, behaviors, and work ethic.

From the technical side we thought we’d bring you a quick and dirty checklist of questions. This isn’t an exhaustive list by any means, but is a good place to start and will certainly provide you with a glimpse of their knowledge.

Also if you’re looking to hire a MySQL DBA here’s a guide, and also one for hiring and EC2 expert.

1. What is the difference between RMAN and a traditional hotbackup?

RMAN is faster, can do incremental (changes only) backups, and does not place tablespaces into hotbackup mode.

2. What are bind variables and why are they important?

With bind variables in SQL, Oracle can cache related queries a single time in the SQL cache (area). This avoids a hard parse each time, which saves on various locking and latching resources we use to check objects existence and so on. BONUS: For rarely run queries, especially BATCH queries, we explicitely DO NOT want to use bind variables, as they hide information from the Cost Based Opitmizer.

BONUS BONUS: For batch queries from 3rd party apps like peoplesoft, if we can’t remove bind variables, we can use bind variable peeking!

3. In PL/SQL, what is bulk binding, and when/how would it help performance?

Oracle’s SQL and PL/SQL engines are separate parts of the kernel which require context switching, like between unix processes. This is slow, and uses up resources. If we loop on an SQL statement, we are implicitely flipping between these two engines. We can minimize this by loading our data into an array, and using PL/SQL bulk binding operation to do it all in one go!

4. Why is SQL*Loader direct path so fast?

SQL*Loader with direct path option can load data ABOVE the high water mark of a table, and DIRECTLY into the datafiles, without going through the SQL engine at all. This avoids all the locking, latching, and so on, and doesn’t impact the db (except possibly the I/O subsystem) at all.

5. What are the tradeoffs between many vs few indexes? When would you want to have many, and when would it be better to have fewer?

Fewer indexes on a table mean faster inserts/updates. More indexes mean faster, more specific WHERE clauses possibly without index merges.

6. What is the difference between RAID 5 and RAID 10? Which is better for Oracle?

RAID 5 is striping with an extra disk for parity. If we lose a disk we can reconstruct from that parity disk.

RAID 10 is mirroring pairs of disks, and then striping across those sets.

RAID 5 was created when disks were expensive. Its purpose was to provide RAID on the cheap. If a disk fails, the IO subsystem will perform VERY slowly during the rebuild process. What’s more your liklihood of failure increases dramatically during this period, with all the added weight of the rebuild. Even when it is operating normally RAID 5 is slow for everything but reading. Given that and knowing databases (especially Oracle’s redo logs) continue to experience write activity all the time, we should avoid RAID5 in all but the rare database that is MOSTLY read activity. Don’t put redologs on RAID5.

RAID10 is just all around goodness. If you lose one disk in a set of 10 for example, you could lose any one of eight other disks and have no troubles. What’s more rebuilding does not impact performance at all since you’re simply making a mirror copy. Lastly RAID10 perform exceedingly well in all types of databases.

7. When using Oracle export/import what character set concerns might come up? How do you handle them?

Be sure to set NLS_LANG for example to “AMERCIAN_AMERICA.WE8ISO8859P1”. If your source database is US7ASCII, beware of 8-bit characters. Also be wary of multi-byte characters sets as those may require extra attention. Also watch export/import for messages about any “character set conversions” which may occur.

8. How do you use automatic PGA memory management with Oracle 9i and above?

Set the WORKAREA_SIZE_POLICY parameter to AUTO and set PGA_AGGREGATE_TARGET

9. Explain two easy SQL optimizations.

a. EXISTS can be better than IN under various conditions

b. UNION ALL is faster than UNION (not sorting)

10. Name three SQL operations that perform a SORT.

a. CREATE INDEX

b. DISTINCT

c. GROUP BY

d. ORDER BY

f. INTERSECT

g. MINUS

h. UNION

i. UNINDEXED TABLE JOIN

11. What is your favorite tool for day-to-day Oracle operation?

Hopefully we hear some use of command line as the answer!

12. What is the difference between Truncate and Delete? Why is one faster?

Can we ROLLBACK both? How would a full table scan behave after?

Truncate is nearly instantaenous, cannot be rolled back, and is fast because Oracle simply resets the HWM. When a full table scan is performed on a table, such as for a sort operation, Oracle reads to the HWM. So if you delete every single solitary row in 10 million row table so it is now empty, sorting on that table of 0 rows would still be extremely slow.

13. What is the difference between a materialized view (snapshot) fast refresh versus complete refresh? When is one better, and when the other?

Fast refresh maintains a change log table, which records change vectors, not unlike how the redo logs work. There is overhead to this, as with a table that has a LOT of indexes on it, and inserts and updates will be slower. However if you are performing refreshes often, like every few minutes, you want to do fast refresh so you don’t have to full-table-scan the source table. Complete refresh is good if you’re going to refresh once a day. Does a full table scan on the source table, and recreats the snapshot/mview. Also inserts/updates on the source table are NOT impacted on tables where complete refresh snapshots have been created.

14. What does the NO LOGGING option do? Why would we use it? Why would we be careful of using it?

It disables the logging of changes to the redologs. It does not disable ALL LOGGING, however as Oracle continues to use a base of changes, for recovery if you pull the plug on the box, for instance. However it will cause problems if you are using standby database. Use it to speed up operations, like an index rebuild, or partition maintenance operations.

15. Tell me about standby database? What are some of the configurations of it? What should we watch out for?

Standby databases allow us to create a copy of our production db, for disaster recovery. We merely switch mode on the target db, and bring it up as read/write. Can setup as master->slave or master->master. The latter allows the former prod db to become the standby, once the failure cause is remedied. Watch out for NO LOGGING!! Be sure we’re in archivelog mode.

Hey you! If you made it this far, definitely grab our newsletter.

MySQL Optimization

Experiencing performance problems with your MySQL 4.x or 5.x database? We can help. We have over a decade of experience tuning sophisticated multi-tier web applications. This allows us to identify and pinpoint problems quickly, drilldown to the relevant application and SQL, and sift out the problem code.

All this makes your applications run faster, and your customers happy!

call_quote

MySQL DBA Interview Questions

One of the more popular articles on our site according to Google is the Oracle DBA Interview Questions article we did a few years ago. So with that in mind, we’ve put together a similar article for MySQL DBA Interviews.

1. Explain two ways that MySQL Replication can get out of sync. What are the solutions to these problems?

One way is if your code contains non-deterministic functions such as SYSDATE, USER and UUID. The second way is if you have mixed transactions between InnoDB and MyISAM tables, in some cases those can get replicated incorrectly.

2. How does one create a new user and give it privileges on an existing database? Why should “with grant option” be avoided?

mysql> grant all privileges on test.* to ‘newuser’@’localhost’ identified by ‘mypassword’;

3. Explain the differences, advantages and disadvantages to using the MERGE storage engine, versus using Partitioned tables to manage large datasets.

The MERGE storage behaves much like a view with UNION ALL between the tables. It is easy to add and remove tables and redefine the MERGE table. They are good for logging and huge datasets.

One of the primary differences between partitioning is that a row can exist in one and only one partition, which is not the case with a MERGE table. Also you can cluster data together in certain partitions reducing the amount of work the server may have to do to get at related data. Also partitioned data can be distributed across multiple harddrives.

4. How do you determine what storage engines are installed?

mysql> show global variables like 'have%';                                       
+-----------------------+----------+
| Variable_name         | Value    |
+-----------------------+----------+
| have_archive          | NO       |
| have_bdb              | YES      |
| have_blackhole_engine | NO       |
| have_compress         | YES      |
| have_crypt            | YES      |
| have_csv              | NO       |
| have_dynamic_loading  | YES      |
| have_example_engine   | NO       |
| have_federated_engine | NO       |
| have_geometry         | YES      |
| have_innodb           | YES      |
| have_isam             | NO       |
| have_merge_engine     | YES      |
| have_ndbcluster       | NO       |
| have_openssl          | DISABLED |
| have_query_cache      | YES      |
| have_raid             | NO       |
| have_rtree_keys       | YES      |
| have_symlink          | YES      |
+-----------------------+----------+
19 rows in set (0.00 sec)
mysql>

5. How do you get the query cache status? How do you tune it?

mysql> show global status like 'qcache%';                                        
+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| Qcache_free_blocks      | 0     |
| Qcache_free_memory      | 0     |
| Qcache_hits             | 0     |
| Qcache_inserts          | 0     |
| Qcache_lowmem_prunes    | 0     |
| Qcache_not_cached       | 0     |
| Qcache_queries_in_cache | 0     |
| Qcache_total_blocks     | 0     |
+-------------------------+-------+
8 rows in set (0.16 sec)
mysql>

Tune it by looking at qcache_hits/(qcache_hits+com_selects)

6. What is DRBD?  Explain the advantages and disadvantages to MySQL Replication for High Availability.

7. What is circular replication?  How is it different from master-slave replication?