Data warehousing – What is it and why is it important?

A data warehouse is a special type of database.  It is used to store large amounts of data, such as analytics, historical, or customer data, and then build large reports and data mining against it.  It is markedly different from a web-facing or high-transaction database, which typically has many many small transactions or pieces of data that are constantly changing, through many 100′s or 1000′s or small user sessions.  These typically execute in speeds on the order of 1/100th of a second, while in data warehouse you have fewer large queries which can take minutes to execute.

Data warehouses are tuned for updates happening in bulk via batch jobs, and for large queries which need big chunks of memory to sort and cross-tabulate data from different tables.  Often full table scans are required because of the specialized one-off nature of these reports.  The same queries are not executed over and over.

It’s important not to mix data warehousing databases with transactional databases in the same instance, whether you are dealing with MySQL or Oracle.  That’s because they are tuned totally differently.  It would be like trying to use the same engine for commuting to work, and a container ship traveling around the world.  Different jobs require different databases or databases that with their dials set for different uses.

Quora discussion of data warehousing – Sean Hull