The Concept of Data Warehousing is Fundamentally Flawed
Ever step back, think about what you’re doing, and then ask yourself, “Why?” Ever ask the same question about the concept of data warehousing? It’s time to face facts – while it may be necessary, the concept of data warehousing is fundamentally flawed.
Think about it. We already have all the data we need in our operational systems – so let’s get to work! “Hey, wait a minute,” says some wise-sounding tech expert. Ya know what would be great fun? Let’s design a completely new database called a data warehouse. Then, let’s write programs to bring all of that data into our warehouse. Along the way, let’s integrate it all so we get a business-view of it, rather than a source-specific view. Hey, let’s also make sure it’s clean. And, let’s make sure we’ve built all the infrastructure necessary to schedule jobs, trap errors, verify totals, Oh, and let’s ask company managers and shareholders to pay for all of this.”
OK, is it just me or does this sound insane?
So, what”s better? Well, in a really good world, all your data and systems would be logically integrated from the start AND you’d be able to report directly from them.
In a perfect world, you wouldn’t have to integrate the data from multiple systems – you would have only one system and it would support all of your operational and informational (i.e. reporting and analysis) needs.
So, what – or who – is keeping us from this perfect world?
Who’s The Villain?
Who’s letting us down? Who’s making us spend all that extra money and do all that extra work just so we can actually use the data we capture?
Believe it or not, it’s the hardware vendors!
Hardware vendors? Why? Because they haven’t figured out how to master the laws of physics to give us infinite MIPS — infinite computing power.
Think about it; if we had infinite computing power we’d put all of our data into a single, enormous integrated, normalized database. That database would support both our operational and informational needs. It would be complex, but it could be made to look simple by layering views on top of it. It would keep all the history we could ever want because, well, why not? Best of all, response time to any query, no matter how complex, would be instantaneous. Why? Because we’d have infinite computing power.
So, in the end, data warehousing is really just a way to make up for the fact that computer hardware (and maybe communication) vendors, with as many PhDs as they have, just haven’t done that one little thing we need them to do — create a computer with infinite MIPS.
Is Data Warehousing the Only Solution?
Given the fact that hardware providers are smart yet, clearly, not where we need them to be, we in the business intelligence field have come up with a “dirty” solution to help us get at our data: we build a data warehouse. We, in essence, do a lot of pre-processing on data because we don’t have the horsepower to do it when queries are issued. Pre-processing as in integrating, aggregating, and modeling into user-friendly formats.
But, is this the only way to do the job? Perhaps, given our lack of infinite MIPS, it is. Still, the idea of a single, enterprise-wide database is enticing. And, actually, there is a partial solution that, while not eliminating the need for informational data stores (i.e. data warehouses and data marts), minimizes the effort required to build them. That partial solution is integrating operational systems or, in its more common form, master data management.
Integrating data before you build a data warehouse has a number of advantages:
- It makes building the warehouse easier and cheaper
- It ensures that, operationally, the whole organization is seeing the same picture (unlike one company who called us for help after different data definitions led to a multi-million dollar ordering mistake)
- It creates a logical view of the single database concept, bringing you closer to that true picture of one, integrated database underlying your entire company
- It opens you up to reporting out of a new generation of BI tools, ones that integrate data but don’t require traditional data warehouses OR stress your operational systems each time a query is run.
So, since we don’t have infinite MIPS (yet), data warehouses and data marts do accomplish a lot and are still mostly necessary for now. But, before building a warehouse, determine if there aren’t other, less invasive, ways to deliver the BI you need. If a warehouse still makes sense, remember that integrating data between your operational systems via an MDM initiative will save you headaches, lower your cost of warehousing and, in some cases, maybe even eliminate the need for a data warehouse.