I build data warehouses. I understand why they’re important, I make a living from them. I also see that traditional, relational data warehouses are on the way out. Their demise is coming from a few technological advances. One of these is the ubiquitous, and growing, Hadoop. But, for small to mid-sized companies, the biggest advance is the growing use of in-memory reporting technologies, like QlikView.
Attributes of New Reporting Technologies
I’ve been working with QlikView for some time now. I’ve also been working with some clients that are in the process of adopting it. Here are some attributes of QlikView, and of similar tools (like Tableau), that are killing the traditional data warehouse:
- They contain their own self-managing data stores.
- They can import data from multiple sources into single, unified data stores.
- They join related data together, like relational databases.
- They provide predictable, blisteringly fast query performance.
- They provide very easy, user-friendly user interfaces.
- They can contain, and rapidly summarize, atomic-level, granular data.
- They can be incrementally refreshed, enabling the storage of history.
- They support rapid development and replace a number of the architecture layers of traditional business intelligence approaches.
Attributes of a Data Warehouse
So, how does this lead to the demise of the data warehouse? Bill Inmon originally defined a data warehouse as a “Subject-oriented, integrated, nonvolatile, time variant collection of data in support of management decisions.” In layman’s terms, a data warehouse is a database used for reporting and analysis and containing data that’s been collected from various source systems.
More Importantly – The Goals of a Data Warehouse
More important than definitions, however, are the goals of the data warehouse:
- To give business people speedy access to data for business intelligence.
- To eliminate the slowness that can be associated with reporting summary data out of complex, source databases.
- To protect the performance of source databases by offloading compute-intensive reporting to other computers.
- To make reporting easy and user-friendly.
- To provide an integrated view of the organization; to make it appear as though its data isn’t spread across separate systems; to make it look, at least for reporting and analysis, like the company is operating from one, central database.
- To save and provide access to history that is frequently discarded or overwritten in source systems.
Have no doubt, a well-designed data warehouse can be great at doing these things – at great cost and with significant complexity.
Do New Reporting Technologies Meet These Goals?
So, can a tool like QlikView replace a traditional data warehouse? Frankly, I see nothing on the above list of goals that these tools can’t do, especially if a company has a master data management program in place that ensures their operational systems already share common keys.
While these new reporting technologies can be pricey, the overall cost of implementing them is almost certainly going to be less than the cost of designing, building and maintaining data warehouses and then purchasing these same, or similar, tools to query from those warehouses.
Caveats – Why You May Still Need a Relational Data Warehouse
There are reasons why you might still need a traditional, relational data warehouse. These include:
- You have specific needs for specialized business intelligence tools that can only be used against SQL-based databases.
- You need real-time reporting of transactions (although, in most cases, this reporting should be done out of operational systems or intermediate operational data stores anyway).
- Your data must support multiple BI tools. Right now, the databases behind tools like QlikView can only be accessed with their own proprietary BI user interfaces. Thus you can’t, for example, access a QlikView associative database with tools like Business Objects, Cognos, MicroStrategy or Excel.
- Your source data is so massive that it will overwhelm the capabilities of your in-memory tool’s database. Applications like telephone call detail come to mind here.
- Your source data cannot be directly loaded into a new-generation BI tool and must be staged somewhere. An example of this is some Cloud-based systems that don’t provide strong programming interfaces for data access.
- Your source data does not share common keys and requires significant massaging to make it useful.
Is the Data Warehouse Dead – or Just Morphing?
Note that not all new BI technologies will kill the data warehouse. Some very powerful ones are still SQL-based. And, of course, SQL-based tools still need relational data sources, i.e. data warehouses.
Finally, it’s perhaps incorrect to say that the data warehouse is dead. It’s really just morphing, or better put, evolving. The definition of the data warehouse says nothing about the kind of storage technology that must be used. Thus, storing that data in an associative database, a multidimensional database, or even on punched cards doesn’t mean you don’t have a data warehouse. The trick, of course, is to make sure you’ve got something that supports your current, and future, needs at a reasonable cost.
My Recommendation
Before embarking on your first data warehouse initiative, I recommend that you take the time to fully determine where you’re trying to get to. Don’t build a data warehouse just because it’s ‘what people do.’ Instead, consider that an in-memory BI tool might address your needs. Even if you might outgrow it in a few years, your investment won’t be lost; you’ll learn a lot about your real needs during those years and you’ll be able to apply the in-memory tools to the bigger data warehouse, if and when you really need it.