BI Predictions for 2015 (1 of 4): Hadoop and Data Warehousing – Get Prepared

While different people will point to different times for the birth of business intelligence, I’ve always argued that BI started with the invention of writing.  Writing was invented, in part, as an accounting device.  How do I know how many cows I have?  How many did I have last year?  Write it down, report it and, bingo – business intelligence!

Regardless of when it started, like just about everything else in the world, BI has changed a lot – and the pace of change is accelerating.  With that in mind, here is the first of four predictions for business intelligence in 2015, and beyond.

Hadoop Matures, But Isn’t (Yet) the Default Technology for Enterprise Data Warehouses

Looking five to ten years out, I have little doubt that Hadoop becomes one of, if not the, platform for housing large data warehouses.  The economics and the flexibility are just too strong. Still, I don’t believe we’re there today.  Why?  Because, even though vendors are investing money, the tool sets are still maturing.  And, the solutions they are developing set competing directions.

Consider, for example, the limitations of MapReduce – a standard piece of traditional Hadoop implementations.  When using MapReduce, users find that the data isn’t immediately available when it is sent to Hadoop.  To address this issue, will near real time access to data be provided by Cloudera Impala or HortonWorks’ Stinger enhancements to Hive or some other solution?  The point?  The standards haven’t shaken out…yet.

Also, in its current state, Hadoop can require a fair amount of hand coding in languages that the average BI professional does not know well, such as Java, R, and Hive – which resembles SQL but is not standard SQL.

What I’ve noticed in the Hadoop world is that organizations that have big data data warehouses are ‘using’ it.  However, drill down on ‘using’ a bit and you find three realities:

  1. Users who are actually querying against Hadoop fall into the data scientist category – folks who are very technical and whose needs vary widely from day to day.
  2. Most mainstream use tends to be as a staging area (between source systems and data warehouses) or as an archive, but not as the primary data warehouse.
  3. Companies that say they are moving their data warehouses to Hadoop are frequently experimenting with that goal in mind.  For example, I recently spoke with folks from a multibillion dollar organization that is “moving” its data warehouse from a data warehouse appliance to Hadoop.  In fact, they’re rightfully excited about that fact that they’ve already ported their data into Hadoop.  However, on further analysis, it turns out that while they’ve moved their data into Hadoop, they haven’t tried to query that data yet.  So, while Hadoop is their destination, they’re not there yet.

RECOMMENDATION: Hadoop is inevitable.  However, unless your organization is comfortable striking out on the leading edge, start by experimenting with it.  Grow your skill set.  Then move Hadoop into the data staging arena.  Once you’ve developed skill sets and confidence, consider using it to replace traditional, relational technologies in your data warehouse.

Read BI Predictions 2 of 4

Let us know your thoughts on our BI Prediction 1 of 4 and check “Notify me of new posts by email” below to make sure you don’t miss the next three predictions.  Add your comments below, feel free to email me at, and, of course, have a great holiday season!


0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.