A few weeks ago I wrote about the idea that there are really only two uses for BI: analysis and business process improvement. Today, let’s focus a bit more on analytic users / analysts. In particular, let’s consider the concept of the data ‘sandbox’. To us, a sandbox is an area of storage where a few highly skilled users can import and manipulate large volumes of data. The whole point of doing so is that these users frequently need data other than what’s in the warehouse. We’ve encountered the sandbox concept a few times recently at Dataspace, both in our consulting engagements and in conversations with other data warehousing professionals.
As a rule, sandbox data is only needed by a small user group and it’s only needed temporarily. Once the necessary knowledge is squeezed out, the data can be discarded. However, If the data doesn’t fit these criteria, it may be a good candidate for addition to the data warehouse for widespread access.
Think of a sandbox as a grown up version of Microsoft Access; it’s a place without a ton of IT controls and where folks can throw and manipulate data. With this in mind, there are two basic ways to provide a sandbox. The first is as a completely separate database where users can store and play with data. This is a common use case for Hadoop and other noSQL technologies.
The other option, and one we recently recommended to a client, is to set aside a portion of a data warehouse as a sandbox. This area would only be accessible to analysts and only under certain circumstances. This approach has one major advantage: new data sets that are loaded can easily be tied to existing data warehouse tables. There’s no need to transfer warehouse data to some other environment, and then have to go back for more if the first grab doesn’t contain all the necessary information.
One danger of sandboxes is that end-users might use them to build permanent applications outside the purview of IT. We strongly discourage this since the proper home for permanent analytic data structures is in the data warehouse itself. However, organizations frequently do need to examine data and prototype table designs before building permanent structures. A sandbox is an ideal environment for this type of work. In fact, Tom Carroll, one of our Dataspace consultants, spent a number of years in the telematics division of an auto manufacturer and used this technique extensively.
Regardless of how you implement a sandbox, it is important to establish ground rules before opening the environment to user access. Starting point guidelines might include:
- The environment is for short term studies and prototyping only, and is not to be used as a permanent home for applications or data
- Each user is assigned a controlled amount of storage, and governors are put in place to limit the computing resources they can consume
- Users must demonstrate both technical expertise and a willingness to abide by the environment’s rules before participating
- All sandbox users, and recipients of analyses produced in the sandbox, must understand that the analyses were produced without the level of validation and quality control provided by applications built on the “managed” portion of the data warehouse.
In order to keep the sandbox from becoming a permanent home for databases, some organizations run system reports on the sandbox structures to see how long they have existed. If something was built, say, six months ago, IT is justified in asking its owner what the purpose of the structure is and what the time frame is for the database to become a ‘certified’ piece of the data warehouse.
As for how much space to give sandbox users, the answer is determined by balancing the cost of sandbox space with the need to ensure enough space is available for promising prototyping and analysis activities.
In the end, the “democratization” of technology has made it likely that some end users will have significant database and SQL skills. It can be game changing for both these users, and their organizations, to provide them with space to pursue data-based ideas and possibilities. However, these users must understand that they are working inside IT-supported environments. In fact, we recommend that a written agreement between these users, their managements, and IT be put in place before they are granted sandbox access. This agreement should outline the ground rules for the sandbox, the space to be granted to the user, allowed and disallowed activities, and any other guidelines that will make for a smooth-running environment.
Does your organization have a data sandbox in place? What’s working? What’s not? Feel free to comment below.
Thanks for reading!