The data lab is a more recent term in the big data and data science world. But it’s an important one, because it can be a fast route to uncovering value in new big data as well as business data in an existing data warehouse. Let’s take a look at why you should consider a data lab, and some of the key requirements for success.
There are lots of ways that big data differs from (or maybe “expands upon” is better) the data that sits in a data warehouse and is used to run your business. But perhaps the key difference is that you just don’t know what questions it is capable of answering. Think about monthly sales figures. You can query those to find out who sold what, who bought what and so on. Put more simply, you know what questions that data is capable of answering, and you can ask those questions using typical visualization or reporting tools
But with new data sources things are different. You have location data, web log files, data from sensors, weather data, traffic flow and more. There’s value hidden in all that data, but you’re not sure what it is. That’s where the data lab comes in.
The data lab is a separate environment built to allow your analysts and data scientists to figure out the value hidden in your data. The data lab helps you find the right questions to ask and, of course, put those answers to work for your business.
But why a separate environment for the data lab? It’s all about resources. Consider the following scenario.
It’s late at night on the last day of the quarter. In one part of the building, finance is busy closing the books, initiating the scripts and applications that will generate the reports for executives the next morning. It’s a critical time. Elsewhere in the building, somebody has lost track of the date as they’ve been working on a particularly vexing problem for days. But perhaps the end is in sight, because a particularly resource-intensive machine learning algorithm has been showing some promise and it’s time to try it out on the whole data set.
If there’s one thing you need in a production environment, it’s predictability. You want workloads to run and finish on time. But when you’re experimenting and trying to figure things out, predictability is not on your list. In that example above, somebody could unintentionally do significant damage to the business with their experimentation. That’s just one reason why you need to move experimentation away from your production environment.
I’ll identify four key roles that you need to consider.Data Scientist
A successful data lab project will have these three roles, and others, working together as a team. Demand for those functions will vary over time and with different projects. And very often you’ll find people who can combine two or very occasionally three different functions. For example, surveys have shown that many data scientists spend as much as 80% of their time doing data engineering work (look up the term “data janitor” which on occasion is used pejoratively), readying data for use.
I left off one role, that of developer. Somebody who perhaps is more involved with putting the results of the lab to work that in the core work of the lab itself.
The lab is not the end result. Rather, it’s a way to generate new insights that can be put to productive use. It’s important to figure out upfront how you’re going to turn insight into value. And if you’re starting a data lab project for the first time, you want that value to be visible quickly to maintain or gain organizational support for the work. In broad terms, here are three ways to go about monetizing your data lab.Build Actionable Reports
You can build a data lab anywhere, but the cloud best enables you to meet some of the unique issues you uncover in a data lab environment.Experiments Don’t Always Produce Results
If you're thinking about a data lab in the cloud a good first step would be to build a data lake to experiment with. Try our free guided trial in the cloud and get started with your data lab.