Or, how to avoid building a Data Swamp
So, you’re thinking about a data lake for your organization. You might even have the green light to start the planning stages.
We’re big believers in the power of the data lake. It can often be a significantly cheaper way to store your data, but that’s not the most attractive part. No, it’s the fact that it can hold your structured and unstructured data, internal and external data, and enable teams across the business to discover new insights.
But here’s the thing – the hype has run away (a little) with the data lake. A data lake is not something you can implement with a snap of your fingers. The rewards are enormous, but it still takes work and strategy, and that’s why we want to help you avoid some mistakes. Let's create an easier path to data lake nirvana.
We’ve gathered insights from experts Larry Fumagalli and David Bayard of Oracle’s Cloud Platform Team for best practices and what not to do.
First, of course, make sure you think about whether your data lake is going to be located in the cloud or on premises. Do you have to create your data lake on premises because of regulatory or business requirements? Or can you locate your data lake in the cloud and take advantage of the new data lake architecture, which we’ll describe in more detail below. Perhaps you can talk the exec team into trying cloud if you have your own private cloud.
There are pros and cons to each of these methods, but that’s a topic for an entirely different article. Today, we’ll focus on data lake best practices overall.1. Start With a Business Problem or Use Case for Your Data Lake
Over and over, we’ve found that customers who start with an actual business problem for their data lake are often more effective. They are more likely to have results to point to, and more likely to have information that will please the higher-ups. They often also get the data job done and do it more quickly and more easily, because they remain focused.
This may seem like a basic piece of information, but we include it here because there still exists a tendency for IT to turn their data lake into a science project; they want to play with it and experiment and build a dream data repository.
And they tend to assume that once that dream is a reality, it will solve all use cases and business teams will simply come to them with their data questions and issues. But the actual reality is that this rarely happens, and it’s better if you start with a business problem in mind, stay focused, and solve it.
Guest author, Sherry Tiao is a Product Marketing Senior Manager for Oracle Big Data Cloud