The modern enterprise collects huge volumes of valuable data at an ever-accelerating rate. This data can come from internal sources, such as transactions, website activity, and device sensors, or external, including social media, government data, and numerous others. But many companies are still struggling with the next step: Deriving true business value from this data deluge to better understand customers, create products, and enter new markets.
Increasingly, people see the cloud as the solution where data flows easily, and the capacity to store it and computing power to analyze it are virtually limitless. Increasing infrastructure capacity requires investments, and not everyone can make those investments. But cloud technology, while critical to creating business value from data, is only one piece of the puzzle. Enterprises need to foster a data-driven culture and ongoing processes to succeed in the data game.
Many enterprises don’t get as much out of their data as they could, because their culture doesn’t support it. They think of data as a liability instead of an opportunity. If an enterprise doesn’t promote and reward equal opportunity and curiosity-driven practices, there’s no reason to believe the way they work with data is any different. In my experience, if you want to aggregate data-driven insights, it’s rarely the technology that stands in the way.
The best way to move data cost effectively is not to move it at all. That’s the premise on which frameworks like Hadoop were built: Bring computing close to the data to avoid moving the data around. As networking speeds and capacity increased, we’ve realized that you don’t necessarily need to get all of your data in one place. You need to know what it is, where it is, and how to access it in real time.
Solutions like data catalogs can index metadata to make this happen. Cloud computing not only accelerates and facilitates, but it also permits the rapid development and deployment of next-generation applications that use machine learning (ML) and artificial intelligence (AI) without the need to recreate the whole infrastructure.
With the culture and technology in place, creating value from data drawn from internal and external sources is a five-step process.
Start with domain expertise. Gathering lots of data doesn’t move you forward. You need the domain expertise to identify what datasets are relevant, what parameters of these datasets are relevant, how they relate to each other, and how to interpret the data into actionable analysis.
Develop data expertise. While domain expertise is a crucial competitive differentiator, you need data science capabilities. Transform and map raw data into a form that can be analyzed, and develop and train the machine learning models to facilitate automation.
Establish process. Many organizations still struggle with adhering to a consistent software development process, which makes developing data-driven products more challenging. Not only are you managing different versions of the code, but you also have to manage different versions of the data and metadata, like provenance and trust. Without a clear development process, this management can quickly get out of hand.
Acquire the data. Acquisition isn’t the first step, because you have to lay the groundwork with expertise and process first. There’s a lot of data out there, so you need to evaluate it for relevance and accuracy and integrate it with your enterprise data using your domain expertise. Which data sets are important? How do you combine them? Which algorithm should you use? All these difficult questions require clear answers.
Refine, retrain, repurpose. It’s critical to understand that data science is not a one-and-done proposition. It’s an ongoing process that starts with a generic model and is continually refined with real-world data and edge cases.
Data without context is not very useful, and in turn, it’s hard to establish context without metadata. To get the most out of your data and transform it into actionable knowledge, you must develop an agile data management system. If you don’t take a disciplined approach to data management, it’s easy to lose track of things.
If you put together data, context, and metadata in the right way, you get more than the sum of their parts. You evolve from data to knowledge. This evolution doesn’t happen overnight. Start by managing data, building a culture, and getting the right technology in place. Eventually, you reach a point where you get all your data in one place. You’re ready to move your data across clouds, you can integrate your data, and then you can start thinking about things like metadata, knowledge management, and making them work for you.
Data doesn’t have significant value on its own. It’s the ore that we mine, refine, and analyze to create knowledge and insight. In today’s world, that process has to happen at unprecedented speed and scale to be effective. By building your data analytics capabilities on a flexible infrastructure that connects securely with multiple data sources, you can achieve these levels of scale and speed. Many organizations opt for building this infrastructure in the cloud, even if it includes components that remain on-premises.
To learn how Oracle Generation 2 Cloud Infrastructure can power data analytics that help your organization achieve its strategic objectives, visit Oracle Cloud Infrastructure.
George Anadiotis is the founder of Linked Data Orchestration, where he works on the intersection of technology, media, and data.