For our blog segment this week, I am pleased to have as our guest blogger David Teszler, Senior Principal for Advanced Analytics & Market Development. Today he is going to help us understand the business value of creating an Analytics Data Mart, and how Oracle's broad portfolio of data management solutions can assist with this endeavor.
In a previous entry, my colleague Brian Spendolini, reviewed best practices and options or moving an Oracle database to the cloud. However, why would you want to move an Oracle database to the cloud? Who wants this data? Who are the consumers? What can they get on the cloud that they can’t get on-premise? What constraints can be removed? Who cares?
One answer to these questions is to serve the demand from analytic domain experts (data scientist) in every line of business. What are their challenges and why would you, a reader of the Database Insider, want to talk to them? Hint, they are hungry for the data you manage and they are not happy being hungry.
Machine Learning and advanced analytics is work that wants to follow the data but often can’t. The data housed (trapped?) in your data warehouse and transactional system wants to be used for analytic experimentations. You really don’t let analytic teams run endless table scans, derive their own columns and embed new analytic objects in a running application.
However, it is your best data, cleanest data and most trusted data. It is what is going on inside of your organization. While it’s not the only data your data science teams want, it is essential data. How can they design an algorithm to help sell more widgets without data on the sales of widgets, service experience of widgets, what widgets sell with other widgets, who’s best as at selling widgets and what risks do my widget expose my firm too?
Why are we making this so hard on the Domain Experts ?
The data analytics pipeline process is broken. It simply takes too long to unite data
with your analytics. Your scarcest resources wait for data. Your best analytic
teams are forced to ‘wrangle’ data that’s already organized. Why are they made to download database data to a spreadsheet and load to an analytics engine under their desk or yet another format on another cloud ? You did the work but they can’t use it. Then there is a question of accuracy as your best analysis is done on subsets and snapshots.
Scarcity begets scarcity
Your domain analytics teams resort to this process because while you posses the data they need and the know how to deliver it you are constrained by infrastructure. It would be nice to give everyone their own Exadata but your budget doesn’t support that. The infrastructure scarcity creates data scarcity that exaggerates your true scarcities of time and skills.
Lost Time is Lost Money
Whatever the accuracy of the models, they are delivered later when time is lost to data management. While this time is lost forever, the analytics value is in -
• Retaining A Customer
• Servicing a Product Before an Outage
• Detecting and Preventing Fraud
Data Context Matters, Don’t Destroy It Just To Analyze It
Step one is reversing the practice of extracting and moving data to analytics platform. The extraction step immediately requires a data wrangling step to turn it into something your analytic tool can use. The extract also eliminates relationships important to your analytics (Master, Detail, Sales to Customer, Sales to Channel, Sales to Time, Sales to Cost etc.)
The wrangling is therefore more complex than simple format changes, it is the entire of context of your business domain: Customer, Store, Product, Service, Risk, Pricing and Supply Chain. This context exists in your applications, data marts and data warehouses. It’s just cutoff from machine learning. Extracts necessarily destroy this context. If the context is in question so are the analytics. Start with your data as is. All of it.
Just Clone to the Cloud and Address Scarcity
If you are a data manager, provide a lab that contains the full data context and analytics tools on the same platform. This could be an operational data store, a full data warehouse or a few tables from the data mart.
In ONE STEP, you address four scarcities created by your existing data pipeline: Time, Skills, Data and Infrastructure.
Using multi-tenant you can plug in multiple sources into the same database service. Now your domain team can have experiment on more data. Since this is just another Oracle database, you can apply data security using the same tools you use today including carrying over your current redaction and masking policies.
If you are an analytics consumer, ask to set up a full context lab and exploit the embedded analytics in the database. And YES you can add your own data and derive your own variables and even source new data on the cloud and access all data using familiar languages.
Machine learning libraries are included
What about the analytics? Here you return to the premise of seeing the database as an analytics engine. To realize this vision, you simply use the embedded analytics tools in Oracle Database Cloud High Performance Edition that include: Statistical SQL, Data Mining, Advanced Analytics, R, Graph and Spatial Analytics. And Text Analytics These tools are included in the Oracle database service High Performance Editions. You can also leverage our BI and Visualization Service against these sources.
Of course, you can bring your own tools, and Zepplin and Jupyter notebooks and run them on our compute service.
Make Your Analytics Domain Expert Happy
Brian showed us how to apply everyday common Oracle practice to move whole databases to the cloud. Here we outlined how you can help one of your user communities address four common scarcities to deliver more accurate analytics sooner. Unite their need for data and compute by leveraging your current skills and the cloud. You try it here with our partner Vlamis Software Solutions.
Follow Us On Social Media: