Customers generate enormous amounts of data in SaaS applications which are critical to business decisions such as reducing procurement spend or maximizing workforce utilization. With most customers using multiple SaaS applications, many of these decisions are made in analytical engines outside of SaaS, or need external data to be brought to SaaS to make decisions within. In this blog we shall examine common data movement and replication needs in the SaaS ecosystem and how Oracle’s Data Integration Platform Cloud (DIPC) enables access to SaaS data and helps with decision making.
As applications moved from on-premise to SaaS, while they provided a number of benefits, a number of pre-existing assumptions and architectures changed. Let us examine a few changes in enterprise landscape here, which are by no means comprehensive.
First, on-premise applications in most cases provided access to applications at a database level, typically read only. This has changed with hardly any SaaS vendor providing database access. Customers now work with REST APIs (or earlier versions of SOAP APIs) to extract and load bulk data. While APIs have many advantages, including removing dependency on application schema, they are no match for SQL queries and have pre-set data throttling limitations defined by SaaS vendor.
Second, most customers have multiple SaaS applications which makes it imperative to merge data from different pillars for any meaningful analysis/insight; Sales with Product, Leads with Contacts; Orders with Inventory and the list goes on. While each of the SaaS applications provide some analytical capability, most customers would prefer modern best of breed tools and open architectures for their data for analytical processing. This could be from traditional relational databases with Business Intelligence to modern Data Lakes with Spark engines.
Third most enterprise customers have either an application or an analytical/reporting platform on-premise, which necessitates data movement between cloud to on-premise; i.e, a hybrid cloud deployment.
Fourth, semi-structured and unstructured data sources are increasingly used in decision making. Emails, Twitter feeds, Facebook and Instagram posts, Log files and device data all provide context for transactional data in relational systems.
And finally, decision making timelines are shrinking with need for real-time data analysis more often than not. While most SaaS applications provide batch architectures, and REST APIs they struggle to provide robust streaming capability for real time analysis. Customers need SaaS applications to be part of both Kappa and Lambda style architectures.
Let us take a peek into how Oracle Data Integration Platform Cloud addresses these issues.
Data Integration Platform Cloud (DIPC) is acloud-based platform for data transformation, integration, replication and governance. DIPC provides batch and real-time data integration among cloud and on-premises environments and brings together the best of breed Oracle data integration products of Oracle GoldenGate, Oracle Data Integrator and Oracle Enterprise Data Quality within one unified cloud platform. You can find more information on DIPC here.
For Oracle’s Fusion applications, such as ERP Cloud, HCM Cloud and Sales Cloud, DIPC supports a number of load and extract methods with out of the box connectors. These include BI Publisher, BI Cloud Connector and other standard SOAP/REST interfaces. The choice of interface depends on specific use case. For example, to extract large datasets for a given subject area (say Financials-> Accounts), BI Cloud Connector (BICC) is ideal with its incremental extract setup in Fusion. BICC provides access to Fusion Cloud data via Public View Objects (PVOs). These PVOs are aggregated into Subject Areas (Financials, HCM, CRM etc), and BICC can be setup to manually or programmatically pull full or incremental extracts. DIPC integrates with BI Cloud Connector to kick off an extract, download the PVO data files in chunks, unzip and decrypt them, extract data from CSV formats, read metadata formats from mdcsv files and finally load them to any target such as Database Cloud Service or Autonomous Data Warehouse Cloud Service. For smaller datasets, DIPC can call existing or custom built BI Publisher reports and load data to any targets.
For other SaaS applications, DIPC has drivers for Salesforce, Oracle Service Cloud, Oracle Sales Cloud and Oracle Marketing Cloud. These drivers provide a familiar jdbc style interface for data manipulation while accessing SaaS applications over REST/SOAP APIs. In addition, other SaaS applications that provide JDBC style drivers, such as NetSuite can become a source and target for ELT style processing in DIPC. DIPC has generic REST and SOAP support allowing access to any SaaS REST APIs. You can find list of sources and targets supported by DIPC here.
DIPC simplifies data integration tasks using the Elevated Tasks, and users can expect more wizards and recipes for common SaaS data load and extract tasks in future. The DIPC Catalog is populated with metadata and sample data harvested from SaaS applications. In the DIPC Catalog users can create Connections to SaaS applications, and subsequent to which a harvest process will be kicked off and populate the Catalog with SaaS Data Entities. From this Catalog, users will be able to create Tasks with Data Entities as Sources and Targets, and wire together a pipeline data flow including JOINs, FILTERS and standard transformation actions. Elevated tasks can also be built to feed SaaS data to a Data Lake or Data Warehouse such as Oracle Autonomous Data Warehouse Cloud (ADWCS). In addition, there is a full featured Oracle Data Integrator embedded inside for existing ODI customers to build out Extract, Load and Transform scenarios for SaaS data integration. Customers can also bring their existing ODI scenarios to DIPC using ODITask. ODITask is an ODI scenario exported from ODI and imported into DIPC for execution. ODITask can be wired to SaaS source and targets.
Figure above shows DIPC Catalog populated with ERP Cloud View Objects.
Figure above shows details for Work Order View Object in DIPC Catalog
For Hybrid cloud architectures, DIPC provides a remote agent that includes connectors to a wide number of sources and targets. Customers who wish to move/replicate data from on-prem sources can deploy the agent, and have data pushed to DIPC in the cloud for further processing, or vice versa for data being moved to on-premise applications. The remote agent can also be deployed on non-Oracle cloud for integration with Databases running on 3rdparty clouds.
For real-time and streaming use cases from SaaS Applications, DIPC includes Oracle Golden Gate, the gold standard in data replication. When permissible, SaaS Applications can deploy Golden Gate to stream data to external Databases, Data Lakes and Kafka Clusters. Either Golden Gate can be deployed to read directly from the SaaS Production database instance to mine the database redo log files or can run on a standby/backup copy of SaaS database and use the cascading redo log transmission mechanism. This mechanism leads to minimal latency and delivers Change Data Capture of specific SaaS transaction tables to an external database or data warehouse providing real-time transaction data for business decisions.
Using these comprehensive features in DIPC, we are seeing customers sync end of day/end of month batches of Salesforce Account information into E-Business Suite. Fusion Applications customers are able to extract from multiple OTBI Subject areas and merge/blend Sales, Financials and Sales / Service objects to create custom datamarts. And in Retail, we have customers using Golden Gate’s change data capture to sync Store data to Retail SaaS Apps at corporate in real time.
In summary, DIPC provides a comprehensive set of features for SaaS customers to integrate data into Data Warehouses, Data Lakes, Databases and with other SaaS Applications in both real-time and batch. You can learn more about DIPC here.