As organizations increasingly rely on cloud-based analytics solutions to derive business value from their data, seamless integration between different platforms becomes critical. One powerful combination is Databricks and Oracle Analytics Cloud (OAC), Oracle’s comprehensive analytics solution. Connecting these systems enables organizations to visualize and analyze diverse datasets at scale, streamlining business intelligence workflows.
Prerequisites
Before getting started with the integration, ensure you have the following prerequisites in place:
Essential Requirements
- Access to Oracle Analytics Cloud (OAC): Confirm you have a valid subscription and the necessary permissions to configure data sources and gateways within OAC.
- Databricks Workspace: Verify you have an active Databricks environment (on Azure, AWS, or other supported cloud providers) with access to data you wish to analyze in OAC.
- Oracle Data Gateway Installer: Download the Oracle Data Gateway, which enables secure connections between OAC and remote data sources.
- Databricks JDBC Driver: Obtain the latest Databricks JDBC driver to facilitate connectivity between the platforms.
Technical Requirements
- Network Connectivity and Permissions: Ensure the compute environment hosting the Data Gateway can communicate with both your Databricks workspace and OAC. The required firewall ports must be open.
- Administrator Privileges: Administrative rights are necessary to install software (such as the Data Gateway) and to upload drivers on your computing resources.
- Service and Authentication Information: Have your Databricks JDBC URL, authentication token, and any database credentials ready for configuration within OAC.
Step-by-Step Integration Process
1. Download and install Oracle Data Gateway
- Begin by downloading the Oracle Data Gateway, an essential component for securely connecting OAC to your on-premises or virtual cloud sources.
- Install it directly on your Databricks compute environment.
- The installation process includes an agent configuration step where you’ll set up credentials for secure communication.
- After successful installation, you can access the Data Gateway configuration interface at http://localhost:8080/obiee/config.jsp.
2. Acquire and configure the Databricks JDBC driver
- Download the JDBC driver for Databricks and place it in the appropriate directory within your Data Gateway folders. This driver enables effective data communication between Databricks and OAC.
- Here’s an example of the file system directory structure showing the placement of JDBC driver files within the Data Gateway installation folders, demonstrating proper driver configuration:
3. Restart the Data Gateway
- Once the driver is configured, restart the Data Gateway to ensure the new driver is properly loaded and ready for use. You can find the restart scripts under the binary path: <Home>\DataGateway\domain\bin
4. Enable Data Gateway in OAC
- Log into OAC and enable the Data Gateway connection. This step bridges your analytics environment with the underlying Databricks data.
- Navigate to the OAC Console, click the Remote Data Connectivity tile, enable the Data Gateway, and add an Agent.
5. Configure the Databricks connection in OAC Data Visualization (DV)
- Within OAC’s Data Visualization module, set up the connection to Databricks by supplying the required JDBC details to establish a secure and high-performance link.
The OAC Data Visualization connection configuration dialog disaplys the Databricks connection parameters including hostname, port, database name, and authentication settings with username and password fields.
Important Configuration Notes:
- Authentication Token Usage: When using an access token, the username should be ‘token’ rather than your actual Databricks username.
- SQL Warehouse Connection: When connecting to a Databricks SQL warehouse, use the advanced connection type with the ConnCatalog parameter set to your Databricks catalog. For all-purpose clusters, you can use the basic connection type.
Here’s a sample connection string:
jdbc:databricks://<Hostname>:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/8ddbd20ef481e154;ConnCatalog=samples;
6. Test the connection
- Always test the connection to verify proper setup and troubleshoot any potential issues before moving to production use.
7. Enable the Assistant for Natural Language Query (NLQ)
- To further empower users, activate the Assistant for your connected Databricks dataset. This feature allows users to interact with data using natural language queries (NLQ), dramatically accelerating self-service analytics.
- The Assistant interface displays natural language query capabilities with sample questions and AI-generated visualizations. The interface shows conversation-style interaction with data insights and automated chart generation.
- The Assistant interprets conversational questions and delivers visual insights or answers, making analytics intuitive and accessible to business users without requiring technical expertise.
Important Note: To ensure the Assistant works optimally, your connected dataset must be clearly modeled and include relevant metadata. The quality of NLQ responses depends on the underlying dataset structure and clarity.
Key Limitation: Cache or Refresh Workflow
When integrating remote data sources like Databricks with OAC, be aware of this important limitation:
Direct querying of live data may not always be supported. Data from Databricks typically imports into OAC using a cache or refresh workflow, meaning analysis is performed on a snapshot of the data rather than in real time. Schedule cache refreshes as needed to keep your OAC datasets current. Alternatively, you could explore Connect to a Database Using Delta Sharing.
Conclusion
With the prerequisites met and by following this clear integration process, organizations can seamlessly bridge their Databricks environment with OAC, unlocking significant value through efficient, scalable, and insightful analytics. The combination of Databricks’ powerful data processing capabilities with OAC’s intuitive visualization tools creates a comprehensive analytics ecosystem that serves both technical and business users.
Additional Resources
For comprehensive guidance and troubleshooting, consult these official Oracle resources:
- Oracle Data Gateway Installation Guide
- Connecting Remote Databricks Data Source
- Databricks JDBC Driver Download
- Tutorial Video
- Databricks Source Support Documentation
If you encounter issues, regenerate your access token, and if you have any questions or need further assistance, feel free to reach out to us.
To learn more about Oracle Analytics Cloud, visit us at Oracle.com/analytics, try our free Analytics product , and follow and connect with us on twitter@OracleAnalytics and LinkedIn.
