OCI Dataflow Private Endpoint Use Cases

January 7, 2025 | 15 minute read
Mario Miola
Principal Solutions Architect
Text Size 100%:

Introduction

Data Flow applications can be configured to access data sources hosted within private networks, enabling secure and seamless connectivity and significantly reducing the exposure of sensitive data to potential breaches or unauthorized access. By limiting exposure to public networks, this approach reduces the risk of data breaches and unauthorized access, a critical concern for industries handling sensitive information. For example, organizations subject to regulations like the General Data Protection Regulation (GDPR) in the EU can ensure personal data remains protected within controlled environments, minimizing the risk of non-compliance. Similarly, healthcare providers bound by the Health Insurance Portability and Accountability Act (HIPAA) in the United States can securely process protected health information (PHI) through private network configurations, safeguarding patient confidentiality. This architecture also supports compliance with other regulatory frameworks, such as SOC 2 for data security and privacy, enabling organizations to meet their obligations while maintaining high-performance data processing.

Configuring a Data Flow application with private network access provides the following capabilities:

  • Access to Private Oracle Cloud Infrastructure (OCI) Data Sources: Connect to OCI data sources accessible only within private networks.
  • Integration with On-Premises Data Sources: Access on-premises data connected to an OCI Virtual Cloud Network (VCN) through Site-to-Site VPN or FastConnect.
  • Support for Oracle RAC Databases: Use the SCAN proxy functionality to access Oracle RAC databases.

Figure 1 depicts a simplified diagram of the Dataflow network configuration. Within that, we notice the serverless applications running inside the Dataflow Services Tenancy. Therefore, some network components need to be understood from this diagram.

Like any other application running on a secure private network, the Dataflow application cluster will connect to the Internet through a NAT Gateway (network address translation) and to the Oracle Service Network (OSN) using the Services Gateway (SGW), restricting any access from the Internet to the application cluster. In the same picture, a Dataflow Private Access Gateway (Dataflow PE) is constructed, allowing a Dataflow application to access OCI resources like ADB Instances that reside in a private subnet, as well as Customer On-premises resources connected to the OCI with Fast-Connect or Site-to-Site VPNs.

Dataflow PE General Schema

Figure 1 - Dataflow service tenancy network simplified design

We selected a few use cases below to analyze how the network settings and some additional configurations allow secure access through private subnets.

1- Connecting to an ADB instance configured with Private endpoint access only

This is perhaps the most common use case. A simple diagram of the configuration reveals two critical things to consider:

  • The ADB instance has a network Access type set to "Virtual cloud network", a private subnet is selected, as Figure 2 suggests:

        ADB DNS Zones to Resolve

Figure-2 ADB Network Access

  • Hence, the Private Endpoint is already resolved with the VCN, we want to carry it forward to the Dataflow configuration, for instance, as depicted in Figure-3:

        Create Private Endpoint

Figure-3 - Dataflow PE configuration for an ADB instance

As a bonus, if the ADB network configuration restricts application access even further by using Network Security Groups (NSGs) that allow only specific CIDR ranges, OCI Services, or NSGs, ensure they are represented in both ends of the ADB and Dataflow configuration. 

1-a Secure access from allowed IPs and VCNs

This is a variation of the Network settings in the ADB, where the option "Secure access from allowed IPs and VCNs only" is selected, and an access control list (ACL) is attached to the configuration. In this scenario, the Dataflow PE is not necessary. The network traffic will travel throughout the NAT gateway in the Dataflow Service Tenancy within the customer-allocated Subnet (Tenant OKE Cluster Subnet in Figure 1). For the documented list of IPs to whitelist in the ACL, please use the reference IP Allowed List.

Figure 4 shows an ADB configuration for the Phoenix region below.

Note -  Dataflow PE is not necessary for the implementation setting.

ADB Security Access

Figure-4 ADB secure access from allowed IPs and VCNs only

1-b Moving a Dataflow application to use PE and regional restrictions on Object Store service

A Dataflow application running with the "Internet access" type may count to access buckets in different regions. For instance, an application running on IAD can access objects stored in PHX, provided OCI IAM authorization grants such access.

If this application eventually moves into a "Private access" run, it will lose access to another region's public Object Store service. The Service Gateway maintains communication with the Object Store service, as depicted in Figure 1, and therefore, regional access is enforced at the domain name resolution (DNS). 

For customers in this scenario, follow 3- Cross-regional access with OCI Dataflow Private Endpoint below.

1-c Access to other Oracle Services restrictions

DNS Zones not allowed (Internal), stated the domain name restrictions you may encounter attempting to use Dataflow PE with other Oracle Services. Currently, the following DNS zones may generate a request being rejected:

            Prohibited DNS entries
 
"oracle.com"
"oracle-ocna.com"
"oraclegoviaas.com"
"oraclegovcloud.com"
"oracleiaas.com"
"grungy.us"
"oraclecorp.com"
"oraclecloud.net"
"oraclegha.com"
"oc-test.com"
"oracleemaildelivery.com"
"ocir.io"
"oracledx.com"

Figure 5 - Prohibited DNS Zones references

Work with the OCI Dataflow and Network teams if further services are required not showing in the list above. In general, the following Oracle services are limited by using the Dataflow PE:

    • Object store buckets with segregated IAM access policy
    • Cross-Region access of OCI resources
    • Direct use of IP addresses to access private resources in the customer tenancy

2- Connecting to on-prem resources

Integrating OCI Data Flow with on-premises data sources through private endpoints ensures secure and efficient data processing across hybrid environments. By leveraging private network connectivity options like Site-to-Site VPN or FastConnect, you can seamlessly connect your Data Flow applications to data repositories hosted in your on-premises infrastructure. This setup enables robust, low-latency communication while maintaining strict security boundaries. It is ideal for use cases that require secure data access and processing across cloud and on-premises environments.

A key element of this setup is the DNS resolution, which will now take place within the Private Access Gateway. The configuration should provide a DNS name (fully qualified domain name, or FQDN) for the private endpoints, not the private IP address itself. If you've configured your network setup for DNS, your hosts can access the private endpoint using the FQDN. Dataflow supports network security groups (NSGs) with its resources. You can request that the Dataflow service set up the private endpoint in an NSG within your VCN. NSGs let you write security rules to control access to the private endpoint without knowing the private IP address assigned to the private endpoint.

Once connectivity between the OCI customer VCN and the on-premise network has been established, for the Dataflow application to operate correctly, it is necessary to associate the private IPs in the on-premise network with a private Domain resolver in the OCI customer VCN.

To illustrate this part, we are using two networks connected, as shown in the diagram below in Figure 6, using the OCI Network Visualizer. The hdi-dataflow-VCN private subnet is connected to a disdemo-DRG dynamic routing gateway attachment, as the figure exposes, resembling path (2) of Figure 1 above. 

OCI Network Visualizer

Figure 6 - OCI Network Visualizer for the use-case demonstration

From this point onwards, the relevant part is to decide how the DNS resolution will occur. In this example, we created two DNS resolvers:

  • One attached to hdi-dataflow-vcn standard view (industrial.com)
  • Another in a customer-private-view with the resolution for the domain oraclevcn.com, indicating this is a private subnet in an OCI VCN. 

For instance, as depicted in Figure 7, the VCN has multiple zones, some created to support the routing throughout the PE:

         

Associated private views

                                Figure 7 - Private views of the VCN resolver

           

Private DNS Zone

                               Figure 8 - Adding the domain industrial.com as a Private DNS Zone 

          
 

Customer Private View

                              Figure 9 - Customer private view for DNS resolution in the hdi-dataflow-vcn VCN

We must resolve them using record types appropriate to the downstream network. For these types and configurations, please follow the Managing DNS zones documentation. Figures 10 and 11 show the records for each of the Private Views above:

                 

DNS Records

                               Figure 10 - The new Records for the DNS resolver MySQL_Private_DB_Industrial.com and respective private IP address

DNS Records 2

                  Figure 11 - Similarly to Figure 10, the records for the Customer view of a MySQL instance in the OCI VCN Private Subnet and respective IP address 10.0.2.236

2-a Test the Dataflow-PE

We are wondering how to test and verify that this configuration is working. Under the Github Dataflow Samples, a body of code tests the DNS resolution in the first iteration and, if positive, attempts to establish connectivity with the configured record in the second iteration. More details can be found in README.

A simple output of a successful test will show in the application driver's log as below:

FQDN 'MySQL_Private_DB.industrial.com' resolved to IP '255.33.36.2'. Testing connectivity...
Success: Able to connect to MySQL_Private_DB.industrial.com (255.33.36.2) on port 3306.

3- Cross-regional access with OCI Dataflow Private Endpoints

OCI Data Flow supports private endpoint integration for seamless cross-regional data access using remote VCN peering through an upgraded Dynamic Routing Gateway (DRG). This configuration allows Data Flow applications in one region to securely connect to data sources hosted in another region's VCN, as depicted in Figure 12. You can ensure high-performance, low-latency data transfers while maintaining a robust security posture by leveraging the upgraded DRG's advanced capabilities, including transitive routing and centralized connectivity. 

Additionally, with OCI’s multi-cloud networking capabilities, you can extend this setup to connect with Microsoft Azure. By leveraging OCI-Azure Interconnect or a similar multi-cloud connectivity framework, you can enable Data Flow to process data stored in Azure resources such as Azure Blob Storage or Azure SQL Database, as shown in Figure 13. This architecture supports centralized connectivity, transitive routing, and low-latency data transfer between OCI and Azure while maintaining stringent security and compliance standards. 

          Remote Peering  

           Figure 12 - OCI Regional Peering 

           Azure Peering

Figure 13 - OCI Azure Peering

For the Dataflow PE, it matters how the DNS names are resolved before the DRG, using private resolvers, as demonstrated in Section 2 above. A tangible benefit is that the instances for private connectivity in the service VCN can access a consumer-specified workload without traversing the Internet. Beyond that, the Dataflow PE can extend private connectivity from instances in service VCN to the consumer's on-premises network and other networks accessible via the consumer VCN. From the usability perspective, consumers continue interacting with just the service console (or API) and do not need any additional interface to enable private access. Despite the flexibility of operation using the Dataflow PE, some limitations are essential to mention, for instance:

  • The default limit for a Dataflow/PE is 5 per tenancy per region.
  • If internet connectivity is required for the Spark Application run with PE enabled, the corresponding DNS zone (e.g., Google's APIs/google.zone) needs to be mentioned in the parameter (zones) section for PE under Application. So, if the zone is whitelisted, the traffic will be routed to Customer VCN for resolution. The customer network is responsible for internet connectivity once the packet reaches the consumer gateway VCN. The network traffic will be dropped for all other zones not mentioned in the parameter (DNS Zones)

4- Connecting to an Oracle Database Cluster (RAC or Exadata)

Dataflow can connect to the RAC (real application cluster) or Exadata machine as a client application using SCAN (Single Client Access Name). The SCAN is a virtual name similar to those used for virtual IP addresses. However, unlike a virtual IP, the SCAN virtual name is associated with the entire cluster rather than an individual node and multiple IP addresses, not just one address.

When the SCAN proxy feature is enabled, a reverse connection entry point (RCE) is established to handle IP-based redirects. As shown in Figure 14, a PE VNIC (private endpoint virtual network interface card) is created in the Customer VCN. The RCE PE VNIC is unique for each Dataflow PE setup. One important consideration about the TLS connection to a database cluster is that the database SCAN listeners should redirect the network traffic to an FQDN, not the IP address directly. Only the FQDN redirects from a SCAN listener will enable TLS. Therefore, the database cluster should be configured to redirect to an FQDN if TLS is a requirement. 

The configuration steps that happen behind the scenes to create the SCAN proxy feature:

  • The user configures the SCAN proxy in the Dataflow PE configuration
  • The Dataflow updates RCE to include SCAN configuration (the SCAN listener DNS name and port), which provides a new IP (SCAN proxy IP) in the service VCN binding to the same SCAN port
  • Dataflow will then use the SCAN proxy IP to create a DNS mapping within the service network, using the original SCAN listener DNS name and the SCAN proxy IP

Figure 14 shows an example of a RAC Oracle Database system within a private subnet in the customer VCN. The flow of the picture states the sequence used to identify the connectivity:

  1. Dataflow initiates a connection to the SCAN Proxy endpoint within the service VCN using the DNS name. Dataflow defines a customer RAC connection through the SCAN Proxy by selecting a specific port. RCE SCAN proxy forwards the request to the underlying PE VNIC listener in the customer network. It then inspects the SCAN listener response for an IP of the underlying database cluster instance, creates a Class E NAT IP, and replaces the cluster instance IP with NAT IP in the SCAN proxy response.
  2. The Private Access Gateway receives the redirect request from the SCAN listener and automatically translates the local listener IP in the customer's VCN into a mapped IP address, then returns this information to the Dataflow components and makes a connection request to one of the local listeners.

SCAN Access

                               Figure 14 - SCAN proxy flow for RAC/Exadata

Figure 15 depicts an entry for the RAC DB system implemented in an OCI Customer VCN. 

Scan DNS Entry

Figure 15 - SCAN DNS name details for an Oracle RAC DB system running in an OCI VCN private subnet

5- Considerations about network traffic and isolation

When running a Spark serverless execution, it is critical to account for network traffic patterns and isolation to ensure optimal performance, security, and compliance. Serverless Spark jobs execute within a managed environment where network traffic flows between your application, data sources, and external services. To minimize latency and control traffic, ensure data sources are colocated within the same region and Virtual Cloud Network (VCN) where possible. Here are additional considerations and information about the Dataflow Private Endpoint (PE) setup:

  • Once an Application run with Dataflow PE resource is attached, the Network traffic to the internet is routed to the Customer VCN subnet through PE Infrastructure as long as the DNS zone is whitelisted during PE creation.  The run will fail if the Customer VCN does not have an Internet Gateway attached. If the DNS zone is not whitelisted, network traffic will be dropped. Network traffic to OCI Services (Eg, Object Storage) in the Oracle Services Network will still be routed through Dataflow Service VCN.
  • When the Dataflow run is in execution, the Spark Dataflow Application running from any nodes assigned to the tenant will initiate a network connection to the Customer Private resource with DNS name (Eg: customer1.instance1.subnet.oraclevcn.com).  This will involve a DNS lookup of the DNS Proxy IP assigned to this Customer, created during PE/RCE (private endpoint/reverse connection endpoint) setup. In a reverse connection, a server initiates the connection to a client, allowing the Dataflow service to access the resource privately by connecting to a designated endpoint within the Customer network.
  • The Customer's DNS zones in the private views will create a proxy that would return a Class E IP address (240.0.0.0-255.255.255.255) for the customer.instance.subnet.oraclevcn.com allocating from a specific CIDR range, for instance, 255.33.36.2, as portrayed in the test script above in item 2-aIn this example, Dataflow nodes running customer jobs can establish a network connection to 255.33.36.0/24, and a Stateful Egress Rule is created with that destination CIDR range. This means that when a Dataflow instance initiates traffic to another host and that traffic is allowed by egress security rules, any traffic that the instance subsequently receives from that host for a period is considered response traffic(ingress) and is permitted. A Route Table rule is also added to route the Dataflow PE appropriately for the destination CIDR range as  255.33.36.0/24, allocated to that Customer.

Summary

Oracle Cloud Infrastructure (OCI) Data Flow's private endpoint capabilities provide robust and secure connectivity for accessing diverse data sources and environments. With private endpoint access, you can seamlessly connect to Autonomous Database (ADB) instances configured for private access only, ensuring secure interactions without exposing the database to public networks. Similarly, private endpoints enable secure connectivity to on-premises resources through Site-to-Site VPN or FastConnect, facilitating hybrid cloud use cases. Cross-regional access is supported for distributed workloads using remote VCN peering and upgraded Dynamic Routing Gateways (DRGs), enabling low-latency data processing across regions. Additionally, OCI Data Flow supports connections to Oracle Database Clusters, such as RAC or Exadata, leveraging SCAN proxy functionality for efficient and high-availability access. These features are underpinned by stringent network isolation and traffic management practices, including private IPs, security rules, and DNS configuration, to ensure optimal performance, security, and compliance for serverless Spark executions.

References 

Dataflow - Configuring a Private Network

Oracle Cloud Infrastructure - Private Access

Enable multi-cloud cross-region interconnectivity between Microsoft Azure and Oracle Cloud Infrastructure

Managing DNS Zones

Dataflow - IP Address Allowlist

Oracle Grid Infrastructure - About the SCAN

Dataflow Samples - private endpoint test 

OCI Network Visualizer

OCI-Azure Interconnect

Mario Miola

Principal Solutions Architect

Analytics and data warehouse experienced leader, improving the design and enhancing the capabilities of commercial offerings for OCI.


Previous Post

Oracle named a Leader in 2024 Gartner® Magic Quadrant™ for Data Integration Tools for 16 consecutive years.

Peter Heller | 4 min read

Next Post


Using Network Path Analyzer and Visualiser to troubleshoot OCI GoldenGate connectivity issues

Shrinidhi Kulkarni | 11 min read
Oracle Chatbot
Disconnected