Data Flow applications can be configured to access data sources hosted within private networks, enabling secure and seamless connectivity and significantly reducing the exposure of sensitive data to potential breaches or unauthorized access. By limiting exposure to public networks, this approach reduces the risk of data breaches and unauthorized access, a critical concern for industries handling sensitive information. For example, organizations subject to regulations like the General Data Protection Regulation (GDPR) in the EU can ensure personal data remains protected within controlled environments, minimizing the risk of non-compliance. Similarly, healthcare providers bound by the Health Insurance Portability and Accountability Act (HIPAA) in the United States can securely process protected health information (PHI) through private network configurations, safeguarding patient confidentiality. This architecture also supports compliance with other regulatory frameworks, such as SOC 2 for data security and privacy, enabling organizations to meet their obligations while maintaining high-performance data processing.
Configuring a Data Flow application with private network access provides the following capabilities:
Figure 1 depicts a simplified diagram of the Dataflow network configuration. Within that, we notice the serverless applications running inside the Dataflow Services Tenancy. Therefore, some network components need to be understood from this diagram.
Like any other application running on a secure private network, the Dataflow application cluster will connect to the Internet through a NAT Gateway (network address translation) and to the Oracle Service Network (OSN) using the Services Gateway (SGW), restricting any access from the Internet to the application cluster. In the same picture, a Dataflow Private Access Gateway (Dataflow PE) is constructed, allowing a Dataflow application to access OCI resources like ADB Instances that reside in a private subnet, as well as Customer On-premises resources connected to the OCI with Fast-Connect or Site-to-Site VPNs.
Figure 1 - Dataflow service tenancy network simplified design
We selected a few use cases below to analyze how the network settings and some additional configurations allow secure access through private subnets.
This is perhaps the most common use case. A simple diagram of the configuration reveals two critical things to consider:
Figure-2 ADB Network Access
Figure-3 - Dataflow PE configuration for an ADB instance
As a bonus, if the ADB network configuration restricts application access even further by using Network Security Groups (NSGs) that allow only specific CIDR ranges, OCI Services, or NSGs, ensure they are represented in both ends of the ADB and Dataflow configuration.
This is a variation of the Network settings in the ADB, where the option "Secure access from allowed IPs and VCNs only" is selected, and an access control list (ACL) is attached to the configuration. In this scenario, the Dataflow PE is not necessary. The network traffic will travel throughout the NAT gateway in the Dataflow Service Tenancy within the customer-allocated Subnet (Tenant OKE Cluster Subnet in Figure 1). For the documented list of IPs to whitelist in the ACL, please use the reference IP Allowed List.
Figure 4 shows an ADB configuration for the Phoenix region below.
Note - Dataflow PE is not necessary for the implementation setting.
Figure-4 ADB secure access from allowed IPs and VCNs only
A Dataflow application running with the "Internet access" type may count to access buckets in different regions. For instance, an application running on IAD can access objects stored in PHX, provided OCI IAM authorization grants such access.
If this application eventually moves into a "Private access" run, it will lose access to another region's public Object Store service. The Service Gateway maintains communication with the Object Store service, as depicted in Figure 1, and therefore, regional access is enforced at the domain name resolution (DNS).
For customers in this scenario, follow 3- Cross-regional access with OCI Dataflow Private Endpoint below.
DNS Zones not allowed (Internal), stated the domain name restrictions you may encounter attempting to use Dataflow PE with other Oracle Services. Currently, the following DNS zones may generate a request being rejected:
"oracle.com" "oracle-ocna.com" "oraclegoviaas.com" "oraclegovcloud.com" "oracleiaas.com" "grungy.us" "oraclecorp.com" "oraclecloud.net" "oraclegha.com" "oc-test.com" "oracleemaildelivery.com" "ocir.io" "oracledx.com" |
Figure 5 - Prohibited DNS Zones references
Work with the OCI Dataflow and Network teams if further services are required not showing in the list above. In general, the following Oracle services are limited by using the Dataflow PE:
Integrating OCI Data Flow with on-premises data sources through private endpoints ensures secure and efficient data processing across hybrid environments. By leveraging private network connectivity options like Site-to-Site VPN or FastConnect, you can seamlessly connect your Data Flow applications to data repositories hosted in your on-premises infrastructure. This setup enables robust, low-latency communication while maintaining strict security boundaries. It is ideal for use cases that require secure data access and processing across cloud and on-premises environments.
A key element of this setup is the DNS resolution, which will now take place within the Private Access Gateway. The configuration should provide a DNS name (fully qualified domain name, or FQDN) for the private endpoints, not the private IP address itself. If you've configured your network setup for DNS, your hosts can access the private endpoint using the FQDN. Dataflow supports network security groups (NSGs) with its resources. You can request that the Dataflow service set up the private endpoint in an NSG within your VCN. NSGs let you write security rules to control access to the private endpoint without knowing the private IP address assigned to the private endpoint.
Once connectivity between the OCI customer VCN and the on-premise network has been established, for the Dataflow application to operate correctly, it is necessary to associate the private IPs in the on-premise network with a private Domain resolver in the OCI customer VCN.
To illustrate this part, we are using two networks connected, as shown in the diagram below in Figure 6, using the OCI Network Visualizer. The hdi-dataflow-VCN private subnet is connected to a disdemo-DRG dynamic routing gateway attachment, as the figure exposes, resembling path (2) of Figure 1 above.
Figure 6 - OCI Network Visualizer for the use-case demonstration
From this point onwards, the relevant part is to decide how the DNS resolution will occur. In this example, we created two DNS resolvers:
For instance, as depicted in Figure 7, the VCN has multiple zones, some created to support the routing throughout the PE:
Figure 7 - Private views of the VCN resolver
Figure 8 - Adding the domain industrial.com as a Private DNS Zone
Figure 9 - Customer private view for DNS resolution in the hdi-dataflow-vcn VCN
We must resolve them using record types appropriate to the downstream network. For these types and configurations, please follow the Managing DNS zones documentation. Figures 10 and 11 show the records for each of the Private Views above:
Figure 10 - The new Records for the DNS resolver MySQL_Private_DB_Industrial.com and respective private IP address
Figure 11 - Similarly to Figure 10, the records for the Customer view of a MySQL instance in the OCI VCN Private Subnet and respective IP address 10.0.2.236
We are wondering how to test and verify that this configuration is working. Under the Github Dataflow Samples, a body of code tests the DNS resolution in the first iteration and, if positive, attempts to establish connectivity with the configured record in the second iteration. More details can be found in README.
A simple output of a successful test will show in the application driver's log as below:
FQDN 'MySQL_Private_DB.industrial.com' resolved to IP '255.33.36.2' . Testing connectivity... Success: Able to connect to MySQL_Private_DB.industrial.com ( 255.33 . 36.2 ) on port 3306 . |
OCI Data Flow supports private endpoint integration for seamless cross-regional data access using remote VCN peering through an upgraded Dynamic Routing Gateway (DRG). This configuration allows Data Flow applications in one region to securely connect to data sources hosted in another region's VCN, as depicted in Figure 12. You can ensure high-performance, low-latency data transfers while maintaining a robust security posture by leveraging the upgraded DRG's advanced capabilities, including transitive routing and centralized connectivity.
Additionally, with OCI’s multi-cloud networking capabilities, you can extend this setup to connect with Microsoft Azure. By leveraging OCI-Azure Interconnect or a similar multi-cloud connectivity framework, you can enable Data Flow to process data stored in Azure resources such as Azure Blob Storage or Azure SQL Database, as shown in Figure 13. This architecture supports centralized connectivity, transitive routing, and low-latency data transfer between OCI and Azure while maintaining stringent security and compliance standards.
Figure 12 - OCI Regional Peering
Figure 13 - OCI Azure Peering
For the Dataflow PE, it matters how the DNS names are resolved before the DRG, using private resolvers, as demonstrated in Section 2 above. A tangible benefit is that the instances for private connectivity in the service VCN can access a consumer-specified workload without traversing the Internet. Beyond that, the Dataflow PE can extend private connectivity from instances in service VCN to the consumer's on-premises network and other networks accessible via the consumer VCN. From the usability perspective, consumers continue interacting with just the service console (or API) and do not need any additional interface to enable private access. Despite the flexibility of operation using the Dataflow PE, some limitations are essential to mention, for instance:
Dataflow can connect to the RAC (real application cluster) or Exadata machine as a client application using SCAN (Single Client Access Name). The SCAN is a virtual name similar to those used for virtual IP addresses. However, unlike a virtual IP, the SCAN virtual name is associated with the entire cluster rather than an individual node and multiple IP addresses, not just one address.
When the SCAN proxy feature is enabled, a reverse connection entry point (RCE) is established to handle IP-based redirects. As shown in Figure 14, a PE VNIC (private endpoint virtual network interface card) is created in the Customer VCN. The RCE PE VNIC is unique for each Dataflow PE setup. One important consideration about the TLS connection to a database cluster is that the database SCAN listeners should redirect the network traffic to an FQDN, not the IP address directly. Only the FQDN redirects from a SCAN listener will enable TLS. Therefore, the database cluster should be configured to redirect to an FQDN if TLS is a requirement.
The configuration steps that happen behind the scenes to create the SCAN proxy feature:
Figure 14 shows an example of a RAC Oracle Database system within a private subnet in the customer VCN. The flow of the picture states the sequence used to identify the connectivity:
Figure 14 - SCAN proxy flow for RAC/Exadata
Figure 15 depicts an entry for the RAC DB system implemented in an OCI Customer VCN.
Figure 15 - SCAN DNS name details for an Oracle RAC DB system running in an OCI VCN private subnet
When running a Spark serverless execution, it is critical to account for network traffic patterns and isolation to ensure optimal performance, security, and compliance. Serverless Spark jobs execute within a managed environment where network traffic flows between your application, data sources, and external services. To minimize latency and control traffic, ensure data sources are colocated within the same region and Virtual Cloud Network (VCN) where possible. Here are additional considerations and information about the Dataflow Private Endpoint (PE) setup:
Oracle Cloud Infrastructure (OCI) Data Flow's private endpoint capabilities provide robust and secure connectivity for accessing diverse data sources and environments. With private endpoint access, you can seamlessly connect to Autonomous Database (ADB) instances configured for private access only, ensuring secure interactions without exposing the database to public networks. Similarly, private endpoints enable secure connectivity to on-premises resources through Site-to-Site VPN or FastConnect, facilitating hybrid cloud use cases. Cross-regional access is supported for distributed workloads using remote VCN peering and upgraded Dynamic Routing Gateways (DRGs), enabling low-latency data processing across regions. Additionally, OCI Data Flow supports connections to Oracle Database Clusters, such as RAC or Exadata, leveraging SCAN proxy functionality for efficient and high-availability access. These features are underpinned by stringent network isolation and traffic management practices, including private IPs, security rules, and DNS configuration, to ensure optimal performance, security, and compliance for serverless Spark executions.
Dataflow - Configuring a Private Network
Oracle Cloud Infrastructure - Private Access
Dataflow - IP Address Allowlist
Oracle Grid Infrastructure - About the SCAN
Analytics and data warehouse experienced leader, improving the design and enhancing the capabilities of commercial offerings for OCI.
Previous Post
Next Post