Using Oracle Database Service for Azure connected to a hub-and-spoke network requires extra setup in Azure to enable internet control message protocol (ICMP) messages to reach the spoke networks. This setup is required for the Path MTU Discovery (PMTUD) protocol to work properly and all network traffic to flow correctly.

Without these steps, you can experience hanging connections caused by a failure to correctly negotiate the maximum segment size (MSS) between Azure and Oracle Cloud Infrastructure (OCI) resources across the multicloud network link.

This blog explains the background and shares the optimal setup.

Background and architecture

Figure 1 shows a network topology that requires traffic between the application and database to travel through a central hub VNet.

Architecture diagram showing Oracle Database Service for Azure hub-and-spoke network

Figure 1: High-level network flow diagram using Oracle Database for Azure in a hub-and-spoke network topology

You can find a detailed architecture showing how to set up a hub-and-spoke network with Oracle Database for Azure in our Architecture Center.

In the setup illustrated in figure 1, we see two spoke VNets (10.40.1.0/24 and 10.40.2.0/24) each with a simple virtual machine (VM)-based application. Both applications need to connect to the same OCI database (on 192.168.2.70) through Oracle Database for Azure.

This traffic is routed through a network virtual appliance (NVA) in the hub VNet (10.40.0.0/24). This NVA can be anything, such as a VM-based NVA, an Azure Firewall service, or any third-party VM-based firewall. The spoke VNets peer with the hub VNet.

On-Premise connectivity may not be traversed via Firewall unless manual UDRs are added on ODSA service network.  For this, customer are advised to log a service request to Oracle Support with the UDR information that needs to be added to the ODSA service network.

As part of the Oracle Database for Azure setup, the hub VNet is peered with a service VNet, which exists in the Oracle Database for Azure control plane. The service VNet isn’t visible to customers and is managed by Oracle. Within this service VNet, a router forwards traffic to OCI through a private tunnel.

Negotiating the maximum segment size

The private tunnel between Azure and OCI creates a small overhead to the size of the network packets. So, the maximum transmission unit (MTU) size on this part of the network is smaller than when a connection is made in other ways.

The PMTUD protocol negotiates the maximum segment size (MSS). PMTUD is an industry-standard technique that allows network connections to be established regardless the differences in MTU size. For details on PMTUD, see the OCI documentation on Hanging Connection. This documentation explains that, for PMTUD to work, ICMP type 3 code 4 messages must be allowed to reach all components and resources in the connection.

Requirements to allow ICMP

To allow ICMP to reach the spoke VNets and for PMTUD to work, complete the following requirements:

  • On the network security group (NSG) that governs the VM from which the connection is made, allow ICMP traffic coming from the Oracle Database for Azure router.
  • Ensure that the ICMP traffic flows freely and correctly through the firewall (or your other NVA routing device) in the hub VNet.

The next section describes the details on how to implement this solution. However, this setup might not be feasible for the following reasons:

  • In Azure, you can’t allow only specific ICMP types, and you don’t want to allow all ICMP traffic for security reasons.
  • Not all multinode third-party firewalls allow session persistence. So, ICMP traffic can be misrouted between firewall nodes, and PMTUD packets are dropped.
  • In the Azure Firewall service, you can’t enable session persistence, so PMTUD packets can be dropped for the same reason.

If you’re affected by these limitations, manually lowering MTU settings is considered a workaround because then PMTUD isn’t required. Details of the various options for this workaround are explained in Addressing MTU issues in Oracle Database for Azure when PMTUD is blocked.

Implementation details

As discussed, you have two requirements to ensure that PMTUD works correctly when using Oracle Database for Azure with a hub-and-spoke network.

Allow ICMP traffic from the Oracle Database for Azure router to the client

In the Oracle Database for Azure console, find the router IP address. Select the relevant database, select the database system, then select Networking:

A screenshot of the network details in the OracleDB for Azure console with the router IP address circled in yellow

Figure 2: Oracle Database for Azure console showing network details, including router IP

When you find the IP address, you can add a rule to allow ICMP traffic in the Azure resource NSG.

Screenshot showing adding the NSG rule for ICMP

Figure 3: Adding NSG entry to allow ICMP traffic from the Oracle Database for Azure router address space

In Figure 3, we allow traffic from the whole 100.64.0.0/10 CIDR range, instead of from the router IP address only (which in our example is 100.66.10.73, shown in Figure 2). All Oracle Database for Azure router IPs always come from the Carrier-Grade NAT (CGNAT) address space (100.64.0.0/10). Allowing the complete range simplifies this step.

Ensure that ICMP traffic flows through the firewall in the Hub VNet

You must also ensure that the ICMP traffic flows freely and correctly through the firewall (or other NVA routing device) in the hub VNet.

The firewalls must allow the ICMP traffic from the Oracle Database for Azure router back to the original client. For firewalls or NVAs with multiple nodes, avoid asymmetric routing, which typically means configuring session persistence on the load balancer of the NVA or firewall nodes. Without session persistence, the ICMP packet can hit a different firewall node and be discarded by the firewall.

Setting session persistence works differently in different firewalls. For example, using Fortinet’s FortiGate Next Generation Firewall (NGFW) from the Azure marketplace uses the Azure Load Balancer service. You can configure session persistence on the load balancer, as shown in figure 4:

A screenshot of the Load balancing rules page in Microsoft Azure with the session persistence highlighted in yellow.

Figure 4: Example of configuring session persistence on an Azure load balancer

When the firewall uses session persistence, the PMTUD communication between the client and the Oracle Database for Azure router flows without problems.

Other firewalls can work differently. For example, the Azure Firewall service, which always uses multiple nodes, doesn’t allow access to load balancer session persistence settings. So, PMTUD communication between resources in spoke networks and the Oracle Database for Azure router doesn’t consistently work because of this shortcoming in the Azure Firewall service. In these cases, manually lowering MTU settings is considered a workaround because then PMTUD isn’t required. Details of the various options for this workaround are explained in Addressing MTU issues in Oracle Database for Azure when PMTUD is blocked.

Background: Why allow ICMP?

When Oracle Database for Azure is configured against a hub VNet, ICMP traffic isn’t enabled by default between the spoke VNets and Oracle Database for Azure because the default NSG for resources in any VNet contains the following entry:

A screenshot of the default NSG entry.

Figure 5: Azure default NSG entry containing VirtualNetwork service tag

The entry shown in figure 5 allows any traffic, including ICMP, to travel to and from everything contained in the VirtualNetwork service tag. According to the Azure documentation on Virtual network service tags, this service tag contains any peered networks, among other things. But it doesn’t inherit any peering of the primary peered networks. In other words, it doesn’t contain peers of peers.

In the Oracle Database for Azure architecture, the PMTUD packets are sent by the router in the Oracle Database for Azure service VNet back to the spoke VNets. The service VNet is peered with the hub VNet but not directly with the spoke VNets. So, in the spoke VNets, the VirtualNetwork service tag doesn’t include the Oracle Database for Azure service VNet from which the PMTUD packets are sent. As a result, the PMTUD packets are blocked, and they won’t arrive on the clients where the connection attempt started.

In OCI, security lists always allow ICMP type 3 code 4 messages by default, so no PMTUD traffic is dropped in OCI.

Experienced behaviour with incorrect setup

PMTUD packets not reaching their destination results in hangs when trying to connect from a resource in a spoke VNet to the Oracle Database for Azure resource. The following issues are both caused by a failure in the PMTUD negotiation:

  • All connections attempts fail. The cause is typically that all ICMP traffic from the Oracle Database for Azure router is blocked, usually on the Azure client’s NSG, but it can also be blocked by the firewall.
  • Most connections succeed, but occasional hangs occur. This error is most likely to happen when the firewall setup is affected by asymmetric routing, such as when the Azure Firewall service is used. In this scenario, only ICMP packets that are handled by an ‘incorrect’ firewall node are blocked. Because of PMTUD caching, a successful connection attempt results in a period without any new PMTUD negotiations, and no failures occur during this time. So, you only see a small fraction of connection attempts fail in this scenario.

Depending on the tools and protocols used, the connection failures take various forms. For example, when using SQL*Plus the connection times out, as illustrated in figure 6:

A screenshot of the output for a timed-out SQL*Plus connection.

Figure 6: Example of a timed-out SQL*Plus connection

As another example, a direct SSH connection to the database host also times out.

A screenshot of a timed-out SSH connection.

Figure 7: Example of timed-out SSH connection

However, we can confirm that connectivity does exist by successfully initiating a telnet session to port 1521 on the database host:

A screenshot of a successful telnet connection.

Figure 8: Example of a successful telnet connection

 

Conclusion

Many enterprise customers seek multicloud split stack solutions which Oracle Database Service for Azure can deliver. These enterprises often use a hub-and-spoke network topology in their Azure environment. Because of the private tunnel created by Oracle Database for Azure, and the default blocking of any ICMP traffic in Azure VNets which aren’t directly peered to each other, this setup can result in hanging connections.

In this situation, the solution is to allow ICMP traffic to flow freely between all resources and network components. We discuss various workarounds in Addressing MTU issues in Oracle Database for Azure when PMTUD is blocked.

For more information, see the following resources: