This blog is intended for Oracle WebLogic administrators familiar with high-availability concepts and the Enterprise Deployment Guides (EDG). It applies to any Oracle WebLogic domain with High Availability requirements.

 

The WebLogic Administration Server is a singleton component of the Oracle WebLogic domains. Only one WebLogic Administration Server can run in a domain. Other WebLogic servers (the managed servers) can be part of a cluster to ensure high availability, but the Administration Server can’t be clustered.

You may, however, configure the Administration Server with active-passive high availability topology within the domain. This allows you to perform manual failovers of the Administration Server to recover from failures that affect the host where the Administration server is running. The Enterprise Deployment Guides(1), also known as EDGs, are MAA documents that provide high availability best practices for the Fusion Middleware products. Among other best practices, these guides provide details on how to configure the high availability protection for the Administration Server, using shared storage and a floating IP.  This blog post presents a simplification of the Administration Server failover that doesn’t require a floating IP, analyzes the advantages and disadvantages of this approach, and provides accurate recommendations for its use.

WebLogic Administration Server failover (EDG traditional approach)

To support the manual failover of the Administration Server, the typical enterprise deployment topology has the following requirements:

  • Virtual hostname (VHN) and a Virtual IP Address (VIP) for the Administration Server. The Administration Server listens in a virtual hostname (for example, adminvip.example.com) that is resolved to a floating IP (for example, 100.100.100.100). This floating IP is not the physical IP of the host that runs the Administration Server; it is a Virtual IP that can be attached to the network interface of any of the hosts in the WebLogic domain.
  • shared storage device for the Administration Server domain directory. The domain folder used by the Administration Server is a shared folder that can be accessed by any of the hosts in the domain.

In the event of a system failure in the host that runs the Administration Server, you can manually reassign the Administration Server VIP address to another host in the domain, mount the Administration Server domain directory on the new host, and then start the Administration Server on the new host. 

WebLogic Administration Server Failover (EDG traditional apporach)
Figure 1. WebLogic Administration Server Failover (Enterprise Deployment Guide approach)

 

WebLogic Administration Server failover without a floating IP

It is possible to provide high-availability protection to the Administration Server without requiring a floating IP. It is a simplification of the traditional approach. Despite the Administration server still listening in a virtual hostname, this hostname resolves to the physical IP of the host that runs the Administration Server. In case of a failover, you update the virtual hostname record in the DNS to point to the physical IP of the new host that is running the Administration Server. The virtual hostname doesn’t change, but the IP changes depending on the node where the Administration Server runs.

For example, the Administration Server listens in the virtual hostname “adminvip.example.com”. When it runs in the host1, this address is resolved to the IP of the host 1 (for example, 100.100.100.1). If the Administration Server is moved to the host2, then the virtual hostname should point to the IP of the host2 (for example, 100.100.100.2).

In this approach, the listen ports used by the Administration Server must not be used by any other managed server. Otherwise, they will conflict because they listen on the same IP. This is not a requirement in the EDG traditional approach because the Administration server always listens on an IP different from the managed servers, so even if they are using the same ports, they do not conflict.

To support the manual failover of the Administration Server without a floating IP, these are the requirements:

  • Virtual hostname (VHN), but it doesn’t require a VIP. The Administration server listens in a virtual hostname (for example, adminvip.example.com) that is resolved to the IP of the host running the server.
  • shared storage device for the Administration Server domain directory. Hence, the domain folder used by the Administration is a shared folder that can be mounted by any of the hosts in the domain.
WebLogic Administration Server failover without floating IP
Figure 2. WebLogic Administration Server Failover without floating IP

Comparing the failover steps

The Administration Server failover procedure, based on a floating IP, is described in the FMW Enterprise Deployment Guides (2). This table compares the failover steps required for the method based on a floating IP with the method without a floating IP:

  WebLogic Administration Server Failover steps (from HOST1 to HOST2)
  Failover with a floating IP  Failover Without a floating IP
1  Stop the Administration Server on HOST1.  (same step)
2  Stop the Node Manager on HOST1.  (same step)
3  Migrate the VIP address to the other host  (“ip addr del..” and “ip addr add..” commands).   Update the record of the Administration Server listen address to map to the IP of the other host.  This change must be done in DNS or /etc/hosts, depending on what is used to resolve this name.
4  Update the routing tables by using arping, for example.  n/a
5  Edit the nodemanager.domains in HOST1 to remove ASERVER_HOME.  (same step)
6  Edit the nodemanager.domains in HOST2 to add ASERVER_HOME.  (same step)
7  Start the Node Manager on HOST1 and restart the Node Manager on HOST2.  (same step)
8  n/a  Check that the change in the DNS is effective before starting the Administration Server.
9  Start the Administration Server on HOST2.  (same step)
10  Test that you can access the Administration Server on HOST2.  (same step)

 

Advantages and Disadvantages of not using a Floating IP for Administration Server Failover

The following is a summary of the advantages and disadvantages of this Administration Failover method compared to the traditional approach.

Advantages

  • You don’t need to reserve and manage an additional IP (the VIP) for the Administration server. This simplifies the system’s network requirements and allows you to delegate networking tasks to the network layer only (DNS). 

Disadvantages

  • It can introduce complexity into the failover procedure: it requires a change in the DNS, which is normally performed by an IT team different from the WebLogic administrators team. However, the real impact depends on each customer’s internal organization. 
  • It can introduce delays in the failover procedure. A change in the DNS takes some time to be effective, depending on the DNS TTL value. If the TTL is not appropriately set, some components may require a restart to refresh their DNS cache and to connect to the new IP.
  • Components that connect to the Administration Server’s IP directly instead of connecting to the virtual hostname (like a load balancer or SSH tunnels) require appropriate configuration or updates when the Administration Server fails over.

Conclusion and Recommendations

Administration Server failover without a floating IP is a valid failover method for the Administration Server. However, this method has some implications that don’t apply to the traditional floating IP approach. If you plan to use it, follow these recommendations:

  • Use this approach only if the changes in the DNS records are flexible in your organization.
  • Use a low TTL value (for example, 120 s) in the DNS entry of the virtual hostname used by the Administration Server so the change is consumed quickly by the hosts. If the Administration Server failover is a planned activity, you can reduce the TTL only sometime before making the DNS entry update. Once it has been updated and consumed, revert its TTL to a higher value to minimize the low TTL impact on DNS.
  • Use low TTL values too in the DNS caches of the components that connect to the Administration Server:
    • Java DNS cache: in the WebLogic servers (or any Java process), customize the networkaddress.cache.ttl property. This property is in $JAVA_HOME/conf/security/java.security file. By default, it caches forever, so a restart is needed to refresh the DNS resolution cache unless you set a value.
    • OHS DNS cache: if the system uses OHS to connect to the Administration Server, you should customize the directive WLDNSRefreshInterval in mod_wl_ohs.conf(3). It is 0 by default, which means cache forever: a restart of the OHS is needed to refresh the DNS resolution cache. Configure it to a low value different than 0 so it gets refreshed periodically and redirects requests to the new IP when the DNS entry changes. 
      Note that there is a trade-off: reducing the TTL helps to consume the DNS updates more quickly, but it also may overload the DNS service with more frequent requests. Depending on the number of clients, this may cause disruptions to the DNS servers.
  • Load balancers normally use IPs to connect to the backends. If a load balancer routes directly to the Administration Server IP (without an OHS in the middle), add all the possible IPs in the backend that point to the Administration Server. Only the IP where the Administration server is listening will be effective at any moment. Otherwise, you must update the backend definition in case of an Administration Server failover.
  • Don’t use a specific IP when defining the security network rules related to the Administration Server. Use a CIDR range that includes all the possible IPs that the Administration Server can use. Otherwise, the rules may not apply after a failover, and you will need to update them with the new IP. 
  • If you use SSH tunnels to connect to the Administration Server, use the appropriate IP at each moment and update in case of a failover.

References

(1) Fusion Middleware Enterprise Deployment Guides:
SOA Enterprise Deployment Guide 12.2.1.4
SOA Enterprise Deployment Guide 14.1.2

(2) Administration Server Failover procedures in the Enterprise Deployment Guides:
SOA EDG 12.2.1.4 “Verifying Manual Failover of the Administration Server”  
SOA EDG 14.1.2    “Verifying Manual Failover of the Administration Server”  

(3) Oracle HTTP Server – Using Oracle WebLogic Server Proxy Plug-Ins – WLDNSRefreshInterval