Three New Features Improve Availability of Tuxedo Based Applications- by Todd Little, Oracle Tuxedo Chief Architect
By R A Sanyal on Apr 05, 2013
Tuxedo 12cR1 introduced several new features to help improve the availability of Tuxedo applications. While Tuxedo is known for providing extremely high reliability, availability, scalability, and performance (RASP), there are always things Oracle can do to improve the availability of an application. This post will cover three new features that help improve the availability of Tuxedo based applications.
Highly available systems try to avoid single points of failure to ensure the survivability of an application even in the midst of a failure. Tuxedo has provided means to avoid single points of failures in virtually all scenarios except one, and that is when customers use data dependent routing or DDR. DDR allows an application to be partitioned based upon the values contained in a field of a request buffer. In the Tuxedo sample bankapp application, the ACCOUNT_ID field in a request message is used to determine which group of servers should handle the request. This is controlled by the *ROUTING section in the UBBCONFIG file. For each range of values, a server group can be specified to handle requests. The issue with regards to availability is that only a single server group can be specified in releases prior to Tuxedo 12cR1. While a server group can have multiple servers in it such that the failure of a single server won't cause a problem, a server group can only reside on a single machine in a cluster. Thus if the machine that the server group is on fails, there will be some period of time that the partition of the application associated with that group of servers is unavailable. Requests to the servers in that partition will fail until the machine is restarted or the server group migrated to another machine.
Improved *ROUTING Section
With Tuxedo 12cR1 the *ROUTING section can now specify up to three server groups that can be associated with a range of values. This now allows the application partition to span up to 3 machines allowing the partition to still be available even if two of the machines completely fail. Besides improving the availability of a partition, it also increases the scalability of a partition as now the resources of up to three machines can be utilized to process requests. This same improvement is included in the Tuxedo domain gateway as well. This allows the domain gateway to specify up to three remote domains that can be associated with a range of values in a field. When combined with multiple gateways, multiple domains, and multiple network links, applications can achieve unmatched levels of availability.
Automatic Migration of Machines and Server Groups
Another feature increasing availability of Tuxedo applications introduced in Tuxedo 12cR1 is the automatic migration of machines and server groups. Since very early on, Tuxedo has had mechanisms to allow a machine to be migrated from one host to another, or for a server group to be migrated from one machine to another. This provides a recovery mechanism in the case of a machine or server group failure. Prior to Tuxedo 12cR1 the migration process was a manual one that required either manual intervention or the creation of scripts that could perform some level of automated migration.
While the failure of a machine or server group by itself doesn't typically affect the availability of a properly configured application, it may leave the application with one or more single points of failure. This can be mitigated by ensuring there are always at least three copies of servers or server groups such that if one fails, redundancy is still maintained. Even though it's not possible to define more than one BACKUP machine for the MASTER machine, and there is only one MASTER machine at any point in time, the failure of the MASTER machine doesn't necessarily impact application availability. This is one misconception many Tuxedo customers have about MP or clustered operations with Tuxedo. They see the MASTER machine as a single point of failure, but in fact normal application processing goes on even if the MASTER machine fails. This is because the DBBL process which runs on the MASTER machine isn't involved in normal request routing. All that happens if the MASTER fails or for some other reason the DBBL can't be reached is that configuration changes can't occur until the DBBL becomes available.
What automatic migration does under most failure scenarios, is to automate the migration of a machine to its backup, or a server group to its backup machine. This eliminates the possibility of human error causing even more problems during a failure, and as well minimize the time to restore the system to normal operation or reducing the mean time to repair (MTTR). Reducing MTTR is one of the most effective ways of increasing overall system availability. Enabling these features is a simple matter of adding two new options to the *RESOURCES section of the UBBCONFIG file. For more details, see the Migrating Your Application [http://docs.oracle.com/cd/E35855_01/tuxedo/docs12c/ada/admigt.html] section of the Tuxedo 12cR1 documentation.
Finally the last availability related feature added in Tuxedo 12cR1 is service versioning. While that may not sound particularly related to high availability, what it allows is the concurrent deployment of multiple versions of an application. By being able to run multiple versions of an application simultaneously, customers can gradually introduce new versions of their application without having to shut down their application or impacting existing users in any way.
Service version requires no changes to the application code, although presumably there are changes, probably even incompatible changes, which is why Oracle introduced service versioning. The only required changes are in the UBBCONFIG file. The APPVER option needs to be set in the *RESOURCES section, and then the REQUEST_VERSION, VERSION_RANGE, and VERSION_POLICY options added to the *RESOURCES section or to any server groups that need versioning support. The REQUEST_VERSION indicates the version number requests will have. For native clients and servers it is either the value specified at the *RESOURCES section or then *GROUPS section, with the latter having precedence. Subsequent calls in the call path will have the request version associated with the server that made the request, unless the VERSION_POLICY is set to PROPAGATE which means the callers service version should be used. The VERSION_RANGE then indicates what request versions a server is able to process. When Tuxedo performs request routing, it will determine the request version number and then only select servers that support that version number. Thus when an incompatible change is made, you would associate a new request version with any updated callers of the service, and set the version range of servers appropriately to ensure that only updated servers handle the requests. This allows for the introduction of gradual changes and lets the application developer decide what versions of a service interface any given server supports.
These new features further enhance Tuxedo's capability to support highly available applications without requiring the customers to build those capabilities into their application code. The result is that customers can deploy applications that provide 99.999% or better availability, while being able to scale those applications to 100s of thousands of services executed per second.
Was this information helpful? Please share your comments and let us know if there are any Oracle Tuxedo topics you would like us to discuss.
Follow Cloud Application Foundation (CAF):