Thursday Nov 05, 2015

Announcing the Tuxedo Advanced Performance Pack

Well now that the busyness of OpenWorld is behind us, it's time to spice things up here on the Tuxedo blog.  The Tuxedo team is pleased to announce the availability of the Tuxedo Advanced Performance Pack.  This add-on option for Tuxedo improves application performance, availability, and manageability.  I'll briefly describe the features and benefits of this new option, but for more details please see the newly published white paper.

The Tuxedo Advanced Performance Pack has three major categories of enhancements:

  • Core enhancements that all Tuxedo applications can leverage
  • Oracle Database related enhancements for those applications using Oracle Database
  • Oracle RAC using XA distributed transaction enhancements for customers using Tuxedo and RAC with XA transactions

Together these enhancements can improve application performance by a factor of 2x to 3x, without any changes to application code.  In fact, all of the features provided by the Tuxedo Advanced Performance Pack are enabled simply by making a small change to the Tuxedo configuration file. 

The core enhancements include:

  • Self-tuning Lock Mechanism to automatically and dynamically set the value of SPINCOUNT to optimize the access to Tuxedo semaphores such as the Bulletin Board lock. 
  • Shared Memory Interprocess Communication - Instead of using System V IPC message queues to communicate between processes, the Tuxedo Advanced Performance Pack provides a facility to use shared memory based messaging.  The result is a reduction of buffer copies normally needed with IPC queues to zero copy messaging, all in user mode.  For applications using large buffers, the performance improvement under heavy load should be dramatic.
  • Tightly Coupled Transaction Branches Spanning Domains - This feature allows Tuxedo to use the same GTRID for distributed transactions that cross domain boundaries, such as one Tuxedo domain calling another Tuxedo domain, or WTC calling a Tuxedo domain.  This allows the sharing of locks and updates to resource managers preventing things like deadlocks that might have otherwise occurred.  As well when used with RAC, if the participating domains were connected to the same RAC database, then RAC will respond with XA_RDONLY for all branches but the last, speeding up transaction commits.
  • Concurrent Global Transaction Table Lock - This feature moves the synchronization of access to the Tuxedo global transaction table (GTT) out from under the Bulletin Board semaphore, to its own semaphore.  As well each transaction in the table has it's own semaphore to allow concurrent updates to the entries.  This eliminates and lock contention for applications making heavy use of XA transactions.

These next set of features benefit Tuxedo applications that utilize Oracle Database:

  • Instance Awareness - Although technically a feature of the Tuxedo Advanced Performance Pack, it's real benefit is in what other features it allows.  Essentially this feature lets Tuxedo know the details of Oracle Database connections.  In other words, what database, service, and instance any given Tuxedo server is using.
  • FAN Support - The Tuxedo Advanced Performance Pack introduces a new Tuxedo server, TMFAN, that connects to Oracle Database ONS to receive FAN notifications.  With this information, Tuxedo can make intelligent decisions based upon the configuration and state of a RAC database.  This enables the following features:
    • Planned Maintenance - When TMFAN receives a notification of planned maintenance for a RAC instance, TMFAN directs the Tuxedo servers connected to that instance to switch their connections to another instance to provide uninterrupted availability.
    • Unplanned Outages - The TMFAN server will get an ONS notification that an instance has failed, and as a result will remove Tuxedo servers connected to that instance from service routing selection.  The affected Tuxedo servers will be told to switch their connection to another instance and resume processing.  Thus eliminating in most scenarios any outage by client applications.
    • RAC Load Balacing - FAN periodically notifies RAC clients about how their load should be distributed across RAC instances.  When TMFAN receives a load balancing advisory, it attempts to ensure that the Tuxedo load for that database adheres to the advisory by changing the routing of requests, and changing the number of connections to each instance.
    • End-to-End Application Tracing - One of the more difficult jobs for a database administrator is identifying the source of problems.  With this feature, Tuxedo will automatically tag Oracle Database sessions with the name of the Tuxedo server, service, and client information.  Thus a dba can quickly locate Tuxedo applications, servers, services, or clients that are causing performance problems and work with the development team to remediate those problems.

Finally these features help applications that use Oracle RAC with Tuxedo distributed transactions:

  • Common XID - The tightly coupled transaction branch feature allows a GTRID to be shared across domains.  This feature extends that to allow the use of a single XID across Tuxedo servers and domains when requests end up using the same RAC instance.  The result is that in many cases instead of having a number of separate but related GTRIDS, and potentially a number of different transaction branches for the same RAC database, all requests if possible use the same instance and the same XID.  Ultimately this could result in a single one phase commit instead of a protracted cross domain two phase commit.
  • Partial One Phase Read-Only Optimization for RAC - Distributed transactions that span RAC instances have separate transaction branches for each instance involved in an XA transaction.  When a transaction manager prepares these various branches, Oracle Database does an optimization where it reports XA_READONLY for all branches but that last.  This feature leverages that capability by preparing all transaction branches but one.  If all other branches report read-only, then the final branch can be committed with a one-phase commit, bypassing the prepare step and the transaction log write.  This can substantially improve the performance of applications that primarily use RAC.
  • XA Transaction Affinity - This feature attempts to minimize the number of RAC instances involved in a distributed transaction.  During the routing of a request, Tuxedo will give preference to servers that are connected to a RAC instance that is already enlisted in the transaction.  The result is a smaller commit tree and the potential for a one phase commit if RAC is the only resource manager involved in the transaction.  As well, minimizing the number of RAC instances involved in the transaction minimize the amount of cross instance traffic within RAC.
  • Single Group Multiple Branches - Normally Tuxedo associates a single transaction branch with a Tuxedo server group.  This presents difficulties if the server group is using a RAC based database service as RAC does not allow a single transaction branch to span RAC instances.  This feature allows a Tuxedo server group to use multiple transaction branches if necessary.  The result is that Tuxedo servers using XA transactions with RAC can now use database services that are offered on multiple instances.  If the particular Tuxedo server a request is routed to is associated with a new instance for the transaction, the Tuxedo Advaned Performance Pack will create a new transaction branch for the newly enlisted instance.

All of these features add up to huge improvements in application performance, application availability, and simplified management.  Tuxedo servers using XA can now fully utilize the features of Oracle RAC including non-singleton services to improve performance and availability.  The integration of FAN into Tuxedo means that Tuxedo can dynamically respond to changes in RAC load and configuration, without any operator intervention.   Together they provide an unbeatable combination of performance, availability, and manageability.

Tuesday Sep 08, 2015

User Group Meeting Invitation

North American Tuxedo User Group Meeting 

The Oracle Tuxedo product management team would like to invite you to this year's North American Tuxedo User Group Meeting being held in Chicago on Thursday October 8, 2015.  Your participation in this one day event will help shape the future of the Oracle Tuxedo product family in the direction our most important users, YOU, want it to go.  Not only that, we are confident that it will help you network with other Oracle Tuxedo users and learn from each other. 

The Oracle Tuxedo product family has had several major releases over the last few years, each release rich in new features and functionality. Tuxedo continues to be actively developed with Tuxedo 12.2.2 version planned for 2016.  We would like to provide you with product updates, share our roadmap, and enable you to share in the Tuxedo user community’s experience with the products.

Agenda Topics:

  • Introductions
  • Tuxedo Update from PM Team (recent releases, investments)
  • Customer Use Cases (hear from other customers)
  • Mainframe Modernization using Tuxedo/ART
  • Tuxedo Extreme Performance Pack (new Tuxedo option)
  • Tuxedo in Cloud (direction, use cases)
  • Product Demos (focused on recent updates)
  • Open Discussion (Q&A, requirements) 

Additional Information

Address: Two Pierce Place, 17th Floor, Itasca, IL 60143
Time: 10am – 4pm, lunch will be provided

RSVP: Please register by September 18th by sending an email to and include the following information: Full name, Company represented, Email address, and Mobile phone.  Registration and attendance are free of charge. 

Networking Dinner following the User Group meeting. Details to be provided prior to the event.

Monday Jul 13, 2015

Oracle Fusion Middleware Innovation Awards

If you have cool application running on Tuxedo and think others would like to hear about it, submit a nomination for an Oracle Fusion Middleware Innovation Award.  These awards are given each year to customers that use FMW products in unique or interesting ways to increase business value.  If you have a Tuxedo based application that you believe is cutting edge, provides unique business value, or extreme performance, please submit a nomination.  Winners receive a free pass to Oracle Open World 2015, an Oracle Fusion Middleware Innovation Award trophy, priority consideration for placement in Profit magazine, Oracle Magazine, or other Oracle publications & press releases. and Oracle Fusion Middleware Innovation logo for inclusion on your own Website and/or press release.  All nominations must be received by 5:00 p.m. (PT), July 31, 2015.  If you need help in preparing your nomination, let me know.

Monday Jul 06, 2015

Tuxedo Sample Applications

Wow, it's been way too long since my last post.  I'll try to post more often.

I wanted to point out a little side project I've been working on that I thought maybe others might be interested in.  It's essentially a github project to create samples and scripts to help get people started with Tuxedo.  None of it meant for prime time, and it's all open source, i.e., meaning other contributors are welcome.  So far I have a few things working well.

Tuxedo and Docker

One of the subprojects within the github repository is a set of files to help using Tuxedo in Docker.  If you're not familiar with Docker, I highly recommend checking out their website.  It basically is a set of tooling around Linux Containers that makes it easy to create a container on your machine.  There is a Dockerfile and associated scripts  that allows one to download the Tuxedo installer and rolling patch kit, and create a Docker image with Tuxedo installed, patched, and ready to go.  The installation and patching of Tuxedo is all automated.  When you're done, you'll have a working Docker image.  To test it, there is a fully automated runme script for the Tuxedo simpapp application in the installers directory of the project.  Copy that into your Docker image and execute it.  It should copy, build, and run the simpapp application.  Check out the for more information.

Tuxedo and Vagrant

Vagrant is another approach to building test and development environments.  Instead of focusing on Linux Containers, Vagrant focuses on creating Virtual Machines, specifically VirtualBox virtual machines.  Although other runtime environments (providers) are supported by Vagrant, VirtualBox is supported out of the box, so to speak.  Inside the Vagrant subproject, there are files and scripts that will create a VirtualBox VM that has Tuxedo installed, patched, and ready to go.  Checkout the for more information.

Tuxedo and RAC

For those wanting to play with Tuxedo and RAC, there is another subproject called RAC that you can use to create multiple VMs that support RAC and Tuxedo.  This project builds upon another github project that helps you create a RAC cluster from 1 to N nodes and as well creates application VMs that can be used to run other things such as Tuxedo.  This project contains the files and instructions to create a RAC cluster and multiple Tuxedo machines that have the Oracle Database Instant Client installed.  Like the other projects, the actual installers for the Oracle projects must be first downloaded from OTN.  Once that is done and a simple edit made to the Vagrantfile to define the number of each type of machine and their memory configurations, two commands get you a working set of interconnected VirtualBox VMs that are running RAC and Tuxedo.  A very quick way to get started with Tuxedo and RAC.   Again, please see the file for more information.

As these are open source projects, please feel free to contribute.  The easiest way to do that is create a github account and request to join the TuxedoUsers github organization.

Friday Oct 10, 2014

Tomcat and Tuxedo - Perfect together

Tuxedo provides numerous integration options to work with existing applications and other technologies.  For example the most recent release of SALT provides RESTful Web services for your Tuxedo services without writing or changing a line of code.  At Oracle Open World last week we demonstrated how you can leverage the Spring Framework and Tuxedo transaction management capabilities inside the Tuxedo Java server.  In this post I'll show how one can run Tomcat inside the Tuxedo Java Server and utilize Tuxedo services directly from Tomcat.  Although the example is trivial, showing calling the Tuxedo TOUPPER service from the simpapp sample application, it illustrates the minimal amount of effort necessary to run Tomcat inside the Tuxedo Java server and call a Tuxedo service.

To use the Tuxedo Java server, the server needs to be configured and added to the Tuxedo UBBCONFIG file.  A new server entry needs to be added similar to:

         CLOPT="-- -c TJSconfig.xml"

For this example, the minimum and maximum dispatch threads could be set to 1 as we're not hosting any Tuxedo services in this instance of the Java Server.  These lines could be added directly to the UBBCONFIG file for the simpapp sample application.

Next a configuration file is needed for the Tuxedo Java server to set things like classes to load, classpaths, and the like.  Here is a copy of TJSconfig.xml as used by the above server definition:

<?xml version="1.0" encoding="UTF-8"?>
<TJSconfig version="2.0">
        <server-class name="MyTuxedoJavaServer"></server-class>

This adds the Tomcat jar files to the classpath and adds a class called MyTuxedoJavaServer to the configuration.

Let's look at the source for MyTuxedoJavaServer and see how this class allows us to start up Tomcat.

public class MyTuxedoJavaServer extends TuxedoJavaServer {
    private TomCatThread tc = null;
    public MyTuxedoJavaServer() {
        // Create the thread that will be used to bootstrap Tomcat
        tc = new TomCatThread();
    public int tpsvrinit() throws TuxException {
        // Startup Tomcat
        return 0;
    public void tpsvrdone() {
        // Shutdown Tomcat

The constructor for the class creates an instance of the thread that will be used to bootstrap Tomcat. The only other two methods are the standard startup and shutdown methods called by the Tuxedo Java server at server startup and shutdown.  These methods simply start and shutdown the Tomcat thread.

Next let's look at the TomCatThread class that is used to bootstrap Tomcat.  It looks like: 

import org.apache.catalina.startup.*;
public class TomCatThread extends Thread {
    private Bootstrap bs = null;
    public void run() {
        try {
            bs = new Bootstrap();
            System.out.println("After starting Tomcat Thread.");
        } catch (Exception e) {
            System.out.println("Received exception" + e);
    public void shutdown() {
        try {
        } catch (Exception e) {
            System.out.println("Received exception while shutting down Tomcat thread" + e);

This class simply extends the Thread class and in its run method creates an instance of the Tomcat bootstrap, sets the home directory for Tomcat, and starts Tomcat running.  The shutdown method just stops the Tomcat embedded server.

Next we need a helper class to help keep track of the Tuxedo context that will be associated with each Tomcat thread.  To make a Tuxedo service call, or most other Tuxedo calls, a thread needs to be associated with an application context in Tuxedo.  This is done using the tpappthrinit() method that creates a new Tuxedo context and associates it with the thread.  We'll use thread local storage to store the created context to be used whenever the thread needs to make a Tuxedo call.  I've used a class called TuxedoThreads to maintain this association:

package todd;
public class TuxedoThreads {
    public static final ThreadLocal<TuxAppContext> userThreadLocal = new ThreadLocal<TuxAppContext>();
    public static void set(TuxAppContext tc) {
    public static TuxAppContext get() throws TuxATMITPException {
        TuxAppContext tc = userThreadLocal.get();
        if (null == tc) {
            // No Tuxedo context gotten for this thread yet, so get one
            tc = TuxAppContextUtil.getTuxAppContext();
        return userThreadLocal.get();

To access the thread's Tuxedo context, we'll simply call the get() class method and if the thread hasn't already created a Tuxedo application context, it will create one and store it in thread local storage.  Otherwise it just returns the previously created Tuxedo context.  One thing to keep in mind is that each thread Tomcat uses will eventually end up with a Tuxedo context.  This means that you need to allow for this in the MAXACCESSORS setting in the UBBCONFIG file.

All that left now is to either use the TuxedoThreads class in a JSP or servlet file or use it in another Java class that they call.  I'll opt for the later as it makes the JSP easier to read.  So here is the MyTest Java class that makes the Tuxedo service call to the TOUPPER service:

package todd;
public class MyTest {
    public String toUpper(String inStr) throws TuxATMITPReplyException,
            TuxATMITPException {
        // Get the threads Tuxedo context
        TuxAppContext tc = TuxedoThreads.get();
        // Create the Tuxedo request buffer and call the Tuxedo service
        TuxATMIReply rply = null;
        TypedString req = new TypedString(inStr);
        TypedString rplyData = null;
        rply = tc.tpcall("TOUPPER", req, 0);
        rplyData = (TypedString) rply.getReplyBuffer();
        return rplyData.toString();

The class has a single method that takes a Java String and it creates a Tuxedo TypedString buffer and calls the Tuxedo TOUPPER service.  It then gets the returned buffer and returns that to the caller.  Here is a trivial JSP using this class to uppercase a string: 

<%@ page import="todd.*"%>
<%@ page language="java" contentType="text/html; charset=UTF-8"
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Test of Tuxedo service call</title>
    <% MyTest foo = new MyTest(); %>
    Hello World uppercased is <%= foo.toUpper("Hello World") %>

The JSP simply creates an instance of the MyTest class and uses the toUpper method to get the upper case version of Hello World.

Wednesday Oct 01, 2014

Why is old bad?

I keep coming across situations where someone tells me that Tuxedo is old like that is a bad thing.  I'm old, well, maybe older, and it seems fine to me.  I like to think I'm like fine wines and single malt scotches that get better with age.  So why is old necessarily a bad thing?

I believe it comes from the notion that old technology has been replaced by something newer.  In a sense this is certainly true.  I don't see a lot of 8 track players these days because they were replaced by cassette tapes, which were replaced by CDs, which were replaced by digital audio players.  Yet there are purists who content vinyl records still offer the best sound quality, although my ears certainly can't confirm that.   

Most people would argue that Unix is modern technology yet Unix was first developed in 1969 making it one of the oldest technologies currently in use.  So why is technology that is 45 years old considered modern?   I believe it is because Unix has kept up with the times.  It has adopted to changes in hardware, operating system design, programming styles and the like.  Tuxedo was first developed in 1982 and likewise has kept up with changes in hardware, application server design, programming languages, and communication techniques.  So where is the difference?

For example, in the 1990s messaging, CORBA, and multi-threaded programming were all the rage.  Tuxedo adopted messaging, CORBA, and multi-threading.  In the 2000s SOAP based web services, the Service Component Architecture, and dynamic programming languages such as Python, Ruby, and PHP became popular and Tuxedo adapted to support those.  More recently Oracle introduced Engineered Systems where the hardware and software was designed to work together to optimize performance.  Tuxedo has been updated to leverage these engineered systems to provide enhancements that can dramatically improve application performance.  Recent releases of Tuxedo now offer support for developing applications in Java, support for RESTful based web services, and tight integration with Oracle Real Application Clusters.

The above may make it sound like Tuxedo lagged behind the technology curve, yet nothing could be farther from the truth.  The Open Group's standard for distributed transaction support known as XA primarily came from Tuxedo.  Service Oriented Architecture or SOA has also been the rage over the last decade, yet Tuxedo has been SOA based since its inception.  Everything in Tuxedo is a service.  This has allowed Tuxedo and Tuxedo applications to adopt newer technologies often with no application changes required.  A service written 20 years ago in C can transparently be used as a SOAP or REST web service with absolutely no changes to the service implementation.  That same service can also be called from Python and appears to the Python developer as just another method.  Want to re-implement that C service in Java without impacting any existing usage of that service?  No problem.

Another word for old that has mixed connotations is mature.  Like old, it can be a bad thing if it has remained unchanged.  However in the world of enterprise software maturity is usually a good thing as it brings with it stability.  Who puts a version 1.0 product into production?  I believe there are few products on the market that have as low a defect rate as Tuxedo has.  This is largely because the code is tested extensively by Oracle and used by the thousands of customers over the last 32 years.  As one customer stated in an Oracle Open World presentation a few years ago that in 15 years of using Tuxedo, they never had a production outage.  That's stability.

So I'm left wondering why is old bad?

Friday Sep 19, 2014

Tuxedo at Oracle Open World 2014

The Tuxedo team will be present in force at Oracle Open World 2014.  We have 7 sessions directly related to Tuxedo presented by engineers, customers,  architects, and product management.  Please add them to your schedule builder so you don't miss them.  They are:

Monday 2:45-3:30PM Moscone South 304
Oracle Tuxedo 12.1.3: What’s New and Why You Should Care [CON8261]

Come hear Frank Xiong - VP for the Tuxedo product family, describe what we've been working on.  He'll be joined by Deepak Goel - Tuxedo Product Manager, and Anastasio Garcia - Middleware, SOA & Delivery Manager from Telefónica España.

Monday 4:15-5:15PM Hotel Nikko - Nikko Ballroom III
Simplify and Optimize Oracle Tuxedo Application Deployments and Management [HOL9448]

Jared Li - Tuxedo development manager, and Chris Guo - Principle Member of Technical Staff will conduct a hands-on lab using Oracle TSAM Plus, Oracle Enterprise Manager, and Oracle Business Transaction Management to monitor, manage, and administer Tuxedo applications. 

Monday 5:45-6:45PM Hotel Nikko - Nikko Ballroom III
Use Java, the Spring Framework, and Oracle Tuxedo to Extend Existing C/C++/COBOL Apps [HOL9447]

A cast of thousands (Maggie Li, Chris Guo, Maurice Gamanho, and myself) will conduct a hands on lab demonstrating how easy it is to use Java and Spring to develop and extend Tuxedo applications. 

Tuesday 10:45-11:30AM Moscone South 200
Oracle Tuxedo Makes It Easy to Develop Composite Apps with Java, C, C++, and COBOL [CON8228]

Jared Li and Deepak Goel will present ways customers can develop composite applications that leverage the performance of C/C++/COBOL while still allowing the rapid creation of business logic in Java, all in a single environment.

Tuesday 7:00-7:45PM Moscone South 301
Oracle Tuxedo Monitoring and Management: Birds-of-a-Feather Meeting [BOF9641]

Come to a Birds of a Feather sessions where Deepak Goel, myself, and Mark Rakhmilevich will present some information on monitoring and managing best practices as well as an open discussion about how to best manage, monitor, and administer Tuxedo applications.

Wednesday 3:30-4:15PM Moscone South 200
The ART and Practice of Mainframe Migration and Modernization [CON8229]

Mark Rakhmilevich - Sr. Director Product Management/Strategy, Rui Pereira - Principal Sales Consultant, and Jeffrey Dolberg - Senior Principal Product Manager, will describe how customers are migrating their mainframe CICS, IMS and batch applications from costly IBM mainframe environments to open systems with Tuxedo and Tuxedo ART.

Thursday 9:30-10:15AM Marriott Marquis - Salon 14/15
Management and Monitoring of Oracle Tuxedo: Integrated, Automated [CON8273]

I will be presenting on the new features in TSAM Plus 12c that can be used to efficiently manage, monitor, and administer Tuxedo applications.  I will cover the recent integration of the Tuxedo observer for BTM, and how to decide on the right tool and strategy to address common performance and availability issues. 

And as always please stop by the Tuxedo booth in the Oracle DEMOgrounds in Moscone South.  We love to see customers and answer their questions.  This is your chance to meet product developers, product managers, engineering managers, and architects all focused on Tuxedo!

Hope to see you there!!

Todd Little
Oracle Tuxedo Chief Architect 

Saturday Mar 15, 2014

High Availability Part 6

This concludes for now my posts on improving the MTBF, just one of the two major factors in improving the availability of a system.  In my next post I’ll start to cover ways to decrease the MTTR, the other factor in determining availability.[Read More]

Thursday Jan 23, 2014

High Availability Part 5

Tuxedo provides an extremely high availability platform for deploying the world's most critical applications.  This post covers some of the way Tuxedo supports redundancy to improve overall application availability.  Upcoming posts will explain additional ways that Tuxedo ensures high availability as well as way to improve application availability of a Tuxedo application.[Read More]

Wednesday Jan 15, 2014

High Availability Part 4

To Err is Human; To Survive is High Availability

In this post I’d like to look at the various causes of unavailability or outages. The most obvious although often overlooked is that of scheduled system maintenance. Now whether that is included in your measurement of availability depends upon the stack holders for a system or application. The ideal systems have no scheduled maintenance that causes the system to be unavailable. That isn’t to say they don’t receive maintenance, but that the maintenance doesn’t cause the system to be unavailable. This can be done via rolling upgrades, site switchovers, etc. For now it suffices to say that this type of down time is intentional, known, and typically scheduled.

The interesting part comes in looking at other causes of unavailability, in particular those caused by failures. The most commonly thought of failure is that of a hardware failure such as a disk drive failing, or a server failing. These failures tend to be obvious and easily remedied. Most people then guess that software failures make up the next significant portion of failures. But as is all too often the case, the most common failures in highly available systems are those caused by people. Estimates place hardware failures at around 10% of the causes of an outage. This low percentage is largely due to the ever improving MTBF of hardware. Software is estimated to cause about 20% of outages for highly available systems. The remaining 70% of outages are attributable to human action, and increasingly these actions are intentional, i.e., purposeful interruptions of service for malicious intent such as denial of service attacks.

To give an example a study was done on replacing a failed hard drive in a software RAID configuration. A seemingly simple task, yet a surprising number of cases of replacing the wrong drive occurred in the first few times an engineer was asked to repair the systems. This indicates that putting procedures in place to repair a system isn’t adequate, but that actually performing the procedures several times is needed to eliminate human error. But more importantly it points out the need to eliminate human intervention as much as possible as any human intervention either for normal operation or for remediating a failure has a significant possibility of being done incorrectly. That incorrect intervention could be relatively catastrophic as in replacing the wrong drive in the above study caused a complete loss of data in some instances.

So what is the takeaway from this information? Minimize or eliminate human intervention as much as possible in order to minimize outages attributable to human error. Typically this means automating as much as possible any necessary steps to resume normal operation after a failure or even during normal operation. Every manual step taken by an administrator has some probability of causing an outage. It also suggests that repair procedures be well tested, preferably in a test environment that duplicates the production environment.

More on how Tuxedo can help solve these problems in my next entry.

Saturday Jan 11, 2014

High Availability Part 3

In my previous posts on High Availability I looked at the definition of availability and ways to increase the availability of a system using redundant components.  In this post I'll look at another way to increase the availability of a system.  Let’s go back to the calculation of availability:

Availability Formula

Based upon this formula, we can see that if we can decrease the MTTR, we can increase the overall availability of the system. For a computer system, let’s look at what makes up the time to repair the system. It includes some time that may not be obvious, but in fact is extremely important. The timeline for a typical computer system failure might look light:

  1. Normal operation
  2. Failure or outage occurs
  3. Failure or outage detected
  4. Action taken to remediate the failure or outage
  5. System placed back into normal operation
  6. Normal operation

Most people only consider item (4) above, the time taken to remediate the outage. That might be something like replacing a failed hard drive or network controller. It could even be as simple as reconnecting an accidentally disconnected network cable, a 30 second repair. But the MTTR isn't 30 seconds. It’s the time included in (3), (4), and (5) above. For the network cable example, the amount of time taken in (3) will depend upon network timers at multiple levels and could be many minutes if just relying on the operating system network stack. The time taken for (4) may be as low as the 30 seconds needed to reconnect the cable although finding the cable might take a bit longer than 30 seconds. The time for (5) again depends upon the service resumption steps such as re-establishing a DHCP address, reconnection of applications or servers, etc. So on the surface the MTTR may be assumed to be 30 seconds, the actual time could be many minutes, especially in the extreme case where systems, servers, applications, etc., need to be restarted or rebooted manually to recover.

So how does this impact system design for highly available systems? It indicates that whatever can be done to decrease items (3), (4), and (5) above, will improve overall system availability. The more of these steps that can be automated, the lower the MTTR one can achieve, and the higher the availability of the system. Too often the detection phase (3) is left up to someone calling a help desk to say they can’t access or use the system. As well items (4) and (5) often require manual intervention or steps. When one wants to achieve 99.99% availability, manual repairs or remediation is going to make that very difficult to achieve.

More on the causes of failures in my next post.

Monday Jan 06, 2014

High Availability Part 2

To compute the availability of a system, you need to examine the availability of the components that make up the system.  To combine the availability of the components, you need to determine if the components failure prevents the system from being usable, or if the system can still be available regardless of the failure. Now that sounds strange until you consider redundancy. In a non-redundant subsystem, if it fails, the system is unavailable. So in a completely non-redundant system, the availability of the system is simply the product of each component’s availability:

A very simplified view of this might be:

     Client => LAN => Server => Disk

If we take the client out of the picture as it really isn't part of the system, we at least have a network, a server, and a disk drive to be available in order for the system to be available. Let’s say each has an availability of 99.9%, then the system availability would be:

or 99.7% available. That’s roughly equivalent to a day’s worth of outage a year. So although each subsystem is only unavailable about 9 hours a year, the 3 combined ends up being unavailable for over a day. As the number of required subsystems or components grows the availability of the overall system decreases. To alleviate this, one can use redundancy to help mask failures. With redundant components, the availability is determined by the formula:

Let’s look at just the server component. If instead of a single server with 99.9% availability , we have two servers each with 99.9% availability, but only one of them is needed to actually have the system be available, then the availability of the server component of the system increases from 99.9% to 99.999% or 5 nines of availability just by adding an additional server. As you can see, redundancy can dramatically increase the availability of a system. If we have redundant LAN and disk subsystems in the example above, instead of 99.7% availability, we get 99.997% availability or about 16 minutes of down time a year instead of over a day of down time.

OK, so what does all of this have to do with creating highly available systems? Everything! What it tells us is that all things being equal, simpler systems have higher availability. In other words, the fewer required components you have the more available your system will be. And it tells us that to improve availability we can either purchase components with higher availability, or we can add some redundancy into the system. Buying more reliable or available components is certainly an option, although generally that is a fairly costly option. Mainframe computers are an example of this option. They generally provide better availability than blade servers, but do so at a very high premium. Using cheaper redundant components is typically much cheaper and can even better overall availability.

More on high availability in my next post. 

Thursday Jan 02, 2014

High Availability

As companies become more and more dependent upon their information systems just to be able to function, the availability of those systems becomes more and more important.  Outages can costs millions of dollars an hour in lost revenue, let alone potential damage done to a company’s image. To add to the problem, a number of natural disasters have shown that even the best data center designs can’t handle tsunamis and tidal waves, causing many companies to implement or re-evaluate their disaster recovery plans and systems. Practically every customer I talk to asks about disaster recovery (DR) and how to configure their systems to maximize availability and support DR. This series of articles will contain some of the information I share with these customers.

The first thing to do is define availability and how it is measured. The definition I prefer is availability represent the percentage of time a system is able to correctly process requests within an acceptable time period during its normal operating period. I like this definition as it allows for times when a system isn’t expected to be available such as during evening hours or a maintenance window. However, that being said, more and more systems are being expected to be available 24x7, especially as more and more businesses operate globally and there is no common evening hours.

Measuring availability is pretty easy. Simply put it is the ratio of the time a system is available to the time the system should be available. I know, not rocket science. While it’s good to measure availability, it’s usually better to be able to predict availability for a given system to be able to determine if it will meet a company’s availability requirements. To predict availability for a system, one needs to know a few things, or at least have good guesses for them. The first is the mean time between failures or MTBF. For single components like a disk drive, these numbers are pretty well known. For a large computer system the computation gets much more difficult. More on MTBF of complex systems later. Then next thing one needs to know is the mean time to repair or MTTR, which is simply how long does it take to put the system back into working order.

Obviously the higher the MTBF of a system, the higher availability it will have and the lower the MTTR of a system the higher the availability of the system. In mathematical terms the system availability in percent is: 


So if the MTBF is 1000 hours and the MTTR is 1 hour, then the availability would be 99.9% or often called 3 nines. To give you an idea about how much down time in a year equates to various number of nines, here is a table showing various levels or classes of availability:


Total Down Time per Year

Class or # of 9s

Typical application or type of system


~36 days



~4 days




~9 hours


Commodity Servers


~1 hour


Clustered Systems


~5 minutes


Telephone Carrier Servers


~1/2 minute


Telephone Switches


~3 seconds


In-flight Aircraft Computers

As you can see, the amount of allowed downtime gets very small as the class of availability goes up. Note though that these times are assuming the system must be available 24x365, which isn’t always the case.

More about high availability in my next entry. 

Thursday Sep 05, 2013

Tuxedo vs MQ Series or other MOMs - No comparison

Tuxedo is only the multi-language enterprise application server that can scale to 100's of thousands of requests per second, deliver end-to-end service response times under 30 microseconds, and regularly provide 99.999% availability or better.  Whether you want to build extreme performance applications in C or C++, migrate existing CICS or IMS COBOL applications to distributed systems, or build applications using a variety of implementation languages, each selected based upon their suitability for the programming problem at hand, Tuxedo offers by far the best option to do this in the least amount of time.[Read More]

Friday Aug 02, 2013

Integrating Tuxedo Global Transactions across Web Services

Global Transactions

A global transaction is a series of service calls where the services involved write to a resource (typically update or create a record in a database), and all updates or creations must be completed or none at all so that no inconsistency exists.

For example, imagine performing a balance transfer from one account to another, and that the information pertaining to those accounts is stored in two different databases. The succession of service calls would be as follows:

  • withdraw amount from database 1,

  • deposit amount to database 2,

  • commit (withdrawal and deposit become effective and are reflected in future balance displays).

Applications running on Oracle Tuxedo, combined with a database resource such as Oracle Database can guarantee what is called in computer science Atomicity, Consistency, Isolation and Durability (or ACID properties).

Web Services

In world more and more connected, Web Services and SOAP standards have been developed to address needs to exchange information irregardless of the system on which it is available. A Web Service is a “public” interface to a business operation that is exposed in a standardized way.

Other standards are developed as needs arise, such as WS-Addressing, WS-ReliableMessaging or WS-Security, and software vendors implement those in order to provide more features.

Such features are usually advertised in service interfaces so that provider and consumer can agree on levels of functionality and automatically adjust interactions. For instance, a service provider may offer a secure version of its services but still allow non-secure consumers to see and use a scaled-down version of the same services, even though they do not implement the full stack of security standards.

The standard that combines Global Transactions and Web Services is WS-AtomicTransaction or WS-AT. Consider the example below:

Each of the different actors in this use-case may be housed in completely different organizations, with their own software, networks and databases. Using Web Services standards ensures that the applications will communicate with each other despite potentially using different software vendors, having different software life-cycles and so on.

The SALT gateway is a Tuxedo system process that adds Web Services support to Tuxedo applications. Tuxedo services can be exposed as Web Services, or Tuxedo client programs can invoke Web Services seamlessly, that is by making it seem like the Web Services are simply other Tuxedo services.

In that spirit, integrating Tuxedo services with Web Services Atomic Transactions is as simple as changing some elements of configuration:

  • Add a transaction log so a record of prepared transactions is kept, so that in the case of a failure those in-flight transactions can be resolved, usually rolled back but in some cases committed.

  • In the Tuxedo-to-external Web Service direction, associate a standard policy descriptor to instruct the SALT gateway on what to do when a transaction propagation is requested: mandatory or optional propagation, or no propagation at all (no policy present). This policy file will look as follows:

<?xml version="1.0"?>

<wsp:Policy wsp:Name="TransactionalServicePolicy"



<wsat:ATAssertion wsp:Optional="true"/>


When exposing a Tuxedo service as a Web Service, the SALT gateway will generate the proper WSDL containing the WS-AT capabilities. A WS-AT transaction will propagate into Tuxedo and the remote side will coordinate it.

When invoking a Web Service, the assertion will be contained in the remote WSDL, and the SALT utilities used to import the Web Service configuration will process those automatically and generate a WS-AT policy file such as seen above. Then when a transaction is started on the Tuxedo side it can be propagated to the outside, and in this case coordinated by Tuxedo.

It is possible to expand existing applications to Web Services, and of course develop new ones, and take advantage of WS-AT by way of the SALT gateway.


For Oracle Tuxedo, Oracle SALT provides a native Web Services implementation that ties global transactions and Web Services together.

Oracle Tuxedo users are already used to the scalability and high-availability of their applications. Oracle SALT brings Web Services interoperability to Oracle Tuxedo, and does so in a configuration-oriented manner, that is it is not even necessary to modify existing applications or develop new ones in order for them to inter-operate with Web Services. 


This is the Tuxedo product team blog.


  • Oracle
« December 2015