- High Availability Part 6
- High Availability Part 5
- High Availability Part 4
- High Availability Part 3
- High Availability Part 2
- High Availability
- Tuxedo vs MQ Series or other MOMs - No comparison
- Integrating Tuxedo Global Transactions across Web Services
- Using Tuxedo application service version with Oracle SALT
- Sterci processes financial messages 7x faster while lowering TCO
Saturday Mar 15, 2014
Thursday Jan 23, 2014
By Todd Little on Jan 23, 2014
Wednesday Jan 15, 2014
By Todd Little on Jan 15, 2014
To Err is Human; To Survive is High Availability
In this post I’d like to look at the various causes of unavailability or outages. The most obvious although often overlooked is that of scheduled system maintenance. Now whether that is included in your measurement of availability depends upon the stack holders for a system or application. The ideal systems have no scheduled maintenance that causes the system to be unavailable. That isn’t to say they don’t receive maintenance, but that the maintenance doesn’t cause the system to be unavailable. This can be done via rolling upgrades, site switchovers, etc. For now it suffices to say that this type of down time is intentional, known, and typically scheduled.
The interesting part comes in looking at other causes of unavailability, in particular those caused by failures. The most commonly thought of failure is that of a hardware failure such as a disk drive failing, or a server failing. These failures tend to be obvious and easily remedied. Most people then guess that software failures make up the next significant portion of failures. But as is all too often the case, the most common failures in highly available systems are those caused by people. Estimates place hardware failures at around 10% of the causes of an outage. This low percentage is largely due to the ever improving MTBF of hardware. Software is estimated to cause about 20% of outages for highly available systems. The remaining 70% of outages are attributable to human action, and increasingly these actions are intentional, i.e., purposeful interruptions of service for malicious intent such as denial of service attacks.
To give an example a study was done on replacing a failed hard drive in a software RAID configuration. A seemingly simple task, yet a surprising number of cases of replacing the wrong drive occurred in the first few times an engineer was asked to repair the systems. This indicates that putting procedures in place to repair a system isn’t adequate, but that actually performing the procedures several times is needed to eliminate human error. But more importantly it points out the need to eliminate human intervention as much as possible as any human intervention either for normal operation or for remediating a failure has a significant possibility of being done incorrectly. That incorrect intervention could be relatively catastrophic as in replacing the wrong drive in the above study caused a complete loss of data in some instances.
So what is the takeaway from this information? Minimize or eliminate human intervention as much as possible in order to minimize outages attributable to human error. Typically this means automating as much as possible any necessary steps to resume normal operation after a failure or even during normal operation. Every manual step taken by an administrator has some probability of causing an outage. It also suggests that repair procedures be well tested, preferably in a test environment that duplicates the production environment.
More on how Tuxedo can help solve these problems in my next entry.
Saturday Jan 11, 2014
By Todd Little on Jan 11, 2014
In my previous posts on High Availability I looked at the definition of availability and ways to increase the availability of a system using redundant components. In this post I'll look at another way to increase the availability of a system. Let’s go back to the calculation of availability:
Based upon this formula, we can see that if we can decrease the MTTR, we can increase the overall availability of the system. For a computer system, let’s look at what makes up the time to repair the system. It includes some time that may not be obvious, but in fact is extremely important. The timeline for a typical computer system failure might look light:
- Normal operation
- Failure or outage occurs
- Failure or outage detected
- Action taken to remediate the failure or outage
- System placed back into normal operation
- Normal operation
Most people only consider item (4) above, the time taken to remediate the outage. That might be something like replacing a failed hard drive or network controller. It could even be as simple as reconnecting an accidentally disconnected network cable, a 30 second repair. But the MTTR isn't 30 seconds. It’s the time included in (3), (4), and (5) above. For the network cable example, the amount of time taken in (3) will depend upon network timers at multiple levels and could be many minutes if just relying on the operating system network stack. The time taken for (4) may be as low as the 30 seconds needed to reconnect the cable although finding the cable might take a bit longer than 30 seconds. The time for (5) again depends upon the service resumption steps such as re-establishing a DHCP address, reconnection of applications or servers, etc. So on the surface the MTTR may be assumed to be 30 seconds, the actual time could be many minutes, especially in the extreme case where systems, servers, applications, etc., need to be restarted or rebooted manually to recover.
So how does this impact system design for highly available systems? It indicates that whatever can be done to decrease items (3), (4), and (5) above, will improve overall system availability. The more of these steps that can be automated, the lower the MTTR one can achieve, and the higher the availability of the system. Too often the detection phase (3) is left up to someone calling a help desk to say they can’t access or use the system. As well items (4) and (5) often require manual intervention or steps. When one wants to achieve 99.99% availability, manual repairs or remediation is going to make that very difficult to achieve.
More on the causes of failures in my next post.
Monday Jan 06, 2014
By Todd Little on Jan 06, 2014
To compute the availability of a system, you need to examine the availability of the components that make up the system. To combine the availability of the components, you need to determine if the components failure prevents the system from being usable, or if the system can still be available regardless of the failure. Now that sounds strange until you consider redundancy. In a non-redundant subsystem, if it fails, the system is unavailable. So in a completely non-redundant system, the availability of the system is simply the product of each component’s availability:
A very simplified view of this might be:
Client => LAN => Server => Disk
If we take the client out of the picture as it really isn't part of the system, we at least have a network, a server, and a disk drive to be available in order for the system to be available. Let’s say each has an availability of 99.9%, then the system availability would be:
or 99.7% available. That’s roughly equivalent to a day’s worth of outage a year. So although each subsystem is only unavailable about 9 hours a year, the 3 combined ends up being unavailable for over a day. As the number of required subsystems or components grows the availability of the overall system decreases. To alleviate this, one can use redundancy to help mask failures. With redundant components, the availability is determined by the formula:
Let’s look at just the server component. If instead of a single server with 99.9% availability , we have two servers each with 99.9% availability, but only one of them is needed to actually have the system be available, then the availability of the server component of the system increases from 99.9% to 99.999% or 5 nines of availability just by adding an additional server. As you can see, redundancy can dramatically increase the availability of a system. If we have redundant LAN and disk subsystems in the example above, instead of 99.7% availability, we get 99.997% availability or about 16 minutes of down time a year instead of over a day of down time.
OK, so what does all of this have to do with creating highly available systems? Everything! What it tells us is that all things being equal, simpler systems have higher availability. In other words, the fewer required components you have the more available your system will be. And it tells us that to improve availability we can either purchase components with higher availability, or we can add some redundancy into the system. Buying more reliable or available components is certainly an option, although generally that is a fairly costly option. Mainframe computers are an example of this option. They generally provide better availability than blade servers, but do so at a very high premium. Using cheaper redundant components is typically much cheaper and can even better overall availability.
More on high availability in my next post.
Thursday Jan 02, 2014
By Todd Little on Jan 02, 2014
As companies become more and more dependent upon their information systems just to be able to function, the availability of those systems becomes more and more important. Outages can costs millions of dollars an hour in lost revenue, let alone potential damage done to a company’s image. To add to the problem, a number of natural disasters have shown that even the best data center designs can’t handle tsunamis and tidal waves, causing many companies to implement or re-evaluate their disaster recovery plans and systems. Practically every customer I talk to asks about disaster recovery (DR) and how to configure their systems to maximize availability and support DR. This series of articles will contain some of the information I share with these customers.
The first thing to do is define availability and how it is measured. The definition I prefer is availability represent the percentage of time a system is able to correctly process requests within an acceptable time period during its normal operating period. I like this definition as it allows for times when a system isn’t expected to be available such as during evening hours or a maintenance window. However, that being said, more and more systems are being expected to be available 24x7, especially as more and more businesses operate globally and there is no common evening hours.
Measuring availability is pretty easy. Simply put it is the ratio of the time a system is available to the time the system should be available. I know, not rocket science. While it’s good to measure availability, it’s usually better to be able to predict availability for a given system to be able to determine if it will meet a company’s availability requirements. To predict availability for a system, one needs to know a few things, or at least have good guesses for them. The first is the mean time between failures or MTBF. For single components like a disk drive, these numbers are pretty well known. For a large computer system the computation gets much more difficult. More on MTBF of complex systems later. Then next thing one needs to know is the mean time to repair or MTTR, which is simply how long does it take to put the system back into working order.
Obviously the higher the MTBF of a system, the higher availability it will have and the lower the MTTR of a system the higher the availability of the system. In mathematical terms the system availability in percent is:
So if the MTBF is 1000 hours and the MTTR is 1 hour, then the availability would be 99.9% or often called 3 nines. To give you an idea about how much down time in a year equates to various number of nines, here is a table showing various levels or classes of availability:
Total Down Time per Year
Class or # of 9s
Typical application or type of system
Telephone Carrier Servers
In-flight Aircraft Computers
As you can see, the amount of allowed downtime gets very small as the class of availability goes up. Note though that these times are assuming the system must be available 24x365, which isn’t always the case.
More about high availability in my next entry.
Thursday Sep 05, 2013
By Todd Little on Sep 05, 2013
Friday Aug 02, 2013
By Maurice G on Aug 02, 2013
A global transaction is a series of service calls where the services involved write to a resource (typically update or create a record in a database), and all updates or creations must be completed or none at all so that no inconsistency exists.
For example, imagine performing a balance transfer from one account to another, and that the information pertaining to those accounts is stored in two different databases. The succession of service calls would be as follows:
withdraw amount from database 1,
deposit amount to database 2,
commit (withdrawal and deposit become effective and are reflected in future balance displays).
Applications running on Oracle Tuxedo, combined with a database resource such as Oracle Database can guarantee what is called in computer science Atomicity, Consistency, Isolation and Durability (or ACID properties).
In world more and more connected, Web Services and SOAP standards have been developed to address needs to exchange information irregardless of the system on which it is available. A Web Service is a “public” interface to a business operation that is exposed in a standardized way.
Other standards are developed as needs arise, such as WS-Addressing, WS-ReliableMessaging or WS-Security, and software vendors implement those in order to provide more features.
Such features are usually advertised in service interfaces so that provider and consumer can agree on levels of functionality and automatically adjust interactions. For instance, a service provider may offer a secure version of its services but still allow non-secure consumers to see and use a scaled-down version of the same services, even though they do not implement the full stack of security standards.
The standard that combines Global Transactions and Web Services is WS-AtomicTransaction or WS-AT. Consider the example below:
Each of the different actors in this use-case may be housed in completely different organizations, with their own software, networks and databases. Using Web Services standards ensures that the applications will communicate with each other despite potentially using different software vendors, having different software life-cycles and so on.
The SALT gateway is a Tuxedo system process that adds Web Services support to Tuxedo applications. Tuxedo services can be exposed as Web Services, or Tuxedo client programs can invoke Web Services seamlessly, that is by making it seem like the Web Services are simply other Tuxedo services.
In that spirit, integrating Tuxedo services with Web Services Atomic Transactions is as simple as changing some elements of configuration:
Add a transaction log so a record of prepared transactions is kept, so that in the case of a failure those in-flight transactions can be resolved, usually rolled back but in some cases committed.
In the Tuxedo-to-external Web Service direction, associate a standard policy descriptor to instruct the SALT gateway on what to do when a transaction propagation is requested: mandatory or optional propagation, or no propagation at all (no policy present). This policy file will look as follows:
When exposing a Tuxedo service as a Web Service, the SALT gateway will generate the proper WSDL containing the WS-AT capabilities. A WS-AT transaction will propagate into Tuxedo and the remote side will coordinate it.
When invoking a Web Service, the assertion will be contained in the remote WSDL, and the SALT utilities used to import the Web Service configuration will process those automatically and generate a WS-AT policy file such as seen above. Then when a transaction is started on the Tuxedo side it can be propagated to the outside, and in this case coordinated by Tuxedo.
It is possible to expand existing applications to Web Services, and of course develop new ones, and take advantage of WS-AT by way of the SALT gateway.
For Oracle Tuxedo, Oracle SALT provides a native Web Services implementation that ties global transactions and Web Services together.
Oracle Tuxedo users are already used to the scalability and high-availability of their applications. Oracle SALT brings Web Services interoperability to Oracle Tuxedo, and does so in a configuration-oriented manner, that is it is not even necessary to modify existing applications or develop new ones in order for them to inter-operate with Web Services.
Tuesday Jul 30, 2013
By Maurice G on Jul 30, 2013
To expand on this previous entry, here are some more details on how to use application service version with Web Services through the Oracle SALT gateway.
Using Tuxedo application service version in conjunction with Tuxedo services exposed as web services
- The GWWS gateway gets REQUEST_VERSION and VERSION_RANGE from UBBCONFIG,
- calls to actual Tuxedo service are made with REQUEST_VERSION inherited from configuration,
- if different settings are needed, such as specific traffic from specific gateway to be routed to specific services, another gateway instance can be configured in a group with different REQUEST_VERSION value and started
Example (UBBCONFIG excerpt):
LMID=L1 GRPNO=2 VERSION_RANGE="1-2"
LMID=L1 GRPNO=2 VERSION_RANGE="3-4"
LMID=L1 GRPNO=3 REQUEST_VERSION=1
LMID=L1 GRPNO=3 REQUEST_VERSION=2
mySERVER SRVGRP=GROUP2 SRVID=30
GWWS SRVGRP=GWWS_GRPV1 SRVID=30 CLOPT="-A -- -i GW1"
GWWS SRVGRP=GWWS_GRPV2 SRVID=30 CLOPT="-A -- -i GW2"
In the example above GWWS in group GWWS_GRPV1 inherits request version "1" from its UBBCONFIG settings, and therefore exposes services that are advertised by Tuxedo application servers which include "1" in their VERSION_RANGE settings, such as GROUP1 here. If a service exposed by GWWS is actually performed by a server in GROUP2 the result will be a TPENOENT error forwarded to the remote Web Services client.
Using this mechanism, it is possible to map different endpoints to services with different versions. Since versions are per-group, this is done by placing GWWS servers in their own group, and either use proxy mapping in front of GWWS (via Apache server or other), or by directly accessing the endpoints of the Web Services. For example, these settings would be added to the UBBCONFIG above:
<wsdf:Endpoint address="http://my.server:3331/quote" id="http_port_v1"/>
<wsdf:Endpoint address="http://my.server:3332/quote" id="http_port_v2"/>
Using Tuxedo application service version in conjunction with External web services imported into Tuxedo using SALT
- Since 1 GWWS instance cannot advertise more than 1 service with same name, that same service would have to be in different instance,
- for that reason, the existing mechanism can simply be used: configure multiple GWWS instances with VERSION_RANGE in its *GROUP settings accordingly.
Example (UBBCONFIG excerpt):
LMID=L1 GRPNO=2 VERSION_RANGE="1-2"
LMID=L1 GRPNO=3 REQUEST_VERSION=1 VERSION_RANGE="3-4"
GWWS SRVGRP=GROUP2 SRVID=30
GWWS SRVGRP=GROUP3 SRVID=30
In the above example, Tuxedo programs (client or server) call an external Web Service exposed by both GWWS in groups GROUP2 and GROUP3. Programs using version 1 or 2 will be routed to the service exposed by GWWS in GROUP2 which may connect to endpoint 1, and programs using version 3 or 4 will be routed to the service exposed by GWWS in GROUP3 which may connect to a different endpoint than GWWS in GROUP2.
Follow Cloud Application Foundation (CAF):
Thursday May 16, 2013
By R A Sanyal on May 16, 2013
Headquartered in Geneva, Sterci Group is a market-leading financial messaging solutions company with subsidiary divisions in London, Brussels, Toronto, New York, Paris, Riyadh, Singapore and Zurich. Sterci’s products and services provide banks, corporations and financial institutions with integrated business solutions for transactional banking, multi-bank connectivity, full data integration, reconciliation, cash management, zero balancing and market data management.
Sterci partners with Oracle to deliver mission critical and best-in-class solutions their clients can depend upon. Many of their customers were running old financial messaging switches like IBM mainframes and HP’s Tandem type platforms that are very expensive to support. Sterci’s view was to help those organizations lower their total cost of ownership. Sterci wanted an application server environment that had transactional monitoring capabilities that were robust, high performing, easy to distribute, and widely supported in the market.
Oracle Tuxedo was an obvious fit. Tuxedo is widely distributed, widely used, mature, highly available and highly performing. With Oracle Tuxedo and Exalogic, Sterci went from processing half a million to 3.5 million financial messages per hour while lowering the total cost of ownership. Watch the video, Sterci Clients up to 7x Faster with Oracle Tuxedo, with Rob Kotlarz, Business Development Director, of Sterci to learn more.
Follow Cloud Application Foundation:
Thursday May 02, 2013
The Realities of Rehosting – Four Customer Stories, by Mark Rakhmilevich, Product Mgmt, Strategy Director Product Development
By R A Sanyal on May 02, 2013
Mainframe customers have options. Not that mainframe vendors, like IBM, would tell them. In fact, IBM recently put together a presentation on “The Reality of Rehosting” and, as you can guess, they weren’t enthusiastic about the notion of moving applications from the mainframe to open systems. On the other hand, some mainframe users who presented at a recent Oracle OpenWorld are way more enthusiastic about their options when leveraging Tuxedo to migrate applications off the mainframe. These are their stories on the “realities of rehosting.”
Banco Bilbao Vizcaya Argentaria (BBVA)
BBVA, a global bank with 100,000 MIPS of IBM mainframe capacity deployed across Europe and the Americas has began rehosting their core banking transactions and other mainframe applications to Oracle Tuxedo and Oracle Database. With over 3000 MIPS already rehosted in 2012 and another 12,000 under way in 2013, the bank has crunched the numbers and estimated $1M/year savings for every 1000 MIPS of rehosted workloads. They rely on robust Tuxedo and Oracle Database foundation, coupled with Tuxedo Mainframe Adaptors for integration with IBM CICS and IMS, and GoldenGate for database synchronization, to run a hybrid core banking infrastructure with full security and global transaction coordination across rehosted transactions and those that are still on the mainframe.
Operating in many countries presents complex regulatory challenges for BBVA, including requirements for managing local customer data and transactions in-country. Using Tuxedo and Oracle Database, BBVA is able to deploy a small “bank-in-a-box” datacenter configuration in those countries that have this requirement. Speaking at Oracle Open World 2012 , Antonio Gelart, head of BBVA modernization program, stated that the bank’s goal is zero MIPS – no mainframes – and with the success of early rehosting projects they can see a clear path towards this goal.
Mazda, long known as an innovator in automobile industry, is right sizing its IT infrastructure to meet the challenges of the current economic disruptions. Shedding the legacy mainframe environment, Mazda has chosen to migrate its cost accounting system off the mainframe to a Linux environment powered by Tuxedo and Tuxedo’s Application Runtime (ART) for Batch.
Accomplishing this migration in about one year, Mazda has revamped some of its application programs in Java and successfully married the traditional mainframe batch framework provided by Tuxedo ART with Java programs. Describing this migration project at Oracle Open World 2012, Masuhiro Yoshioka, Mazda’s IT infrastructure manager, said that they chose Oracle Tuxedo for its strong reliability and availability characteristics, which are critical for Mazda’s cost accounting system that feeds into quarterly and annual financial reporting. In addition to significant cost savings expected from decommissioning one of the two mainframes, Mazda has seen significant performance improvements from parallelizing the overnight batch processing across a distributed batch farm supported by Tuxedo’s distributed batch framework. Once acquired, the taste for rehosting stays strong. Mazda is now looking at rehosting an IBM IMS application from its last remaining mainframe to Tuxedo’s Application Runtime for IMS.
Caja de Valores (CdV)
CdV, an IT arm of Buenos Aires Stock Exchange, began its mainframe modernization quest in 2001 with a 3-yr project to re-write about 4M lines of COBOL code to Java. Fast forward to 2007, and, in the words of Alejandro Wyss, the CdV CIO, no critical subsystem has been migrated, the budget for re-write was overrun by 3X, and re-write project has achieved less than 30% of the total scope, while COBOL code base grew to 6M LoC.
A new approach was required, one that moved the application functionality to more flexible open systems infrastructure in a short timeframe and with low risk. CdV determined that rehosting the applications to Oracle Tuxedo, where the application logic is preserved intact in COBOL and only technical APIs are adapted or emulated to run on a Linux or UNIX platform, was a more promising option. Starting with the Stock Trading application as a pilot, the entire migration was accomplished in 20 months. Deploying on Linux to achieve HW vendor independence, CdV was able to leverage Tuxedo’s built-in clustering capabilities to move to an application grid enabling Active/Active fault-tolerant services infrastructure, while increasing throughput by 200% at a fraction of a mainframe cost. Leveraging Tuxedo’s standards-based integration options, CdV is able to reduce overall risk and cut time to market for new capabilities by 30%.
A Top 5 Global Bank
It’s difficult to imagine a more critical banking system than a SWIFT Financial Messaging application. Lifeblood of any major bank is its connectivity into the global financial fabric managed by SWIFT that interconnects over 8000 banks and many of their corporate customers. For a bank that’s in the top 5 of SWIFT messaging volumes, an aging SWIFT Financial Messaging solution is a serious risk.
criteria for a mainframe replacement solution that could consolidate disparate SWIFT
messaging systems and perform at 10x current system’s throughput, the Bank
embarked on a series of performance benchmarks.
The clear winner – a Tuxedo-based GT Exchange application from Sterci, a long term Tuxedo ISV that
specializes in SWIFT financial messaging market, deployed on Exalogic Elastic
Cloud System. In fact, once the Bank has seen its 4x throughput advantage over
IBM AIX/pSeries (2.58M complex SWIFT messages/hr on Exalogic compared to 620K
on IBM p750 servers), they’ve decided to deploy it across 2 countries and 4
datacenters using 8 quarter-rack Exalogic systems. While this is a mainframe application replacement
example rather than rehosting of an existing application, it underlines the
performance advantages that can be achieved through Tuxedo optimizations on
Migrating off expensive, inflexible mainframe systems to Tuxedo
These four customers are not
alone in migrating off expensive and inflexible mainframe systems to
Tuxedo-powered open systems infrastructure. They are just the more recent
examples of mainframe migrations leveraging Tuxedo that demonstrate significant
cost and risk reduction benefits, and highlight the gains in performance,
datacenter flexibility, and business agility customers can achieve using
Tuxedo-based migration approach. In
subsequent posts we’ll highlight the technical details of these migrations, and
share best practices for migrating mainframe applications to Tuxedo.
Press Release: Oracle Enhanced Mainframe Rehosting for Oracle Tuxedo 12c
Web Page: Tuxedo page on oracle.com
Follow Cloud Application Foundation:
Wednesday Apr 17, 2013
Increase the Availability of Your Tuxedo Applications and Improve IT Productivity with TSAM Plus--By Deepak Goel, Senior Director, Software Development
By R A Sanyal on Apr 17, 2013
Find out how you can increase the productivity of your IT staff and the availability of your Tuxedo applications using Oracle Tuxedo System and Application Monitor Plus 12c (TSAM Plus 12c). Check out YouTube video below by Todd Little, Managing Tuxedo Applications with TSAM Plus 12c and OEM CC12c.
TSAM Plus 12c is a management and monitoring solution for Tuxedo 12c applications. It helps improve performance and availability of Tuxedo applications and expedite problem resolution in both dev/test and production environments, while monitoring several domains at the same time. TSAM 12c has many features, which help automate day-to-day operations such as resource deployments, scale up and out of application nodes and service level management, increasing the productivity of IT staff as they do not need to worry about writing scripts, or moving from one console to another console or correlating messages from one product to another in order to diagnose a critical production problem.
TSAM Plus 12c includes a plugin for Oracle Enterprise
Manager Cloud Control 12c, which allows Tuxedo applications to be monitored and
managed from the same console as other Oracle products, including WebLogic and
TSAM Plus 12c Functionality can be broadly categorized as follows:
- Application Performance Management: TSAM Plus 12c greatly improves application performance by providing unique functionality to automatically detect performance bottlenecks; quickly diagnose these performance problems, and identify their root cause
- Operations Automation: TSAM Plus 12c automates common manual and error prone operations allowing administrators to focus on more strategic initiatives. With TSAM Plus 12c , Tuxedo applications can be packaged in a self contained application package along with required configuration artifacts and stored in a central repository, ready for deployment, to an existing domain, or to interactively create a new Tuxedo domain or to add additional nodes to an existing domain. Both physical and virtual environments are supported. In addition, With TSAM Plus 12c, it is much easier to make changes in configuration of Tuxedo applications in production environment without having to restart the application, thus avoiding costly downtime. With TSAM Plus 12c, A Tuxedo domain can be changed dynamically, in addition to creating a new Tuxedo domain from scratch. TSAM Plus 12c also helps with day-to-day operational tasks, such as manually start and stop applications and start new instances of an application server.
- Service Level Management: TSAM Plus 12c helps IT organizations to achieve high availability, performance, and optimized service levels for their business services.
Datasheet: Oracle Tuxedo System and Application Monitor
Web Page: Tuxedo page on oracle.com
Follow Cloud Application Foundation:
Sunday Apr 14, 2013
Developing and Deploying Services in Java on Tuxedo 12c is Easy and Straightforward by Todd Little, Oracle Tuxedo Chief Architect
By R A Sanyal on Apr 14, 2013
One of the 187 new features in Tuxedo 12c is the
ability to develop Tuxedo services in Java. Prior to Tuxedo 12c, to
create a Tuxedo service in Java meant adding another application server such as
WebLogic Server or IBM WebSphere to the environment and using either the
WebLogic Tuxedo Connector (WTC) or the Tuxedo JCA Adapter. The service
was then developed in Java, deployed to the Java EE application server, and
then connected to existing Tuxedo applications via the Tuxedo domain
gateway. This meant that every request from Tuxedo to these Java services
entailed a network hop and any distributed transactions required a subordinate
transaction to be started in the Java EE application server. As well, any
native Tuxedo service called by the Java service now required another network
hop--all in all usable, but requiring more administration, more resources, and
Java Server Support
The Java Server support in Tuxedo uses a POJO programming model based upon Java
SE. The programming environment and APIs used for service development is
JATMI, the same API used in WTC. JATMI is essentially an object oriented
version of the standard Tuxedo Application to Transaction Monitor Interface
(ATMI). It supports virtually all of the ATMI features and should be very
familiar to anyone that has developed Tuxedo services in another
language. Yet being Java developers have access to the rich set of class
libraries that Java developers have come to know and love. Since the
environment is Java SE based, Java EE features such as transaction management
are provided by the JATMI classes instead of the Java Transaction API.
Developing & Deploying Java Service on Tuxedo is easy
Developing and deploying services in Java on Tuxedo is extremely easy and straightforward. The basic steps are to create a Java class that extends the TuxedoJavaServer class provided by JATMI. Create one or more methods that will handle Tuxedo service requests. These methods take a TPSVCINFO instance as the only parameter that contains such information as the name of the service called and the typed buffer the caller passed to the service. The method extracts whatever information it needs from the typed buffer, performs its business logic and then creates a typed buffer to reply to the caller. Finally the class calls the tpreturn() method to return the reply buffer back to the caller.
Configuring the Tuxedo Java Server
Once the server class or classes have been developed and compiled, the Tuxedo Java Server TMJAVASVR needs to be added to the Tuxedo UBBCONFIG file. This Tuxedo provided server will load the JVM, load the server classes, and take care of dispatching incoming requests to the methods in the server classes. Which classes to load and the mapping between Tuxedo service names that the server will offer are defined in an XML based configuration file. By default each public method in the server classes is advertised as the name of the Tuxedo service. This configuration file also specifies such things as the classpaths to be used, JDBC driver and connection information for accessing a database, and resources such as FML/FML32 field tables and VIEW/VIEW32 classes. After updating and loading the UBBCONFIG file, the application is ready to be booted and tested.
Sample Implementation and Configuration
Here is what a simple Java service implementation might look like:
public void JAVATOUPPER(TPSVCINFO rqst) throws
TuxAppContext myAppCtxt = getTuxAppContext(); /* The the application context */
TypedBuffer svcData = rqst.getServiceData(); /* Get the callers data */
TypedString TbString = (TypedString)svcData; /* Assume it's a STRING buffer */
String newStr = TbString.toString().toUpperCase(); /* Get the string and upper case it */
TypedString replyTbString = new TypedString(newStr); /* Create the reply buffer */
myAppCtxt.tpreturn(TPSUCCESS, 0, replyTbString, 0); /* Return reply buffer to caller */
The entry in the UBBCONFIG for the Tuxedo Java Server might look like:
SRVGRP=GROUP1 SRVID=2 CLOPT="-A"
CLOPT="-- -c TJSconfig.xml"
which would start a single copy of the Tuxedo Java Server with 10 threads to handle requests. The configuration file TJSconfig.xml for this server might look something like:
where MyTuxedoJavaServer is the name of the Java class that extends the TuxedoJavaServer class.Multiple copies of the
Tuxedo Java Server can be run just as any other Tuxedo server using the same
configuration file or each using their own configuration file. All
standard Tuxedo buffer types are supported, so services can use STRING, CARRAY,
MBSTRING, FML/FML32, XML, or VIEW/VIEW32 buffers. As well, Java services
can call other Tuxedo services by using the tpcall() method on the
As the Tuxedo Java Server is a standard Tuxedo server, all of the monitoring, management, and administration capabilities that Tuxedo provides to C or other language servers is available to services written in Java. These services also benefit from the unmatched reliability, availability, scalability, and performance that Tuxedo has proven to provide at thousands of customer sites. By providing Java support in Tuxedo, customers are free to choose the language that best suits their application development needs, whether it is C, C++, COBOL, Python, Ruby, PHP, and now Java, and they all work together seamlessly to provide one integration application.
Cloud Application Foundation (CAF):
Friday Apr 05, 2013
Three New Features Improve Availability of Tuxedo Based Applications- by Todd Little, Oracle Tuxedo Chief Architect
By R A Sanyal on Apr 05, 2013
Tuxedo 12cR1 introduced several new features to help improve the availability of Tuxedo applications. While Tuxedo is known for providing extremely high reliability, availability, scalability, and performance (RASP), there are always things Oracle can do to improve the availability of an application. This post will cover three new features that help improve the availability of Tuxedo based applications.
Highly available systems try to avoid single points of failure to ensure the survivability of an application even in the midst of a failure. Tuxedo has provided means to avoid single points of failures in virtually all scenarios except one, and that is when customers use data dependent routing or DDR. DDR allows an application to be partitioned based upon the values contained in a field of a request buffer. In the Tuxedo sample bankapp application, the ACCOUNT_ID field in a request message is used to determine which group of servers should handle the request. This is controlled by the *ROUTING section in the UBBCONFIG file. For each range of values, a server group can be specified to handle requests. The issue with regards to availability is that only a single server group can be specified in releases prior to Tuxedo 12cR1. While a server group can have multiple servers in it such that the failure of a single server won't cause a problem, a server group can only reside on a single machine in a cluster. Thus if the machine that the server group is on fails, there will be some period of time that the partition of the application associated with that group of servers is unavailable. Requests to the servers in that partition will fail until the machine is restarted or the server group migrated to another machine.
Improved *ROUTING Section
With Tuxedo 12cR1 the *ROUTING section can now specify up to three server groups that can be associated with a range of values. This now allows the application partition to span up to 3 machines allowing the partition to still be available even if two of the machines completely fail. Besides improving the availability of a partition, it also increases the scalability of a partition as now the resources of up to three machines can be utilized to process requests. This same improvement is included in the Tuxedo domain gateway as well. This allows the domain gateway to specify up to three remote domains that can be associated with a range of values in a field. When combined with multiple gateways, multiple domains, and multiple network links, applications can achieve unmatched levels of availability.
Automatic Migration of Machines and Server Groups
Another feature increasing availability of Tuxedo applications introduced in Tuxedo 12cR1 is the automatic migration of machines and server groups. Since very early on, Tuxedo has had mechanisms to allow a machine to be migrated from one host to another, or for a server group to be migrated from one machine to another. This provides a recovery mechanism in the case of a machine or server group failure. Prior to Tuxedo 12cR1 the migration process was a manual one that required either manual intervention or the creation of scripts that could perform some level of automated migration.
While the failure of a machine or server group by itself doesn't typically affect the availability of a properly configured application, it may leave the application with one or more single points of failure. This can be mitigated by ensuring there are always at least three copies of servers or server groups such that if one fails, redundancy is still maintained. Even though it's not possible to define more than one BACKUP machine for the MASTER machine, and there is only one MASTER machine at any point in time, the failure of the MASTER machine doesn't necessarily impact application availability. This is one misconception many Tuxedo customers have about MP or clustered operations with Tuxedo. They see the MASTER machine as a single point of failure, but in fact normal application processing goes on even if the MASTER machine fails. This is because the DBBL process which runs on the MASTER machine isn't involved in normal request routing. All that happens if the MASTER fails or for some other reason the DBBL can't be reached is that configuration changes can't occur until the DBBL becomes available.
What automatic migration does under most failure scenarios, is to automate the migration of a machine to its backup, or a server group to its backup machine. This eliminates the possibility of human error causing even more problems during a failure, and as well minimize the time to restore the system to normal operation or reducing the mean time to repair (MTTR). Reducing MTTR is one of the most effective ways of increasing overall system availability. Enabling these features is a simple matter of adding two new options to the *RESOURCES section of the UBBCONFIG file. For more details, see the Migrating Your Application [http://docs.oracle.com/cd/E35855_01/tuxedo/docs12c/ada/admigt.html] section of the Tuxedo 12cR1 documentation.
Finally the last availability related feature added in Tuxedo 12cR1 is service versioning. While that may not sound particularly related to high availability, what it allows is the concurrent deployment of multiple versions of an application. By being able to run multiple versions of an application simultaneously, customers can gradually introduce new versions of their application without having to shut down their application or impacting existing users in any way.
Service version requires no changes to the application code, although presumably there are changes, probably even incompatible changes, which is why Oracle introduced service versioning. The only required changes are in the UBBCONFIG file. The APPVER option needs to be set in the *RESOURCES section, and then the REQUEST_VERSION, VERSION_RANGE, and VERSION_POLICY options added to the *RESOURCES section or to any server groups that need versioning support. The REQUEST_VERSION indicates the version number requests will have. For native clients and servers it is either the value specified at the *RESOURCES section or then *GROUPS section, with the latter having precedence. Subsequent calls in the call path will have the request version associated with the server that made the request, unless the VERSION_POLICY is set to PROPAGATE which means the callers service version should be used. The VERSION_RANGE then indicates what request versions a server is able to process. When Tuxedo performs request routing, it will determine the request version number and then only select servers that support that version number. Thus when an incompatible change is made, you would associate a new request version with any updated callers of the service, and set the version range of servers appropriately to ensure that only updated servers handle the requests. This allows for the introduction of gradual changes and lets the application developer decide what versions of a service interface any given server supports.
These new features further enhance Tuxedo's capability to support highly available applications without requiring the customers to build those capabilities into their application code. The result is that customers can deploy applications that provide 99.999% or better availability, while being able to scale those applications to 100s of thousands of services executed per second.
Was this information helpful? Please share your comments and let us know if there are any Oracle Tuxedo topics you would like us to discuss.
Follow Cloud Application Foundation (CAF):
Thursday Mar 28, 2013
Oracle Tuxedo Mainframe Adapters Provide High Availability, Failover and Load Balancing- See the Demo
By R A Sanyal on Mar 28, 2013
Oracle Tuxedo Mainframe Adapters Provide High Availability, Failover and Load Balancing- See the Demo
Oracle Tuxedo Mainframe Adapters provides bi-directional, fully transactional access to and from mainframe CICS, IMS and batch applications. Oracle Tuxedo applications can invoke CICS, IMS & Batch apps running on mainframes and vice-versa.
CRM can support multiple connections and multiple links which makes high availability like load balancing and failover become possible. CRM supports multiple connections including:
- Several GWSNAXs connect to a single CRM. The connected Gateways share a common configuration offering a common set of services
- High availability: Supports inbound loading balance (round robin)
- High availability: Supports inbound failover
- High availability: Supports inbound transaction affinity.
- Interoperatiblity: only supports GWSNAX/CRM of qwc
- Note: The GWSNAXs must be in different Tuxedo domains.