Oracle Support Master Note for Troubleshooting Managed Distributed Transactions (Doc ID 100664.1)

Master Note for Troubleshooting Oracle Managed Distributed Transactions (Doc ID 100664.1)

Copyright (c) 2010, Oracle Corporation. All Rights Reserved.  

In this Document

What is being announced?
  What do you need to do?
     Oracle Distributed Transactions
     Two-Phase Commit (2PC)
     In-Doubt Transactions
     Recoverer Process (RECO)
     Database Parameters
     Database Views
     Testing Recovery of Failed Distributed Transactions
     Failures in Distributed Transactions
     Steps to Troubleshoot an In-Doubt Transaction
     Purging the Views
     SCN Recovery Steps
  Who to contact for more information?

Applies to:

Oracle Server - Enterprise Edition - Version: to - Release: 9.2 to 11.2
Information in this document applies to any platform.

What is being announced?

This master document discusses Oracle Managed Distributed Transactions; by this, we refer to transactions that span two or more databases and involve Oracle Databases only.

The information contained in this document targets DBAs involved in environments that use Oracle Distributed Transactions. Distributed database concepts and troubleshooting steps are included in the note and, where appropriate, references to additional notes and documentation which provide further information on the relevant topic. You can use this information when investigating and troubleshooting in-doubt transactions; a section on SCN based recovery steps is also included.

Note that within the scope of this document, for short, the term distributed transactions is used to refer to Oracle Managed Distributed Transactions(also known as homogeneous transactions).

The following Oracle Documentation references discuss Oracle Distributed Transactions:
Oracle9i Database Administrator's Guide
Release 2 (9.2)
Part Number A96521-01
Part VI: Distributed Database Management

Oracle10g Database Administrator's Guide
10g Release 2 (10.2)
Part Number B14231-02
Part VII: Distributed Database Management

Oracle11gR1 Database Administrator's Guide
11g Release 1 (11.1)
Part Number B28310-04
Part V: Distributed Database Management

Oracle11gR2 Database Administrator's Guide
11g Release 2 (11.2)
Part Number E17120-05
Part V: Distributed Database Management

What do you need to do?

Oracle Distributed Transactions

A distributed transaction modifies data related to two or more databases, it contains DML statements than span many nodes. For a distributed transaction to be succesful all or none of the database nodes involved in the transaction need to commit or rollback the whole transaction.

Note the difference between a distributed and a remote transaction; a remote transaction contains one or more DML statements that are executed on the SAME remote node, consider the examples below:

- Distributed transaction:
insert into table@remotesite;
insert into mytable: --local table
-Remote transaction:
insert into table@remotesite;

Database links are used to communicate between the databases performing distributed transactions. Location transparency can be achieved via synonyms, database views or PL/SQL procedures.

Oracle Distributed Transactions are Operating System and Database version independent; they can expand databases running different Operating Systems and RDBMS releases as long as their combination is certified by Oracle and supported. For more information, refer to: Note 207303.1 Client / Server / Interoperability Support Between Different Oracle Versions

Global Database Name

The global name of the database is composed of DB_NAME.DB_DOMAIN, where DB_DOMAIN specifies the network domain.

In a distributed environment, it is fundamental that the global name of a database is unique in the network so that each database can be unambiguously identified.

Note: The DB_DOMAIN parameter is only relevant at database creation time; once the database is created, changing this parameter in the database initialization file will not take effect in itself, though it is recommended to include it there for documentation purposes.

The following sql will allow to check the current name of the database and to change it if necessary:
sqlplus> select * from global_name;


sqlplus> select * from global_name;
Refer to Managing Global Names in a Distributed System from the Oracle Database Administrator's Guide for further information on the above.

Two-Phase Commit (2PC)

The two phase-commit mechanism is used to ensure data integrity in a distributed transaction. It is automatically used during all distributed transactions and coordinates either the commit or roll back of all the changes in the transaction as a single, self-contained unit.

In short, there are three phases for the 2PC:
  • PREPARE: The initiating node ask each of its referenced nodes to promise to perform a commit or rollback when told to do so. The preparing node will flush the redo log buffer to the online redo log. It converts locks to in-doubt transaction locks on the data blocks and passes its highest SCN value to the initiating node.
  • COMMIT: The initiating node commits and writes to its redo log the committed SCN. The Data Block locks are released.
  • FORGET:  Pending transactions tables are related database views are cleared (dba_2pc_pending/dba_2pc_neighbors)
All the above phases take place quickly and transparently to the application where the transaction originated.

A crash during the PREPARE Phase results in a ROLLBACK
A crash during the COMMIT Phase results in either COMMIT or ROLLBACK

Node Relations During 2-PC Operations

Each Database involved in a two-phase commit operation performs one or more of the following roles:
  • Client: Database node that requests information from another db.
  • Server: Node which receives a request for information from another database involved in a distributed transaction.
  • Global Coordinator (GC): Node where the distributed transaction originates. This node is responsible for sending messages to the other nodes to prepare and commit (or roll back).
  • Local Coordinator:  Node that needs to access data on other nodes in order to complete its part of the transaction. It is also responsible for coordinating the transaction among the nodes with which it communicates directly.
  • Commit Point Site: Node which commits or rolls back the transaction first, as instructed by the Global Coordinator (GC). This site determines the outcome of the transaction. This node never enters the prepare state. Note: The commit point site is chosen based on the highest COMMIT_POINT_STRENGTH value of all the nodes involved.
  • Session Tree: The communication topology for the Oracle server is fundamentally a tree structured topology. As the statements in a distributed transaction are issued, the nodes and edges of the session tree are defined. There is a certain amount of recursiveness inherent in the tree structure. A session tree is the hierarchical model that describes the relationships between sessions and their roles in the distributed transaction. The nodes which represent the sessions in the session tree have different roles in the 2PC protocol with regard to transaction management.
For more detailed information regarding distributed transactions and Oracle implementation of the 2PC mechanism, refer to Note 13229.1 Distributed Database, Transactions and Two Phase Commit


In-Doubt Transactions

If a distributed transaction fails, it may leave in-doubt transactions on one or more databases.
An in-doubt transaction is normally automatically resolved when the database or network is restored, this is done by the RECO database background process.

An in-doubt distributed transaction occurs when a two-phase commit was interrupted by any type of system or network failure. For example, two databases report to the coordinating database that they were prepared to commit, but the coordinating database instance fails immediately after receiving the messages. The two databases who are prepared to commit are now left awaiting notification out of the outcome.


Recoverer Process (RECO)

RECO is a mandatory background process that, in a distributed database, automatically resolves failures in distributed transactions. The RECO process of a node automatically connects to other databases involved in an in-doubt distributed transaction; it can use an existing connection or establish a new connection to other nodes involved in the failed transaction.. When RECO reestablishes a connection between the databases, it automatically resolves all in-doubt transactions, removing from each database's pending transaction table any rows that correspond to the resolved transactions.

At exponentially growing time intervals, the RECO background process of a node attempts to recover the local portion of an in-doubt distributed transaction.


Disabling and Enabling RECO

You can enable and disable RECO using the ALTER SYSTEM statement with the ENABLE/DISABLE DISTRIBUTED RECOVERY options. For example, you can temporarily disable RECO to force the failure of a two-phase commit and manually resolve the in-doubt transaction.
-To disable RECO:
-To enable RECO and let it automatically resolve indoubt transactions

Database Parameters


GLOBAL_NAMES: (default false): It is strongly recommended to set this parameter to TRUE, to enforce the database name uniqueness in the network.
Setting it to TRUE enforces that any database link has the same name than the global_name of the database that it connects to.

NOTE: changing this parameter from false to true may render already existing database links unusable.

OPEN_LINKS: (default 4) Specifies the maximum number of concurrent open connections to remote databases in one session. These connections include database links, as well as external procedures and cartridges, each of which uses a separate process.. It should at least be the same than the maximum number of databases that any transaction can reference to.

DISTRIBUTED_LOCK_TIMEOUT: (default 60 seconds): Defines the number of seconds that a distributed transaction waits for a lock. If a session waits longer than this for the lock then ORA-2049 is signalled.
note This parameter was hidden in 8i and 9.0 and then made available again in 9.2 onwards.

COMMIT_POINT_STRENGTH (default 1). Range of values is any integer from 0 to 255. Its value determines the commit point site in a distributed transaction. The node in the transaction with the highest value for COMMIT_POINT_STRENGTH will be the commit point site. The commit point strength should be set relative to the amount of critical shared data in the database. If no commit point site is set, Oracle will determine which site becomes the commit point site.

Database Views


DBA_2PC_PENDING Provides Information on distributed transactions awaiting recovery, ie in-doubt transactions. Relevant columns from this view are:
Local Id of the transaction,in the form of n.n.n; where n is a number
Global Transaction ID, unique to all sites
One of the following: Collecting/Prepared/Committed/Forced Commit/Forced Rollback
YES indicates that portions of the transaction have been committed and portions rolled back (forcibly)
C indicates Commit
R indicates Rollback
This field is populated only if the application has issued one of the following statements prior to beginning the distributed transaction:
Commit comment text. This column is populated only if the application has issued a COMMIT with a comment, for example:
  COMMIT COMMENT  'comment detail'
Time that the row has been inserted in the view
If the transaction has been forced the time will be displayed; NULL otherwise
Time that the RECO process last attempted to resolve the transaction
OS USerID of the local user that created the transaction
Terminal from where the local portion of the transaction originated
Name of the machine where the local transaction originated
Database username who originated the distributed transaction

If the transaction is committed this column represents the global commit number


DBA_2PC_NEIGHBORS  Displays information on incoming and outgoing connections for pending transactions within an Oracle distributed transaction. Relevant columns:

Local identifier of the transaction
Connection Type:
  • IN for incoming
  • OUT for outgoing
  • For incoming connections, this is the client db global_name
  • For outgoing connections, this value represents the database link
  • For incoming connections, the Oracle Username
  • For outgoing connections, the owner of the database link
Used to locate the Global Commit Point site:
  • For incoming links, C indicates that this site or one of the descendants on an outgoing link is the commit point site.
  • For outgoing links, C indicates that the destination database DBID is the commit point site.
  • If we are in-doubt, INTERFACE is N and then the top-level database either is the commit point site or can locate the commit point site.
The global name of the remote database
Local session number for the connection at this database. Sessions are numbered consequently, starting by 1
Transaction branch ID of the connection at this database. Branch IDs for incoming connections are two byte hexadecimal numbers; the first byte is the remote parent's session ID, and the second byte is its branch ID.


V$DBLINK Lists all open database links in your session, that is, all database links with the IN_TRANSACTION column set to YES.
GV$DBLINK Lists all open database links in your session along with their corresponding instances. This view is useful in an Oracle Real Application Clusters configuration.

It might be useful to determine which database link connections are currently open in your own session. Note that if you connect as SYSDBA, you cannot query a view to determine all the links open for all sessions; you can only access the link information in the session within which you are working.


V$GLOBAL_TRANSACTION provides information on the currently active distributed transactions.

Testing Recovery of Failed Distributed Transactions

Oracle provides a means to force failure of distributed transactions so that you can test and validate the distributed transaction recovery procedures according to the point of failure.
Issuing a COMMIT with a comment such as ORA-2PC-CRASH-TEST-n, -where n is a number from 1 to 10-.
You can test a variety of scenarios, according to the value of n, as shown below,
1   Crash commit point after collect
2   Crash non-commit-point site after collect
3   Crash before prepare (non-commit-point site)
4   Crash after prepare (non-commit-point site)
5   Crash commit point site before commit
6   Crash commit point site after commit
7   Crash non-commit-point site before commit
8   Crash non-commit-point site after commit
9   Crash commit point site before forget
10 Crash non-commit-point site before forget
For example, the following statement returns the following messages if the local commit point strength is greater than the remote commit point strength and both nodes are updated:

ORA-02054: transaction 1.93.29 in-doubt
ORA-02059: ORA_CRASH_TEST_7 in commit comment

At this point, the in-doubt distributed transaction appears in the DBA_2PC_PENDING view. If enabled, RECO automatically resolves the transaction.

Refer to Note 126069.1 Manually Resolving In-Doubt Transactions: Different Scenarios, for detailed test results regarding each of the situations above.

Failures in Distributed Transactions

Error messages related to distributed transactions fail in the range ORA-02040 - ORA-02099, refer to Oracle11g Database Error Messages for details on those.

Often, abnormal conditions that occur during 2PC are caused by either a network or a server failure whilst the transaction is in the prepare and commit phases.

Some errors that you might see in the alert.log when working in environments that use distributed transactions are shown below:

ORA-02053: transaction <txnId> committed, some remote DBs may be in-doubt
The transaction has been locally committed, however we have lost communication with one or more local coordinators.

ORA-02054: transaction <txnId> in-doubt
The transaction is neither committed or rolled back locally, and we have lost communication with the global coordinator.

ORA-02050: transaction <txnId> rolled back, some remote DBs may be in-doubt
Indicates that a communication error ocurred during the two-phase commit

ORA-01591: lock held by in-doubt distributed transaction <txnId>
Encountering the above error and users/applications unable to proceed with their work. In this case, Oracle automatically rolls back the user attempted transaction and the DBA has now to manually commit or rollback the in-doubt transaction.

NOTE: Reads are blocked because, until the transaction is resolved, Oracle does not assume which version of the data to display for a query user.

- Distributed transactions time out waiting to acquire locks or hold locks themselves for an excessive amount of time. If a distributed transaction cannot obtain a required lock after

DISTRIBUTED_LOCK_TIMEOUT seconds then the following error is returned:
ORA-02049 timeout: distributed transaction waiting for lock

Increasing the value of the distributed_lock_timeout/retry the issued SQL, are typical approaches when this error is encountered. If it becomes a recurrent problem and certain transactions appear to be hanging for ever or causing contention in the database, then further investigation is required, and there is a need determine what other transaction is holding the lock and what type of problem exists.

To further analyze the above, refer to Note 789517.1 ORA-02049: timeout: distributed transaction waiting for lock' Error: How to Obtain a System State Trace BEFORE the Error Occurs, While Still Experiencing the Contention

- Additionally , there are some other rare circumstances in which further action needs to be taken by the DBA in order to resolve the in-doubt transaction as there is underlying metadata corruption. Please refer to the following note for a discussion on how to investigate and overcome those: Note 401302.1 How To Resolve Stranded DBA_2PC_PENDING Entries

Steps to Troubleshoot an In-Doubt Transaction

See the example below of how to troubleshoot a sample in-doubt transaction:


1-  Problem reported:

Users querying a local table in database DB102C encounter the following errors and notify the DBA :
ORA-01591: lock held by in-doubt distributed transaction 10.24.340
In the alert.log :
DISTRIB TRAN DB102D.UK.ORACLE.COM.ea27958e.9.46.338
  is local tran 10.24.340 (hex=0a.18.154)
  insert pending prepared tran, scn=5017243 (hex=0.004c8e9b)
You note down the Local Transaction ID,10.24.340, and query DBA_2PC_PENDING:


select * from dba_2pc_pending where local_tran_id='10.24.340';
LOCAL_TRAN_ID                 : 10.24.340
GLOBAL_TRAN_ID                : DB102D.UK.ORACLE.COM.ea27958e.9.46.338
STATE                         : prepared
MIXED                         : no
ADVICE                        :
TRAN_COMMENT                  : 
FAIL_TIME                     : 02-nov-2010 02:06:02
FORCE_TIME                    :
RETRY_TIME                    : 02-nov-2010 02:06:02
OS_USER                       : NT AUTHORITY\SYSTEM
OS_TERMINAL                   : USER33-lap
HOST                          : USER33-lap
DB_USER                       : TESTUSER
COMMIT#                       : 5017243

From the above output the following can be determined:
  • This node is not the Global Coordinator as the local_tran_id is different than the last portion of the Global_Tran_ID; ie the distributed transaction did not originate at this node.
  • GLOBAL_TRANSACTION_ID represents the common transaction ID that will be the same on the every node for a distributed transaction. It is of the form global_database_name.hhhhhhhh.local_transaction_id where global_database_name is the database name of the global coordinator and hhhhhhhh represents in hexadecimal the internal database identifier of the global coordinator.
  • The transaction in this node is in PREPARED State
  • Look at the ADVICE and TRAN_COMMENT columns for information about this transaction, if any of those columns are populated, they could help you decide whether the local portion of the transaction should be rolled back or committed. (COMMIT COMMENT... or SET TRANSACTION... NAME populates tran_comment column)


We climb the session tree so that we find coordinators, until we eventually reach the global coordinator. Along the way, you might find a coordinator that has resolved the transaction. If not, you can eventually work your way to the commit point site, which will have always have resolved the in-doubt transaction.
To trace the session tree, query DBA_2PC_NEIGHBORS on each node
- On the local node:
select * from dba_2pc_neighbors;
LOCAL_TRAN_ID                 : 10.24.340
IN_OUT                        : in
DATABASE                      : DB102D.UK.ORACLE.COM
DBUSER_OWNER                  : TESTUSER
INTERFACE                     : N
DBID                          : ea27958e
SESS#                         : 1
BRANCH                        : 09002E00520100000104

  • As discussed, DBA_2PC_NEIGHBORS provides information about connections associated with an in-doubt transaction. Information for each connection is different, based on whether the connection is inbound (IN_OUT = in) or outbound (IN_OUT = out).
  • In our example we see that IN_OUT = in, indicating that our db (db102c) is a server of the DB102D client as specified in the database column.
  • DBUSER_OWNER shows the connecting user to our db102c database
  • INTERFACE = N indicates that neither DB102C nor any of its dependent sites is the commit point site.
- On the remote node referenced in the transaction:
select * from dba_2pc_neighbors;
LOCAL_TRAN_ID                 : 9.46.338
IN_OUT                        : in
DATABASE                      :
DBUSER_OWNER                  : TESTUSER
INTERFACE                     : N
DBID                          :
SESS#                         : 1
BRANCH                        : 0000
LOCAL_TRAN_ID                 : 9.46.338
IN_OUT                        : out
DATABASE                      : DB102C.UK.ORACLE.COM
DBUSER_OWNER                  : TESTUSER
INTERFACE                     : N
DBID                          : bae6b90f
SESS#                         : 1
BRANCH                        : 4

For the OUT connection :
  • column DBUSER_OWNER shows the owner of the dblink
  • column DATABASE shows the database link name that has accessed the remote server (db102c).
The local txn ID matches the end portion of the global_transaction_id indicating that this site is the global coordinator, where the transaction originated.

This site is the commit point site and has already committed the local portion of the transaction as seen by the state column:

Conclusion: This node already committed the local transaction and afterwards it crashed leaving the GC still waiting for the 'commit' response from  the commit point site.

4 - Resolve the in-doubt transaction

Given that the commit point site has already committed this transaction, then manually commit it at the DB102C site; to do so the following syntax can be used:
COMMIT FORCE 'DB102D.UK.ORACLE.COM.ea27958e.9.46.338','5017245';


COMMIT FORCE '10.24.340','5017245';

The COMMIT FORCE clause lets you manually commit an in-doubt distributed transaction. The transaction is identified by the 'string' containing its local or global transaction ID. You can use integer to specifically assign the transaction a system change number (SCN). If you omit integer, then the transaction is committed using the current SCN.
In order to ensure global integrity, we use the commit# of the commit point site (highest global commit#)

By specifying the SCN for the transaction when forcing a transaction to commit, the in-doubt transaction is committed with the SCN assigned when it was committed at other nodes.

In this way, we can maintain the synchronized commit time of the distributed transaction even if there is a failure.
We only specify an SCN only when we can determine the SCN of the same transaction already committed at another node.

In cases where the in-doubt transaction is to be ROLLBACK you would use syntax ROLLBACK FORCE <local_txn_ID> to set the state to forced rollback.


Purging the Views

DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY is used to manually purge the details from the views once the in-doubt transaction has been resolved:


In the case of a mixed transaction, where portions of the database have already committed and other portions have been rolled back forced, use

         execute DBMS_TRANSACTION.PURGE_MIXED ('txnID');

Refer to DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY  for further details  regarding purging Pending Rows from the Data Dictionary

ORA-30019: Illegal rollback Segment operation in Automatic Undo mode, use the
following workaround:

-- check the current value of  _smu_debug_mode (default 0):
SQL> show parameter debug   -- if default 0, it will show no entry
-- set it temporarily to 4:
SQL> alter system set "_smu_debug_mode" = 4;  -- in 9.2x alter session can be used instead.
SQL> commit;

SQL> commit;

SQL> alter system set "_smu_debug_mode" = <original value>;

SQL> commit;

SCN Recovery Steps


Complete Recovery:

1. At the down site recover completely if possible (treat as regular recovery).
2. SCN will appear in the alert.log after recover is done on the crashed Node.


Incomplete Recovery:

1. If time-based or cancel-based recovery was used on the crashed node, other sites must be placed back to the same point in time for global consistency. Get last SCN from the alert.log of the crashed node.

2. At each node perform a shutdown normal or immediate.

3. Take a cold backup.

4. Restore the control file if necessary.

5. Restore the last backup of all datafiles along with archived redo logs.

6. Choose which tool to use to perform SCN recovery - either SQLPLUS or RMAN.

7. Connect as sysdba, startup mount, check status from v$datafile to make sure all datafiles are online. Issue 'Alter database datafile '?/?/?' online;', for each datafile with status offline, to bring it online.

8. Issue the following command using the latest SCN from alert.log on the Node that had to be recovered:
NOTE: If for some reason (e.g.when issuing commit force command) automatic recovery (RECO process) needs to be disabled, we can disable it and reenable via alter system as mentioned before:
Use this command to wake RECO up after that:

Who to contact for more information?

For further diagnosis and help troubleshooting distributed transactions, contact Oracle Support or login to Streams and Distributed Database Community to share your experiences with fellow peers.



NOTE:1012842.102 - ORA-2019 ORA-2058 ORA-2068 ORA-2050: Failed Distributed Transactions
NOTE:126069.1 - Manually Resolving In-Doubt Transactions: Different Scenarios
NOTE:13229.1 - Distributed Database, Transactions and Two Phase Commit
NOTE:159377.1 - How to Purge a Distributed Transaction from a Database
NOTE:207303.1 - Client / Server / Interoperability Support Between Different Oracle Versions
NOTE:401302.1 - How To Resolve Stranded DBA_2PC_PENDING Entries
Oracle9i Database Administrator's Guide - Part VI: Distributed Database Management
Oracle10g Database Administrator's Guide - Part VII: Distributed Database Management
Oracle11g Database Administrator's GuidePart V: Distributed Database Management

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.