In my last post on MicroTx transaction protocols, I explained how the Oracle Transaction Manager for Microservices (MicroTx) can help applications adopt distributed transactions using Sagas with an eventual consistency protocol defined by the Eclipse MicroProfile Long Running Actions. In this post I’ll focus on how microservice based applications can use the Try-Confirm/Cancel (TCC) protocol to also implement Sagas.
The Try-Confirm/Cancel protocol is a simple protocol that has an initiator ask other microservices to reserve resources or place them in escrow. Once the initiator and all participants have acquired the required reservations, the initiator then asks a transaction coordinator to confirm all those reservations. Should the initiator decide it doesn’t want or can’t use the reservations made, it asks a transaction coordinator to cancel all the reservations. What constitutes a reservation is completely up to the application, however the application is expected to do certain things.
Reservations are made using an HTTP POST request that is expected to return a URI representing the reservation. The MicroTx client libraries ensure this URI is propagated up the call stack by placing the information in MicroTx specific headers. The URIs are expected to respond to HTTP PUT requests to confirm the reservation, and HTTP DELETE requests to cancel the reservation.
The basic flow is as follows:
During the try phase, all accepted reservations are collected in headers by the MicroTx library. This includes reservations made indirectly by microservices called by the initiator. By the time the initiator (in the case above, Microservice A) has completed making reservations with Microservice B and Microservice C above, the MicroTx library has collected all the reservations in headers. At this point the initiator can decide to Confirm the reservations, Cancel the reservations, or ignore the reservations letting timeouts eventually cancel the reservations.
The initiator then decides to either Confirm all the reservations or Cancel all the reservations. How a reservation is represented is completely up to the application. Let’s look a simple microservice that allows reserving and purchasing a seat for a performance. Seats would have a state which could either be AVAILABLE, RESERVED, or SOLD. The try phase would have changed the state of the seat to RESERVED from AVAILABLE. The Confirm phase would change the state from RESERVED to SOLD, assuming payment was successfully made. The Cancel phase would change the state from RESERVED to AVAILABLE. To prevent failure of the Confirm step due to inability to complete a payment, during the Try phase, the microservice should obtain payment authorization to ensure the payment can be made.
Another example might be to reserve a certain quantity of something such as items in inventory, or funds from an account. In this case during the Try phase the application might deduct the reserved quantity from the available quantity and add a record of the reservation to the database. During the Confirm phase, the reservation record would simply be deleted. During the Cancel phase, the amount in the reservation record would be added back into the total inventory and the reservation record deleted.
In many ways TCC is very similar to LRA although far simpler. The Try phase is much like the initial LRA methods that enlist themselves in the LRA. The Confirm phase is essentially the Complete phase in LRA, while the Cancel phase is much like the Compensate phase. The semantics of Try, Confirm, and Cancel are completely up to the application. LRA is much more general in that the initial LRA methods are basically free to do anything and return anything and aren’t necessarily limited to using a reservation model. As well LRA participants can start a nested LRA and get notified of the outcome of the transaction, things not possible with TCC.
TCC fits very well for microservices that need to place something in reserve using a very simple model. All that’s required of a microservice using TCC is to:
1. add a call to the MicroTx client library to indicate the URI representing the reserved resource
2. optionally, although strongly recommended, specify a timeout for the reservation
3. respond to PUT requests on the URI to confirm the reservation and DELETE requests on the URI to cancel the reservation.
One of the major advantages of TCC over LRA is that LRAs give up isolation allowing potentially dirty reads, while TCC doesn’t suffer from potential dirty reads. In effect, with TCC the state of the microservices involved is always consistent.
Another reason to use TCC is to avoid locks that are held for the entire duration of an XA distributed transaction. Lock contention can limit the throughput and increase the latency of applications using XA if attention isn’t paid to avoiding hotspots in the involved XA resources. TCC potentially allows more concurrency than XA may be able to provide as TCC only uses local transactions for the duration of the microservice.
As with virtually all distributed transaction protocols, there is the possibility of heuristic outcomes. These are outcomes where only some of the updates performed during the transaction get committed, leading to inconsistencies. How costly these inconsistencies are to an application is very application dependent. If the cost is high, you’ll want to choose a protocol and application logic that minimizes the likelihood of a heuristic outcome.
In XA, the purpose of the prepare phase is to avoid heuristic outcomes. When a resource manager in an XA transaction indicates it has successfully prepared, it is indicating it will be able to commit the transaction barring some catastrophic failure. TCC and LRA don’t have this rigid requirement, so it is up to the microservice to ensure it can successfully Confirm/Complete or Cancel/Compensate its participation when asked.
Timeouts are another source of heuristic outcomes. In TCC, typically each reservation has a time limit associated with it. When that time limit is reached, the microservice can unilaterally cancel its reservation. If that happen during the Confirm phase of a TCC transaction, that microservice will be unable to Confirm the transaction while other microservices in the transaction successfully Confirm, yielding a heuristic outcome.
As this series of blog posts illustrates, there is no one size fits all when it comes to data consistency. Which transaction protocol to use should be based upon the business requirements for consistency across microservices. For a large application, it is likely multiple transaction protocols will be needed. Each protocol has advantages and disadvantages when it comes to ease of use and implementation, level of isolation required, and performance. For more information on MicroTx and how you can leverage it to solve your data consistency issues, see the product landing page.
I'm currently the Chief Architect for a family of transaction processing products at Oracle including Oracle Tuxedo product family, Oracle Blockchain Platform, and the new Oracle Transaction Manager for Microservices. My main areas of focus are on security, privacy, confidentiality, performance, and scalability. My job is to provide the technical strategy for these products to ensure they meet customer requirements.
Prior to being acquired by Oracle, I was Chief Architect for BEA Tuxedo at BEA Systems, Inc. While at BEA Systems, I was responsible for defining the technical strategy and direction for the Tuxedo product family. I developed the Tuxedo Control for WebLogic Workshop that greatly simplified the usage of Tuxedo services from Workshop based applications. I also received two patents for methods allowing design patterns in a UML modeling tool to control the generation of software artifacts.
During my more than 40 years of software architecture and development experience, I have worked on a wide range of software systems and technology. At Science Applications International I worked on microcoded plasma display systems and command, control, and communication systems for naval applications. As a senior software consultant at Digital Equipment Corporation, I was the New York Area Regional Tools Consultant and also helped develop a multi-language multi-threaded distributed object oriented runtime environment with concurrent garbage collection.
Previous Post