More on TCP Timeouts in CORBA
By kcavanaugh on Oct 23, 2007
I previously blogged here about how we configure TCP timeouts in the GlassFish ORB. Scott Oaks recently discovered some cases where the default configuration needs to be changed. He also pointed out that the blog entry was missing some details about exactly HOW to set the appropriate TCP timeouts.
My previous blog entry referred to several properties that are defined in the class com.sun.corba.ee.impl.orbutil.ORBConstants. In particular:
- TRANSPORT_TCP_TIMEOUTS_PROPERTY is com.sun.corba.ee.transport.ORBTCPTimeouts
- TRANSPORT_TCP_CONNECT_TIMEOUTS_PROPERTY is com.sun.corba.ee.transport.ORBTCPConnectTimeouts
- WAIT_FOR_RESPONSE_TIMEOUT is com.sun.corba.ee.transport.ORBWaitForResponseTimeout
Any of these can be set by the appropriate -D command: e.g. -Dcom.sun.corba.ee.transport.ORBTCPTimeouts=500:30000:20.
This is particularly important when running with a very busy app server on a large machine (like a T2000). It may happen that the default 6 second timeout is exceeded while waiting for more data to be read on a large request. In this case, you may see errors logged like:
java.rmi.MarshalException: CORBA >COMM_FAILURE 1398079696 Maybe; nested
exception is: org.omg.CORBA.COMM_FAILURE: vmcid: SUN minor code: 208 completed: Maybe
java.rmi.MarshalException: CORBA MARSHAL 1398079699 >Maybe; nested exception is
org.omg.CORBA.MARSHAL: vmcid: SUN minor code: 211 completed: Maybe
Errors that have completion status maybe cannot be retried, because the client ORB cannot assume they have not already executed on the server side.
In this case, the ORBTCPTimeouts needs to be increased, say to something like 500:30000:20. This means:
- The first timeout is .5 seconds
- Each subsequent retry increases the timeout by 20%
- The maximum time we will wait is 30 seconds (actually, due to some implementation details, the maximum is closer to double the configured value, or 60 seconds in this case).