Solaris TCP Window Update

Solaris TCP Window Update
When people check out the TCP source code in OpenSolaris, they may find that some pieces of the code do not follow exactly as specified in the various RFCs.  Here is an example and the reason why Solaris deviates from the RFCs.

On page 72 of RFC 793, the criteria on updating the TCP send window is specified as the following.

          If SND.UNA < SEG.ACK =< SND.NXT, the send window should be
updated. If (SND.WL1 < SEG.SEQ or (SND.WL1 = SEG.SEQ and
SND.WL2 =< SEG.ACK)), set SND.WND <- SEG.WND, set
SND.WL1 <- SEG.SEQ, and set SND.WL2 <- SEG.ACK.

Note that SND.WND is an offset from SND.UNA, that SND.WL1
records the sequence number of the last segment used to update
SND.WND, and that SND.WL2 records the acknowledgment number of
the last segment used to update SND.WND. The check here
prevents using old segments to update the window.

And on page 94 of RFC 1122, the first condition above is corrected to

          Similarly, the window should be updated if: SND.UNA =<
SEG.ACK =< SND.NXT.

In Solaris, we use a different check.  See the following piece of code in usr/src/uts/common/inet/tcp/tcp.c

swnd_update:
/\*
\* The following check is different from most other implementations.
\* For bi-directional transfer, when segments are dropped, the
\* "normal" check will not accept a window update in those
\* retransmitted segemnts. Failing to do that, TCP may send out
\* segments which are outside receiver's window. As TCP accepts
\* the ack in those retransmitted segments, if the window update in
\* the same segment is not accepted, TCP will incorrectly calculates
\* that it can send more segments. This can create a deadlock
\* with the receiver if its window becomes zero.
\*/
if (SEQ_LT(tcp->tcp_swl2, seg_ack) ||
SEQ_LT(tcp->tcp_swl1, seg_seq) ||
(tcp->tcp_swl1 == seg_seq && new_swnd > tcp->tcp_swnd)) {
/\*
\* The criteria for update is:
\*
\* 1. the segment acknowledges some data. Or
\* 2. the segment is new, i.e. it has a higher seq num. Or
\* 3. the segment is not old and the advertised window is
\* larger than the previous advertised window.
\*/

The check

SND.WL1 = SEG.SEQ and SND.WL2 =< SEG.ACK

is modified to be

SND.WL2 < SEG.ACK

Without the change of conditions, a combination of zero window and segment drop can cause a deadlock in TCP.  The reason is that according to the RFCs, TCP does not use window update in out of order segments (retransmitted segments because of drop are out of order), yet the ACK field in those segments is processed.  This can cause a sender A to send more than the other side's (B's) receive window.   This is because the ACK field moves the left edge of the window forward, but as the window update (being 0) in the same segment is not used, TCP will continue to use the old send window which is bigger.  Thus from A's perspective, the whole send window moves forward.  Those out of window segments will be dropped by B.  And once A sends beyond B's receive window, all ACKs from A to B will also be dropped by B because they are out of window (TCP uses the latest sequence number in ACK segments).  In a bi-directional transfer, this means that B will keep on retransmitting its data as those ACKs from A are not acceptable.  This connection will be hung.  Note that this is not a problem in uni-directional transfer.

If a segment (even out of order) passes the normal TCP acceptance test and the ACK field acknowledges new data, it should mean that the window update in the segment must also be used.  Window update and the ACK field are really tied together.  One cannot use the ACK field without also using the window update.  This issue was discussed in the now closed tcp-impl mailing list several years ago.  But AFAIK, there is no write up on this issue.  So there may still be implementations which have this problem handling bi-directional transfer.


Technorati Tag:
Technorati Tag:

Comments:

Thanks for the detailed explanation :-) BTW, there's a typo in the quoted part, which is not a copy/paste error while the entry was edited for the blog post. The same typo can be seen in line 14772 of tcp.c on cvs.opensolaris.org:
- 14772 	 \* retransmitted segemnts.  Failing to do that, TCP may send out
+ 14772 	 \* retransmitted segments.  Failing to do that, TCP may send out

Posted by Giorgos Keramidas on 六月 16, 2005 at 01:58 下午 HKT #

We have a java issue that seems to correspond with this description.

We run a J2Me (CDC pprof1.1) project on a Motorola Device - the JVM is IBM's J9.

The Client App on the Motorola device communicates with the Server using TCP/IP Sockets, thru which we transmit/receive XML. The socket is kept continuously open.

At apparently completely 'random' intervals the TCP/Ip protocol seems to fail (Socket connection is still active - no reset) BUT Client/Server simply stop 'talking'.

Any ideas ??? The project is completely wrecked by this behaviour.

Posted by gordon wilcox on 六月 12, 2009 at 10:26 上午 HKT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

kcpoon

Search

Categories
Archives
« 四月 2014
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today