4.3BSD TCP fixes


jbn@FORD-WDL1
13-Jan-86 14:22:13-PST


      In response to popular demand, I am sending out two fixes to 4.3BSD
(beta release). Fix #1 affects interoperability with non-4.xBSD systems,
apparently including TOPS-20 machines. Fix #2 reduces network congestion
on long-haul nets. (Yes, yet another of Nagle's continuing attempts to
get network congestion under control.) The effect of #2 is substantial;
in some situations, an order of magnitude improvement in file transfer
speeds will be observed.
      With these in, 4.3BSD TCP behaves quite well. In 4.3, all the right
machinery is there, but there are a few easily-fixed bugs.
      These fixes are going out via several routes (net.bugs.4bsd, the
Berkeley buglist, and to some key individuals) because they have a marked
effect on interoperability and Internet performance.

                                John Nagle
===============================================================================
Index: sys/netinet/tcp_input.c 4.3BSD-beta Fix

Description:
        TCP connections to some non-BSD systems open, but will not
        accept data from the remote system.

        The "advertised window", tcp_adv, was not initialized during
        connection synchronization. Also, one comparison on sequence
        numbers was made incorrectly, using a difference of unsigned
        values, which in C is always positive(!).

                                        John Nagle

Repeat-By:
        Try to establish a TCP connection with a system which sets
        the high bit in the TCP sequence number. (A 4.3BSD system
        which has been up for more than 195 days will do this, or
        you can change the initial value of tcp_iss to some value
        with the high bit set.)

Fix:
tcp_input.c
327a328,329
> * Be careful with arithmetic here; differences of sequence
> * numbers compare in unexpected ways. Hence the (int) cast.
329c331
< tp->rcv_wnd = MAX(sbspace(&so->so_rcv), tp->rcv_adv - tp->rcv_nxt);

---
>	tp->rcv_wnd = MAX(sbspace(&so->so_rcv),(int)(tp->rcv_adv-tp->rcv_nxt));
tcp_seq.h:
22a23
>  * Note that our rcv_adv variable needs to be	initialized too.
25c26
<	(tp)->rcv_nxt =	(tp)->irs + 1
---
>	(tp)->rcv_adv =	(tp)->rcv_nxt =	(tp)->irs + 1
===============================================================================
Index: ucb/netinet/tcp_timer.c 4.3BSD-beta Fix

Description: Excessive retransmissions on long-haul nets. Serious congestion in Internet gateways. File transfer speeds under 10% of expected values over 9600 baud point-to-point links. Angry network managers.

The basic machinery is right but some of the special cases are wrong, resulting in bad host behavior on slow links. Several problems combine to result in very short retransmit intervals: 1) The smoothed round-trip time is zero until the first successful round-trip without retransmission. If there is a retransmission of the first packet, the zero value is actually used to compute the round-trip time, resulting in a minumum retransmission time.

2) The standard backoff algorithm not only backs off rather slowly, but due to an incorrect calculation, the first retransmit interval is 2.0*t_srtt, but the second is only 1.0*t_srtt, and not until retransmit #4 or so does the retransmit time get back up to 2*t_srtt. The supplied "experimental" backoff algorithm backs off at rate 2**n, which reduces retransmits under overload conditions.

John Nagle

Repeat-By: Connect two 4.3BSD systems via a 9600 baud DMR link. Try a big file transfer with ftp(I). Be prepared for a long wait.

Fix: tcp_timer.c 112c112 < int tcpexprexmtbackoff = 0;

---
> int	tcpexprexmtbackoff = 1;		/* use exponential backoff if 1	*/
154a155,169
>		/*
>		 * Calculate retransmit	timer for non-first try.
>		 * Start with the same value used for the first	retransmit.
>		 * Then	use either the table tcp_backoff to scale this up
>		 * based on the	number of retransmits, or if the patchable
>		 * flag	tcpexprexmtbackoff is set, just	multiply it by
>		 * 2**number of	retransmits.
>		 * If t_srtt is	zero when we get here, we have never
>		 * had a successful round-trip and are already retransmitting,
>		 * which indicates trouble, so we apply	a larger initial guess
>		 * for the round-trip time.  This prevents serious network
>		 * overload when talking to faraway hosts, especially when
>		 * they	aren't answering.
>		*/
>		if (tp->t_srtt == 0) tp->t_srtt	= TCPTV_SRTTRTRAN;
156c171
<		    (int)tp->t_srtt, TCPTV_MIN,	TCPTV_MAX);
---
>		    (int)(tcp_beta * tp->t_srtt), TCPTV_MIN, TCPTV_MAX);
tcp_timer.h:
60a61,62
> #define TCPTV_SRTTRTRAN ( 10*PR_SLOWHZ)	/* base	roundtrip time if retran
>						   before 1st good roundtrip */
===============================================================================



This archive was generated by hypermail 2.0b3 on Thu Mar 09 2000 - 14:35:39 GMT