Submission about TCP/IP under VMS


Pierre L. LAFORGUE (mcvax!imag!pierre@seismo.CSS.GOV)
Wed, 5 Feb 86 19:27:45 -0200


BAD MSG:
(or ...!seismo!mcvax!vmucnam!imag!pierre)

Received: from DCN8.ARPA by SRI-NIC.ARPA with TCP; Wed 5 Feb 86 12:50:34-PST
Date: 05-Feb-86 20:50:22-UT
From: mills@dcn6.arpa
Subject: Conversations with an Ethernet watcher
To: tcp-ip@sri-nic.arpa

Folks,

Further to my recent note and the previous notes from Mark and Hans-Werver. I
watched packets wade through our DCnet Etherswamp and found alligators still
munching. Briefly, that swamp includes gateways to ARPANET (56K bps), UMICHnet
(9.6K bps) and FORDnet (9.6K bps), as well as a raft of other swamp creatures.
Thus, I can see all packets flying between DCnet gateways and in some cases
the subnet gateways on FORDnet and UMICHnet. The hosts lurking on the subnets
include 3COM, 4.3bsd, Wollongong, Sun, fuzzball and even more bizarre
creatures scattered all over the country. The subnets are connected mostly by
multiple 9.6K bps lines and fuzzball gateways, which run a dynamic routing
algorithm that functions both at the net and subnet level.

As you might expect, we take moderate to severe congestion hits when things
break or when hosts on any of the nets misbehave, which seems fairly often
these days. Mark and Hans-Werner report only the tip of the swampberg, to
phrase a coin. Following is a quick summary of my Etherwatch, captured in the
interval between curiousity and eyestrain and intended not so much as a
specific problem report as a generic speculation on what might be happening in
other ponds.

1. Congestive collapse. When things get really bad the fuzzball routing
   algorithm will occasionally declare a line down, which can activate a
   secondary path, but only after a period of hold-down and possibly
   non-reciprocal connectivity (we call these "one-wire feeds" after a term
   used in the radio/TV broadcast community). When this happens transient
   black holes and ICMP error messages can originate at the strangest places.

2. Black holes. Not all subnet gateways subscribe to the fuzzball routing
   algorithm; in particular, some FORDnet subnet gateways. The fuzzballs
   thus cannot determine reachability and do not generate ICMP error messages.
   Unfortunately, this situation now holds (since Mark's complaint) for all
   FORDnet subnets except 128.5.0. Mark will henceforth get no error messages
   at all when Ford Aerospace gateways or lines west of Dearborn are blitzed.

3. Disregarding error reports. I see what appears to be almost universal
   disregard for ICMP error messages. Certainly Unix and TOPS-20 users
   remain blissfully ignorant of these things, which many gateways, including
   the core gateways and fuzzballs, take some pain to get right. Thus, the
   casual user has no hard evidence to beat up the system or net maintainers.

4. Mismatched routing dynamics. EGP dynamics are very slow compared with most
   IGP dynamics, including the fuzzballs. Thus, nets on the business side of
   our EGP gateway, for example, can flap up and down with the effect that EGP
   advertisements may disagree with the actual routes delivered. Our own
   warranty has a clause relieving liability during transient periods up to
   several minutes.

5. Hidden gateways. There is a lot of subnet plumbing in our waterworks, some
   of it rather leaky (in the best and most common tradition). Thus, ICMP
   error messages can originate at a subnet gateway or even a host without
   implying problems further up the pipe. Unless the recipient of such an
   error message is aware of the subnet configuration, it might mistakenly
   assume the primary (net) gateway is broken.

6. Protocol problems. Certain hosts tapping our plumbing make rather good
   random-noise/congestion generators (not our DCnet hosts, of course). I
   watched Wollongong 128.5.0.9 generate between 1.5 and 5 ACKs for every
   TCP data segment just now. I also saw 4.3bsd(?) 128.5.33.1 and 10.2.0.78
   get into ACK-ACK fights. Host 10.2.0.78 apparently has a bunch (3) of hung
   TCP connections sending something every few seconds but ignoring what
   appear to be valid reset segments from 128.5.33.1. The 128.5.32.1 and
   128.5.33.1 hosts advertise, probably unwisely, windows of 2048 and 4096
   octets respectively, which is much larger than necessary (two to four
   seconds at 9.6K bps) and almost guarantees gateway congestion. Almost every
   initial-connection attempt involves at least two and up to five
   retransmissions at intervals much less than the estimated roundtrip delay.

7. Fairness abusers. Why do some hosts (10.2.0.78, among others) open multiple
   parallel SMTP connections to the same host? This might represent a
   misguided attempt to "optimize" the delivery delay, but certainly makes the
   congestion problems that much worse. There are two such connections right
   now between 10.2.0.78 and 128.5.32.1 and three between 10.2.0.78 and
   128.5.33.1. No wonder our poor fuzzthings get eaten by the alligators.

8. Tinygrams and jumbograms. Occasionally I see connections across the net
   with probably unwise selection of maximum segment size MSS, with a
   particularily uncomfortable choice of 1024. This guarantees fragmentation
   at least once somewhere on the path and usually twice, as well as clogs
   reassembly resources in congested weather. Smaller values less than the
   ARPANET maximum (906-odd) are much more appropriate. In fact, gains in
   efficiency on many paths with MSS greater than 576 are lost due to
   congestion, itself due to lower buffer-space utilization. At the other end
   of the spectrum, vast spasms of tinygrams (usually character-at-a-time
   TELNET) continue to flood the swamps, in spite of well-documented fixes
   for this.

Dave
-------



This archive was generated by hypermail 2.0b3 on Thu Mar 09 2000 - 14:35:39 GMT