John B. Nagle (email@example.com)
Wed, 30 Jul 86 15:25:24 pdt
1. If you are losing packets due to having too few
receiving buffers in your Ethernet controller,
get a modern Ethernet controller. The worst known
offender is the old 3COM Multibus Ethernet controller
used in early SUN systems; not only does it have only
two receiving buffers, it has no overrun detection, and
thus the software never tallies the many packets it tends
2. If you are losing packets due to congestion problems in a
TCP-based system, this can be fixed; see my various RFCs
on the subject. "Improving" the protocol by adding extra
acknowledgements or fancier retransmission schemes is
NOT the answer. I've developed some workable solutions
that are documented in RFCs and implemented in 4.3BSD.
3. The real need for link-level acknowledges, or at least
some indication of non-delivery that works most of the
time, is for routing around faults. Ethernets transmit
happily into black holes; when the destination dies,
the source never knows.
When the destination Ethernet node is a gateway,
and said gateway goes down, there is no low-level way for
the sending Ethernet node to notice this and divert to an
alternate gateway. This is a serious problem in hi-rel
systems, because we have no standard way for a host on
a multi-gateway Ethernet to behave which will cause it
to divert from one gateway to another when one gateway
fails. There are a number of approaches to this
problem, all of them lousy:
- Ignore it and put up with at least minutes and perhaps
indefinite downtime when a supposedly redundant gateway fails.
(Considered unacceptable in military systems)
- Shorten the ARP timeout to 10 seconds or so and spend
excessive resources sending ARPs.
(Tends to cause one retransmit every 10 seconds due
to non-clever ARP implementations).
- Let the hosts participate in some kind of nonstandard
routing protocol so they can tell when a gateway dies.
(No good for off-the-shelf hosts).
- Let the transport layer inform the datagram layer when
a retransmit occurs, so that the datagram layer can trigger
the selection of a different gateway; if this causes
selection of an up but ill-chosen gateway, a redirect
from that gateway corrects the situation. (Some code
to do this is in 4.2BSD, but it wasn't fully implemented.)
It's all so much easier if you have link-level failure-to
This archive was generated by hypermail 2.0b3 on Thu Mar 09 2000 - 14:36:34 GMT