Re: Re: RING vs. ETHER - Theory and practice.


John B. Nagle (jbn@glacier.stanford.edu)
Wed, 30 Jul 86 15:25:24 pdt


      1. If you are losing packets due to having too few
          receiving buffers in your Ethernet controller,
          get a modern Ethernet controller. The worst known
          offender is the old 3COM Multibus Ethernet controller
          used in early SUN systems; not only does it have only
          two receiving buffers, it has no overrun detection, and
          thus the software never tallies the many packets it tends
          to lose.

      2. If you are losing packets due to congestion problems in a
          TCP-based system, this can be fixed; see my various RFCs
          on the subject. "Improving" the protocol by adding extra
          acknowledgements or fancier retransmission schemes is
          NOT the answer. I've developed some workable solutions
          that are documented in RFCs and implemented in 4.3BSD.

      3. The real need for link-level acknowledges, or at least
          some indication of non-delivery that works most of the
          time, is for routing around faults. Ethernets transmit
          happily into black holes; when the destination dies,
          the source never knows.
          When the destination Ethernet node is a gateway,
          and said gateway goes down, there is no low-level way for
          the sending Ethernet node to notice this and divert to an
          alternate gateway. This is a serious problem in hi-rel
          systems, because we have no standard way for a host on
          a multi-gateway Ethernet to behave which will cause it
          to divert from one gateway to another when one gateway
          fails. There are a number of approaches to this
          problem, all of them lousy:

          - Ignore it and put up with at least minutes and perhaps
            indefinite downtime when a supposedly redundant gateway fails.
            (Considered unacceptable in military systems)
          - Shorten the ARP timeout to 10 seconds or so and spend
            excessive resources sending ARPs.
            (Tends to cause one retransmit every 10 seconds due
            to non-clever ARP implementations).
          - Let the hosts participate in some kind of nonstandard
            routing protocol so they can tell when a gateway dies.
            (No good for off-the-shelf hosts).
          - Let the transport layer inform the datagram layer when
            a retransmit occurs, so that the datagram layer can trigger
            the selection of a different gateway; if this causes
            selection of an up but ill-chosen gateway, a redirect
            from that gateway corrects the situation. (Some code
            to do this is in 4.2BSD, but it wasn't fully implemented.)

          It's all so much easier if you have link-level failure-to
          deliver indications.

                                        John Nagle



This archive was generated by hypermail 2.0b3 on Thu Mar 09 2000 - 14:36:34 GMT