WHY are those mailbridges dropping so many packets????


enger@bluto.scc.com
23 Apr 88 22:26:00 EST


Folks:

I must disagree with some of the remarks made about the recent mailbridge and
EGP core server upgrades. I think the efforts of the IETF and the
Adopt-A-Gateway donors have made a noticable improvement.

I can recall a time when there were delays on the order of tens of seconds
just to get an interactive character echo when crossing from Arpanet to
Milnet. In some cases these were EGP/GGP extra hop delays, in others it would
seem to be entirely attributable to the mailbridges. I haven't encountered
anything like that lately, have you?

Recent "ping" testing from MY host to the Arpanet side of the mailbridges
doesn't reveal any dramatic delays or loss rates. I have run ping sessions
that have sent out 500 packets without seeing one dropped, and obtained
sub-second response in all cases. I'll admit sub-second response is not that
much to be proud of, but it beats the old 30 second delays.

It has been observed that the mailbridge gateways' packet drop rates are high.
Someone conjectured that the cause might STILL be insufficient CPU cycles.
The consistently "low" delays seen from my host would seem to indicate that
the units are NOT that short on CPU horsepower. Another person suggested that
the mailbridges may be I/O bound due to all the overhead-type traffic. Being
I/O bound should lead to queuing (delays), and if excessive, packet loss. I
don't seem to be seeing large ammounts of this when MY host pings the net-10
side of the mailbridges. So why are the official loss figures so high?

In reviewing the graphs of the mailbridge long term packet drop rates, some
persons have indicated that it looked like the drop rate INCREASED after the
CPU upgrades! One explanation for this is as follows: the faster CPU now
allows the mailbridge to ACCEPT much more traffic from the subnets; but it is
still somehow limited in the rate at which it can SEND traffic into the
subnets. When the packets cannot be sent, they are dropped.

Why can't the packets be sent? Supposedly one of the major reasons is
complying with the 8-in-flight rule (RFNM blocking). Once a mailbridge has
taken a packet off of an input queue and "routed" it, it supposedly has no
place to keep the packet if it cannot immediately be placed in the output
queue. To make matters worse, a packet is considered "in-flight" when it is
placed into the output queue, NOT when it has finally made its way through the
output queue and the 1822 interface, and into the PSN. Thus, large numbers of
packets are being dropped even when there may be space in memory in which they
could be held, if either the 8-in-flight rule, or the design of the mailbridge
software were changed to allow it.

I am told that PSN 7 frees the Arpanet from the 8-in-flight limitation;
blocking the host at the link layer when it has exhausted its quota of PSN
buffering. The mailbridges, however, use revision controlled software which
cannot be tinkered with to remove the RFNM counting. If the counting were
removed, the mailbridge could better use what memory it has, as well as more
of its quota of the buffering available in the PSN. I am told the Arpanet EGP
servers have already been freed from the RFNM chains. Ah, but what about the
Milnet side? It still runs PSN 6 software. What can be done on gateways'
interfaces to the Milnet?

Since the mailbridges are dropping lots of packets due to their "rfnm
blocking", one might ask why are they being blocked? What's holding up those
acknowledgements?

Regardless of how rapidly a mailbridge can place packets into a subnet, in the
long term a mailbridge can't successfully unload traffic at a rate faster than
the next "IP entity" can accept it. That rate will be affected by the entity's
horsepower and I/O limitations.

Many of us run down the mailbridges for being antiquated junk. However,
consider what the mailbridge may be sending to: a busy EGP core server (due to
GGP extra hop traffic), or some other busy or underpowered gateway or host,
such as a VAX 750 or worse (possibly with *USERS* to further degrade things).
Even a good "IP entity" though, can't accept data faster than its interface
rate (~56Kbps at most, the same as the mailbridges). If the interface is
receiving traffic from another source besides the mailbridge (pretty likely
for EGP servers and gateways), then the mailbridge can't send to it at full
speed.

For whatever reason, if the mailbridge can't deliver a traffic flow at the
same rate as it is receiving it, it will eventually have to drop some of the
packets of that flow. Unless the traffic source is somehow flow controlled,
the source will continue to send at a rate faster than the rate at which the
mailbridge can unload it. Given sufficient time (to fill up available
queuing), this would seem to mandate dropping at the mailbridge.

To obtain lower drop rates (which will conserve the bandwidth of the source
subnet) we must exert some form of flow control on the sender. Since the
mailbridge software is revision controlled (and destined for the Smithsonian
anyway) it isn't likely that it will be enhanced. So implementation of Dr.
Mills' sophisticated queue management/source quench systems will probably have
to wait for the next generation of mailbridges. What can be done in the mean
time?

It would seem that a good way to stamp out excess packet loss (and wastefull
retransmissions on the source subnet) is to stamp out old style TCP. Maybe we
should have a campaign to promote the use of Jacobson/Karrels-TCP?

Bob Enger



This archive was generated by hypermail 2.0b3 on Thu Mar 09 2000 - 14:41:56 GMT