Re: Mail Bridge Performance

Van Jacobson (van@lbl-csam.ARPA)
07 Mar 86 06:10:21 PST (Fri)

With all due respect gentlemen, the problem I was trying to describe is
with the current implementation of EGP, not GGP. I'm aware that GGP is
brain-damaged. Replacing GGP involves replacing at least the 7 LSI-11
mail bridges and probably all 38 core gateways. While this is clearly
necessary and should be done as quickly as possible, it's going to take
a while. I proposed a relatively simple, quick "patch" that might
improve things in the interim. (i.e., I wasn't going to complain about
GGP until EGP got fixed). I can prove this patch would improve our
transit delay a factor of ten. West coast sites similar to us (e.g.,
UCB) should see similar improvement. Dave posted some traffic data
last May that showed a 25%+ East/West imbalance through the mail
bridges. Things might improve nationally by whatever portion of this
was due to the current, lousy, EGP routes on the West coast.

Dave, I don't understand the statement that "EGP has nothing to do
with the routing". Say I'm trying to get from a Vax on the lbl-ether
to a Vax on the ucb-ether, e.g.,
   rtsg --> lbl --> mil??? --?> ucbvax --> monet
The first milnet hop (lbl to MIL-whatever) is determined by EGP,
subsequent hops up to ucbvax are determined by GGP. Lbl is a pure
gateway and doesn't get icmp redirects so the route advertised by our
EGP peer is all that determines the first hop. If our EGP peer says
"use MILBBN", even the most wonderous GGP-replacement won't prevent
packets making two completely unnecessary trips across the country.

I must admit I've never been fond of EGP (the current implementation,
that is, I've got nothing against the protocol). About 60% of all our
Internet traffic and 90% of our "interactive" traffic is to "local"
UCB, Stanford or LLNL hosts. Because the traffic is well localized,
I've been making sporadic delay and throughput measurements to those
hosts since the '83 NCP/TCP switchover. Generally, the measurements
show a slow, roughly linear degradation up to Oct, '84 (with a factor
of two step due to the Arpa/Milnet split in late '83). With the EGP
switchover in late '84, things suddenly degraded by a factor of ten.
Since then, the data has been so "noisy" that it's difficult to
analyze. [There was a clear milestone in early '86 though when delays
went to infinity (the EGP space wars).]

I'll finish this epic with one measurement I didn't put in the last
message. You can estimate the damage that GGP is doing by using the
best first hop gateway and comparing the transit times to multi-homed
hosts. E.g., I measure
    lbl-csam --> MILLBL --?> ucb-arpa
using the local net addresses for csam & arpa and the milnet/arpanet
addresses. Any difference in the two measurements should be due to
GGP. The median time for the local net case is 500ms and for the
milnet/arpanet case it's 200ms so GGP hurts by a factor of ~2.5. The
ratio stays about 2.5 if I try su-score or sri-iu and/or MILSRI instead
of ucb-arpa/MILLBL. Compared to the 5 second times and factor of
20 that result from a bad EGP route, this is down in the mud.

 - Van

This archive was generated by hypermail 2.0b3 on Thu Mar 09 2000 - 14:36:04 GMT