Re: NSFNET woe: causes and consequences


Ken Pogran (pogran@ccq.bbn.com)
Sun, 4 Oct 87 13:10:59 EDT


Dave,

The message you sent to the tcp-ip list the other day regarding
the NSFNET woes you observed caused us here at BBN to put on our
thinking caps. We worked to understand how what you saw relates
to what we know about what's happening in the ARPANET these days.
I think we already understood what was behind a good bit of what
you observed, and your message gave us the impetus to investigate
a few more things as well. This message describes the situation
as we understand it.

There are four separate underlying issues:

1. The number of "reachable networks" in the Internet has just
    nudged upwards of 300 for the first time. (The Internet used
    to be growing at a rate of about 10 networks/month; that rate
    has accelerated over the past few months.)

2. For the week ending Thursday, 1 October, the ARPANET handled
    a record 202 million packets. (Traffic over the past few
    months has been in the 180s -- itself a record over last
    spring.)

3. We've begun the "beta test" on the ARPANET of the new PSN
    software release, PSN 7.0, and -- sure enough -- there have
    been a few problems.

And, finally,

4. The limit, that you described in your message, of 64 virtual
    circuits in the ACC 5250 X.25 driver that is used by several
    X.25-connected gateways on the ARPARNET

The first two issues just demonstrate that things continue to get
busier and busier in the ARPANET and in the Internet. We've put
out a new version of LSI-11 "core gateway" software that allows
for 400, rather than 300, reachable gateways to give the core
some breathing room again. And I shudder to think what ARPANET
(and, hence, Internet) performance would be like if we tried to
handle over 200 million packets per week without the so-called
"Routing Patch" that was installed late in the summer that
considerably improved the performance of the ARPANET routing
algorithm.

I think the third issue, the beginning of the PSN 7.0 beta test
on the ARPANET, contributed to some of what you saw and helped to
obscure some of the other causes of what you observed. As you
know, last weekend, we put PSN 7 into a portion of the ARPANET.
CMU was one of the nodes that got PSN 7.

PSN 7 contains a new "End-to-End" protocol for management of the
flow of data between source PSNs and destination PSNs. It's the
first re-do of the End-to-End protocol in the ARPANET EVER.
We're expecting a lot of improvement in efficiency within the PSN
and, hence, some network performance improvement.

To make a graceful, phased cutover to the New End-to-End
feasible, PSN 7.0 contains code for both the new and the old
End-to-End protocols. So as we've introduced PSN 7.0, it's been
with the OLD end-to-end protocol. Now unfortunately, having code
for two End-to-End protocols coresident takes up memory space
that would normally go to buffers, etc. for handling traffic.
So, yes -- during the 3-4 week phased cutover, the ARPANET PSN's
will be a little short on buffer space; there's not much that can
be done about that. But once ALL nodes are cut over to the New
End-to-End protocol, we will install PSN 7.1, which will remove
the old End-to-End, reclaim that memory space, and -- in the case
of the ARPANET nodes in which C/300 processors have replaced the
C/30s -- be able to use DOUBLE the main memory.

Back to the problem at hand: You mentioned the report of
"resource shortage"s in the PSNs. This happened with the CMU PSN
for reasons we still don't understand. However, this WASN'T "the
usual BBN euphemism for ... connection blocks which manage
ARPANET virtual circuits" that you suggested in your message --
we've usually got plenty of those these days. The resource
shortage the CMU PSN reported to the NOC had to do with the PSN's
X.25 interface. Since several higher-priority problems showed up
with PSN 7, we decided the best thing to do was to return the CMU
node to PSN 6 and work on this one later. We have some
preliminary ideas of what might have happened, and we'll be
investigating this week.

As for delays in the ARPANET: It turns out that the version of
PSN 7.0 that was deployed last weekend contained a bug in the
"Routing Patch" that worsened, instead of improved, the
performance of the routing algorithm. We are frankly embarassed
about that. This problem was fixed Thursday night, 1 October --
about the time you sent your message. We'd be very interested in
hearing from you how things looked from the NSFNet side THIS
weekend.

>From your description it certainly sounds like the 64 VC limit in
the ACC 5250 is the proximate cause of the problem at CMU last
weekend. We now count 83 gateways attached to the ARPANET. A
gateway on the ARPANET that's handling a lot of diverse traffic
to other gateways as well as to other ARPANET hosts is very
likely to need more than 64 VCs.

We think we can provide a work-around for this problem over the
short term. The PSN has a "idle timer" for each VC, and can
initiate a Close of the VC if it hasn't been used for awhile. We
can configure that timer to be pretty short and thus recyle the
gateway's VCs. Of course, some overhead will be incurred to
re-establish a VC to send the next IP datagram to that
destination, but that's probably preferable to having things plug
up for lack of VCs. Note that by having the PSN reclaim idle
VCs, we shouldn't see much "loss of data" that you alluded to in
your message. We would be happy to work with administrators at
sites that have gateways with ACC 5250s who would like to try
this out.

In closing, let me say that we at BBN share your concerns about
the issues to be faced as the ARPANET evolves toward a
gateway-to-gateway service from its traditional host-to-host or
host-to-gateway service. The way gateways are attached to the
network is one of a number of urgent architectural and
engineering issues that must be addressed.

Regards,
 Ken Pogran
 Manager, System Architecture
 BBN Communications Corporation

P.S. TO THE COMMUNITY: As the PSN 7.0 upgrade proceeds in the
ARPANET, we'll probably encounter a few more problems. As
described in the DDN Management Bulletin distributed earlier,
please send reports of problems to please send reports of problems to ARPAUPGRADE@BBN.COM. BBN will
respond.



This archive was generated by hypermail 2.0b3 on Thu Mar 09 2000 - 14:39:34 GMT