Congestion in the Arpanet


Dennis G. Perry (PERRY@VAX.DARPA.MIL)
Wed 1 Oct 86 12:41:40-EDT


There has been quite a bit of conjecture on what is happening
in the Internet and what are the reasons for the performance that
people are seeing. I have been trying to understand these issues
myself and asked BBN to provide me with some information. Attached
is some of that information. I hope it raises other questions and
answers a few.

dennis
                ---------------

----- Forwarded message # 1:

Received: from cc5.bbn.com by .J.BBN.COM id a021878; 30 Sep 86 23:35 EDT
To: jburke@cc5.bbn.com, mlevandowski@cc5.bbn.com, rgrenier@cc5.bbn.com,
    sblumenthal@cc5.bbn.com, mckenzie@cc5.bbn.com, pogran@cc5.bbn.com
To: prishivalko@cc5.bbn.com, dperry@vax.darpa.mil
cc: mayersoh@cc5.bbn.com, jwiggins@cc5.bbn.com, cgreenleaf@cc5.bbn.com,
    mprimak@cc5.bbn.com, fserr@cc5.bbn.com, scohn@cc5.bbn.com,
    hinden@cc5.bbn.com
Subject: arpanet congestion
Date: 30 Sep 86 23:16:58 EDT (Tue)
From: Jeff Mayersohn <mayersoh@cc5.bbn.com>

     For the last month, a large number of PSNs in the Arpanet have
been reporting symptoms of congestion to the network monitoring
center. These reports, or "traps," have been accompanied by an
increasing number of user complaints. In order to deal with the
problem of network congestion, we have been pursuing a number of
avenues at BBNCC. This note summarizes the current state of our
investigations and makes a number of specific recommendations.

     First, a little background. The Arpanet topology is largely
unchanged since the physical split of the Arpanet into the Arpanet and
Milnet in 1984. The topology of the post-physical-split Arpanet was
actually designed from data which was collected before the earlier
logical split of the two networks. In the past year, the network has
shown a significant increase in traffic. A five-day average of
network traffic showed an internode traffic rate of 140 Kbps in June
of 1985 and an internode traffic rate of 230 Kbps in March of 1986.
(The traffic growth had, in fact, leveled off over the summer of 1986
but we suspect that traffic has grown even more since the start of the
academic year.) The network has recently been redesigned to
accommodate NSF hosts, but these new resources have not yet been added
to the network.

     Marianne Gardner has observed some very interesting trends in the
statistics that we have collected recently. First, a very small
percentage of host pairs account for a very large percentage of the
network traffic. More than 80% of network traffic is contributed by
600 host pairs (out of 2596 communicating pairs). Some 60% of the
traffic is contributed by 100 pairs. Second, gateway traffic
dominates network traffic. 86% of Arpanet traffic has a gateway
as either the source of destination. 52% of network traffic is
between gateways.

     Our immediate focus over the last few weeks has been to
concentrate on topological modelling in order to recommend a small
number of changes which would bring network resource usage to
acceptable levels. This modelling was based upon the peak hour
traffic in late June, the last month during which a global network
statistics collection was performed. The measured June traffic was
increased by 50%. This number was based upon the recent growth in
network traffic and the ratio of the peak hour traffic to the peak
minute traffic. The assumption is probably conservative, which is
good.

     The modelling work was done by Peg Primak, whose report is
contained in the following. As of June, 1986, the Arpanet contained
47 nodes and 63 links. Two of these nodes have since been retired
(SAC2 and USC) but were retained in the current model with all USC
traffic re-routed to node 121. Our routing model shows single hour
maximum link utilization of 75% (on UWisc-Roch) and maximum node
utilization of 69%. Even with the UWisc-Purdue link restored, the
maximum link utilization is still 72% and the maximum node utilization
is 69%. (The Wisconsin to Purdue link was temporarily removed from
the network a while ago.)

     To alleviate the worst of these problems, we considered adding a
link from MIT77 to SRI51. The addition of this link reduces maximum
link utilization to 58% (on the new link), with only two other links
having utilizations over 50% (53% and 51%). Node utilization remains
unchanged. The network diameter is reduced from 10 to 9 by the
addition of this link. As these results show, a link between MIT77
and SRI51 would substantially improve Arpanet performance, and would
become one of the most heavily utilized links in the network.

     Node utilization is quite heavy on several nodes. Normal
utilization over seven minute intervals seems to be between 30% and
60% for all of the following nodes: ISI27, UCLA, RCC5, and UWISC.
With the MIT-SRI link added, SRI51 will join this group. Measurement
data show that each of these nodes experiences times of very heavy
utilization (15 minute averages of 60% to 70%, 7 minute averages of
87%). Based on the June data, either nodes should be added at these
sites or the five nodes at these sites should be upgraded to C/300s.

     We assume that the addition of trunk bandwidth will take a while.
There are a number of other actions which we would like to take.
First, TAC 113 should be installed immediately in the Arpanet. This
provides for two changes that should reduce congestion. First, the
release bundles more characters into single packets, thereby reducing
the number of bits and packets required to send a given unit of Telnet
data. TAC 113 also modifies the TCP retransmission timers. We
probably get the wrong kind of feedback when the network slows down.
If data is delayed due to network congestion, we suspect that this
gives rise to TCP retransmissions which exacerbate the original
problem.

     Bob Hinden of our gateway group tells me that, in the next two
weeks, we will conduct an experiment which will make the Wideband
Network look more favorable to the internet routing in the Butterfly
gateways. This will cause some gateway-to-gateway traffic to move
from the Arpanet to the Wideband Network.

     We have observed that, when network links get heavily saturated,
the network routing algorithm becomes a bit too dynamic, trying to
find excess capacity which does not exist. The effect of the
resulting oscillations in network routing sometimes works to the
detriment of network performance. There is a simple fix to this,
i.e., we can easily make all three cross-country paths look equivalent
to the routing algorithm. This results in the proper sort of
load-sharing.

     There is still the possibility that we are running short of
end-to-end resources. We are currently measuring the utilization of
these resources to see whether this is the case. If we are short of
these resources, there may be easy remedies to this in the PSN
software.

     Our efforts over the last few weeks have concentrated on the
modelling work. We have not had the opportunity to accumulate or
study global network statistics collection in order to understand what
has changed in the last month. John Wiggins and Clive Greenleaf have
begun this collection today. Simple questions which should be
answered are: 1) where has traffic increased? 2) are gateways using
the network differently? 3) are we seeing large amounts of internet
control traffic as we have in the past? 3) would the addition of
mailbridges improve the situation? 5) should homings of hosts to
mailbridges be changed?

     In summary, we should pursue the following:

1) A link from MIT to SRI51 should be added.

2) Node capacity should be added at ISI27, UWISC, RCC5, UCLA, SRI51

3) The planned addition of resources should be accelerated.

4) TAC 113 should be installed.

5) The network parameters should be adjused in order to result in more
even sharing of the cross-country bandwidth, should statistics confirm
that routing oscillations are occurring.

6) The Wideband Network experiment should be conducted as soon as
possible.

7) Additional statistics should be collected in order to shed light on
the underlying causes of the congestion. We will let results be known
as soon as we have them.

8) The Purdue to Wisconsin link should be restored asap.

----- End of forwarded messages

-------
-------
-------



This archive was generated by hypermail 2.0b3 on Thu Mar 09 2000 - 14:36:58 GMT