Re: Maintaining Statistics for TCP/IP Implementations


John B. Nagle (jbn@glacier.stanford.edu)
Fri, 19 Dec 86 21:23:40 pst


     Much of what I learned about congestion in the Internet I learned by
instrumenting a TCP implementation. The information that you need is
not necessarily the information that a typical implementation keeps.
Yet as it turns out, collecting this information is quite inexpensive.
Management of the exceptional cases is the crucial issue.

     During the life of a TCP connection, it is useful to maintain some
event counts, and at the conclusion of the connection, it is useful to
generate a log entry of some form, at least for connections that meet
some criteria.

     When a packet is received, there are several possibilities as to
its disposition. The most useful (not, unfortunately, always the most
common) case is that it contains new and acceptable data, an ACK that
acknowledges previously unacknowledged data, or a window update that
advances the window. This case must of course be handled efficiently.
Packets which change the state of the connection are also useful, but
efficiency is less of an issue. But packets which do none of these
things are redundant; they represent an error somewhere in the system.
It is immensely useful to count the useful packets over the life of a
connection. My criterion was that if less than 95% of the packets
received over the life of a connection were useful, (allowing for at
least 5 non-useful packets on short sessions to handle startup issues),
then a log entry should be generated to indicate trouble.

     Reading such a log is an edifying experience. The most notable fact
about such a log is that certain machines are represented all out of
proportion to the amount of traffic they generate. One of course logs
the identities of the hosts involved in the connections. A log entry here
corresponds to "dropping a trouble ticket" in a telephone central office;
it indicates something to be fixed. Enough said.

     One also wants to keep a tally of retransmission attempts; again, if
the number of retransmitted packets is large over the life of the connection,
something is wrong and this should be noted. Of course, if a connection
closes abnormally, one logs that fact for later analysis.

     It is also useful to log rejected packets. Find all those places
in your TCP where you decide to drop a packet because it is "bad", and
make them calls to a routine that logs the packet with an error code.
One turns up all sorts of dirty laundry that way.

     The number of ICMP Source Quenches received is also quite useful;
again, large values compared to the volume of data traffic are significant.

     When I operated a VAX with such logging two years ago, there would
be five or six connections logged as bad when the network was operating
properly; there might be hundreds when something was wrong. That's how
I managed to make a large network based on slow links work properly.

     It is worth thinking about how one might report such data in a standard
way to a network monitor node. Something that generated one datagram per
"bad" TCP connection might be quite useful; some would of course get lost
but serialization would allow the network monitor to detect this, and
statistical techniques could be used to compensate for the lost data.
You do need to log a measure of the total data transmitted in each
direction on the connection, and log entries should also contain cumulative
information about the total amount of data and total number of connections
so that statistical computations can be made.

     One needs this information to manage a network. With it, one can
manage your network, and make it perform well. Without it, one can just
grumble and make excuses.

                                John Nagle



This archive was generated by hypermail 2.0b3 on Thu Mar 09 2000 - 14:37:14 GMT