more (possibly too many) thoughts on systems of gateawys

Charles Hedrick (
Tue, 14 Jan 86 00:52:23 est

Since my posting yesterday, I have given a bit more thought to the
issue of keeping track of network topology. I got several responses
acknowledging that the issue was an important and difficult one, but
none proposing any real solutions. So it seemed worth putting a bit
more thought into the issue. While I haven't come up with any
startling innovations, I think I see a couple of approaches that would
work. First, let me start by enumerating the possibilities that I
have seen. We have several issues. The first is how hosts keep track
of what gateways are up. The second is how hosts keep track of
changes in gateway status. The third is how hosts know what gateways
exist. Of course these are not orthogonal.

Keeping track of what gateways are up:
  pinging - every host sends an echo request to every gateway that
        it knows about every 30 sec. or so. Most people consider
        this unacceptable because it generates too much network
        traffic. TOPS-20 does this, though with an interval of
        several minutes. I believe it must be done every 30 sec.,
        because we have to be able to discover that a gateway is
        down in time to move to another one before connections start
        timing out or users start thinking that the system is down.
  gateway broadcast - every gateway sends a broadcast every 30 sec.
        For a network that supports broadcasts, this gives as good
        results as pinging, but the number of packets is far smaller.
        PUP and XNS gatewayinfo do this. So does Unix routed. The
        only disadvantage I can see is that it only works if the
        network supports broadcasts, and that it may not be so good
        for single-process systems (e.g. IBM PC). On an IBM PC, you
        can't just have a daemon sitting there keeping track of what
        networks are up. Telnet could have to wait a minute or two
        gathering gateway information before starting to make the
  host broadcast - when a host wants to make a connection, it sends
        a broadcast asking for any gateways to a certain host to
        respond. This is effectively done now by ARP-hacking gateways.
        Since an ARP is needed anyway to initiate a connection, it
        adds no overhead. This strategy is appropriate for single-
        process systems. The only disadvantage I can think of is that
        it only works on media that support broadcast. Note that in
        a complex network, this stategy requires that the gateways
        have some other way to keep track of each other. They must
        arrange things so that only the preferred gateway will respond
        to an ARP.

Keeping track of changes. These techniques would normally be combined
with those above.
  timeouts - when a connection times out, one has a good suspicion
        that some part of the current route is down. What to do
        about it depends upon which of the above strategies one is
        using. If you are using pinging or gateway broadcast,
        strictly speaking you don't need to do anything about timeouts.
        4.3 uses timeouts because 4.3 establishes a route when a
        connection is opened. Even if routed has figured out that
        the gateway involved is down, the connection will still try
        to use it. A timeout triggers the system to reexamine the
        route, using its latest gateway information. On TOPS-20,
        this is not needed, since the route is recomputed for each
        packet sent. If you are depending upon a host broadcast
        (e.g. ARP), a timeout should cause the current route (in this
        case ARP table entry) to be removed, so that the host sends
        another broadcast to look for a new route. Note that timeouts
        do not totally solve the problem of detecting down gateways,
        if we have traffic to some gateway (or for ARP-based schemes,
        host) that is not connection-oriented. That is, UDP-based
        protocols may not have a concept of timeout, or may find it
        hard to feed back information about timeouts to lower levels
        of the system.
  ICMP redirect - depending upon the design, the system may not know
        when a better route has become available. Again, TOPS-20
        always will, because it recomputes routes each time, and
        continually pings all gateways. But 4.2 will not change
        routes during a connection. And a system that depends upon
        the ARP hack probably doesn't have enough information to do
        so either. So one can arrange for gateways to keep track of
        each other, and to issue an ICMP redirect if a better route
        becomes available. Note that this does not necessarily
        require the host to keep track of gateway information. If
        all of the gateways do the ARP hack, a host can process an
        ICMP redirect simply by removing the ARP table entry for
        the destination host involved.
  ARP table expiration - Unix expires entries in the ARP table after
        N minutes of non-use. This is primarily intended to keep
        down the number of entries in the ARP table. However in
        theory this could be used to keep routing up to date. If
        we expired entries even when they are in use, it would
        force a new ARP request. This would (we hope) come back
        with the latest routing, taking into account any gateways
        that have come up or gone down. The problem with doing
        this is that it would increase the number of ARP requests.
        If we only use it to discover better routes, we could afford
        to do it fairly infrequently, say once every 30 min. If we
        depend upon it to discover gateways that are down, we probably
        have to do it every 30 sec. This is likely to cause results
        that are about as bad as pinging. It would also interfere
        with performance, since our experience shows that waiting
        for an ARP causes a noticable pause in telnet. Doing this
        once every 30 min is not likely to cause a significant
        load. Suppose we have 256 hosts on a subnet, each talking
        to 4 other hosts at a time (this is probably a gross
        overestimate for any real network). That is 1000 ARP
        requests in 1800 sec. This is a packet rate of around
        1 per second. That should be tolerable. However the
        requests are probably not going to be random. There may be
        a tendency for them to cluster, due to the fact that all of
        the systems will have been rebooted at the same time (the
        last power failure).

Knowledge of network topology.
  builtin tables - this is fairly common, but with a large network
        it becomes a pain to update all the tables.
  gateway broadcasts - the gateway broadcast strategy mentioned
        above also solves this problem, since it allows the host
        to discover what gateways exist simply by monitoring
  host broadcasts - the host broadcast strategy mentioned above
        also solves this problem, since the host no longer has to
        know the network topology. When it needs to make a connection
        it broadcasts a request and the gateways have to figure out
        who should respond. To use changes in topology, this should be
        combined with ICMP redirects when a better route becomes
  try a random gateway - TOPS-20 keeps a table with a small number
        of "prime" gateways. When it wants to make a connection,
        and none of the currently known gateways is right for the
        job, it chooses a random prime gateway. This gateway is
        expected to know about all of the others, and to issue an
        ICMP redirect to the right one. However this only works
        if one knows which of the prime gateways are up. TOPS-20
        uses pinging. Any other solution to the problem of knowing
        which gateway is up will also solve the problem of knowing
        what gateways there are, so this strategy is probably not
        terribly useful.

Some choices are clear:
  - we probably don't want pinging to be the primary method of
        keeping track of the network.
  - ARPs are probably the only reasonable way for single-process
        machines to find out about the network, since they can't
        be expected to have daemons that keep track of topology.
        This implies that all gateways should be expected to
        support the "ARP hack", even when subnetting is general

Now the question is whether we also want the gateways to broadcast,
a la routed. My initial reaction is that if we can come up with
a mechanism based on ARPs that will solve all of our problems, there
is no need to run routed or its equivalent on each host. So first,
let's look at a design based on ARPs, and no protocol like routed.
  - connections are initially established by issuing an ARP
        request. The gateways arrange to answer these in such a
        way as to give an optimal route.
  - when a connection times out, the ARP table entry for the host
        involved is removed. This forces a new ARP for the next
        packet to be sent.
  - when a better route becomes available, it would be helpful
        if the gateway currently being used issues an ICMP
        redirect. Because of the timing out of ARP table
        entries, this is not completely necessary.
  - if non-connection-oriented protocols are being used (so that
        timeouts are not possible), or if it is not practical
        for gateways to issue ICMP redirects when a better route
        becomes available, ARP table entries must expire after
        30 minutes.

This mechanism is obviously not sufficient for hosts with more than
one Ethernet interface, since they have no way to choose which
interface to use. ARP's don't help, since in general some other
gateway will probably be able to find a route to any host on any
subnet, so there will be responses to ARP requests on both interfaces.
However a host with more than one interface is effectively a gateway.
It should participate in whatever protocol is used among the gateways,
probably EGP or routed.

There are several reasons why one might prefer some other mechanism
for hosts that are capable of running daemons:
  - if UDP-based protocols are in heavy use, it may be impractical to
        detect down gateways by depending upon timeouts. For Suns,
        the Network File System is critical, and that uses UDP.
        While NFS does have a concept of timeout, our experience
        shows that timeouts may indicate a number of conditions
        other than routing failures. It is not clear whether it
        would be appropriate to clear ARP table entries when there
        is an NFS timeout.
  - one may believe that it is not practical to implement ICMP
        redirect in the gateways when a better route becomes
        available, and that the overhead of expiring ARP entries
        is unacceptable.

If one decides that another scheme is needed other than having the
host broadcast requests, it seems clear that the best alternative is
to have the gateway broadcast the fact that it is up. In that case,
routed seems to make a lot of sense. It is widely implemented, and
seems to do what needs to be done. In a Unix implementation, one also
needs a way to force routes to be recomputed when there is a change in
the gateway table. The method in 4.3 seems to depend upon timeouts.
I suspect it might be better to have an IOCTL that routed could do to
invalidate routes (either all routes whenever a topology change
happens, or some slightly more selective method).

Unfortunately, the problem I have to solve is not just picking the
combination of strategies that I like the best. I also have to be
able to live with existing TCP/IP implementations. Currently Rutgerse
is using 4.2 (Sun, Pyramid, Celerity, Ultrix), TOPS-20, DG, Symbolics,
Bridge, ... We only have source to some of these, and even where we
do have source, it may not be desirable to do major network development
work. If we are unable to change the host implementation, then
the advice to a gateway designer is pretty much the obvious:

1) Do the best one can for hosts that will depend upon ARP's to
discover routing. This means trying to coordinate gateways so
that only the best one responds.

2) Enough systems use code based on 4.2, and routed is a reasonable
enough way of doing things, that it probably makes sense for the
gateways to implement routed.

3) One should probably try to get gateways to issue ICMP redirects
whenever appropriate. However it is not clear which existing
implementations this is going to help. Certainly it would help
TOPS-20. Existing systems that use ARP are pretending that all all of
the hosts are directly connected, so an ICMP redirect is going to be
irrelevant to them. For Unix systems, ICMP redirect doesn't add much
to what routed already provides (and indeed may even confuse it, if
routed thinks it is managing the gateway tables). Circumstances where
ICMP redirects could be generated are when a packet is sent to a
gateway that knows it is not the best route. Len Bosack at Stanford
suggests that gateways should have a command that says we are about to
shut them down. In that case, they can start issuing ICMP redirects
to an alternate. (However one has to be careful to avoid loops. If
the alternate doesn't know you are shutting down, and it is a less
prefered route, it may issue a redirect right back to you.)

This archive was generated by hypermail 2.0b3 on Thu Mar 09 2000 - 14:35:39 GMT