Re: Response to anti-bridge comments


Drew Daniel Perkins (ddp#@andrew.cmu.edu)
Tue, 31 Mar 87 01:07:28 est


>At last count, the routing tables on the Translans showed something like
600 active hosts.

I'm not sure this is quite a large enough net for my statements to apply.
It's probably the case that your's is a network where level 2 bridges are
appropriate for your current needs. I wouldn't agree that they will be
appropriate for future needs though.

>We generally see 1-2 per second, which is expected and entirely acceptable.

The arp rate a particular net sees is probably going to be quite dependent on
the type of applications being used on it. Over 60% of the IP traffic on our
network is from the Andrew distributed file system (something like NFS in
functionality). Each workstation on the network regularly communicates with
quite a few different servers and other hosts. I would suspect that the ARP
rate from distributed applications like this is much higher from networks
with your basic telnets, ftp's and smtps.

>If you see 20 per second, then you've got something wrong somewhere.

>I've found that bursts of ARP requests are usually caused by hosts who

>respond inappropriately to UDP broadcasts to bogus IP addresses.

Luckily I think we have the bad UDP broadcast problem under control. That
definitely is not what causes the majority of our ARP's. There's nothing
really wrong (that we've been able to find) besides the problems with 4.2
UNIX itself. Actually most of our arps seem to be from hosts that
relentlessly try to open connections to machines which are down. It
especially get's bad when an important server goes down. When 500 hosts are
trying to talk to a file server that has been down for a while... Luckily
the file system knows about exponential backoff.

One of the problems we found a while ago is that the default size of an arp
cache in 4.2 is not at all appropriate for a server machine which
communicates with LOTS of machines. The default is for a cache of 5x19
entries, i.e. there are 19 possible hash values and only 5 hosts can hash to
each one. In the worst case, if you are trying to communicate sequentially
with 6 hosts which all hash to the same value, you can end up sending one ARP
req packet per IP packet/transaction. For our file servers which regularly
communicate with 300 hosts in a 10 minute period this is definitely not
appropriate... I think we made the cache 10x99 instead.

The crux of the problem though is that with level 2 biridges, your arp rate
(or any kind of multicast/broadcast) rises LINEARLY with the number of hosts
connected on ANY subnet of the network. However using level 3 routers, the
rate only rises linearly with the number of hosts connected to your
particular subnet.

Drew



This archive was generated by hypermail 2.0b3 on Thu Mar 09 2000 - 14:37:46 GMT