Response to anti-bridge comments


Phil R. Karn (karn@flash.bellcore.com)
Thu, 26 Mar 87 15:20:26 est


As one who has just helped construct a "large" bridged network, I think a
few comments based on actual experience might be useful.

First, a description. Bellcore has five major locations in north central New
Jersey. We lease T1 circuits organized as a star with the hub at
Piscataway, the geographically central location. These circuits are divided
down with synchronous multiplexors into various sized channels for things
like Micom terminal switch trunks and IBM RJE links. At the moment, 256
kbps of this capacity connects a set of five Vitalink Translan IIIs as a
star with the hub at Piscataway. Each of these boxes also connects to the
building-wide backbone Ethernet at its location, thus bridging the locations
together at the link level. Within each location almost all of our Ethernets
are bridged with DEC LanBridge 100s, with the fiber version interconnecting
multiple buildings at a location. At last count, the routing tables on the
Translans showed something like 600 active hosts. Virtually all of these
hosts speak DoD IP; most are 4.2BSD derivatives. A few Symbolics machines
speak Chaosnet. As far as I'm aware we have no DECNET or XNS traffic, other
than that spoken by the Translans and LanBridges themselves.

And it all works, and works well! I hardly ever look at the boxes anymore.
We had one infant mortality after we installed them in late December: a
power supply died after 24 hours in operation. Vitalink immediately shipped
out a replacement which arrived a day later, and the boxes have all been
solid since. Two other outages were due to people kicking out Ethernet
cables, but this is a generic Ethernet problem and isn't Vitalink's fault (I
do hope they'll come out with screw-on connectors, though). With our switch
to bridging, the reliability and availability of intra-company networking
has improved enormously over what it was when we used general purpose UNIX
machines (VAXes and Sun fileservers) as IP gateways. True, it's a bit
unfair to compare standalone boxes with general purpose systems with disks
that must also do other things. But there were enough RIP/routed screwups
that once I seriously considered running everything with static routing.
Even now our remaining IP routers get screwed up occasionally and they have
to be restarted. But at least when this happens it doesn't affect our
intra-company communications, which are most important. And nobody has to
renumber their hosts when they move from one physical cable to another,
which is an ENORMOUS practical advantage in a place as big as this one.

All this is not to say we haven't had our problems. I do monitor the ARP
broadcast traffic from time to time. We generally see 1-2 per second, which
is expected and entirely acceptable. If you see 20 per second, then you've
got something wrong somewhere. I've found that bursts of ARP requests are
usually caused by hosts who respond inappropriately to UDP broadcasts to
bogus IP addresses. The trigger is generally a Microvax, since Ultrix 1.2
allows you to set the net mask and the broadcast address, thereby allowing
you to get it wrong. (I just can't wait until Sun also supports subnetting
and broadcast address selection). Although the problem clearly gets worse
as you build larger bridged networks, YOU CAN'T BLAME IT ON BRIDGING!!! If
there weren't so many broken TCP/IP implementations out there the problem
wouldn't exist in the first place. Nevertheless, my usual tactic has been
to place an entry in the appropriate Translan to isolate the offending host
until the user can fix it; this "twit listing" feature is very helpful.

You discover other interesting things when you build a large bridged
network. For example:

1. It seems that every CCI Power 6 machine as shipped comes with the same
Ethernet address. We didn't notice until we started bridging networks
together, but you can't exactly blame it on the use of bridging.

2. Some older 4.2 machines seem to respond inappropriately with wrong
answers to RARPs from Sun workstations, keeping them from booting.

3. We made an aggressive effort to turn off rwho daemons, bringing UDP
broadcasting to an acceptable level. (Many people find this necessary even
when bridges aren't used). With fewer IP gateways, the amount of RIP traffic
has stayed fairly modest.

4. Pyramids seem to respond to every ARP request they hear, regardless of
whether they were the target or not. Fortunately they respond with correct
(but irrelevant) information, so this is just a minor annoyance.

You can just as easily have antisocial machines with these problems on the
same physical cable; the solution is to FIX them, not throw up your hands
and say "bridges are terrible" because they force you to confront the
software vendors.

Overall, our experience with bridging has been quite positive. There are
some valid arguments against large-scale bridging, but they have to do more
with vulnerability to spoofing than with any inherent technical weaknesses
in a "friendly" environment such as ours. Even in a heterogeneous
environment, though, Vitalink boxes are useful as simple, fast packet
switches because they can be configured to filter out broadcasts and to use
static routing tables. I understand that NASA Ames uses them in this way.

I'm a big believer in TCP/IP. IP does the job of interconnecting dissimilar
networks so well that some people forget that there are easier ways to
connect networks of the same type. The Internet has grown so large that the
job needs to be broken down hierarchically into more manageable pieces; you
can't (and shouldn't try to) do EVERYTHING with IP gateways.

Phil



This archive was generated by hypermail 2.0b3 on Thu Mar 09 2000 - 14:37:45 GMT