Preston Mullen (mullen@nrl-css.ARPA)
Fri, 07 Aug 87 22:04:44 EDT
Here's a new one to add to the list of things that can induce broadcast
storms and other serious problems. It involves an unpleasant interaction
between SunOS 3.3 and 3.4 and Wollongong's WIN/VX software version 3.0.
(Ironically, this problem was noticed when the Suns and VAXes were
upgraded to what is generally considered to be better networking
When a diskless Sun 3 workstation running SunOS 3.3 or 3.4 boots, at
some point it sends out a broadcast ICMP Address Mask Request. This is
in accordance with RFC950; unfortunately, an incorrect reply from any
machine on the network can be accepted by the workstation, and some
incorrect masks can induce the workstation to start sending all packets
as Ethernet broadcasts, which instantly leads to a broadcast storm.
If this happens, the workstation will probably fail to finish
booting completely, usually stopping during the NFS mount with
"RPC: not registered" messages, but sometimes sooner. Also, the
workstation may itself then generate an incorrect reply (sent as an
Ethernet broadcast) to a subsequent ICMP Address Mask Request from
some other machine, thus spreading the virus. Other symptoms that
may be observed include extremely sluggish operation of diskless
machines and "no carrier" and "Ethernet jammed" notices from Suns
(aside from those generated during broadcast storms).
This happened here on a non-subnetted Class B network when the 3.0
release of Wollongong's IP/TCP software was installed on two VAXes
running VMS. Both VAXes replied to a Sun's ICMP Address Mask Request
with network masks of 0000FFFF instead of the correct FFFF0000. The
diskless Sun gullibly swallowed this and began to encapsulate every IP
packet in an Ethernet broadcast packet. Even after we isolated our
network from the offending VAXes, any Sun in this state would respond
with the bogus netmask to a broadcast from another Sun and thus
continue the problem.
The solution was to disconnect the offending VAXes from the network,
halt ALL the diskless Suns, then reboot them. After that, we could let
the VAXes back on the network, but it was not really safe since any
diskless Sun reboot would start the cycle over again.
The people who own the VAXes that started the problem have told me
that the problem is really in the Wollongong software (i.e., not a
configuration error) and is a holdover from some "broken 4.3bsd code".
Evidently there are also problems with setting the address mask.
They report that Wollongong has sent them a fix.
==> Potential users of Wollongong 3.0 software should get a fix from
Wollongong before running 3.0 on a network where a machine might
broadcast an ICMP address mask request.
==> Sun should make their networking code smarter. There is no way a
netmask of 0000FFFF could ever be valid, since it must include the
normal network part of the address as well as the subnet part. The
Suns should have ignored the faulty replies instead of being driven
berserk by them. I've reported this to Sun. The problem doesn't
affect Suns running 3.2 or earlier releases since those releases have
no subnetting support.
It would probably be better if the diskless Sun did not broadcast the
Address Mask Request but instead sent it directly to the server.
I think that the request is broadcast after the ifconfig of the
diskless Sun's Ethernet interface (anyway, an "address mask set to
FFFF0000" report appears on the console quite a while before the
problem shows itself). If that is so, then it's not clear why an ICMP
Address Mask Request needs to be sent at all, since the network mask
can be specified in the ifconfig in the rc.boot file.
By the way, our class B network is partitioned by level 2 bridges
(Digital DEBETs). Needless to say, they passed every bad packet and
broadcast right through. (The VAXes were on the other side of a bridge
from my Suns.) Yep, I'll be moving my stuff behind an IP gateway now.
Many, many thanks to Van Jacobson for 'tcpdump', which proved
instrumental in tracking this down. I hope Sun will let Van
release the source code for tcpdump.
Computer Science and Systems Branch
Information Technology Division
Naval Research Laboratory
Washington DC 20375-5000
P.S. Why do diskless Sun workstations running SunOS 3.4 broadcast an
ARP request for IP address 0.0.60.216 very early in the boot
sequence? (The ARP packet asks that replies go to 0.0.60.216.)
This appears to be wired into the Sun networking software.
It's harmless enough, but it should not be there. Maybe
someone forgot to take out some debugging code.
This archive was generated by hypermail 2.0b3 on Thu Mar 09 2000 - 14:38:49 GMT