SUN 3.4 problems


Joanne Mannarino (pyramid!prls!philabs!jmr@lll-lcc.arpa)
26 Aug 87 15:07:07 GMT


In trying to upgrade our SUN 3/180 fileserver (named condor) to SUN UNIX
version 3.4 along with 11 diskless clients, I ran into some problems. The
upgrade procedure on condor went fine. I reconfigured the kernel for 3.4,
rebooted condor and still no problem. Then I tried booting up all of the
diskless clients (one at a time) and then the headaches began.

The booting process began with "requesting internet address" with the host
responding with the correct information (thus there is communication via our
Ethernet). The problem began when the booting process got to the point for:

        starting rpc and net services: portmap router biod

The boot process then halts with the following error messages:

        server not responding
        RPC: program not registered
        mount retrying
                /usr
                /usr/condor

This will remain at this point until you either manually abort or power down
the unit. At this point, any active workstation on the network (ie, SUNs
either connected to our other fileserver (which still runs 3.2) or
diskful SUNs sitting on the net) displays a screenful of "ie0: no carrier"
and "Ethernet jammed" error messages.

I contacted SUN support immediately and after running tests to see if all of
the daemons that should be running were running, the conclusion was made by
SUN that the problem is somewhere within our Ethernet structure. SUN said
that 3.4 includes major changes in the Ethernet drivers that don't correct
for possible problems in the network. At this point SUN support referred me
to someone in their Data Communications support department. After running
some net stats and sending them the data, I was told "your network looks ok".
BUT still we are having problems.

We've tried some different things to see if we could isolate the problem
(actually this was done before the fileserver upgrade, but we wrote it off as
being a network problem isolated to a particular laboratory). We tried
running a diskful 3/160 as a server for a diskless 3/160 both running 3.4 and
we ran across the same problems. It was suggested that we take both units
off of our main net and hook them up directly to their own mini net. When
this was done, the problem went away, ie, the client came up running 3.4.

We have also tried changing the /etc/fstab on a client and "backgrounding"
the mount process. This results in the client coming up in single user
mode. Then after trying to manually mount a filesystem, I get the above
errors of "server not responding" and "mount retrying".

As an interim solution, we have kept the 3.4 enhancements (I didn't back out
of the upgrade) and are running a 3.2 kernel. Everything seems fine, but this
still doesn't solve our problem.

Some SUN reps claim that the problem is definitely with our network, others
say it's in the 3.4 software. Anyone else experienced these symptoms when
upgrading to 3.4? Any suggestions on what we should do now?

thanks in advance,
Joanne Mannarino

--
joanne mannarino				   seismo!philabs!jmr
philips	laboratories					   or
(914)945-6008					 jmr@philabs.philips.com



This archive was generated by hypermail 2.0b3 on Thu Mar 09 2000 - 14:39:14 GMT