4BSD TCP Ethernet Throughput


Van Jacobson (van@helios.ee.lbl.gov)
Mon, 24 Oct 88 13:33:13 PDT


Many people have asked for the Ethernet throughput data I
showed at Interop so it's probably easier to post it:

These are some throughput results for an experimental version of
the 4BSD (Berkeley Unix) network code running on a couple of
different MC68020-based systems: Sun 3/60s (20MHz 68020 with AMD
LANCE Ethernet chip) and Sun 3/280s (25MHz 68020 with Intel
82586 Ethernet chip) [note again the tests were done with Sun
hardware but not Sun software -- I'm running 4.?BSD, not Sun
OS]. There are lots and lots of interesting things in the data
but the one thing that seems to have attracted people's
attention is the big difference in performance between the two
Ethernet chips.

The test measured task-to-task data throughput over a TCP
connection from a source (e.g., chargen) to a sink (e.g.,
discard). The tests were done between 2am and 6am on a fairly
quiet Ethernet (~100Kb/s average background traffic). The
packets were all maximum size (1538 bytes on the wire or 1460
bytes of user data per packet). The free parameters for the
tests were the sender and receiver socket buffer sizes (which
control the amount of 'pipelining' possible between the sender,
wire and receiver). Each buffer size was independently varied
from 1 to 17 packets in 1 packet steps. Four tests were done at
each of the 289 combinations. Each test transferred 8MB of data
then recorded the total time for the transfer and the send and
receive socket buffer sizes (8MB was chosen so that the worst
case error due to the system clock resolution was ~.1% -- 10ms
in 10sec). The 1,156 tests per machine pair were done in random
order to prevent any effects from fixed patterns of resource
allocation.

In general, the maximum throughput was observed when the sender
buffer equaled the receiver buffer (the reason why is complicated
but has to do with collisions). The following table gives the
task-to-task data throughput (in KBytes/sec) and throughput on
the wire (in MBits/sec) for (a) a 3/60 sending to a 3/60 and
(b) a 3/280 sending to a 3/60.

        _________________________________________________
        | 3/60 to 3/60 | 3/280 to 3/60 |
        | (LANCE to LANCE) | (Intel to LANCE) |
        | socket | |
        | buffer task to | task to |
        | size task wire | task wire |
        |(packets) (KB/s) (Mb/s) | (KB/s) (Mb/s) |
        | 1 384 3.4 | 337 3.0 |
        | 2 606 5.4 | 575 5.1 |
        | 3 690 6.1 | 595 5.3 |
        | 4 784 6.9 | 709 6.3 |
        | 5 866 7.7 | 712 6.3 |
        | 6 904 8.0 | 708 6.3 |
        | 7 946 8.4 | 710 6.3 |
        | 8 954 8.4 | 718 6.4 |
        | 9 974 8.6 | 715 6.3 |
        | 10 983 8.7 | 712 6.3 |
        | 11 995 8.8 | 714 6.3 |
        | 12 1001 8.9 | 715 6.3 |
        |_____________________________|__________________|

The theoretical maximum data throughput, after you take into
account all the protocol overheads, is 1,104 KB/s (this
task-to-task data rate would put 10Mb/s on the wire). You can
see that the 3/60s get 91% of the the theoretical max. The
3/280, although a much faster processor (the CPU performance is
really dominated by the speed of the memory system, not the
processor clock rate, and the memory system in the 3/280 is
almost twice the speed of the 3/60), gets only 65% of
theoretical max.

The low throughput of the 3/280 seems to be entirely due to the
Intel Ethernet chip: at around 6Mb/s, it saturates. (I put the
board on an extender and watched the bus handshake lines on the
82586 to see if the chip or the Sun interface logic was pooping
out. It was the chip -- it just stopped asking for data. (The
CPU was loafing along with at least 35% idle time during all
these tests so it wasn't the limit).

[Just so you don't get confused: Stuff above was measurements.
 Stuff below includes opinions and interpretation and should
 be viewed with appropriate suspicion.]

If you graph the above, you'll see a large notch in the Intel
data at 3 packets. This is probably a clue to why it's dying:
TCP delivers one ack for every two data packets. At a buffer
size of three packets, the collision rate increases dramatically
since the sender's third packet will collide with the receiver's
ack for the previous two packets (for buffer sizes of 1 and 2,
there are effectively no collisions). My suspicion is that the
Intel is taking a long time to recover from collisions (remember
that you're 64 bytes into the packet when you find out you've
collided so the chip bus logic has to back up 64 bytes -- Intel
spent their silicon making the chip "programmable", I doubt they
invested as much as AMD in the bus interface). This may or may
not be what's going on: life is too short to spend debugging
Intel parts so I really don't care to investigate further.

The one annoyance in all this is that Sun puts the fast Ethernet
chip (the AMD LANCE) in their slow machines (3/50s and 3/60s)
and the slow Ethernet chip (Intel 82586) in their fast machines
(3/180s, 3/280s and Sun-4s, i.e., all their file servers).
[I've had to put delay loops in the Ethernet driver on the 3/50s
and 3/60s to slow them down enough for the 3/280 server to keep
up.] Sun's not to blame for anything here: It costs a lot
to design a new Ethernet interface; they had a design for the
3/180 board set (which was the basis of all the other VME
machines--the [34]/280 and [34]/110); and no market pressure to
change it. If they hadn't ventured out in a new direction with
the 3/[56]0 -- the LANCE -- I probably would have thought
700KB/s was great Ethernet throughput (at least until I saw
Dave Boggs' DEC-Titan/Seeq-chip throughput data).

But I think Sun is overdue in offering a high-performance VME
Ethernet interface. That may change though -- VME controllers
like the Interphase 4207 Eagle are starting to appear which
should either put pressure on Sun and/or offer a high
performance 3rd party alternative (I haven't actually tried an
Eagle yet but from the documentation it looks like they did a
lot of things right). I'd sure like to take the delay loops out
of my LANCE driver...

 - Van

ps: I have data for Intel-to-Intel and LANCE-to-Intel as well as
    the Intel-to-LANCE I listed above. Using an Intel chip on the
    receiver, the results are MUCH worse -- 420KB/s max. I chose
    the data that put the 82586 in its very best light.

    I also have scope pictures taken at the transceivers during all
    these tests. I'm sure there'll be a chorus of "so-and-so violates
    the Ethernet spec" but that's a lie -- NONE OF THESE CHIPS OR
    SYSTEMS VIOLATED THE ETHERNET SPEC IN ANY WAY, SHAPE OR FORM.
    I looked very carefully for violations and have the pictures to
    prove there were none.

    Finally, all of the above is Copyright (c) 1988 by Van Jacobson.
    If you want to reproduce any part of it in print, you damn well
    better ask me first -- I'm getting tired of being misquoted in
    trade rags.



This archive was generated by hypermail 2.0b3 on Thu Mar 09 2000 - 14:43:56 GMT