Summary of responses to my tcp-ip performance query


John Carter (titan!retrac@rice.edu)
16 Feb 88 17:17:00 GMT


Hello,

    Back in late November I posted the following request for information
to the network. I had intended to post this summary of the responses that
I received, but I forgot. What follows is my original posting followed by
the responses that I received. Hope people find this to be useful! Many
thanks to all of you who answered my query!!!

==============================================================================

> I'm a fairly new reader of this newsgroup, so I apologize if this has
> already been discussed. I would like to know what the best performance
> figures are for large memory to memory transfers using TCP-IP. More
> specifically, what are the fastest reported average transfer times for
> transferring 10 Mbytes over a 10 Mbit/sec ethernet? (or) What is the
> highest reported throughput of DATA across a 10 Mbit/sec ethernet using
> TCP-IP?
>
> Def.: Memory to memory above means, the client generates the data
> out of thin air and the server puts them all in one buffer
> (the "best case" situation). I'm interested in raw transfer
> rates and the cost of TCP-IP overhead on performance.
>
> I have seen performance figures for van Jacobson's modifications to
> Berkeley 4.3 TCP-IP which gave measurements of 23.3 secs for 10 MB over
> a 10 Mbit/sec ethernet (effective throughput of 3.4 Mbit/sec). Are there
> any better?
>
> John Carter
> Dept. of Computer Science, Rice University

==============================================================================

From: David Robinson <david@elroy.jpl.nasa.gov>
To: retrac@rice.edu
Subject: TCP performence

The best I have personally seen is between two Sun-3/260's doing
memory-memory transfers is 3.2Mbits/sec. Their UDP topped out
at 5Mbits/sec. This was on a fairly quite net, rwhod and routed
traffic only. From my experience excelan boards have been the
worst, slower than my lowly Sun-2, but what can be expected from
a 80186??

I do not know what the limiting factor of the Sun's is by I suspect
that it is CPU bound, the e-net controllers each have large
memmory buffers (256K?).

        -David Robinson
        david@elroy.jpl.nasa.gov

[Wishing for an IP pure hardware chip!]

---------------------------------------------------------------------

Subject: Re: What's the "best" TCP/IP throughput?
Date: 26 Oct 87 07:37:25 PST (Mon)
From: lekash@orville.nas.nasa.gov

Better performance for ethernet is not that likely until someone
builds a better interface card. Thats the current bottleneck.
If you go to other media, say pronet-80, or hyperchannel, you
can get much higher rates. we were seeing up to 17mbits/sec over
hyperchanel, proteon claims over seven for their ring. I
would guess with performance tuning, those numbers will
increase. (We might even do some here.)

                                        john
---------------------------------------------------------------

From: nowicki%rose@sun.com (Bill Nowicki)
Message-Id: <8710272303.AA03617@rose.sun.com>
To: retrac@rice.edu
Subject: Re: What's the "best" TCP/IP throughput?

Disclaimer: this is NOT an official number, just my latest test in an
uncontrolled environment with an unannounced software configuration.
But between a pair of Sun-4/260s I am able to transfer with TCP over an
Ethernet at 5.0 Mbits/second.

        -- WIN

---------------------------------------------------------------------

From: Richard Fox <rfox@ames-nas.arpa>
Organization: NASA Ames Research Center, Mountain View, CA

I have just spent some time using the FTP protocol using tcp-ip over
an ethernet, gathering stats. As you are probably aware the results
at this point have been pretty disappointing.

I would like to start investigating new and different protocols. So
any info you get could you please forward. Also, if you have other
protocols that need testing I would be glad to help. We have ethernets,
hyperchannels, satellites etc. and a strong interest in finding a more
efficient protocol than the current TCP-IP.

By the way, we have just received a new TCP implementation that is
supposed to have rate-control. If you are interested I will send you
the results after I have some time to play with it.

rich fox
(415)694-4358

-------------------------------------------------------------------------

From: erikn@sics.se (Erik Nordmark)
Summary: 1.8 Mbps between Vaxstation II's, about 4 Mbps between Sun-3's

When I was at Stanford I was working on David Cheriton's VMTP protocol.
There was already an implementation in the V distributed system and I
did one in the 4.3BSD kernel.

These are the number we got:

Memory-to-memory bulk data transfer between 2 VAXstation II's on a 10 Mbps
Ethernet running 4.3BSD Unix:
        1.8 Mbps

Short request-reponse interaction: a 32 byte request and a 32 byte response
message (same system):
        send -----> recv
             <----- reply

        8.6 milliseconds

The implementation in V performs with about 4 Mbps and 2.3 ms between two
Sun-3's on a 10 MBPS Ethernet.

Note: The Unix implementation uses IP for datagram delivery whereas
the V implementation has its own mechanisms for delivery, routing
et.c. on the local net. An optimized version of the Unix
implementation that uses "raw" ethernet for packets on the local net
(and IP for internet packets) achieves about 2.1 Mbps between the two
microVAXes.

About VMTP:

The Versatile Message Transaction Protocol is a reliable transport
protocol based on the transaction style of communication. A
transaction consists of a request and a response message which are
limited in size to 1 Mbyte. Current implementations limit the message
size to 16 kbytes, so there is room for performance improvements in
the implementations.

The protocol has:
        better naming then tcp/ip (stable, location independent identifiers)
        support for real-time communication
        support for security
        multicasting on LAN as well as WAN (latter relying on the Internet
                        Group Management Protocol and extensions to IP)
        solves the speed mismatch problem on the local net by using rate
                        based flow control
        etc.

David Cheriton (cheriton@pescadero.stanford.edu)
"VMTP: a Transport Protocol for next generation ..."
Proc. Communications and Architecture and Protocols
Aug. 1986 (ACM)

Steve Deering
"Host extensions for IP multicasting"
RFC 988

Karen C. Lam
"4.3bsd Internet Multicast {Implementation Notes,Installation and Usage Notes}"
BBN Laboratories Inc, 10 Moulton Street, Cambridge MA

** Erik Nordmark **
Swedish Institute of Computer Science, Box 1263, S-163 13 SPANGA, Sweden
Phone: +46 8 750 79 70 Ttx: 812 61 54 SICS S Fax: +46 8 751 72 30

uucp: uucp: erikn@sics.UUCP or {seismo,mcvax}!enea!sics!erikn
Domain: erikn@sics.se

-------------------------------------------------------------------------

From: Jack Jansen <mcvax!cwi.nl!jack@uunet.uu.net>

The protocol used in the Amoeba distributed OS is probably the fastest
around currently (at least, that's what we like to think:-).

We do 420Kb/sec between two microvaxen, and 600+Kb/sec between two
68020 systems. This is all running the protocol between machines
running amoeba. I got up to 250Kb/sec once between two uVaxen running
ultrix 1.2.

References to amoeba should be easy to find, there's quite a bit
published, mainly written by Andrew S. Tanenbaum and Sape J. Mullender.
If you cannot find anything, drop me a line and I'll send you
a list of references.

Another protocol to look at might be the one used in David Cheriton's
V operating system. He does almost as good as we do.

--
	Jack Jansen, jack@cwi.nl (or  (or jack@mcvax.uucp)
	The shell is my	oyster.

-------------------------------------------------------------------------

From: dmc%tv.tv.tek.com@relay.cs.net Subject: Bulk data transfer protocol timings

We are running Stanford's V-system Version 6 kernels in Tektronix 4405 workstations, which have 16.6 Mhz. 68020 processors. The ethernet interface used is the AMD LANCE, with 64 receive packet buffers and 8 transmit packet buffers of 1518 bytes.

The Inter-Kernel measurement program `timeipc' gives us the following figures for segment transfers between the user process memory of two 4405's. The program runs at a real-time scheduling priority, and normal process execution is essentially suspended while the test is in progress. The protocol is an early version of VMTP.

Send-Receive-ReplyWithSegment (5 trial average): Size (bytes) elapsed time/100 transactions effective bit rate 0 .20 seconds 1024 .34 seconds 2.409 Mbit/sec.

Send-Receive-MoveTo-Reply (5 trial average): Size (bytes) elapsed time/100 transactions effective bit rate 2048 .66 seconds 2.482 Mbit/sec. 4096 .99 seconds 3.310 Mbit/sec. 8192 1.40 seconds 4.681 Mbit/sec. 16384 2.32 seconds 5.650 Mbit/sec. 32768 4.18 seconds 6.271 Mbit/sec. 65536 7.91 seconds 6.628 Mbit/sec. 131072 15.30 seconds 6.853 Mbit/sec.

Send-ReceiveWithSegment-Reply (5 trial average): Size (bytes) elapsed time/100 transactions effective bit rate 1024 .35 2.341 Mbit/sec.

Send-Receive-MoveFrom-Reply (5 trial average): Size (bytes) elapsed time/100 transactions effective bit rate 2048 .78 2.101 Mbit/sec. 4096 1.01 3.244 Mbit/sec. 8192 1.58 4.148 Mbit/sec. 16384 2.62 5.003 Mbit/sec. 32768 4.57 5.736 Mbit/sec. 65536 7.87 6.662 Mbit/sec. 131072 15.2 6.899 Mbit/sec.

Don Craig Tektronix Television Systems

------------------------------------------------------------------------------

From: Mike Muuss <mike@brl.arpa>

A pair of Sun-3/50 machines running SUNOS 3.3 with tcp_sndspace and tcp_rcvspace (or whatever they are called) increased to 16K (ie, increased offered windows). Test is typically 1 Mbyte memory to memory using the TTCP program (copies on request). Typical data rate is 3 Mbits/sec. For two pairs, typically see 6 Mbits/sec total for both connections. Never bothered to do three pairs. Trailers were off.

6 Mbits/sec is fairly close to the maximum usable bandwidth of an Ethernet.

On an NSC Hyperchannel, between a Gould PN9080 running UTX 2.0, using a PI32 to access an A400, with an otherwise idle trunk to an A130 adaptor connected to a Cray XMP48 running UNICOS 2.0 (at the time), I was able to achieve 11 Mbits/sec aggregate, using MTU of 4144 and Cray-IP encapsulation. This was not using TCP at all, but merely IP/ICMP_Echo request/response packets, in a "flood ping" test.

Best, -Mike

-----------------------------------------------------------------------

From: aeh@j.cc.purdue.edu (Dale Talcott)

Re your query about high speed network protocols: Several of our mainframes here at Purdue are connected using Control Data's LCN (loosely coupled network). This is not a 10Mbs based network, but may provide some insight into limiting factors.

The LCN is somewhat Ethernet-like in that it is tapped-trunk using coaxial cable, but it runs at 50Mbs instead of 10Mbs and uses 3/4 inch coax. There is carrier-sense, but collisions are avoided by providing fixed time slots for each host to start a transmission. In practice, there are occasional collisions. The maximum trunk length is limited to about 2000 feet. The typical number of taps on a trunk is small (5 - 10). The hardware level protocol is point-to-point, with no broadcasts. The hardware level protocol packet limit is 65535 16-bit words. The software packet size is 4096 bytes of data with a 12 byte software header and 21 bytes of hardware framing, addressing, crc, etc. However, there is a mode called "streaming" in which a sender can "grab" the trunk and keep it for as long as it wants by holding the carrier asserted between packets.

The hosts do not connect directly to the LCN. Rather, there are specialized minicomputers to do the connection. These are called NADs (network access devices). There are several kinds of NADs, according to the device the NAD is connecting to the LCN. We have NADs for tape controllers, disk controllers, VAXes, and various CDC Cyber mainframes.

Simplified network:

+-----+ +-----+ |host | |host | +-----+ +-----+ | | +---+ +---+ |nad| |nad| +---+ +---+ | | =============================================== trunk | +---+ |nad| +---+ | +-----+ +---+ +---+ +----+ |host |---|nad|====|nad|---|disk| +-----+ +---+ +---+ +----+

There are different protocols used, according to the devices being connected. When a host talks to another host, the protocol is called RHF (remote host facility) and is more-or-less ISO seven layer in philosophy. In practice, there is much mingling of layers.

When a NAD pair with dedicated trunk is being used to connect a host to its disks (as in the bottom example), the NADs use a simplified protocol, idiosyncratic to the device and host operating system.

---
Now, for the numbers:

("Mbs" = "megabits per second", throughout.)

A Cyber 205 host to CDC 819 disk drives, over a dedicated trunk achieved a sustained transfer rate of 30Mbs. The instantaneous rate at which the drives read is 36Mbs. The 205 usually reads/writes data in 65536 byte chunks (memory small page size), and often reads .5Mbyte chunks (memory large page). The disk NADs use streaming mode between themselves. I do not remember the actual size of the file used to determine this transfer rate, but it was at least 8 Mbytes.

--
Transfers among	hosts using the	RHF protocol, exclusive	of the time
needed to build	a connection:

CDC 6600 to CDC 6500, no other load on the network, 6 Mbit of fabricated data at sending end, discarded at receiving end, packet size of 3840 bytes: 6.5Mbs.

CDC 6600 to Cyber 205, disk to disk, both systems idle, 16 Mbytes of data: 2.6Mbs.

Same file, opposite direction: 2.0Mbs.

VAX 11/780 (dual processor) running 4.3BSD to Cyber 205. VAX multiuser, but idle. Cyber running typical middle-of-the-night, CPU intensive, low priority workload. Transfer rate for 1 Mbyte, "holey" file on VAX to 205 disk file: 1.3Mbs.

(A holey Unix file does not require disk accesses to read the holes: the system just fabricates a chuck of zeros.) Same file, only real and residing on Fuji Eagle disk: 1.1Mbs.

Statistics for transfers during normal production from 205 to 780 give numbers in the .5Mbs range for moderately large files (~1Mbyte) when the 780 has to translate each '\037' character into '\n' (done with the VAX movtc instruction). Highest rate noticed was .8Mbs for ~.25Mbyte transfer.

For a transfer from CDC Cyber 720 to VAX 780, no character translation, disk to disk, the transfer rate was .96Mbs. This was a while ago, and I don't remember the loads on the two systems, but I suspect both were idle.

For Cyber 205 to VAX 8600 running Ultrix 2.?, both systems in normal daytime production (but the VAX still lightly loaded), a .5Mbyte file transferred disk to disk with character translation at 1.05Mbs. Normal rates are 20 - 30% better than the 780. (Note: the 8600 is at JVNC at Princeton, not at Purdue.)

---
Notice the huge	disparity between the data rate	the hardware (NADs and
LCN trunk) is capable of and that actually obtained once several layers
of host	resident protocol are placed on	top.  In developing the	RHF
implementation for the VAXes, we noticed that every optimization in
host code (placing data	blocks on page boundaries, using the movtc
instead	of a C loop, etc) showed up in the transfer statistics.	 (Our
first cut got only 24kbs!).  We	have an	open project to	find why it
is still so slow.

Dale Talcott Systems programmer ARPANET: aeh@j.cc.Purdue.EDU Purdue University Computing Center BITNET: BITNET: AEH@PURCCVM Mathematical Sciences Bldg. Phone: (317) 494-1787 West Lafayette, IN 47907

------------------------------------------------------------------------------

>From ogud@sdag.cs.umd.edu Tue Nov 24 00:05:11 1987

Sorry I did not read your mail earlier but I trying to finish my thesis on the behavior of Ethernet here in the CS Dep. One part of my study was to look at bandwith between SUN's and more machines Another part was to examine the behavior of the net under overload conditions

The results are in short: (all load figures include headers) sun3/50 to sun3/50 transferrate 2Mbits for Max size TCP packets max 500 pack per sec of min size TCP packets

sun3/160 to sun3/160 multiply by 1.2

I had sun3/160 and sun3/50 listening to all the traffic using modified Etherspy program and killed of all UNIX processes that are not needed and at sun3/50 started to report dropped packets at around 25% load (2.5Mb) Both machines where shot down a(dropping lot of packets) at 40% load and started showing erratic behavior.

How does this affect the transfer rate? Well no protcol using TCP will get any better rate because the SUN's are the bottleneck not the NET. When running FTP here at night on large files I see max 140Kbytes per sec or (140 * 1090 * 8 = 1.2Mb). The higest number I have heard about is 190 KBytes/sec or( 1.6 Mb). This number is probably for a SUN3/260 and scales well down to the 140 for 3/50.

If you want more info send my questions and Hopefully I will be able to answer them for you.

Olafur Gudmundsson Dept. of Computer Science University of Maryland ARPA: ogud@brillig.edu UUCP: {...}!seismo!umcp-cs!brillig!ogud Tel: (301)-454-6153 (w) UPS: College Park MD. 20742 ATT: (301)-595-4154 (h)

------------------------------------------------------------------------------

>From mangler@csvax.caltech.edu Tue Nov 24 04:22:59 1987 > From: David Robinson <david@elroy.jpl.nasa.gov> > > The best I have personally seen is between two Sun-3/260's doing > memory-memory transfers is 3.2Mbits/sec.

What wasn't mentioned is that this was on SunOS 3.2. Prior and later versions of SunOS aren't nearly as fast at TCP. SunOS 3.4 is 2X slower!

Don Speck speck@vlsi.caltech.edu {amdahl,scgvaxd}!cit-vax!speck

============================================================================== ___ / \ John Carter =======O \ Rice Univerity | _ | \ Houston, Texas | (_) | (( )) |__#__| O/ \\O ARPA/CSNET: retrac@rice.edu | | /+ -+. UUCP: {Backbone or Internet site}!rice!retrac [N] = | [B] // // Rockets record with me at the game: 13 - 0 [A] )) \\ Rockets record w/o me at the game: 4 - 4 Home |_| =/ 15 - 18 Total /___\ ^^^ ME, *superstitious*?!? ^^^ No way! :-) -------



This archive was generated by hypermail 2.0b3 on Thu Mar 09 2000 - 14:40:42 GMT