Re: TCP performance limitations

David C. Plummer (DCP@QUABBIN.SCRC.Symbolics.COM)
Tue, 13 Oct 87 10:08 EDT

    Date: 8 Oct 87 16:05:47 GMT
    From: csustan!csun!psivax!nrcvax!ihm@LLL-WINKEN.ARPA (Ian H. Merritt)

>There is a fourth way that we (Symbolics) have done which you did not
>(a) Pick a compile-time unrolling factor, usually a power of 2, say 16 = 2^4.
>(b) Divide the data length by the unrolling factor, obtaining a quotient
> and remainder. When the unrolling factor is a power of two, the
> quotient is a shift and the remainder is a logical AND.
>(c) Write a unrolled loop whose length is the unrolling factor. Execute
> this loop <quotient> times.
>(d) Write an un-unrolled loop (whose length is therefore 1). Execute
> this loop <remainder> times.

    Or if you have memory to burn (which is fast becoming a common
    condition), just unroll the loop for the maximum condition and branch
    into it at the appropriate point to process the length of the actual

First of all, that's 65535 octets for TCP. Second, I believe that was
one of the three techniques metioned by the person to whom I was
replying. Third, we (Symbolics) can't do that without playing some
really nasty games with the compiler. You see, we're of the opinion
that assembly language is a thing of the past, and there aren't any good
Lisp constructs for the kind of computed GO necessary to pull this trick
off. I can't think of any good tricks in FORTRAN, either. I'm not
familiar with Pascal, Ada or C to know if those higher-level languages
allow such things. Fourth, depending on your CPU architecture, there
may be ideal unrolling constants which would keep the unrolled loop
inside an instruction prefetch buffer; complete unrolling would actually
be a degredation.

This archive was generated by hypermail 2.0b3 on Thu Mar 09 2000 - 14:39:35 GMT