Re: TCP performance limitations


decvax!utzoo!henry@ucbvax.Berkeley.EDU
Fri, 23 Oct 87 01:43:18 edt


> ... Fourth, depending on your CPU architecture, there
> may be ideal unrolling constants which would keep the unrolled loop
> inside an instruction prefetch buffer; complete unrolling would actually
> be a degredation.

In particular, almost any CPU with a cache -- which means most anything above
the PC level nowadays -- will have an optimum degree of unrolling for loops
that iterate a given number of times. It's not just a question of whether
the loop will fit; eventually the extra main-memory fetches needed to get
a larger loop into the cache wipe out the gains from reduced loop-control
overhead. For straightforward caches (with a loop that will *fit* in the
cache!), elapsed time versus degree of unrolling is a nice smooth curve with
a quite marked minimum. Based on the look I took at this, if the ratio of
your cache speed to memory speed isn't striking, and your loop control is
not grossly costly (due to e.g. pipeline breaks), the minimum has a good
chance of falling at a fairly modest unrolling factor, maybe 8 or 16.

                                Henry Spencer @ U of Toronto Zoology
                                {allegra,ihnp4,decvax,pyramid}!utzoo!henry



This archive was generated by hypermail 2.0b3 on Thu Mar 09 2000 - 14:39:35 GMT