I'm interested by your statements that checksuming is a very small part of
the procotcol processing. "Common wisdom" I've always heard, and
Cabrera in "User-Process Communication Performance in Networks of
Computers" (IEEE Trans. on Software Eng. Jan. 88) say that data copying and
checksumming are the two biggest components of protocol processing in
BSD4.2 Unix implementations.

Have you done any instrumentation of your code to get performance statistics
of which parts of the protocol processing account for the bottlenecks?

