**Charley Kline** (*kline@uxc.cso.uiuc.edu*)

*Fri, 1 Apr 88 15:53:44 CST*

**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ]**Next message:**Roger Fajman: "Re: TCP/IP for IBM's MVS"**Previous message:**Francine Perillo: "RFCs, IENs and the DDN NIC"

When I first started the CTSS TCP/IP project, Dave Clark suggested that

I publish the checksum algorithm. His contention was that most checksummers

are dreadfully machine-dependent, incomprehensible, and artful things. Well,

I didn't put much effort into art, but I thought I'd oblige him.

What I do to compute checksums on a Cray is a vector operation

in assembler of all things. I can do up to 512 bytes of packet at a

time. The core of the algorithm looks as follows. A1 holds the address

of the 512-byte block of memory to checksum (I'm leaving out many

details having to do with short blocks, because they are ugly).

EBM

A0 A1

VL 64 use full vectors

S1 <32 form 32-bit mask from the right.

A2 32

V1 ,A0,1 load packet into V1

V2 S1&V1 Form right-hand 32-bits in V2.

V3 V1>A2 Form left-hand 32-bits in V3.

V1 V2+V3 Add the two together.

A2 63 Prepare to collapse into a scalar.

S1 0

S4 <16 Form 16-bit mask from the right.

A4 16

CK$LOOP S2 V1,A2

A2 A2-1

A0 A2

S1 S1+S2

JAN CK$LOOP

S2 S1&S4 Form right-hand 16-bits in S2

S1 S1>A4 Form left-hand 16-bits in S1

S1 S1+S2

S2 S1&S4 Form right-hand 16-bits in S2

S1 S1>A4 Form left-hand 16-bits in S1

S1 S1+S2

S1 #S1 Take one's complement

CMR At this point, S1 contains the checksum.

Note that this takes full advantage of the fact that the one's complement

sum is an Abelian Group--I can add 32-bit pieces instead of 16-bit pieces,

as has been mentioned. I can also take advantage of the fact that the sum of

the checksums of several blocks is the same as the checksum of the sum

by carrying out this checksum on several blocks and adding up the answers.

First I load two copies of the packet into two vector registers. One

gets vector-shifted right 32 bits; the other gets vector-ANDED with a

32 bit mask. Then the two are added together. Since all these operations

chain, I can produce one result per clock cycle. Then I collapse

the results in the vector by adding each element to a scalar register.

Finally, the end-around carry and conversion to 16-bits. It's all

terribly fast. Now if only context-switching and IPC under CTSS were

as fast...

-----

Charley Kline, University of Illinois Computing Services

kline@uxc.cso.uiuc.edu

{ihnp4,uunet,pur-ee,convex}!uiucuxc!kline

"Birth. School. Work. Death."

**Next message:**Roger Fajman: "Re: TCP/IP for IBM's MVS"**Previous message:**Francine Perillo: "RFCs, IENs and the DDN NIC"

*
This archive was generated by hypermail 2.0b3
on Thu Mar 09 2000 - 14:41:54 GMT
*