# Checksums (was Re: Ping, checksum algorithm?)

gordan (mnetor!spectrix!yunexus!maccs!gordan@uunet.uu.net)
23 Mar 88 01:03:47 GMT

them (ignoring overflows) _as if their bit patterns represented one's
omplement numbers_. The trick, then, is doing one's complement
arithmetic on a two's complement machine.

Without going into any arithmetical justification, here's how to do a
one's complement sum on a two's complement machine, in pseudocode:

INT16 sum;
INT16 *word; /* pointer to start of 16-bit words to be summed */

sum = 0;

for (i = 0; i < `number of 16-bit words to be summed'; i++)
{
`byte-swap word[i], if necessary (see comment on byte-order)'
sum += word[i]; /* do NOT combine these two lines ... */
sum += `CARRY'; /* ... into sum += word[i] + CARRY !!!! */
}

where CARRY is the value of the hardware carry bit (0 or 1), as set by
the addition in the previous line (note you mustn't do sum += word[i] +
CARRY as one line, since a high-level language could rearrange the order
of addition and add the value of the carry bit before it was set).

Of course, the value of the carry bit is not accessible from a
higher-level language like C. A perfectly equivalent method (very
suitable if your machine has 32-bit integers) is:

INT32 sum32, word32
INT16 *word; /* pointer to start of 16-bit words to be summed */

sum32 = 0;

for (i = 0; i < `number of 16-bit words to be summed'; i++)
{
`byte-swap word[i], if necessary (see comments on byte-order)'
`copy word[i] to word32, zero-extended (NOT sign-extended)'
/* (e.g., 0xedcb -> 0x0000edcb, not 0xffffedcb) */
sum32 += word32;
}

sum = `add the two 16-bit halves of sum32 to each other'

This works, since the carry bit values for 16-bit addition of the least
significant 16-bit word accumulate in the most significant 16-bit word
of the 32-bit sum. (This is probably what you would use on a 68020 --
and you can forget about byte-swapping on a 68020 as well).

After calculating a one's complement sum, you have to take its one's
complement (invert all the bits) to get the actual checksum used in IP,
TCP, and UDP (but note that UDP treats a calculated checksum of 0x0000
as a special case -- see the RFC).

It is of course necessary to take byte-order into account.

(Byte-order: if adjacent memory locations on a machine contain the
following bytes:

X : 0x12
X+1 : 0x34

then what is the value of the 16-bit word whose address is X?
(assuming a byte-addressable machine and valid alignment for X to be

If the 16-bit value is 0x1234, the machine is said to be
``big-endian;'' if it is 0x3412, the machine is ``little-endian.''

Some machine architectures (Motorola 680x0, etc.) are big-endian, others
(Intel 80x86, VAX) are little-endian. TCP/IP headers use big-endian
byte-order. Thus, life is easier on a Sun than on a VAX.

Some examples follow, using actual packets (see the appropriate RFC docs
for IP, UDP, and TCP, and ignore the Ethernet stuff). In case anyone's
curious, the IP addresses here are used on a LAN unconnected to any
outside network (they do not respect the class A/B/C Internet naming
scheme).

An Ethernet UDP/IP packet
---------------------------------------------------
1: ff-ff-ff-ff-ff-ff 02-60-8c-09-58-97 08-00
2: 45 00
3: 00-24 00-01 00-00 ff 11 61-31 01-00-58-97 01-00-
4: -00-00
5: 09-46 00-2a 00-10 c9-ca
6: 01 06 4a 48 45 56
7: 41 58
8: 00 00 00 00 00 00 00 00 00 00
---------------------------------------------------
Lines 2-8: Ethernet data

Lines 5-7: IP data

Lines 6-7: UDP data

Line 8: Garbage padding to satisfy Ethernet minimum packet size
(Ethernet header + data >= 60 bytes).
---------------------------------------------------

An Ethernet TCP/IP packet
----------------------------------------------------
1: 08-00-2b-02-d2-67 08-00-02-00-51-23 08-00
2: 45 00
3: 00-4b 44-46 00-00 1e 06 56-3a 01-00-00-0b 01-00-
4: -00-23
5: 00-17 07-a8 06-14-56-f0 d3-1d-aa-a4 50 18
6: 00-68 b1-d0 00-00
7: 0d 0a 0d 0a 4d 63 4d 61 73 74
8: 65 72 20 55 6e 69 76 65 72 73 69 74 79 20 56 41
9: 58 20 38 36 30 30 0d 0a 0d
10: 00
---------------------------------------------------
Lines 2-10: Ethernet data

Lines 5-9: IP data

Lines 7-9: TCP data

Line 10: Garbage Ethernet padding (to send an even number of bytes)
----------------------------------------------------

In the first packet, the IP Checksum field is 0x6131 (in the middle of
line 3). The IP checksum is calculated over all 16-bit words in the
header (except the checksum field itself is taken to be zero, prior to
actually calculating it). Thus the 16-bit words that go into
calculating the IP checksum are (from lines 2,3,4): 0x4500, 0x0024,
0x0001, 0x0000, 0xff11, 0x0000, 0x0100, 0x5897, 0x0100, 0x0000.

The 32-bit sum of zero-extended words is 0x 0001 9ecd, so the one's
complement sum is 0x9ece. The one's complement of this is the checksum,
0x6131.

The UDP Checksum field in the same packet is 0xc9ca (at the end of line
5). Unlike IP, the UDP checksum is calculated not only over the UDP
header, but also over the UDP data, and over a pseudo-header consisting
of the IP source and destination addresses, the IP Protocol field
zero-extended to 16-bits, and a UDP length word. Again the checksum
field itself is taken to be zero during the actual calculation, since we
can't know its value before actually computing it.

Thus the 16-bit words that go into calculating the UDP checksum are
(from lines 5,6,7): 0x0946, 0x002a, 0x0010, 0x0000, 0x0106, 0x4a48,
0x4556, 0x4158; (and from the pseudo-header): 0x0100, 0x5897, 0x0100,
0x0000, 0x0011 (UDP protocol number = 0x11 or 17 decimal), and 0x0010
(the UDP length). The 32-bit sum of zero-extended words is 0x 0001
3634, so the 16-bit one's complement sum is 0x3635 and the checksum is
0xc9ca as required.

In the second packet shown, the IP checksum is 0x563a (in the middle of
line 3). The 16-bit words that go into calculating the IP checksum are
(from lines 2,3,4): 0x4500, 0x004b, 0x4446, 0x0000, 0x1e06, 0x0000,
0x0100, 0x000b, 0x0100, 0x0023.

The 32-bit sum of zero-extended words is 0x 0000 a9c5, so the one's
complement sum is 0xa9c5. The one's complement of this is the checksum,
0x563a.

The TCP Checksum field in the same packet is 0xb1d0 (the second word in
line 6). Just as for UDP, the TCP checksum is calculated over all 16-bit
words in the TCP header, data, and pseudo-header. The 16-bit words that
go into calculating the checksum are:

0x0017, 0x07a8, 0x0614, 0x56f0, 0xd31d, 0xaaa4,
0x5018, 0x0068, 0x0000 (checksum field itself is initially zero),
0x0000.

From the TCP data:

0x0d0a, 0x0d0a, 0x4d63, 0x4d61, 0x7374, 0x6572,
0x2055, 0x6e69, 0x7665, 0x7273, 0x6974, 0x7920, 0x5641, 0x5820,
0x3836, 0x3030, 0x0d0a, 0x0d00 (we have an odd number of data bytes,
so the last byte is zero-filled on the right to form a 16-bit word).