SLIP CRC's and other reliability


Rob Horn (necntc!adelie!infinet!rhorn@AMES.ARC.NASA.GOV)
15 Apr 88 03:07:54 GMT


The practical engineering decision that CRC's are needed is based
upon: assumptions about how SLIP will be used, assumptions about the
environment that it will be used in, and cost (time, money, etc.)
assumptions. To restate the often discovered and forgotten: IP
checksums are mostly a network performance optimization. Transmission
errors that escape IP checksums usually escape TCP and vice versa.
(Note: I am already assuming that those concerned with reliability use
TCP. This is only 95% correct.) So adding checksums to SLIP must be
justified on network performance, not error detection.

My concern is error detection. I think Rick Adams major mistake
is in assuming that SLIP will only be used over voice channels,
hence using modems. The errors characteristic of voice channel +
modem are dropouts and random error bursts. A checksum is almost
as good as a CRC at detecting these, so TCP works well.

My assumption is that SLIP will be used over not only voice
channels, but also muxes of many kinds, data PBX's, etc. The
errors characteristic of these are dropouts, random error bursts,
duplications, and transpositions. Checksums are very vulnerable
to transpositions. Any even stride byte transposition will
escape detection. Much more serious, TCP will probably also fail
to detect it. I have experienced such errors with checksummed
links through failing multiplexors.

I think that a failure which escapes TCP is very serious and an
option to detect it is very important. CRC's will catch
transpositions. I only ask that CRC be available as a checking
option. I see no need to burden a simple system with LAPB or
whatever. If the CRC fails, just trash the packet. The cost of
this is minor: a few days programming and a few instructions per
byte.

If you look at costs further, a more cost effective way to
enhance reliability is forward error correction (FEC). Instead
of retrying, spend cycles to fix damaged packets without retry.
This is worthwhile whenever the incremental cost of FEC is less
than the cost of retransmission. For a small system, time is
what counts. Ten thousand instructions is nothing when compared
to a retransmission. For a multi-user host, the tradeoff is not
obvious.

I have been contemplating a two-way interleaved (255,253)
Reed-Solomon FEC with assumed nulls for message length matching.
This code would fix:
  a) any single erroneous byte
  b) any two erroneous bytes with odd stride (including
     transposition)
Other errors would result in either incorrect fixes or error
detection. The incorrect fixes will trigger TCP checksum error
detection, including transposition cases. My first complexity
estimate is that this FEC would take about twice as much CPU as a



This archive was generated by hypermail 2.0b3 on Thu Mar 09 2000 - 14:41:55 GMT