Re: TCP maximum segment size determination

17 Nov 1987 11:38-EST

        If you want a low (as in zero) gateway overhead, no change(1)
to gateway code, non-TCP specific, works with non-symmetric routes,
easy to phase into use (e.g., where you really need it), backward
compatible, way to do MTU probing, read on.
        In this scheme, every packet is a "probe" (no cost to
originating system to invoke).
        The gateways do fragmentation, as they are required to do.
I would suggest that the size of the first fragment be the MTU of the
next hop - the suggested fragmentation algorithm has this property,
and the last time the subject was discussed on this list, no vendors
replied saying that they had implemented an algotithm which didn't
have the property (but it isn't really required anyway).
        If all the fragments arrive at the destination and get
reassembled(2), all is well and you didn't really need to probe.
In the no-problem case, your additional cost is zero - seems good.
        If not all the fragments made it, some will be left in the IP
reassembly queue and timeout. This is where the ICMP Time Exceeded
message is used (as opposed to the spec which uses words like "may"
and "need not"). Chances are pretty good (engineering over
mathematics) that the first fragment made it and can be included in
the message - it has the minimum-maximum MTU encoded in the IP Length.
If "fragment zero" is not there, it should be easy to use a fragment
with "the largest size"; this might require a little extra code in the
hosts, but it only costs cycles when needed, and is in the hosts and not
the gateways.
        The ICMP message is returned to the originating host; the
route that it takes does not matter - symmetric or asymmetric. The
datagram may get lost(1), but then any IP based technique has the same
problem. If the problem is persistent, one should get through.
        The originating host receives the message and the ICMP
routines use the very strong hint about the path MTU to update
whatever table it uses to record such information (this may require
some additional code in the hosts) before passing the message on to
the higher level protocol involved. One can use any desired "filtering"
algorithm to set thresholds for when to decide it is appropriate to
change the MTU vs letting higher level reliability mechanisms deal with
the problem.
        All of the mechanisms used in this technique are allowed by the
current specifications, so phase-in and backward compatibility should not
be a problem. It isn't perfect, but seems like an engineering compromise
that has several advantages over other schemes.

1. I would suggest that control traffic, e.g., ICMP be given a little
    priority in gateways when deciding which packets to drop. Dropping
    control traffic doesn't sound like a good idea in general.
2. An IP entity can "force" a probe by forcing a reassembly timeout.
    It can construct, for example, a "large" ICMP echo request setting the
    more fragments bit (but never sending the "last" fragment) (and probably
    a low TTL). It would get the path MTU and, by definition, timeout
    and be returned.

This archive was generated by hypermail 2.0b3 on Thu Mar 09 2000 - 14:39:56 GMT