Re: Adaptive SMTP Timeouts

Tue, 3 Jun 86 10:40 EDT

    Date: Mon, 2 Jun 86 12:03:42 PDT

    An idea that we have found very helpful...

    Our mailer keeps outgoing mail sorted by host. Hosts are split into two
    categories: healthy and sick. While there is work to do on the healthy
    queue, the mailer ignores the sick hosts. Whenever the mailer empties
    the healthy queue, it tries the host on the front of the sick queue. (If
    that fails, it gets moved to the end of the sick queue.) The idea is to
    avoid having the mailer bang its head against hosts that are known to be
    causing trouble.

We do something like this. We keep track of up and down hosts, and when
processing mail skip the hosts that are believed down. Therefore, we
don't concentrate on one particular host (which might drive that host
crazy). I think we do it this way because one message is often destined
for many hosts.

    Occasionally mail to a host that isn't really very sick takes much
    longer that we would like. This happens when the sick queue is very long
    and the mailer is busy so the sick queue doesn't turn over very fast. So
    far, this hasn't bothered us enough to do anything about it.

Obvious solution: Periodically declare sick hosts up, or slightly more
conservatively, declare the host suitable for a probe. If it really is
sick, you'll know soon enough. You only have to do this for one
message. If it isn't sick, you can requeue the tardy messages.

    Along the same lines, we also keep mail to a host sorted, but not quite
    chronologically. Whenever the mailer tries to send a message and fails,
    that message gets moved to the end of the queue. Occasionally, this lets
    the rest of the mail get through when one particular message is
    having/causing troubles.

When a message is causing troubles, how long does it take a human to
realize it and take corrective action. If it stayed at the head of the
queue, I can imagine a human would notice sooner by either having no
mail get through at all, or the queue for the troublesome host keeps
growing instead of stays at some "respectable" number.

This archive was generated by hypermail 2.0b3 on Thu Mar 09 2000 - 14:36:33 GMT