NTP ticks getting louder

23-Nov-86 20:49:54-UT


I thought you might like an update on how clocks are ticking in the swamps.
After some rummaging around today I was surprised to learn that not only the
GOES radio clock on FORD1.ARPA had completely departed its interface, but the
WWVB clock on UMD1.ARPA had departed its antenna, or something like that. Only
the WWVB clock on DCN1.ARPA, along with the scruffy WWV clocks on GW.UMICH.EDU
and UDEL2.UDEL.EDU (relocated from DCN6.ARPA), continued to tick. However,
since all the swamps involved use Network Time Protocol (NTP) peers as backup,
the hosts involved remained synchronized (to DCN1.ARPA) and the clockwatchers
scattered throughout the Internet scarcely knew anything was abnormal.

The control of time warps as synchonization switched between the local clocks
and NTP-derived time was not without mishap, however, and revealed some bugs.
Benign torpedoes sent by Rich Wales at UCLA exposed one bug that caused NTP
targets to vaporize and then recondense, although most of the time this did
not destabilize system synchronization. Some very subtle transients in the
recursive median filters used by the fuzzball NTP peers to deglitch neighbor
offsets proved very hard to catch, but catched they got. Several changes were
made to the fuzzware to improve accuracy and reduce vulnerability to glitches.

Here at U Delaware we are synchronizing clocks to DCN1.ARPA via ARPAnet paths
and can report satisfying results. With an eight-stage recursive median
filter, one-minute poll interval, 256-ms aperture and filter constants as
reported in previous RFCs, we can reliably deliver local time to within 10-20
ms or so of the DCN1.ARPA WWVB reference clock, which has previously been
calibrated to within a few milliseconds of NBS radio time. It turns out that
NTP is a useful diagnostic of network health as well, since wide delay
dispersions and offset glitches are sensitive indicators of path switching and
congestion. Milo Medin at NASA/AMES, Rich Wales at UCLA and Mike Petry at U
Maryland have NTP non-fuzzball peers running and, hopefully, can report how
well things work via other paths and using other systems.

Finding and fixing time warps during the shakedown of NTP in distributed-peer
mode (see RFC-958) has been surprisingly hard. since the system amounts to a
set of mutually coupled, nonlinear, phase-locked oscilators. As many know, the
theory of linear phase-locked oscillators is well trampled in Electrical
Engineering, as are models of mutual trust/distrust in Computer Science. The
present problems seem to lie more in the area of nonlinear statistics, for
which the technology of nonlinear filtering (e.g. order statistics, median
filters), clustering algorithms (e.g. RFC-956) and multivariate estimation are
proving excellent tools. These tools, incidently, are excellent for the study
of large, ill-disciplined Internets in general. Which suggests, of course,
further instrumentation of NTP peers as a network monitoring mechanism.


This archive was generated by hypermail 2.0b3 on Thu Mar 09 2000 - 14:36:59 GMT