Last update: 21-Sep-2010 20:15 UTC
NTP time synchronization services are widely available in the public Internet. The public NTP subnet in late 2010 includes several thousand servers in most countries and on every continent of the globe, including Antarctica, and sometimes in space and on the sea floor. These servers support a total population estimated at over 25 million computers in the global Internet.
The NTP subnet operates with a hierarchy of levels, where each level is assigned a number called the stratum. Stratum 1 (primary) servers at the lowest level are directly synchronized to national time services via satellite, radio and telephone mdem. Stratum 2 (secondary) servers at the next higher level are synchronize to stratum 1 servers and so on. Normally, NTP clients and servers with a relatively small number of clients do not synchronize to public primary servers. There are several hundred public secondary servers operating at higher strata and are the preferred choice.
This page presents an overview of the NTP daemon included in this distribution. We refer to this as the reference implementation only because it was used to test and validate the NTPv4 specification RFC-5905. It is best read in conjunction with the briefings on the Network Time Synchronization Research Project page.
Figure 1. NTP Daemon Processes and Algorithms
The overall organization of the NTP daemon is shown in Figure 1. It is useful in this context to consider the daemon as both a client of upstream servers and as a server for downstream clients. It includes a pair of peer/poll processes for each reference clock or remote server used as a synchronization source. The poll process sends NTP packets at intervals ranging from 8 s to 36 hr. The peer process receives NTP packets and runs the on-wire protocol that collects four timestamps: the origin timestamp T1 upon departure of the client request, the receive timestamp T2 upon arrival at the server, the transmit timestamp T3 upon departure of the server reply, and the destination timestamp T4 upon arrival at the client. These timestamps are used to calculate the clock offset and roundtrip delay:
offset = [(T2 - T1) + (T3 - T4)] / 2
delay = (T4 - T1) - (T3 - T2).
Those sources that have passed a number of sanity checks are declared selectable. From the selectable population the statistics are used by the select algorithm to determine a number of truechimers according to correctness principles. From the truechimer population a number of survivors are determined on the basis of statistical clustering principles. One of the survivors is declared the system peer and the system statistics inherited from it. The combine algorithm computes a weighted average of the survivor offsets and jitters to produce the final offset used by the clock discipline algorithm to adjust the system clock time and frequency.
When started, the program requires several measurements for these a algorithms to work reliably before setting the clock. As the default poll interval is 64 s, it can take several minutes to set the clock. The time can be reduced using the iburst option on the Server Options page. For additional details about the clock filter, select, cluster and combine algorithms see the Architecture Briefing on the NTP Project Page.
Each source is characterized by the offset and delay measured by the on-wire protocol and the dispersion and jitter calculated by the clock filter algorithm of the peer process. Each time an NTP packet is received from a source, the dispersion is initialized as the sum of the precisions of the server and client. Precision is defined by the latency to read the system clock and various from 1000 ns to 100 ns in modern machines.
The offset, delay and dispersion samples are inserted as the youngest stage of an 8-stage shift register, thus discarding the oldest stage. Subsequently, the sample dispersion in each stage is increased at a fixed rate of 15 ms/s, representing the worst case error due to skew between the server and client clock frequencies's.
In each peer process the clock filter algorithm selects the stage with the lowest delay sample, which generally represents the most accurate data, and it and the associated offset sample become the peer variables of the same name. The peer dispersion is determined as a weighted average of the dispersion samples in the shift register. It continues to grow at the same rate as the sample dispersion. Finally, the peer jitter is determined as the root-mean-square (RMS) average of all the offset samples in the shift register relative to the selected offset sample.
The clock filter algorithm continues to process packets in this way until the source is no longer reachable. Reachability is determined by an 8-bit shift register, which is shifted left by one bit as each poll packet is sent, with zero replacing the vacated rightmost bit. Each time an update is received, the rightmost bit is set. The source is considered reachable if any bit is set in the register; otherwise, it is considered unreachable.
A server is considered selectable only if it is reachable and a timing loop would not be created. A timing loop occurs when the server is apparently synchronized to the client or when the server is synchronized to the same server as the client. When a source is unreachable, a dummy sample with "infinite" dispersion is inserted in the shift register, thus displacing old samples.
The composition of the survivor population and the system peer selection is redetermined as each update from each source is received. The system variables are copied from the peer variables of the same name and the system stratum set one greater than the system peer stratum. Like peer dispersion, the system dispersion increases at the same rate so, even if all sources have become unreachable, the daemon appears to upstratum clients at ever increasing dispersion.
Of interest in this discussion is how the protocol determines the quality of service from a particular reference clock or remote server. It is determined from two statistics, expected error and maximum error. Expected error is determined from various jitter components; it represents the nominal error in determining the mean clock offset. However, it is not relevant to the discussion to follow. Maximum error is determined from delay and dispersion contributions and represents the worst-case error due to all causes. In order to simplicity this presentation, certain minor contribution s to the maximum error statistic are ignored. Elsewhere in the documentation the maximum error is called synchronization distance.
The maximum error is computed as one-half the root delay to the primary source of time; i.e., the primary reference clock, plus the root dispersion. The root variables are included in the NTP packet header received from each server. When calculating maximum error, the root delay is the sum of the root delay in the packet and the peer delay, while the root dispersion is the sum of the root dispersion in the packet and the peer dispersion.
A source is considered selectable only if its maximum error is less than the select threshold, by default 1.5 s, but can be changed due to client preference. A common consequences is when an upstream server loses all sources and its maximum error apparent to clients begins to increase. The clients are not aware of this condition and continues to accept synchronization as long as the maximum error is less than the select threshold.
Although it might seem counter-intuitive, a cardinal rule in the selection process is, once a sample has been selected by the clock filter algorithm, that sample and any older samples are no longer selectable. This applies also to the select algorithm. Once the peer variables for a source have been selected, older variables of the same or other sources are no longer selectable. This means that not every sample can be used to update the peer variables and up to seven samples can be ignored between selected samples. This fact has been carefully considered in the discipline algorithm design with due consideration of the feedback loop delay and minimum sampling rate. In engineering terms, even if only one sample in eight survives, the resulting sample rate is twice the Nyquist rate at any time constant and poll interval.
Some daemon configurations include a combination of reference clocks and remote servers in order to provide redundancy and backup. For example, a modem reference clock may furnish backup for a GPS reference clock, but used only if the GPS clock fails. In addition, the local clock might be used if all sources fail, or orphan mode might be used instead. The mitigation algorithms provide an orderly selection in such cases. Another function of these algorithms is when multiple sources of the same type are available, but for one reason or another, one or more of them are preferred over the others. Finally, some reference clocks provide a pulse-per-second (PPS) signal to augment the serial timecode. The mitigation algorithms have to figure out when the PPS signal is valid and which reference clock is to number the seconds. These intricate algorithms are described on the Mitigation Algorithms and the prefer Keyword page.
At the heart of the NTP specification and reference implementation is the clock discipline algorithm, which is best described as an adaptive parameter, hybrid phase/frequency-lock feedback loop. It is an intricately crafted algorithm that automatically adapts for optimum performance while minimizing network overhead. Further details are on the Clock Discipline page.
In the NTPv4 specification and reference implementation a state machine is used to manage the system clock under exceptional conditions, as when the daemon is first started or when encountering severe network congestion. When the frequency file is present at startup is that the residual offset error is less than 0.5 ms within 300 s. When the frequency file is not present, this result is achieved within 600 s. Further details are on the Clock State Machine page.