1. Client using public servers

A common configuration of chronyd is a client using public servers from the pool.ntp.org project. It is the default configuration included in many packages of chrony.

The configuration file could be:

pool pool.ntp.org iburst
driftfile /var/lib/chrony/drift
makestep 0.1 3
rtcsync

The servers used by the client are selected randomly by the pool DNS servers from the country of the client (according to IP geolocation data, which are not always accurate). The polling interval is automatically adjusted between the default minimum of 64 and maximum of 1024 seconds. As the client is running, it should slowly increase its polling interval to the maximum and reduce the load on the servers.

Accuracy of the system clock is usually within a few milliseconds, but it can be significantly worse, depending on how symmetric are network routes between the servers and client, how stable is the network delay and client’s clock, and how accurate are the servers themselves.

The set of servers can change on each restart of the client. There can be a significant offset between different clients in a local network using the same configuration.

Example reports from chronyc on a client using the configuration:

$ chronyc -n tracking
Reference ID    : D91E4B93 (217.30.75.147)
Stratum         : 2
Ref time (UTC)  : Fri Jan 21 12:41:47 2022
System time     : 0.000483869 seconds fast of NTP time
Last offset     : +0.000763419 seconds
RMS offset      : 0.000790034 seconds
Frequency       : 0.310 ppm fast
Residual freq   : +0.215 ppm
Skew            : 1.199 ppm
Root delay      : 0.012714397 seconds
Root dispersion : 0.001104208 seconds
Update interval : 522.2 seconds
Leap status     : Normal
$ chronyc -n sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^+ 82.113.53.41                  3  10   377    92  -1044us[-1044us] +/-   12ms
^* 217.30.75.147                 1   9   377   347   +442us[+1206us] +/- 6357us
^- 89.221.218.101                2  10   377   683   +749us[+1232us] +/-   36ms
^+ 81.25.28.124                  1  10   377    57    -68us[  -68us] +/- 7681us
$ chronyc -n sourcestats
Name/IP Address            NP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==============================================================================
82.113.53.41                6   4   77m     +0.143      0.806   -829us   408us
217.30.75.147               7   7   51m     +0.422      0.966   +520us   334us
89.221.218.101              6   3   68m     +0.048      2.032   -323us   896us
81.25.28.124                6   4   77m     +0.226      1.850   +598us   764us

The offset from the client’s tracking log in shown in the graph below as dots. That is the error of the system clock that chronyd is assuming from its measurements and correcting. The red line is the actual error of the clock measured by a separate monitoring instance of chronyd using a GPS reference clock, which was not adjusting the system clock.

client swts public

In this test the clock was off by about 2-3 milliseconds most of the time, but there were some large excursions in the error, one reaching about 8 milliseconds. They could be reduced by limiting the maximum polling interval (e.g. to 64 seconds), but that would increase the load on the public servers.

2. Client using local server and software timestamping

A local server with its own reference clock (e.g. a GPS receiver) is needed if better accuracy is required on clients. They should be configured with a shorter polling interval and have the interleaved mode enabled if supported on the server (e.g. if it runs chronyd). Multiple servers should be used for reliability.

In this example the client uses a single server with a polling interval of 4 seconds and software timestamping (the default).

server 192.168.123.1 iburst minpoll 2 maxpoll 2 xleave
driftfile /var/lib/chrony/drift
makestep 0.1 3
rtcsync

Reports from chronyc:

$ chronyc -n tracking
Reference ID    : C0A87B01 (192.168.123.1)
Stratum         : 2
Ref time (UTC)  : Wed Jan 19 16:59:22 2022
System time     : 0.000000075 seconds slow of NTP time
Last offset     : -0.000000146 seconds
RMS offset      : 0.000000970 seconds
Frequency       : 24.067 ppm fast
Residual freq   : -0.004 ppm
Skew            : 0.029 ppm
Root delay      : 0.000061748 seconds
Root dispersion : 0.000024459 seconds
Update interval : 4.0 seconds
Leap status     : Normal
$ chronyc -n sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^* 192.168.123.1                 1   2   377     8  +1002ns[ +864ns] +/-   47us
$ chronyc -n sourcestats
Name/IP Address            NP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==============================================================================
192.168.123.1               9   6    32     -0.004      0.045    -14ns   223ns

There were three network switches between the server and client in the test. The network was loaded with 20-second data transfers from the client to the server every minute.

The following graph shows the client’s tracking offset and the actual error of the system clock measured with a local reference clock with a sub-microsecond accuracy.

client swts 3switch

The clock was stable to about 2 microseconds. There was a constant error of about 5 microseconds, which was caused mainly by asymmetric errors in software timestamps used by the client. The server used hardware timestamps with insignificant errors when compared to the client. If the server used software timestamping and had the same hardware, OS, and configuration as the client, the asymmetry could cancel out, but the synchronisation would be less stable.

3. Client using local server and hardware timestamping

For best accuracy it is necessary to use a NIC which supports hardware timestamping. In this example the client and server both have the Intel I210 card. They also both run chrony version 4.2, which supports the experimental extension field F323 improving synchronisation stability. The client is configured to send 16 requests per second and filter 5 measurements at a time, making about 3 updates of the clock per second. The polling of the hardware clock matches the minimum polling interval of the NTP source.

server 192.168.123.1 minpoll -4 maxpoll -4 xleave extfield F323 filter 5
hwtimestamp * minpoll -4
driftfile /var/lib/chrony/drift
makestep 0.1 3
rtcsync

The Intel I210 has timestamping errors compensated in the Linux igb driver (it is not necessary to compensate them with the rxcomp and txcomp options in the hwtimestamp directive). For better stability, Energy-Efficient Ethernet (EEE) was disabled in the network and the CPU on both server and client was set to a constant frequency.

Reports from chronyc:

$ chronyc -n tracking
Reference ID    : C0A87B01 (192.168.123.1)
Stratum         : 2
Ref time (UTC)  : Wed Jan 19 14:12:20 2022
System time     : 0.000000010 seconds fast of NTP time
Last offset     : -0.000000003 seconds
RMS offset      : 0.000000010 seconds
Frequency       : 24.096 ppm fast
Residual freq   : +0.000 ppm
Skew            : 0.004 ppm
Root delay      : 0.000015813 seconds
Root dispersion : 0.000003070 seconds
Update interval : 0.3 seconds
Leap status     : Normal
$ chronyc -n sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^* 192.168.123.1                 1  -4   377     1    +33ns[  +30ns] +/-   11us
$ chronyc -n sourcestats
Name/IP Address            NP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==============================================================================
192.168.123.1              37  13    12     +0.000      0.004     +0ns    24ns

There were three network switches between the server and client in the test. The network was loaded with 20-second data transfers from the client to the server every minute. Network load typically has only a small impact on accuracy of hardware timestamping, but it can cause an NTP packet to be queued in a switch and cause a large error in the NTP measurement due to asymmetric delay. As long as this does not happen for too many measurements in a row, the client should be able to ignore the impacted measurements and keep the clock well synchronised. Some switches can be configured to prioritise NTP packets (by port number or DSCP) to limit the queueing delays.

The NTP measurements and the clock were stable to few tens of nanoseconds. Measuring accuracy of the system clock at this level is difficult. The main problem is communication over the PCIe bus between the system clock (CPU) and the NIC, which can have an asymmetric latency causing errors in the readings of the hardware clock up to a few hundred nanoseconds.

The following graph shows the client’s tracking offset and an error of the clock measured with a PPS signal (shared with the server) connected to the NIC.

client hwts 3switch f323

The asymmetry of about 70 nanoseconds is caused by the network switches. It is common for switches to have a different forwarding delay from port A to port B than from port B to port A and different asymmetries on different pairs of ports.

Other asymmetries in this test should cancel out due to the server and client using the same model of the NIC for timestamping of NTP packets and timestamping of the shared PPS signal (connected with cables of equal length). If the error due to PCIe latency was not larger than 100 nanoseconds, the system clock would be accurate to about 250 nanoseconds relative to the reference clock of the server.

4. Server using reference clock on serial port

One of the easier ways to make a stratum-1 NTP server is to connect a GPS receiver to a serial port of the computer. The receiver needs to provide a pulse per second (PPS) signal to enable accuracy at the microsecond level. It is usually connected to the DCD pin of the port. The gpsd daemon can combine the serial data with PPS and provide a SHM or SOCK reference clock for chronyd.

The following example uses the SOCK refclock:

refclock SOCK /var/run/chrony.ttyS0.sock
makestep 0.1 3
allow
rtcsync
driftfile /var/lib/chrony/drift
leapsectz right/UTC

gpsd needs to be started after chronyd in order to connect to the socket and it needs to be started with the -n option to not wait for clients to connect before polling the receiver. For example:

# gpsd -n /dev/ttyS0

Reports from chronyc:

$ chronyc -n tracking
Reference ID    : 47505300 (GPS)
Stratum         : 1
Ref time (UTC)  : Mon Jan 24 12:42:11 2022
System time     : 0.000000043 seconds fast of NTP time
Last offset     : +0.000000046 seconds
RMS offset      : 0.000000489 seconds
Frequency       : 2.331 ppm fast
Residual freq   : +0.000 ppm
Skew            : 0.010 ppm
Root delay      : 0.000000001 seconds
Root dispersion : 0.000007050 seconds
Update interval : 4.0 seconds
Leap status     : Normal
$ chronyc -n sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
#* GPS                           0   2   377     6   +262ns[ +307ns] +/- 1246ns
$ chronyc -n sourcestats
Name/IP Address            NP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==============================================================================
GPS                        18  11    69     +0.000      0.010     +1ns   209ns

The following graph shows the tracking offset and the error of the system clock measured with a more accurate reference clock (PPS signal connected to a hardware clock on the NIC).

server serial

The clock was off by about 20 microseconds most of the time. Most of this error is caused by hardware and software delays in timestamping of the interrupt triggered by the serial port. The main issue is stability of the delay. There are periods where it is significantly shorter, which causes the offset to jump by about 12 microsecond. The CPU was set to a constant frequency in this test. The jumps were probably caused by changes in the CPU load or changes in timing of some processes, which prevented it from entering a power-saving state before the interrupts and avoided the delay of waking up.

Disabling power-saving states (e.g. with the Linux kernel idle=poll option) would make the delay more stable, but it would increase the power consumption.

The server did not use hardware timestamping, which means a similar issue with interrupts impacted its software timestamps. The delay is sensitive to CPU load and also network load as NICs implement interrupt coalescing in order to reduce their rate. The following graph shows an example of errors in software receive and transmit timestamps.

swts error

On some NICs the coalescing can be limited or disabled with the ethtool -C command (on Linux) to improve the timestamping stability.

5. Server using reference clock on NIC

The best way to make a highly accurate stratum-1 NTP server is to connect the PPS signal to a software defined pin (SDP) on the NIC which is receiving requests and sending responses to NTP clients. This allows the PPS signal to be timestamped in hardware, avoiding the PCIe and interrupt delays, with the same clock as is timestamping NTP packets, which cancels out any asymmetry between the system clock and hardware clock in the server’s timestamps of NTP packets.

In this example the server has the Intel I210 card, which has a 6-pin header on the board exposing two SDPs (3.3V level) with the following layout:

+-------------+
| GND  | SDP0 |
+-------------+
| GND  | SDP1 |
+-------------+
|  ?   |  ?   |
+-------------+

A 16Hz PPS signal from a u-blox NEO-6M GPS receiver is connected to SDP0. The receiver is connected also to a USB port for the serial data to be processed by gpsd to provide the SHM 0 refclock needed for PPS locking. The timing stability of the received messages limits the maximum rate of PPS. At 16 Hz, the SHM 0 refclock needs to be accurate to 25 milliseconds in order for the PHC refclock to correctly and reliably lock to it.

The following command (executed when gpsd is running) configures the receiver to make 16 pulses per second with 50% duty cycle and compensate a 20ns antenna cable delay:

$ ubxtool -p CFG-TP5,0,20,0,1,16,0,2147483648,0,111

To improve stability of reading of the hardware clock, the CPU is set to a constant frequency with disabled boosting:

# cpupower frequency-set -g userspace -d 3600000 -u 3600000
# echo 0 > /sys/devices/system/cpu/cpufreq/boost

The server has the following configuration:

refclock PHC /dev/ptp0:extpps:pin=0 dpoll -4 poll -2 rate 16 width 0.03125 refid GPS lock NMEA maxlockage 32
refclock SHM 0 refid NMEA noselect offset 0.120 poll 6 delay 0.010
hwtimestamp * minpoll -4
makestep 0.1 3
allow
rtcsync
driftfile /var/lib/chrony/drift
leapsectz right/UTC

The extpps option enables external PPS timestamping on the PHC. The pin=0 setting selects the SDP0 pin. The dpoll option configures the driver to poll 16 times per second and with the poll option it provides a median measurement 4 times per second. The rate option specifies the 16Hz PPS rate. The width option is needed to filter falling edges in the PPS signal as the hardware clock timestamps both edges. It specifies 50% of the 16Hz PPS interval, matching the receiver PPS configuration. The maxlockage option is needed to enable locking of the PPS to the SHM refclock providing only one sample per second.

The offset option of the SHM 0 refclock compensates for the delay of messages received on the USB port. It needs to be measured carefully, e.g. against a known good NTP server. A wrong offset could cause the server to be off by an integer multiple of 62.5 milliseconds (1/16s).

The hardware timestamping errors are already compensated in the kernel igb driver for the I210.

Reports from chronyc:

$ chronyc -n tracking
Reference ID    : 47505300 (GPS)
Stratum         : 1
Ref time (UTC)  : Mon Jan 24 15:38:25 2022
System time     : 0.000000008 seconds slow of NTP time
Last offset     : +0.000000000 seconds
RMS offset      : 0.000000004 seconds
Frequency       : 0.696 ppm slow
Residual freq   : -0.000 ppm
Skew            : 0.015 ppm
Root delay      : 0.000000001 seconds
Root dispersion : 0.000002471 seconds
Update interval : 0.3 seconds
Leap status     : Normal
$ chronyc -n sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
#* GPS                           0  -2   377     1     -1ns[   -1ns] +/- 1308ns
#? NMEA                          0   6   377    46  -8397us[-8392us] +/- 5176us
$ chronyc -n sourcestats
Name/IP Address            NP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==============================================================================
GPS                         9   5     2     -0.000      0.014     -0ns     5ns
NMEA                        8   5   446     -0.027      9.862  -8848us   627us

The following graph shows the tracking offset:

server phc

It shows that chronyd can track the reference clock to about 20 nanoseconds. A better reference clock would be needed to measure the accuracy and stability. In this case they are probably limited by the GPS receiver - it is a cheap non-timing-grade model without a stabilised oscillator.

hosted by tuxfamily.org