Categories
Linux NTP PTP Raspberry Pi

Revisiting Microsecond Accurate NTP for Raspberry Pi with GPS PPS in 2025

Introduction

Almost four years ago, I wrote the original post to this seriesMicrosecond accurate NTP with a Raspberry Pi and PPS GPS. Some things have changed since then, some haven’t. PPS is still an excellent way to get a microsecond accurate Raspberry Pi. Some of the commands and configurations have changed. So let’s run through this again now in 2025. This will basically be the same post but with some updated commands and material suggestions so you can use your Raspberry Pi 5 as a nanosecond accurate PTP (precision time protocol – IEEE 1588) grandmaster.

ublox LEA-M8T sending PPS signals to Raspberry Pi 5 for extremely precise timing
testing various Chrony/ublox configurations evening of Feb 18 2025 – acheived 1 nanosecond (1 nanosecond = 0.000000001 seconds) accuracy! this was a somewhat lucky capture but the Pi is routinely in single digits, and skew hovers around 0.001-0.002 ppm error, which is 1-2 ppb error on the Pi clock

Original Introduction

Lots of acronyms in that title. If I expand them out, it says – “microsecond accurate network time protocol with a Raspberry Pi and global positioning system pulse per second”. What it means is you can get super accurate timekeeping (1 microsecond = 0.000001 seconds) with a Raspberry Pi and a GPS receiver that spits out pulses every second. By following this guide, you will your very own Stratum 1 NTP server at home!

Why would you need time this accurate at home?

You don’t. There aren’t many applications for this level of timekeeping in general, and even fewer at home. But this blog is called Austin’s Nerdy Things so here we are. Using standard, default internet NTP these days will get your computers to within 2-4 milliseconds of actual time (1 millisecond = 0.001 seconds). Pretty much every internet connected device these days has a way to get time from the internet. PPS gets you to the next SI prefix in terms of accuracy (milli -> micro), which means 1000x more accurate timekeeping. With some other tricks, you can get into the nanosecond range (also an upcoming post topic!).

Materials Needed

  • Raspberry Pi 5 – the 3’s ethernet is hung off a USB connection so while the 3 itself can get great time, it is severely limited in how accurate other machines can sync to it. Raspberry Pi 4 would work decently. But Raspberry Pi 5 supports Precision Time Protocol (PTP), which can get synchronizations down to double-digit nanoseconds. So get the 5. Ideally, your Pi isn’t doing much other than keeping time, so no need to get one with lots of memory.
  • A timing-specific GPS module – these have algorithms tuned to provide extremely precise PPS signals. For example, by default, they prefer satellites with higher elevations, and have special fixed position modes where they know they aren’t moving so they focus on providing the best time possible. u-blox devices, for instance, have a “survey-in” mode where the positions are essentially averaged over a specified amount of time and standard deviations to a singular, fixed location. Other options:
  • (slightly optional but much better results) – GPS antenna
  • Wires to connect it all up (5 wires needed – 5V/RX/TX/GND/PPS)
  • project box to stuff it all in – Temperature stability is super important for accurate time. There is a reason some of most accurate oscillators are called oven controlled crystal oscillators (OCXO) – they are extremely stable. This box keeps airflow from minutely cooling/heating the Pi.
raspberry pi for timing with PPS GPS NTP in project box with ds18b20 temperature sensor
I did say “stuffed” right? not joking here… I stuffed some newspaper on top to minimize airflow then closed it up. caption: raspberry pi for timing with PPS GPS NTP in project box with ds18b20 temperature sensor

Steps

0 – Update your Pi and install packages

This NTP guide assumes you have a Raspberry Pi ready to go.

You should update your Pi to latest before basically any project. We will install some other packages as well. pps-tools help us check that the Pi is receiving PPS signals from the GPS module. We also need GPSd for the GPS decoding of both time and position (and for ubxtools which we will use to survey-in). I use chrony instead of NTPd because it seems to sync faster than NTPd in most instances and also handles PPS without compiling from source (the default Raspbian NTP doesn’t do PPS) Installing chrony will remove ntpd.

sudo apt update
sudo apt upgrade
# this isn't really necessary, maybe if you have a brand new pi
# sudo rpi-update
sudo apt install pps-tools gpsd gpsd-clients chrony

1 – Add GPIO and module info where needed

In /boot/firmware/config.txt (changed from last post), add ‘dtoverlay=pps-gpio,gpiopin=18’ to a new line. This is necessary for PPS. If you want to get the NMEA data from the serial line, you must also enable UART and set the initial baud rate.

########## NOTE: at some point, the config file changed from /boot/config.txt to /boot/firmware/config.txt

sudo bash -c "echo '# the next 3 lines are for GPS PPS signals' >> /boot/firmware/config.txt"
sudo bash -c "echo 'dtoverlay=pps-gpio,gpiopin=18' >> /boot/firmware/config.txt"
sudo bash -c "echo 'enable_uart=1' >> /boot/firmware/config.txt"
sudo bash -c "echo 'init_uart_baud=9600' >> /boot/firmware/config.txt"

In /etc/modules, add ‘pps-gpio’ to a new line.

sudo bash -c "echo 'pps-gpio' >> /etc/modules"

Reboot

sudo reboot

Let’s also disable a bunch of stuff we don’t need:

2 – wire up the GPS module to the Pi

Disclaimer – I am writing this guide with a combination of Raspberry Pi 4/5 and Adafruit Ultimate GPS module, but will swap out with the LEA-M8T when it arrives.

Pin connections:

  1. GPS PPS to RPi pin 12 (GPIO 18)
  2. GPS VIN to RPi pin 2 or 4
  3. GPS GND to RPi pin 6
  4. GPS RX to RPi pin 8
  5. GPS TX to RPi pin 10
  6. see 2nd picture for a visual
Adafruit Ultimate GPS Breakout V3

3 – enable serial hardware port

Run raspi-config -> 3 – Interface options -> I6 – Serial Port -> Would you like a login shell to be available over serial -> No. -> Would you like the serial port hardware to be enabled -> Yes.

screenshot showing raspberry serial port (UART) enabled
screenshot showing raspberry serial port (UART) enabled

4 – verify PPS

First, check that PPS is loaded. You should see a single line showing pps_gpio:

lsmod | grep pps
austin@raspberrypi4:~ $ lsmod | grep pps
pps_gpio               12288  0

Now check for the actual PPS pulses. NOTE: you need at least 4 satellites locked on for PPS signal. The GPS module essentially has 4 unknowns – x, y, z, and time. You need three satellites minimum to solve x, y, and z and a forth for time. Exception for the timing modules – if they know their x, y, z via survey-in or fixed set location, they only need a single satellite for time!

sudo ppstest /dev/pps0
austin@raspberrypi4:~ $ sudo ppstest /dev/pps0
trying PPS source "/dev/pps0"
found PPS source "/dev/pps0"
ok, found 1 source(s), now start fetching data...
source 0 - assert 1739485509.000083980, sequence: 100 - clear  0.000000000, sequence: 0
source 0 - assert 1739485510.000083988, sequence: 101 - clear  0.000000000, sequence: 0
source 0 - assert 1739485511.000083348, sequence: 102 - clear  0.000000000, sequence: 0
source 0 - assert 1739485512.000086343, sequence: 103 - clear  0.000000000, sequence: 0
source 0 - assert 1739485513.000086577, sequence: 104 - clear  0.000000000, sequence: 0
^C
austin@raspberrypi4:~ $

5 – change GPSd boot options to start immediately

There are a couple options we need to tweak with GPSd to ensure it is available upon boot. This isn’t strictly necessary for PPS only operation, but if you want the general NMEA time information (i.e. not just the exact second marker from PPS), this is necessary.

Edit /etc/default/gpsd:

# USB might be /dev/ttyACM0
# serial might be /dev/ttyS0

# on raspberry pi 5 with raspberry pi os based on debian 12 (bookworm)
DEVICES="/dev/ttyAMA0 /dev/pps0"

# -n means start without a client connection (i.e. at boot)
GPSD_OPTIONS="-n"

# also start in general
START_DAEMON="true"

# Automatically hot add/remove USB GPS devices via gpsdctl
USBAUTO="true"

I’m fairly competent at using systemd and such in a Debian-based system, but there’s something about GPSd that’s a bit odd and I haven’t taken the time to figure out yet. So instead of enabling/restarting the service, reboot the whole Raspberry Pi.

sudo reboot

5 – check GPS for good measure

To ensure your GPS has a valid position, you can run gpsmon or cgps to check satellites and such. This check also ensures GPSd is functioning as expected. If your GPS doesn’t have a position solution, you won’t get a good time signal. If GPSd isn’t working, you won’t get any updates on the screen. The top portion will show the analyzed GPS data and the bottom portion will scroll by with the raw GPS sentences from the GPS module.

gpsmon is a bit easier to read for timing info, cgps is a bit easier to read for satellite info (and TDOP, timing dilution of precision, a measure of how accurate the GPS’s internal time determination is).

Here’s a screenshot from cgps showing the current status of my Adafruit Ultimate GPS inside my basement. There are 10 PRNs (satellites) seen, 8 used. It is showing “3D DGPS FIX”, which is the highest accuracy this module offers. The various *DOPs show the estimated errors. Official guides/docs usually say anything < 2.0 is ideal but lower is better. For reference, Arduplane (autopilot software for RC drones, planes) has a limit of 1.4 for HDOP. It will not permit takeoff with a value greater than 1.4. It is sort of a measure of how spread out the satellites are for that given measure. Evenly distributed around the sky is better for location, closer together is better for timing.

cgps
cgps screenshot showing 10 sats seen, 8 used, with a TDOP of 1.3
cgps showing 8 satellites used for this position determination, and a TDOP (time dilution of precision) of 1.31, which is decent. notably, cgps does not show the PPS offset

And for gpsmon, it shows both the TOFF, which is the time offset from the NMEA $GPZDA sentence (which will always come in late due to how long it takes the transmit the dozens of bytes over serial – example 79 byte sentence over 9600 bit per second link, which is super common for GPS modules = 79*(8 bits per byte + 1 start bit + 1 end bit)/9600 = 82.3 milliseconds) as well as the PPS offset. This particular setup is not actually using PPS at the moment. It also shows satellites and a few *DOPs but notably lacks TDOP.

gpsmon
gpsmon screenshot showing timeoff set as well as PPS offset
gpsmon showing 8 satellites used for the position with HDOP of 1.27. This indicates a decent position solution, but doesn’t say anything about the time solution.

Both gpsmon and cgps will stream the sentences received from the GPS module.

6 – configure chrony to use both NMEA and PPS signals

Now that we know our Raspberry Pi is receiving both the precision second marker (via PPS), as well as the time of day (TOD) data (via the NMEA $GPMRC and $GPZDA sentences), let’s set up chrony to use both sources for accurate time.

This can be done as a one step process, but it is better to gather some statistics about the delay on your own NMEA sentences. So, let’s add our reference sources and also enable logging for chrony.

In the chrony configuration file (/etc/chrony/chrony.conf), add the following near the existing server directives

# SHM refclock is shared memory driver, it is populated by GPSd and read by chrony
# it is SHM 0
# refid is what we want to call this source = NMEA
# offset = 0.000 means we do not yet know the delay
# precision is how precise this is. not 1e-3 = 1 millisecond, so not very precision
# poll 0 means poll every 2^0 seconds = 1 second poll interval
# filter 3 means take the average/median (forget which) of the 3 most recent readings. NMEA can be jumpy so we're averaging here
refclock SHM 0 refid NMEA offset 0.000 precision 1e-3 poll 0 filter 3

# PPS refclock is PPS specific, with /dev/pps0 being the source
# refid PPS means call it the PPS source
# lock NMEA means this PPS source will also lock to the NMEA source for time of day info
# offset = 0.0 means no offset... this should probably always remain 0
# poll 3 = poll every 2^3=8 seconds. polling more frequently isn't necessarily better
# trust means we trust this time. the NMEA will be kicked out as false ticker eventually, so we need to trust the combo
refclock PPS /dev/pps0 refid PPS lock NMEA offset 0.0 poll 3 trust

# also enable logging by uncommenting the logging line
log tracking measurement statistics

Restart chrony

sudo systemctl restart chrony

Now let’s check to see what Chrony thinks is happening:

chronyc sources

This screenshot was taken seconds after restarting chrony. The * in front of NMEA means that’s the currently selected source. This make sense since the PPS source hasn’t even been polled yet (see the 0 in the reach column). The ? in front of PPS means it isn’t sure about it yet.

Wait a minute or two and try again.

Now Chrony has selected PPS as the currently selected source with the * in front. And the NMEA source has been marked as a “false ticker” with the x in front. But since we trusted the PPS source, it’ll remain as the preferred source. Having two sources by itself is usually not advisable for using general internet NTP servers, since if they both disagree, Chrony can’t know which is right, hence >2 is recommended.

The relatively huge estimated error is because Chrony used the NMEA source first, which was quite a bit off of the PPS precise second marker (i.e. >100 millseconds off) and it takes time to average down to a more realistic number.

Since we turned on statistics, we can use that to set an exact offset for NMEA. After waiting a bit (an hour or so), you can cat /var/log/chrony/statistics.log:

austin@raspberrypi5:~ $ sudo cat /var/log/chrony/statistics.log
====================================================================================================================
   Date (UTC) Time     IP Address    Std dev'n Est offset  Offset sd  Diff freq   Est skew  Stress  Ns  Bs  Nr  Asym
====================================================================================================================
2025-02-14 17:38:01 NMEA             8.135e-02 -3.479e-01  3.529e-02 -5.572e-03  1.051e-02 7.6e-03  19   0   9  0.00
2025-02-14 17:38:03 NMEA             7.906e-02 -3.592e-01  3.480e-02 -5.584e-03  9.308e-03 1.1e-03  20   0   9  0.00
2025-02-14 17:38:04 NMEA             7.696e-02 -3.641e-01  3.252e-02 -5.550e-03  8.331e-03 3.6e-03  21   0  10  0.00
2025-02-14 17:38:05 NMEA             6.395e-02 -3.547e-01  2.809e-02 -4.309e-03  8.679e-03 1.5e-01  22   4   8  0.00
2025-02-14 17:38:02 PPS              7.704e-07 -1.301e-06  5.115e-07 -4.547e-08  2.001e-07 5.2e+00  15   9   5  0.00
2025-02-14 17:38:06 NMEA             3.150e-02 -2.932e-01  2.162e-02  1.269e-02  5.169e-02 2.0e+00  19  13   3  0.00
2025-02-14 17:38:08 NMEA             3.997e-02 -3.018e-01  3.025e-02  7.709e-03  6.287e-02 9.6e-02   7   1   4  0.00
2025-02-14 17:38:09 NMEA             4.024e-02 -3.000e-01  2.954e-02  6.534e-03  6.646e-02 1.9e-02   7   1   3  0.00
2025-02-14 17:38:10 NMEA             3.698e-02 -3.049e-01  2.471e-02  4.366e-03  3.917e-02 3.3e-02   7   0   5  0.00
2025-02-14 17:38:11 NMEA             3.695e-02 -3.189e-01  2.301e-02  1.258e-03  2.839e-02 7.9e-02   8   0   5  0.00
2025-02-14 17:38:13 NMEA             3.449e-02 -3.099e-01  2.239e-02  2.141e-03  1.940e-02 3.1e-02   9   0   6  0.00
2025-02-14 17:38:09 PPS              6.367e-07 -5.065e-07  3.744e-07 -6.500e-08  1.769e-07 9.9e-02   7   1   4  0.00

We are interested in the ‘Est offset’ (estimated offset) for the NMEA “IP Address”. Here’s a python script to run some numbers for you – just copy + paste the last 100 or so lines from the statistics.log file into a file named ‘chrony_statistics.log’ in the same directory as this python file:

import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime

def parse_chrony_stats(file_path):
    """
    Parse chrony statistics log file and return a pandas DataFrame
    """
    # read file contents first
    with open(file_path, 'r') as f:
        file_contents = f.readlines()

    # for each line, if it starts with '=' or ' ', skip it
    file_contents = [line for line in file_contents if not line.startswith('=') and not line.startswith(' ')]

    # exclude lines that include 'PPS'
    file_contents = [line for line in file_contents if 'PPS' not in line]

    # Use StringIO to create a file-like object from the filtered contents
    from io import StringIO
    csv_data = StringIO(''.join(file_contents))

    # Read the filtered data using pandas
    df = pd.read_csv(csv_data,
                     delim_whitespace=True,
                     names=['Date', 'Time', 'IP_Address', 'Std_dev', 'Est_offset', 'Offset_sd', 
                           'Diff_freq', 'Est_skew', 'Stress', 'Ns', 'Bs', 'Nr', 'Asym'])
    

    # Combine Date and Time columns into a datetime column
    df['timestamp'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
    
    return df

def plot_est_offset(df):
    """
    Create a plot of Est_offset vs time for each IP address
    """
    plt.figure(figsize=(12, 6))
    
    # Plot each IP address as a separate series
    for ip in df['IP_Address'].unique():
        ip_data = df[df['IP_Address'] == ip]
        plt.plot(ip_data['timestamp'], ip_data['Est_offset'], 
                marker='o', label=ip, linestyle='-', markersize=4)
    
    plt.xlabel('Time')
    plt.ylabel('Estimated Offset (seconds)')
    plt.title('Chrony Estimated Offset Over Time by IP Address')
    plt.legend()
    plt.grid(True)
    
    # Rotate x-axis labels for better readability
    plt.xticks(rotation=45)
    
    # Adjust layout to prevent label cutoff
    plt.tight_layout()
    
    return plt

def analyze_chrony_stats(file_path):
    """
    Main function to analyze chrony statistics
    """
    # Parse the data
    df = parse_chrony_stats(file_path)
    
    # Create summary statistics
    summary = {
        'IP Addresses': df['IP_Address'].nunique(),
        'Time Range': f"{df['timestamp'].min()} to {df['timestamp'].max()}",
        'Average Est Offset by IP': df.groupby('IP_Address')['Est_offset'].mean().to_dict(),
        'Max Est Offset by IP': df.groupby('IP_Address')['Est_offset'].max().to_dict(),
        'Min Est Offset by IP': df.groupby('IP_Address')['Est_offset'].min().to_dict(),
        'Median Est Offset by IP': df.groupby('IP_Address')['Est_offset'].median().to_dict()
    }
    
    # Create the plot
    plot = plot_est_offset(df)
    
    return df, summary, plot

# Example usage
if __name__ == "__main__":
    file_path = "chrony_statistics.log"  # Replace with your file path
    df, summary, plot = analyze_chrony_stats(file_path)
    
    # Print summary statistics
    print("\nChrony Statistics Summary:")
    print("-" * 30)
    print(f"Number of IP Addresses: {summary['IP Addresses']}")
    print(f"Time Range: {summary['Time Range']}")
    print("\nAverage Estimated Offset by IP:")
    for ip, avg in summary['Average Est Offset by IP'].items():
        print(f"{ip}: {avg:.2e}")

    print("\nMedian Estimated Offset by IP:")
    for ip, median in summary['Median Est Offset by IP'].items():
        print(f"{ip}: {median:.2e}")
    
    # Show the plot
    plt.show()

We get a pretty graph (and by pretty, I mean ugly – this is highly variable, with the slow 9600 default bits per second, the timing will actually be influenced by the number of seen/tracked satellites since we haven’t messed with what sentences are outputted) and some outputs.

matplotlib chart for chrony offset for NMEA source running at 9600 bps

And the avg/median offset:

Chrony Statistics Summary:
------------------------------
Number of IP Addresses: 1
Time Range: 2025-02-14 17:33:55 to 2025-02-14 17:38:26

Average Estimated Offset by IP:
NMEA: -2.71e-01

Median Estimated Offset by IP:
NMEA: -2.65e-01

So we need to pick a number here for the offset. They do not differ by much, 271 millseconds vs 265. Let’s just split the difference at 268. Very scientific. With this number, we can change the offset in the chrony config for the NMEA source. Make it positive to offset.

refclock SHM 0 refid NMEA offset 0.268 precision 1e-3 poll 0 filter 3

This usually works but I’m not getting good results so please refer to the previous post for how this should look. Turns out with the default sentences, some of the timing was attributed to 900-1000 milliseconds late, meaning the Pi was synchronizing to a full second late than actual. Couple options to resolve: increase baudrate, and reduce/eliminate unnecessary NMEA sentences. I increased the baudrate below, which won’t be necessary for any modules that have a baudrate higher than 9600 for default. If you don’t care about monitoring the GPS status, disable all sentences except for ZDA (time info).

I took an hour or so detour here to figure out how to change the baudrate on the MTK chip used in the Adafruit GPS module.

Long story short on the baudrate change:

austin@raspberrypi5:~ $ cat gps-baud-change.sh
#!/bin/bash

# Stop gpsd service and socket
sudo systemctl stop gpsd.service gpsd.socket

# Set the baud rate
sudo gpsctl -f -x "$PMTK251,38400*27\r\n" /dev/ttyAMA0


# Start gpsd back up
sudo systemctl start gpsd.service
#gpsd -n -s 38400 /dev/ttyAMA0 /dev/pps0

sudo systemctl restart chrony

How to automate this via systemd or whatever is the topic for another post. The GPS module will keep the baudrate setting until it loses power (so it’ll persist through a Pi reboot!).

Turns out that the offset needs to be 511 milliseconds for my Pi/Adafruit GPS at 38400 bps:

austin@raspberrypi5:~ $ sudo cat /etc/chrony/chrony.conf
refclock SHM 0 refid NMEA offset 0.511 precision 1e-3 poll 2
refclock PPS /dev/pps0 refid PPS lock NMEA poll 2 trust

Reboot and wait a few minutes.

7 – verify Chrony is using PPS and NMEA

Now we can check what Chrony is using for sources with

chronyc sources
# or if you want to watch it change as it happens
watch -n 1 chronyc sources
chronyc sources showing a lock on PPS (denoted with *) and false ticker on NMEA (denoted with x), which is the expected and desired status after a couple minutes

Many people asked how to get both time/NMEA and PPS from a single GPS receiver (i.e. without a 3rd source) and this is how. The keys are the lock directive as well as the trust directive on the PPS source.

8 – results

Check what chrony thinks of the system clock with

chronyc tracking

Here we see a few key items:

  • System time – this is updated every second with what Chrony thinks the time vs what the system time is, we are showing 64 nanoseconds
  • Last offset – how far off the system clock was at last update (from whatever source is selected). I got lucky with this capture, which shows 0 nanoseconds off
  • RMS offset – a long term average of error. I expect this to get to low double-digit nanoseconds. Decreasing further is the topic of the next post.
  • Frequency – the drift of the system clock. This number can kind of be whatever, as long as it is stable, but the closer to zero, the better. There is always a temperature correlation with the oscillator temperature vs frequency. This is what chrony is constantly correcting.
  • Residual frequency – difference from what the frequency is and what it should be (as determined by the selected source)
  • Skew – error in the frequency – lower is better. Less than 0.05 is very stable.
  • Root delay/dispersion – basically how far from the “source” of truth your chrony is
  • Update interval – self explanatory

9 – Grafana dashboard showing Chrony stats

And to track the results over time, I feed the Chrony data to InfluxDB via Telegraf. Another topic for a future post. The dashboard looks like this:

Here we can see a gradual increase in the frequency on the Raspberry Pi 5 system clock. The offsets are almost always within 1 microsecond, with average of 16.7 nanoseconds. The spikes in skew correspond to the spikes in offsets. Something is happening on the Pi to probably spike CPU loading (even though I have the CPU throttled to powersave mode), which speeds things up and affects the timing via either powerstate transitions or oscillator temperature changes or both.

Conclusion

In 2025, a GPS sending PPS to Raspberry Pi is still a great way to get super accurate time. In this Chrony config, I showed how to get time of day, as well as precision seconds without an external source. Our offsets are well under one microsecond.

In the next post, we will examine how to maximize the performance (by minimizing the frequency skew!) of our Raspberry Pi/PPS combination.

And for the post after that – here’s a preview of using PTP from an Oscilloquartz OSA 5401 SyncPlug. Note the standard deviations and offsets. This device has a OCXO – oven controlled crystal oscillator – that has frequency stability measured in ppb (parts per billion). It also has a NEO-M8T timing chip, the same one I mentioned in the beginning of this post.

screenshot showing three terminals up – 1) chrony sources, with a PHC (physical hardware clock) in the NIC. error is shown as +/- 38 nanoseconds. 2) chrony source stats, showing a standard deviation of 2 nanoseconds for that same source and 3) linuxptp synchronizing the PHC from the OSA 5401 SyncPlug with all numbers shown in nanoseconds. rms is error of the PHC from the SyncPlug, max is max offset, freq is pretty bad for this oscillator at -49xxx PPM, delay is ethernet delay (3.1 microseconds)

The OSA 5401 SyncPlug is quite difficult to come by (I scored mine for $20 – shoutout to the servethehome.com forums! this device likely had a list price in the thousands) so I’ll also show how to just set up a PTP grandmaster (that’s the official term) on your Raspberry Pi.

Next Steps

  • Document commands to set ublox module to 16 Hz timepulses
  • Document commands to set ublox to survey-in upon power on
  • Document commands to set ublox to use GPS + Beidou + Galileo
  • Document Chrony config to use 16 Hz timepulses
  • Configure Pi to use performance CPU governor to eliminate CPU state switch latency
  • Telegraf/InfluxDB/Grafana configuration for monitoring
  • Temperature-controlled enclosure
Categories
Linux Ubuntu

Proxmox Ubuntu 22.04 Jammy LTS Cloud-init Image Script

Been a while since I last posted. For basically everything I do Linux-wise, I use Ubuntu. Specifically the Long Term Support versions, which are released on even years in April (April 2020, April 2022, etc.), hence the 22.04 designation of the latest LTS, Jammy. I had standardized on 20.04 (Focal) for everything, but am finding that various packages I use that I expect to be available on 20.04, aren’t. But they are present in standard, default package lists for Jammy 22.04. Last night I was creating a new VM to transfer this very blog to my shiny, new co-located server. I was going to use the Ubuntu 20.04 Proxmox Cloud Init Image Script template I wrote up a bit ago. Then I stopped and realized I should update it. So I changed 2004 everywhere in the script to 2204 and ‘focal’ to ‘jammy’, reran it, and now I have a fancy new cloud init script for Ubuntu 22.04. Note that this script does require you to install libguestfs-tools, as indicated in the previous cloud-init post.

ChatGPT enhanced script as of 2023:

#!/bin/bash

##############################################################################
# things to double-check:
# 1. user directory
# 2. your SSH key location
# 3. which bridge you assign with the create line (currently set to vmbr100)
# 4. which storage is being utilized (script uses local-zfs)
##############################################################################

DISK_IMAGE="jammy-server-cloudimg-amd64.img"
IMAGE_URL="https://cloud-images.ubuntu.com/jammy/current/$DISK_IMAGE"

# Function to check if a file was modified in the last 24 hours or doesn't exist
should_download_image() {
    local file="$1"
    # If file doesn't exist, return true (i.e., should download)
    [[ ! -f "$file" ]] && return 0

    local current_time=$(date +%s)
    local file_mod_time=$(stat --format="%Y" "$file")
    local difference=$(( current_time - file_mod_time ))
    # If older than 24 hours, return true
    (( difference >= 86400 ))
}

# Download the disk image if it doesn't exist or if it was modified more than 24 hours ago
if should_download_image "$DISK_IMAGE"; then
    rm -f "$DISK_IMAGE"
    wget -q "$IMAGE_URL"
fi

sudo virt-customize -a "$DISK_IMAGE" --install qemu-guest-agent
sudo virt-customize -a "$DISK_IMAGE" --ssh-inject root:file:/home/austin/id_rsa.pub

if sudo qm list | grep -qw "9022"; then
    sudo qm destroy 9022
fi

sudo qm create 9022 --name "ubuntu-2204-cloudinit-template" --memory 2048 --cores 2 --net0 virtio,bridge=vmbr100
sudo qm importdisk 9022 "$DISK_IMAGE" local-zfs
sudo qm set 9022 --scsihw virtio-scsi-pci --scsi0 local-zfs:vm-9022-disk-0
sudo qm set 9022 --boot c --bootdisk scsi0
sudo qm set 9022 --ide2 local-zfs:cloudinit
sudo qm set 9022 --serial0 socket --vga serial0
sudo qm set 9022 --agent enabled=1
sudo qm template 9022

echo "Next up, clone VM, then expand the disk"
echo "You also still need to copy ssh keys to the newly cloned VM"

Original, non-ChatGPT script:

###############
# things to double-check:
# 1. user directory
# 2. your SSH key location
# 3. which bridge you assign with the create line (currently set to vmbr100)
# 4. which storage is being utilized (script uses local-zfs)
###############

rm -f jammy-server-cloudimg-amd64.img
wget -q https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img
sudo virt-customize -a jammy-server-cloudimg-amd64.img --install qemu-guest-agent
sudo virt-customize -a jammy-server-cloudimg-amd64.img --ssh-inject root:file:/home/austin/id_rsa.pub
sudo qm destroy 9022
sudo qm create 9022 --name "ubuntu-2204-cloudinit-template" --memory 2048 --cores 2 --net0 virtio,bridge=vmbr100
sudo qm importdisk 9022 jammy-server-cloudimg-amd64.img local-zfs
sudo qm set 9022 --scsihw virtio-scsi-pci --scsi0 local-zfs:vm-9022-disk-0
sudo qm set 9022 --boot c --bootdisk scsi0
sudo qm set 9022 --ide2 local-zfs:cloudinit
sudo qm set 9022 --serial0 socket --vga serial0
sudo qm set 9022 --agent enabled=1
sudo qm template 9022
rm -f jammy-server-cloudimg-amd64.img
echo "next up, clone VM, then expand the disk"
echo "you also still need to copy ssh keys to the newly cloned VM"
Proxmox Ubuntu 22.04 cloud-init image script
Proxmox Ubuntu 22.04 cloud-init image script

I suppose for a full automation I should get user with a whoami command, assign it to a variable, and use that throughout the script. That’s a task for another day.

I also added a line in my crontab to have this script run weekly, so the image is essentially always up to date. I’ll run both 20.04 and 22.04 in parallel for a bit but anticipate that I can turn off the 20.04 job fairly soon.

# m h  dom mon dow   command
52 19 * * TUE sleep $(( RANDOM \% 60 )); /usr/bin/bash /home/austin/ubuntu-2004-cloud-init-create.sh >> /home/austin/ubuntu-template.log 2>&1
52 20 * * TUE sleep $(( RANDOM \% 60 )); /usr/bin/bash /home/austin/ubuntu-2204-cloud-init-create.sh >> /home/austin/ubuntu-2204-template.log 2>&1
Categories
Ansible homelab Kubernetes Linux proxmox Terraform

Deploying a Kubernetes Cluster within Proxmox using Ansible

Introduction / Background

This post has been a long time coming. I apologize for how long it’s taken. I noticed that many other blogs left off at a similar position as I did. Get the VMs created then…. nothing. Creating a Kubernetes cluster locally is a much cheaper (read: basically free) option to learn how Kubes works compared to a cloud-hosted solution or a full-blown Kubernetes engine/solution, such as AWS Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE).

Anyways, I finally had some time to complete the tutorial series so here we are. Since the last post, my wife and I are now expecting our 2nd kid, I put up a new solar panel array, built our 1st kid a new bed, messed around with MacOS Monterey on Proxmox, built garden boxes, and a bunch of other stuff. Life happens. So without much more delay let’s jump back in.

Here’s a screenshot of the end state Kubernetes Dashboard showing our nodes:

ur Proxmox VM nodes deployed via Terraform
Kubernetes Dashboard showing our Proxmox VM nodes deployed via Terraform

Current State

If you’ve followed the blog series so far, you should have four VMs in your Proxmox cluster ready to go with SSH keys set, the hard drive expanded, and the right amount of vCPUs and memory allocated. If you don’t have those ready to go, take a step back (Deploying Kubernetes VMs in Proxmox with Terraform) and get caught up. We’re not going to use the storage VM. Some guides I followed had one but I haven’t found a need for it yet so we’ll skip it.

VMs in Proxmox ready for Kubernetes installation

Ansible

What is Ansible

If you ask DuckDuckGo to define ansible, it will tell you the following: “A hypothetical device that enables users to communicate instantaneously across great distances; that is, a faster-than-light communication device.”

In our case, it is “a open-source software provisioning, configuration management, and application-deployment tool enabling infrastructure as code.”

We will thus be using Ansible to run the initial Kubernetes set up steps on every machine, initialize the cluster on the master, and join the cluster on the workers/agents.

Initial Ansible Housekeeping

First we need to specify some variables similar to how we did it with Terraform. Create a file in your working directory called ansible-vars.yml and put the following into it:

# specifying a CIDR for our cluster to use.
# can be basically any private range except for ranges already in use.
# apparently it isn't too hard to run out of IPs in a /24, so we're using a /22
pod_cidr: "10.16.0.0/22"

# this defines what the join command filename will be
join_command_location: "join_command.out"

# setting the home directory for retreiving, saving, and executing files
home_dir: "/home/ubuntu"

Equally as important (and potentially a better starting point than the variables) is defining the hosts. In ansible-hosts.txt:

# this is a basic file putting different hosts into categories
# used by ansible to determine which actions to run on which hosts

[all]
10.98.1.41
10.98.1.51
10.98.1.52

[kube_server]
10.98.1.41

[kube_agents]
10.98.1.51
10.98.1.52

[kube_storage]
#10.98.1.61

Checking Ansible can communicated with our hosts

Let’s pause here and make sure Ansible can communicate with our VMs. We will use a simple built-in module named ‘ping’ to do so. The below command broken down:

  • -i ansible-hosts.txt – use the ansible-hosts.txt file
  • all – run the command against the [all] block from the ansible-hosts.txt file
  • -u ubuntu – log in with user ubuntu (since that’s what we set up with the Ubuntu 20.04 Cloud Init template). if you don’t use -u [user], it will use your currently logged in user to attempt to SSH.
  • -m ping – run the ping module
ansible -i ansible-hosts.txt all -u ubuntu -m ping

If all goes well, you will receive “ping”: “pong” for each of the VMs you have listed in the [all] block of the ansible-hosts.txt file.

Using Ansible’s ping to check communications with each of the VMs for deployment

Potential SSH errors

If you’ve previously SSH’d to these IPs and have subsequently destroyed/re-created them, you will get scary sounding SSH errors about remote host identification has changed. Run the suggested ssh-keygen -f command for each of the IPs to fix it.

You might also have to SSH into each of the hosts to accept the host key. I’ve done this whole procedure a couple times so I don’t recall what will pop up first attempt.

SSH remote host identification has changed error. Run suggested ssh-keygen -f command to resolve.
ssh-keygen -f "/home/<username_here>/.ssh/known_hosts" -R "10.98.1.41"
ssh-keygen -f "/home/<username_here>/.ssh/known_hosts" -R "10.98.1.51"
ssh-keygen -f "/home/<username_here>/.ssh/known_hosts" -R "10.98.1.52"
ssh-keygen -f "/home/<username_here>/.ssh/known_hosts" -R "10.98.1.61"

Installing Kubernetes dependencies with Ansible

Then we need a script to install the dependencies and the Kubernetes utilities themselves. This script does quite a few things. Gets apt ready to install things, adding the Docker & Kubernetes signing key, installing Docker and Kubernetes, disabling swap, and adding the ubuntu user to the Docker group.

ansible-install-kubernetes-dependencies.yml:

# https://kubernetes.io/blog/2019/03/15/kubernetes-setup-using-ansible-and-vagrant/
# https://github.com/virtualelephant/vsphere-kubernetes/blob/master/ansible/cilium-install.yml#L57

# ansible .yml files define what tasks/operations to run

---
- hosts: all # run on the "all" hosts category from ansible-hosts.txt
  # become means be superuser
  become: true
  remote_user: ubuntu
  tasks:
  - name: Install packages that allow apt to be used over HTTPS
    apt:
      name: "{{ packages }}"
      state: present
      update_cache: yes
    vars:
      packages:
      - apt-transport-https
      - ca-certificates
      - curl
      - gnupg-agent
      - software-properties-common

  - name: Add an apt signing key for Docker
    apt_key:
      url: https://download.docker.com/linux/ubuntu/gpg
      state: present

  - name: Add apt repository for stable version
    apt_repository:
      repo: deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable
      state: present

  - name: Install docker and its dependecies
    apt: 
      name: "{{ packages }}"
      state: present
      update_cache: yes
    vars:
      packages:
      - docker-ce 
      - docker-ce-cli 
      - containerd.io
      
  - name: verify docker installed, enabled, and started
    service:
      name: docker
      state: started
      enabled: yes
      
  - name: Remove swapfile from /etc/fstab
    mount:
      name: "{{ item }}"
      fstype: swap
      state: absent
    with_items:
      - swap
      - none

  - name: Disable swap
    command: swapoff -a
    when: ansible_swaptotal_mb >= 0
    
  - name: Add an apt signing key for Kubernetes
    apt_key:
      url: https://packages.cloud.google.com/apt/doc/apt-key.gpg
      state: present

  - name: Adding apt repository for Kubernetes
    apt_repository:
      repo: deb https://apt.kubernetes.io/ kubernetes-xenial main
      state: present
      filename: kubernetes.list

  - name: Install Kubernetes binaries
    apt: 
      name: "{{ packages }}"
      state: present
      update_cache: yes
    vars:
      packages:
        # it is usually recommended to specify which version you want to install
        - kubelet=1.23.6-00
        - kubeadm=1.23.6-00
        - kubectl=1.23.6-00
        
  - name: hold kubernetes binary versions (prevent from being updated)
    dpkg_selections:
      name: "{{ item }}"
      selection: hold
    loop:
      - kubelet
      - kubeadm
      - kubectl
        
# this has to do with nodes having different internal/external/mgmt IPs
# {{ node_ip }} comes from vagrant, which I'm not using yet
#  - name: Configure node ip - 
#    lineinfile:
#      path: /etc/default/kubelet
#      line: KUBELET_EXTRA_ARGS=--node-ip={{ node_ip }}

  - name: Restart kubelet
    service:
      name: kubelet
      daemon_reload: yes
      state: restarted
      
  - name: add ubuntu user to docker
    user:
      name: ubuntu
      group: docker
  
  - name: reboot to apply swap disable
    reboot:
      reboot_timeout: 180 #allow 3 minutes for reboot to happen

With our fresh VMs straight outta Terraform, let’s now run the Ansible script to install the dependencies.

Ansible command to run the Kubernetes dependency playbook (pretty straight-forward: the -i is to input the hosts file, then the next argument is the playbook file itself):

ansible-playbook -i ansible-hosts.txt ansible-install-kubernetes-dependencies.yml

It’ll take a bit of time to run (1m26s in my case). If all goes well, you will be presented with a summary screen (called PLAY RECAP) showing some items in green with status ok and some items in orange with status changed. I got 13 ok’s, 10 changed’s, and 1 skipped.

Ansible play recap showing successful Kubernetes dependencies installation

Initialize the Kubernetes cluster on the master

With the dependencies installed, we can now proceed to initialize the Kubernetes cluster itself on the server/master machine. This script sets docker to use systemd cgroups driver (don’t recall what the alternative is at the moment but this was the easiest of the alternatives), initializes the cluster, copies the cluster files to the ubuntu user’s home directory, installs Calico networking plugin, and the standard Kubernetes dashboard.

ansible-init-cluster.yml:

- hosts: kube_server
  become: true
  remote_user: ubuntu
  
  vars_files:
    - ansible-vars.yml
    
  tasks:
  - name: set docker to use systemd cgroups driver
    copy:
      dest: "/etc/docker/daemon.json"
      content: |
        {
          "exec-opts": ["native.cgroupdriver=systemd"]
        }
  - name: restart docker
    service:
      name: docker
      state: restarted
    
  - name: Initialize Kubernetes cluster
    command: "kubeadm init --pod-network-cidr {{ pod_cidr }}"
    args:
      creates: /etc/kubernetes/admin.conf # skip this task if the file already exists
    register: kube_init
    
  - name: show kube init info
    debug:
      var: kube_init
      
  - name: Create .kube directory in user home
    file:
      path: "{{ home_dir }}/.kube"
      state: directory
      owner: 1000
      group: 1000

  - name: Configure .kube/config files in user home
    copy:
      src: /etc/kubernetes/admin.conf
      dest: "{{ home_dir }}/.kube/config"
      remote_src: yes
      owner: 1000
      group: 1000
      
  - name: restart kubelet for config changes
    service:
      name: kubelet
      state: restarted
      
  - name: get calico networking
    get_url:
      url: https://projectcalico.docs.tigera.io/manifests/calico.yaml
      dest: "{{ home_dir }}/calico.yaml"
      
  - name: apply calico networking
    become: no
    command: kubectl apply -f "{{ home_dir }}/calico.yaml"
    
  - name: get dashboard
    get_url:
      url: https://raw.githubusercontent.com/kubernetes/dashboard/v2.5.0/aio/deploy/recommended.yaml
      dest: "{{ home_dir }}/dashboard.yaml"
    
  - name: apply dashboard
    become: no
    command: kubectl apply -f "{{ home_dir }}/dashboard.yaml"

Initializing the cluster took 53s on my machine. One of the first tasks is to download the images which takes the majority of the duration. You should get 13 ok and 10 changed with the init. I had two extra user check tasks because I was fighting some issues with applying the Calico networking.

ansible-playbook -i ansible-hosts.txt ansible-init-cluster.yml
Successful Kubernetes init execution showing join token at the bottom

Getting the join command and joining worker nodes

With the master up and running, we need to retrieve the join command. I chose to save the command locally and read the file in a subsequent Ansible playbook. This could certainly be combined into a single playbook.

ansible-get-join-command.yaml –

- hosts: kube_server
  become: false
  remote_user: ubuntu
  
  vars_files:
    - ansible-vars.yml
    
  tasks:
  - name: Extract the join command
    become: true
    command: "kubeadm token create --print-join-command"
    register: join_command
    
  - name: show join command
    debug:
      var: join_command
      
  - name: Save kubeadm join command for cluster
    local_action: copy content={{ join_command.stdout_lines | last | trim }} dest={{ join_command_location }} # defaults to your local cwd/join_command.out

And for the command:

ansible-playbook -i ansible-hosts.txt ansible-get-join-command.yml
Successfully retrieved the join command and saved it to the local machine

Now to join the workers/agents, our Ansible playbook will read that join_command.out file and use it to join the cluster.

ansible-join-workers.yml –

- hosts: kube_agents
  become: true
  remote_user: ubuntu
  
  vars_files:
    - ansible-vars.yml
    
  tasks:
  - name: set docker to use systemd cgroups driver
    copy:
      dest: "/etc/docker/daemon.json"
      content: |
        {
          "exec-opts": ["native.cgroupdriver=systemd"]
        }
  - name: restart docker
    service:
      name: docker
      state: restarted
    
  - name: read join command
    debug: msg={{ lookup('file', join_command_location) }}
    register: join_command_local
    
  - name: show join command
    debug:
      var: join_command_local.msg
      
  - name: join agents to cluster
    command: "{{ join_command_local.msg }}"

And to actually join:

ansible-playbook -i ansible-hosts.txt ansible-join-workers.yml
Two worker agents successfully joined to the cluster

With the two worker nodes/agents joined up to the cluster, you now have a full on Kubernetes cluster up and running! Wait a few minutes, then log into the server and run kubectl get nodes to verify they are present and active (status = Ready):

kubectl get nodes
‘kubectl get nodes’ showing our nodes as ready

Kubernetes Dashboard

Everyone likes a dashboard. Kubernetes has a good one for poking/prodding around. It appears to basically be a visual representation of most (all?) of the “get information” types of command you can run with kubectl (kubectl get nodes, get pods, describe stuff, etc.).

The dashboard was installed with the cluster init script but we still need to create a service account and cluster role binding for the dashboard. These steps are from https://github.com/kubernetes/dashboard/blob/master/docs/user/access-control/creating-sample-user.md. NOTE: the docs state it is not recommended to give admin privileges to this service account. I’m still figuring out Kubernetes privileges so I’m going to proceed anyways.

Dashboard user/role creation

On the master machine, create a file called sa.yaml with the following contents:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: admin-user
  namespace: kubernetes-dashboard

And another file called clusterrole.yaml:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: admin-user
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: admin-user
  namespace: kubernetes-dashboard

Apply both, then get the token to be used for logging in. The last command will spit out a long string. Copy it starting at ‘ey’ and ending before the username (ubuntu). In the screenshot I have highlighted which part is the token

kubectl apply -f sa.yaml
kubectl apply -f clusterrole.yaml
kubectl -n kubernetes-dashboard get secret $(kubectl -n kubernetes-dashboard get sa/admin-user -o jsonpath="{.secrets[0].name}") -o go-template="{{.data.token | base64decode}}"
Applying both templates and getting the user’s token

SSH Tunnel & kubectl proxy

At this point, the dashboard has been running for a while. We just can’t get to it yet. There are two distinct steps that need to happen. The first is to create a SSH tunnel between your local machine and a machine in the cluster (we will be using the master). Then, from within that SSH session, we will run kubectl proxy to expose the web services.

SSH command – the master’s IP is 10.98.1.41 in this example:

ssh -L 8001:127.0.0.1:8001 [email protected]

The above command will open what appears to be a standard SSH session but the tunnel is running as well. Now execute kubectl proxy:

Kubernetes SSH tunnel & kubectl proxy output

The Kubernetes Dashboard

At this point, you should be able to navigate to the dashboard page from a web browser on your local machine (http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/) and you’ll be prompted for a log in. Make sure the token radio button is selected and paste in that long token from earlier. It expires relatively quickly (couple hours I think) so be ready to run the token retrieval command again.

Kubernetes dashboard login with token

The default view is for the “default” namespace which has nothing in it at this point. Change it to All namespaces for more details:

Kubernetes dashboard all namespaces

From here you can see information about everything in the cluster:

Kubernetes dashboard showing relatively default workloads

Conclusion

With this last post, we have concluded the journey from creating a Ubuntu cloud-init image in Proxmox, using Terraform to deploy Kubernetes VMs in Proxmox, all the way through deploying an actual Kubernetes cluster in Proxmox using Ansible. Hope you found this useful!

Video link coming soon.

Discussion

For discussion, either leave a comment here or if you’re a Reddit user, head on over to https://www.reddit.com/r/austinsnerdythings/comments/ubsk1i/i_made_a_tutorial_showing_how_to_deploy_a/.

References

https://kubernetes.io/blog/2019/03/15/kubernetes-setup-using-ansible-and-vagrant/

https://github.com/virtualelephant/vsphere-kubernetes

Categories
Linux NTP Raspberry Pi

Millisecond accurate Chrony NTP with a USB GPS for $12 USD

Introduction

Building off my last NTP post (Microsecond accurate NTP with a Raspberry Pi and PPS GPS), which required a $50-60 GPS device and a Raspberry Pi (also $40+), I have successfully tested something much cheaper, that is good enough, especially for initial PPS synchronization. Good enough, in this case, is defined as +/- 10 milliseconds, which can easily be achieved using a basic USB GPS device: GT-U7. Read on for instructions on how to set up the USB GPS as a Stratum 1 NTP time server.

YouTube Video Link

https://www.youtube.com/watch?v=DVtmDFpWkEs

Microsecond PPS time vs millisecond USB time

How accurate of time do you really need? The last post showed how to get all devices on a local area network (LAN) within 0.1 milliseconds of “real” time. Do you need you equipment to be that accurate to official atomic clock time (12:03:05.0001)? Didn’t think so. Do you care if every device is on the correct second compared to official/accurate time (12:03:05)? That’s a lot more reasonable. Using a u-blox USB GPS can get you to 0.01 seconds of official. The best part about this? The required USB GPS units are almost always less than $15 and you don’t need a Raspberry Pi.

Overview

This post will show how to add a u-blox USB GPS module to NTP as a driver or chrony (timekeeping daemon) as a reference clock (using GPSd shared memory for both) and verify the accuracy is within +/- 10 milliseconds.

Materials needed

  • USB u-blox GPS (VK-172 or GT-U7), GT-U7 preferred because it has a micro-USB plug to connect to your computer. It is important to note that both of these are u-blox modules, which has a binary data format as well as a high default baudrate (57600). These two properties allow for quick transmission of each GPS message from GPS to computer.
  • 15-30 minutes

Steps

1 – Update your host machine and install packages

This tutorial is for Linux. I use Ubuntu so we utilize Aptitude (apt) for package management:

sudo apt update
sudo apt upgrade
sudo rpi-update
sudo apt install gpsd gpsd-clients python-gps chrony

2 – Modify GPSd default startup settings

In /etc/default/gpsd, change the settings to the following:

# Start the gpsd daemon automatically at boot time
START_DAEMON="true"

# Use USB hotplugging to add new USB devices automatically to the daemon
USBAUTO="true"

# Devices gpsd should collect to at boot time.
# this could also be /dev/ttyUSB0, it is ACM0 on raspberry pi
DEVICES="/dev/ttyACM0"

# -n means start listening to GPS data without a specific listener
GPSD_OPTIONS="-n"

Reboot with sudo reboot.

3a – Chrony configuration (if using NTPd, go to 3b)

I took the default configuration, added my 10.98 servers, and more importantly, added a reference clock (refclock). Link to chrony documentation here. Arguments/parameters of this configuration file:

  • 10.98.1.198 is my microsecond accurate PPS NTP server
  • iburst means send a bunch of synchronization packets upon service start so accurate time can be determined much faster (usually a couple seconds)
  • maxpoll (and minpoll, which isn’t used in this config) is how many seconds to wait between polls, defined by 2^x where x is the number in the config. maxpoll 6 means don’t wait more than 2^6=64 seconds between polls
  • refclock is reference clock, and is the USB GPS source we are adding
    • ‘SHM 0’ means shared memory reference 0, which means it is checking with GPSd using shared memory to see what time the GPS is reporting
    • ‘refid NMEA’ means name this reference ‘NMEA’
    • ‘offset 0.000’ means don’t offset this clock source at all. We will change this later
    • ‘precision 1e-3’ means this reference is only accurate to 1e-3 (0.001) seconds, or 1 millisecond
    • ‘poll 3’ means poll this reference every 2^3 = 8 seconds
    • ‘noselect’ means don’t actually use this clock as a source. We will be measuring the delta to other known times to set the offset and make the source selectable.
pi@raspberrypi:~ $ sudo cat /etc/chrony/chrony.conf
# Welcome to the chrony configuration file. See chrony.conf(5) for more
# information about usuable directives.
#pool 2.debian.pool.ntp.org iburst
server 10.98.1.1 iburst maxpoll 6
server 10.98.1.198 iburst maxpoll 6
server 10.98.1.15 iburst

refclock SHM 0 refid NMEA offset 0.000 precision 1e-3 poll 3 noselect

Restart chrony with sudo systemctl restart chrony.

3b – NTP config

Similar to the chrony config, we need to add a reference clock (called a driver in NTP). For NTP, drivers are “servers” that start with an address of 127.127. The next two octets tell what kind of driver it is. The .28 driver is the shared memory driver, same theory as for chrony. For a full list of drivers, see the official NTP docs. To break down the server:

  • ‘server 127.127.28.0’ means use the .28 (SHM) driver
    • minpoll 4 maxpoll 4 means poll every 2^4=16 seconds
    • noselect means don’t use this for time. Similar to chrony, we will be measuring the offset to determine this value.
  • ‘fudge 127.127.28.0’ means we are going to change some properties of the listed driver
    • ‘time1 0.000’is the time offset calibration factor, in seconds
    • ‘stratum 2’ means list this source as a stratum 2 source (has to do with how close the source is to “true” time), listing it as 2 means other, higher stratum sources will be selected before this one will (assuming equal time quality)
    • ‘refid GPS’ means rename this source as ‘GPS’
server 127.127.28.0 minpoll 4 maxpoll 4 noselect
fudge 127.127.28.0 time1 0.000 stratum 2 refid GPS

Restart NTPd with sudo systemctl restart ntp.

4 – check time offset via gpsmon

Running gpsmon shows us general information about the GPS, including time offset. The output looks like the below screenshot. Of importance is the satellite count (on right, more is better, >5 is good enough for time), HDOP (horizontal dilution of precision) is a measure of how well the satellites can determine your position (lower is better, <2 works for basically all navigation purposes), and TOFF (time offset).

gpsmon showing time offset for a USB GPS

In this screenshot the TOFF is 0.081862027, which is 81.8 milliseconds off the host computer’s time. Watch this for a bit – it should hover pretty close to a certain value +/- 10ms. In my case, I’ve noticed that if there are 10 or less satellites locked on, it is around 77ms. If there are 11 or more, it is around 91ms (presumably due to more satellite information that needs to be transmitted).

5 – record statistics for a data-driven offset

If you are looking for a better offset value to put in the configuration file, we can turn on logging from either chrony or NTPd to record source information.

For chrony:

Edit /etc/chrony/chrony.conf and uncomment the line for which kinds of logging to turn on:

# Uncomment the following line to turn logging on.
log tracking measurements statistics

Then restart chrony (sudo systemctl restart chrony) and logs will start writing to /var/log/chrony (this location is defined a couple lines below the log line in chrony.conf):

pi@raspberrypi:~ $ ls /var/log/chrony
measurements.log  statistics.log  tracking.log

For NTPd (be sure to restart it after making any configuration changes):

austin@prox-3070 ~ % cat /etc/ntp.conf
# Enable this if you want statistics to be logged.
statsdir /var/log/ntpstats/

statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable

Wait a few minutes for some data to record (chrony synchronizes pretty quick compared to NTPd) and check the statistics file, filtered to our NMEA refid:

cat /var/log/chrony/statistics.log | grep NMEA

This spits out the lines that have NMEA present (the ones of interest for our USB GPS). To include the headers to show what each column is we can run

# chrony
cat /var/log/chrony/statistics.log | head -2; cat /var/log/chrony/statistics.log | grep NMEA

# ntp, there is no header info so we can omit that part of the command
cat /var/log/peerstats | grep 127.127.28.0
Screenshot showing chrony statistics for our NMEA USB GPS refclock

NTP stats don’t include header information. The column of interest is the one after the 9014 column. The columns are day, seconds past midnight, source, something, estimated offset, something, something, something. We can see the offset for this VK-172 USB GPS is somewhere around 76-77 milliseconds (0.076-0.077 seconds), which we can put in place of the 0.000 for the .28 driver for NTP and remove noselect.

austin@prox-3070 ~ % cat /var/log/ntpstats/peerstats | grep 127.127.28.0
59487 49648.536 127.127.28.0 9014 -0.078425007 0.000000000 7.938064614 0.000000060
59487 49664.536 127.127.28.0 9014 -0.079488544 0.000000000 3.938033388 0.001063537
59487 49680.536 127.127.28.0 9014 -0.079514781 0.000000000 1.938035682 0.000770810
59487 49696.536 127.127.28.0 9014 -0.079772284 0.000000000 0.938092429 0.000808697
59487 49712.536 127.127.28.0 9014 -0.079711708 0.000000000 0.438080791 0.000661032
59487 49728.536 127.127.28.0 9014 -0.075098563 0.000000000 0.188028843 0.004311958

So now we have some data showing the statistics of our NMEA USB GPS NTP source. We can copy and paste this into Excel, run data to columns, and graph the result and/or get the average to set the offset.

screenshot showing chrony/NTP statistics to determine offset

This graph is certainly suspicious (sine wave pattern and such) and if I wasn’t writing this blog post, I’d let data collect overnight to determine an offset. Since time is always of the essence, I will just take the average of the ‘est offset’ column (E), which is 7.64E-2, or 0.0763 seconds. Let’s pop this into the chrony.conf file and remove noselect:

refclock SHM 0 refid NMEA offset 0.0763 precision 1e-3 poll 3

For NTP:

Restart chrony again for the config file changes to take effect – sudo systemctl restart chrony.

6 – watch ‘chrony sources’ or ‘ntpq -pn’ to see if the USB GPS gets selected as the main time source

If you aren’t aware, Ubuntu/Debian/most Linux includes a utility to rerun a command every x seconds called watch. We can use this to watch chrony to see how it is interpreting each time source every 1 second:

# for chrony
watch -n 1 chronyc sources
watching chrony sources

In the above screenshot, we can see that chrony actually has the NMEA source selected as the primary source (denoted with the *). It has the Raspberry Pi PPS NTP GPS ready to takeover as the new primary (denoted with the +). All of the sources match quite closely (from +4749us to – 505us is around 5.2 milliseconds). The source “offset” is in the square brackets ([ and ]).

# for ntp
watch -n 1 ntpq -pn
NTP showing (via ntpq -pn) that the GPS source is 0.738 milliseconds off of the host clock. The ‘-‘ in front of the remote means this will not be selected as a valid time (presumably due to the high jitter compared to the other sources, probably also due to manually setting it to stratum 2).

7- is +/- five millseconds good enough?

For 99% of use cases, yes. You can stop here and your home network will be plenty accurate. If you want additional accuracy, you are in luck. This GPS module also outputs a PPS (pulse per second) signal! We can use this to get within 0.05 millseconds (0.00005 seconds) from official/atomic clock time.

Conclusion

In this post, we got a u-blox USB GPS set up and added it as a reference clock (refclock) to chrony and demonstrated it is clearly within 10 millisecond of official GPS time.

You could write a script to do all this for you! I should probably try this myself…

In the next post, we can add PPS signals from the GPS module to increase our time accuracy by 1000x (into the microsecond range).

A note on why having faster message transmission is better for timing

My current PPS NTP server uses chrony with NMEA messages transmitted over serial and the PPS signal fed into a GPIO pin. GPSd as a rule does minimum configuration of GPS devices. It typically defaults to 9600 baud for serial devices. A typical GPS message looks like this:

$GPGGA, 161229.487, 3723.2475, N, 12158.3416, W, 1, 07, 1.0, 9.0, M, , , , 0000*18

That message is 83 bytes long. At 9600 baud (9600 bits per second), that message takes 69.1 milliseconds to transmit. Each character/byte takes 0.833 milliseconds to transmit. That means that as the message length varies, the jitter will increase. GPS messages do vary in length, sometimes significantly, depending on what is being sent (i.e. the satellite information, $GPGSV sentences, is only transmitted every 5-10 seconds).

I opened gpsmon to get a sample of sentences – I did not notice this until now but it shows how many bytes each sentence is at the front of the sentence:

(35) $GPZDA,144410.000,30,09,2021,,*59
------------------- PPS offset: -0.000001297 ------
(83) $GPGGA,144411.000,3953.xxxx,N,10504.xxxx,W,2,6,1.19,1637.8,M,-20.9,M,0000,0000*5A
(54) $GPGSA,A,3,26,25,29,18,05,02,,,,,,,1.46,1.19,0.84*02
(71) $GPRMC,144411.000,A,3953.xxxx,N,10504.xxxx,W,2.80,39.98,300921,,,D*44
(35) $GPZDA,144411.000,30,09,2021,,*58
------------------- PPS offset: -0.000000883 ------
(83) $GPGGA,144412.000,3953.xxxx,N,10504.xxxx,W,2,7,1.11,1637.7,M,-20.9,M,0000,0000*52
(56) $GPGSA,A,3,20,26,25,29,18,05,02,,,,,,1.39,1.11,0.84*00
(70) $GPGSV,3,1,12,29,81,325,27,05,68,056,21,20,35,050,17,18,34,283,24*76
(66) $GPGSV,3,2,12,25,27,210,14,15,27,153,,13,25,117,,02,23,080,19*78
(59) $GPGSV,3,3,12,26,17,311,22,23,16,222,,12,11,184,,47,,,*42
------------------- PPS offset: -0.000000833 ------
(71) $GPRMC,144412.000,A,3953.xxxx,N,10504.xxxx,W,2.57,38.19,300921,,,D*48
(35) $GPZDA,144412.000,30,09,2021,,*5B
(83) $GPGGA,144413.000,3953.xxxx,N,10504.xxxx,W,2,7,1.11,1637.6,M,-20.9,M,0000,0000*52
(56) $GPGSA,A,3,20,26,25,29,18,05,02,,,,,,1.39,1.11,0.84*00
(71) $GPRMC,144413.000,A,3953.xxxx,N,10504.xxxx,W,2.60,36.39,300921,,,D*41
(35) $GPZDA,144413.000,30,09,2021,,*5A

These sentences range from 83 bytes to 35 bytes, a variation of (83 bytes -35 bytes)*0.833 milliseconds per byte = 39.984 milliseconds.

Compare to the u-blox binary UBX messages which seem to always be 60 bytes and transmitted at 57600 baud, which is 8.33 milliseconds to transmit the entire message.

UBX protocol messages (blanked out lines). I have no idea what part of the message is location, hopefully I got the right part blanked out.

The variance (jitter) is thus much lower and can be much more accurate as a NTP source. GPSd has no problem leaving u-blox modules at 57600 baud. This is why the USB GPS modules perform much more accurate for timekeeping than NMEA-based devices when using GPSd.

For basically every GPS module/chipset, it is possible to send it commands to enable/disable sentences (as well as increase the serial baud rate). In an ideal world for timekeeping, GPSd would disable every sentence except for time ($GPZDA), and bump up the baud rate to the highest supported level (115200, 230400, etc.). Unfortunately for us, GPSd’s default behavior is to just work with every GPS, which essentially means no configuring the GPS device.

Update 2024-01-19: RIP Dave Mills, inventor/creator of NTP – https://arstechnica.com/gadgets/2024/01/inventor-of-ntp-protocol-that-keeps-time-on-billions-of-devices-dies-at-age-8

Categories
homelab Kubernetes Linux proxmox Terraform

Deploying Kubernetes VMs in Proxmox with Terraform

Background

The last post covered how to deploy virtual machines in Proxmox with Terraform. This post shows the template for deploying 4 Kubernetes virtual machines in Proxmox using Terraform.

Youtube Video Link

https://youtu.be/UXXIl421W8g

Kubernetes Proxmox Terraform Template

Without further ado, below is the template I used to create my virtual machines. The main LAN network is 10.98.1.0/24, and the Kube internal network (on its own bridge) is 10.17.0.0/24.

This template creates a Kube server, two agents, and a storage server.

Update 2022-04-26: bumped Telmate provider version to 2.9.8 from 2.7.4

terraform {
  required_providers {
    proxmox = {
      source = "telmate/proxmox"
      version = "2.9.8"
    }
  }
}

provider "proxmox" {
  pm_api_url = "https://prox-1u.home.fluffnet.net:8006/api2/json" # change this to match your own proxmox
  pm_api_token_id = [secret]
  pm_api_token_secret = [secret]
  pm_tls_insecure = true
}

resource "proxmox_vm_qemu" "kube-server" {
  count = 1
  name = "kube-server-0${count.index + 1}"
  target_node = "prox-1u"
  # thanks to Brian on YouTube for the vmid tip
  # http://www.youtube.com/channel/UCTbqi6o_0lwdekcp-D6xmWw
  vmid = "40${count.index + 1}"

  clone = "ubuntu-2004-cloudinit-template"

  agent = 1
  os_type = "cloud-init"
  cores = 2
  sockets = 1
  cpu = "host"
  memory = 4096
  scsihw = "virtio-scsi-pci"
  bootdisk = "scsi0"

  disk {
    slot = 0
    size = "10G"
    type = "scsi"
    storage = "local-zfs"
    #storage_type = "zfspool"
    iothread = 1
  }

  network {
    model = "virtio"
    bridge = "vmbr0"
  }
  
  network {
    model = "virtio"
    bridge = "vmbr17"
  }

  lifecycle {
    ignore_changes = [
      network,
    ]
  }

  ipconfig0 = "ip=10.98.1.4${count.index + 1}/24,gw=10.98.1.1"
  ipconfig1 = "ip=10.17.0.4${count.index + 1}/24"
  sshkeys = <<EOF
  ${var.ssh_key}
  EOF
}

resource "proxmox_vm_qemu" "kube-agent" {
  count = 2
  name = "kube-agent-0${count.index + 1}"
  target_node = "prox-1u"
  vmid = "50${count.index + 1}"

  clone = "ubuntu-2004-cloudinit-template"

  agent = 1
  os_type = "cloud-init"
  cores = 2
  sockets = 1
  cpu = "host"
  memory = 4096
  scsihw = "virtio-scsi-pci"
  bootdisk = "scsi0"

  disk {
    slot = 0
    size = "10G"
    type = "scsi"
    storage = "local-zfs"
    #storage_type = "zfspool"
    iothread = 1
  }

  network {
    model = "virtio"
    bridge = "vmbr0"
  }
  
  network {
    model = "virtio"
    bridge = "vmbr17"
  }

  lifecycle {
    ignore_changes = [
      network,
    ]
  }

  ipconfig0 = "ip=10.98.1.5${count.index + 1}/24,gw=10.98.1.1"
  ipconfig1 = "ip=10.17.0.5${count.index + 1}/24"
  sshkeys = <<EOF
  ${var.ssh_key}
  EOF
}

resource "proxmox_vm_qemu" "kube-storage" {
  count = 1
  name = "kube-storage-0${count.index + 1}"
  target_node = "prox-1u"
  vmid = "60${count.index + 1}"

  clone = "ubuntu-2004-cloudinit-template"

  agent = 1
  os_type = "cloud-init"
  cores = 2
  sockets = 1
  cpu = "host"
  memory = 4096
  scsihw = "virtio-scsi-pci"
  bootdisk = "scsi0"

  disk {
    slot = 0
    size = "20G"
    type = "scsi"
    storage = "local-zfs"
    #storage_type = "zfspool"
    iothread = 1
  }

  network {
    model = "virtio"
    bridge = "vmbr0"
  }
  
  network {
    model = "virtio"
    bridge = "vmbr17"
  }

  lifecycle {
    ignore_changes = [
      network,
    ]
  }

  ipconfig0 = "ip=10.98.1.6${count.index + 1}/24,gw=10.98.1.1"
  ipconfig1 = "ip=10.17.0.6${count.index + 1}/24"
  sshkeys = <<EOF
  ${var.ssh_key}
  EOF
}

After running Terraform plan and apply, you should have 4 new VMs in your Proxmox cluster:

Proxmox showing 4 virtual machines ready for Kubernetes

Conclusion

You now have 4 VMs ready for Kubernetes installation. The next post shows how to deploy a Kubernetes cluster with Ansible.