Category: Linux

Millisecond accurate Chrony NTP with a USB GPS for $12 USD

Post author By Austin
Post date September 29, 2021
21 Comments on Millisecond accurate Chrony NTP with a USB GPS for $12 USD

Introduction

Building off my last NTP post (Microsecond accurate NTP with a Raspberry Pi and PPS GPS), which required a $50-60 GPS device and a Raspberry Pi (also $40+), I have successfully tested something much cheaper, that is good enough, especially for initial PPS synchronization. Good enough, in this case, is defined as +/- 10 milliseconds, which can easily be achieved using a basic USB GPS device: GT-U7. Read on for instructions on how to set up the USB GPS as a Stratum 1 NTP time server.

YouTube Video Link

https://www.youtube.com/watch?v=DVtmDFpWkEs

Microsecond PPS time vs millisecond USB time

How accurate of time do you really need? The last post showed how to get all devices on a local area network (LAN) within 0.1 milliseconds of “real” time. Do you need you equipment to be that accurate to official atomic clock time (12:03:05.0001)? Didn’t think so. Do you care if every device is on the correct second compared to official/accurate time (12:03:05)? That’s a lot more reasonable. Using a u-blox USB GPS can get you to 0.01 seconds of official. The best part about this? The required USB GPS units are almost always less than $15 and you don’t need a Raspberry Pi.

Overview

This post will show how to add a u-blox USB GPS module to NTP as a driver or chrony (timekeeping daemon) as a reference clock (using GPSd shared memory for both) and verify the accuracy is within +/- 10 milliseconds.

Materials needed

USB u-blox GPS (VK-172 or GT-U7), GT-U7 preferred because it has a micro-USB plug to connect to your computer. It is important to note that both of these are u-blox modules, which has a binary data format as well as a high default baudrate (57600). These two properties allow for quick transmission of each GPS message from GPS to computer.
15-30 minutes

Steps

1 – Update your host machine and install packages

This tutorial is for Linux. I use Ubuntu so we utilize Aptitude (apt) for package management:

sudo apt update
sudo apt upgrade
sudo rpi-update
sudo apt install gpsd gpsd-clients python-gps chrony

2 – Modify GPSd default startup settings

In /etc/default/gpsd, change the settings to the following:

# Start the gpsd daemon automatically at boot time
START_DAEMON="true"

# Use USB hotplugging to add new USB devices automatically to the daemon
USBAUTO="true"

# Devices gpsd should collect to at boot time.
# this could also be /dev/ttyUSB0, it is ACM0 on raspberry pi
DEVICES="/dev/ttyACM0"

# -n means start listening to GPS data without a specific listener
GPSD_OPTIONS="-n"

Reboot with sudo reboot.

3a – Chrony configuration (if using NTPd, go to 3b)

I took the default configuration, added my 10.98 servers, and more importantly, added a reference clock (refclock). Link to chrony documentation here. Arguments/parameters of this configuration file:

10.98.1.198 is my microsecond accurate PPS NTP server
iburst means send a bunch of synchronization packets upon service start so accurate time can be determined much faster (usually a couple seconds)
maxpoll (and minpoll, which isn’t used in this config) is how many seconds to wait between polls, defined by 2^x where x is the number in the config. maxpoll 6 means don’t wait more than 2^6=64 seconds between polls
refclock is reference clock, and is the USB GPS source we are adding
- ‘SHM 0’ means shared memory reference 0, which means it is checking with GPSd using shared memory to see what time the GPS is reporting
- ‘refid NMEA’ means name this reference ‘NMEA’
- ‘offset 0.000’ means don’t offset this clock source at all. We will change this later
- ‘precision 1e-3’ means this reference is only accurate to 1e-3 (0.001) seconds, or 1 millisecond
- ‘poll 3’ means poll this reference every 2^3 = 8 seconds
- ‘noselect’ means don’t actually use this clock as a source. We will be measuring the delta to other known times to set the offset and make the source selectable.

pi@raspberrypi:~ $ sudo cat /etc/chrony/chrony.conf
# Welcome to the chrony configuration file. See chrony.conf(5) for more
# information about usuable directives.
#pool 2.debian.pool.ntp.org iburst
server 10.98.1.1 iburst maxpoll 6
server 10.98.1.198 iburst maxpoll 6
server 10.98.1.15 iburst

refclock SHM 0 refid NMEA offset 0.000 precision 1e-3 poll 3 noselect

Restart chrony with sudo systemctl restart chrony.

3b – NTP config

Similar to the chrony config, we need to add a reference clock (called a driver in NTP). For NTP, drivers are “servers” that start with an address of 127.127. The next two octets tell what kind of driver it is. The .28 driver is the shared memory driver, same theory as for chrony. For a full list of drivers, see the official NTP docs. To break down the server:

‘server 127.127.28.0’ means use the .28 (SHM) driver
- minpoll 4 maxpoll 4 means poll every 2^4=16 seconds
- noselect means don’t use this for time. Similar to chrony, we will be measuring the offset to determine this value.
‘fudge 127.127.28.0’ means we are going to change some properties of the listed driver
- ‘time1 0.000’is the time offset calibration factor, in seconds
- ‘stratum 2’ means list this source as a stratum 2 source (has to do with how close the source is to “true” time), listing it as 2 means other, higher stratum sources will be selected before this one will (assuming equal time quality)
- ‘refid GPS’ means rename this source as ‘GPS’

server 127.127.28.0 minpoll 4 maxpoll 4 noselect
fudge 127.127.28.0 time1 0.000 stratum 2 refid GPS

Restart NTPd with sudo systemctl restart ntp.

4 – check time offset via gpsmon

Running gpsmon shows us general information about the GPS, including time offset. The output looks like the below screenshot. Of importance is the satellite count (on right, more is better, >5 is good enough for time), HDOP (horizontal dilution of precision) is a measure of how well the satellites can determine your position (lower is better, <2 works for basically all navigation purposes), and TOFF (time offset).

gpsmon showing time offset for a USB GPS

In this screenshot the TOFF is 0.081862027, which is 81.8 milliseconds off the host computer’s time. Watch this for a bit – it should hover pretty close to a certain value +/- 10ms. In my case, I’ve noticed that if there are 10 or less satellites locked on, it is around 77ms. If there are 11 or more, it is around 91ms (presumably due to more satellite information that needs to be transmitted).

5 – record statistics for a data-driven offset

If you are looking for a better offset value to put in the configuration file, we can turn on logging from either chrony or NTPd to record source information.

For chrony:

Edit /etc/chrony/chrony.conf and uncomment the line for which kinds of logging to turn on:

# Uncomment the following line to turn logging on.
log tracking measurements statistics

Then restart chrony (sudo systemctl restart chrony) and logs will start writing to /var/log/chrony (this location is defined a couple lines below the log line in chrony.conf):

pi@raspberrypi:~ $ ls /var/log/chrony
measurements.log  statistics.log  tracking.log

For NTPd (be sure to restart it after making any configuration changes):

austin@prox-3070 ~ % cat /etc/ntp.conf
# Enable this if you want statistics to be logged.
statsdir /var/log/ntpstats/

statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable

Wait a few minutes for some data to record (chrony synchronizes pretty quick compared to NTPd) and check the statistics file, filtered to our NMEA refid:

cat /var/log/chrony/statistics.log | grep NMEA

This spits out the lines that have NMEA present (the ones of interest for our USB GPS). To include the headers to show what each column is we can run

# chrony
cat /var/log/chrony/statistics.log | head -2; cat /var/log/chrony/statistics.log | grep NMEA

# ntp, there is no header info so we can omit that part of the command
cat /var/log/peerstats | grep 127.127.28.0

Screenshot showing chrony statistics for our NMEA USB GPS refclock

NTP stats don’t include header information. The column of interest is the one after the 9014 column. The columns are day, seconds past midnight, source, something, estimated offset, something, something, something. We can see the offset for this VK-172 USB GPS is somewhere around 76-77 milliseconds (0.076-0.077 seconds), which we can put in place of the 0.000 for the .28 driver for NTP and remove noselect.

austin@prox-3070 ~ % cat /var/log/ntpstats/peerstats | grep 127.127.28.0
59487 49648.536 127.127.28.0 9014 -0.078425007 0.000000000 7.938064614 0.000000060
59487 49664.536 127.127.28.0 9014 -0.079488544 0.000000000 3.938033388 0.001063537
59487 49680.536 127.127.28.0 9014 -0.079514781 0.000000000 1.938035682 0.000770810
59487 49696.536 127.127.28.0 9014 -0.079772284 0.000000000 0.938092429 0.000808697
59487 49712.536 127.127.28.0 9014 -0.079711708 0.000000000 0.438080791 0.000661032
59487 49728.536 127.127.28.0 9014 -0.075098563 0.000000000 0.188028843 0.004311958

So now we have some data showing the statistics of our NMEA USB GPS NTP source. We can copy and paste this into Excel, run data to columns, and graph the result and/or get the average to set the offset.

screenshot showing chrony/NTP statistics to determine offset

This graph is certainly suspicious (sine wave pattern and such) and if I wasn’t writing this blog post, I’d let data collect overnight to determine an offset. Since time is always of the essence, I will just take the average of the ‘est offset’ column (E), which is 7.64E-2, or 0.0763 seconds. Let’s pop this into the chrony.conf file and remove noselect:

refclock SHM 0 refid NMEA offset 0.0763 precision 1e-3 poll 3

For NTP:

Restart chrony again for the config file changes to take effect – sudo systemctl restart chrony.

6 – watch ‘chrony sources’ or ‘ntpq -pn’ to see if the USB GPS gets selected as the main time source

If you aren’t aware, Ubuntu/Debian/most Linux includes a utility to rerun a command every x seconds called watch. We can use this to watch chrony to see how it is interpreting each time source every 1 second:

# for chrony
watch -n 1 chronyc sources

In the above screenshot, we can see that chrony actually has the NMEA source selected as the primary source (denoted with the *). It has the Raspberry Pi PPS NTP GPS ready to takeover as the new primary (denoted with the +). All of the sources match quite closely (from +4749us to – 505us is around 5.2 milliseconds). The source “offset” is in the square brackets ([ and ]).

# for ntp
watch -n 1 ntpq -pn

NTP showing (via ntpq -pn) that the GPS source is 0.738 milliseconds off of the host clock. The ‘-‘ in front of the remote means this will not be selected as a valid time (presumably due to the high jitter compared to the other sources, probably also due to manually setting it to stratum 2).

7- is +/- five millseconds good enough?

For 99% of use cases, yes. You can stop here and your home network will be plenty accurate. If you want additional accuracy, you are in luck. This GPS module also outputs a PPS (pulse per second) signal! We can use this to get within 0.05 millseconds (0.00005 seconds) from official/atomic clock time.

Conclusion

In this post, we got a u-blox USB GPS set up and added it as a reference clock (refclock) to chrony and demonstrated it is clearly within 10 millisecond of official GPS time.

You could write a script to do all this for you! I should probably try this myself…

In the next post, we can add PPS signals from the GPS module to increase our time accuracy by 1000x (into the microsecond range).

A note on why having faster message transmission is better for timing

My current PPS NTP server uses chrony with NMEA messages transmitted over serial and the PPS signal fed into a GPIO pin. GPSd as a rule does minimum configuration of GPS devices. It typically defaults to 9600 baud for serial devices. A typical GPS message looks like this:

$GPGGA, 161229.487, 3723.2475, N, 12158.3416, W, 1, 07, 1.0, 9.0, M, , , , 0000*18

That message is 83 bytes long. At 9600 baud (9600 bits per second), that message takes 69.1 milliseconds to transmit. Each character/byte takes 0.833 milliseconds to transmit. That means that as the message length varies, the jitter will increase. GPS messages do vary in length, sometimes significantly, depending on what is being sent (i.e. the satellite information, $GPGSV sentences, is only transmitted every 5-10 seconds).

I opened gpsmon to get a sample of sentences – I did not notice this until now but it shows how many bytes each sentence is at the front of the sentence:

(35) $GPZDA,144410.000,30,09,2021,,*59
------------------- PPS offset: -0.000001297 ------
(83) $GPGGA,144411.000,3953.xxxx,N,10504.xxxx,W,2,6,1.19,1637.8,M,-20.9,M,0000,0000*5A
(54) $GPGSA,A,3,26,25,29,18,05,02,,,,,,,1.46,1.19,0.84*02
(71) $GPRMC,144411.000,A,3953.xxxx,N,10504.xxxx,W,2.80,39.98,300921,,,D*44
(35) $GPZDA,144411.000,30,09,2021,,*58
------------------- PPS offset: -0.000000883 ------
(83) $GPGGA,144412.000,3953.xxxx,N,10504.xxxx,W,2,7,1.11,1637.7,M,-20.9,M,0000,0000*52
(56) $GPGSA,A,3,20,26,25,29,18,05,02,,,,,,1.39,1.11,0.84*00
(70) $GPGSV,3,1,12,29,81,325,27,05,68,056,21,20,35,050,17,18,34,283,24*76
(66) $GPGSV,3,2,12,25,27,210,14,15,27,153,,13,25,117,,02,23,080,19*78
(59) $GPGSV,3,3,12,26,17,311,22,23,16,222,,12,11,184,,47,,,*42
------------------- PPS offset: -0.000000833 ------
(71) $GPRMC,144412.000,A,3953.xxxx,N,10504.xxxx,W,2.57,38.19,300921,,,D*48
(35) $GPZDA,144412.000,30,09,2021,,*5B
(83) $GPGGA,144413.000,3953.xxxx,N,10504.xxxx,W,2,7,1.11,1637.6,M,-20.9,M,0000,0000*52
(56) $GPGSA,A,3,20,26,25,29,18,05,02,,,,,,1.39,1.11,0.84*00
(71) $GPRMC,144413.000,A,3953.xxxx,N,10504.xxxx,W,2.60,36.39,300921,,,D*41
(35) $GPZDA,144413.000,30,09,2021,,*5A

These sentences range from 83 bytes to 35 bytes, a variation of (83 bytes -35 bytes)*0.833 milliseconds per byte = 39.984 milliseconds.

Compare to the u-blox binary UBX messages which seem to always be 60 bytes and transmitted at 57600 baud, which is 8.33 milliseconds to transmit the entire message.

UBX protocol messages (blanked out lines). I have no idea what part of the message is location, hopefully I got the right part blanked out.

The variance (jitter) is thus much lower and can be much more accurate as a NTP source. GPSd has no problem leaving u-blox modules at 57600 baud. This is why the USB GPS modules perform much more accurate for timekeeping than NMEA-based devices when using GPSd.

For basically every GPS module/chipset, it is possible to send it commands to enable/disable sentences (as well as increase the serial baud rate). In an ideal world for timekeeping, GPSd would disable every sentence except for time ($GPZDA), and bump up the baud rate to the highest supported level (115200, 230400, etc.). Unfortunately for us, GPSd’s default behavior is to just work with every GPS, which essentially means no configuring the GPS device.

Update 2024-01-19: RIP Dave Mills, inventor/creator of NTP – https://arstechnica.com/gadgets/2024/01/inventor-of-ntp-protocol-that-keeps-time-on-billions-of-devices-dies-at-age-8

Tags chrony, gps, gpsmon, linux, ntp, ntpd, Raspberry Pi

homelab Kubernetes Linux proxmox Terraform

Deploying Kubernetes VMs in Proxmox with Terraform

Post author By Austin
Post date September 23, 2021
9 Comments on Deploying Kubernetes VMs in Proxmox with Terraform

Background

The last post covered how to deploy virtual machines in Proxmox with Terraform. This post shows the template for deploying 4 Kubernetes virtual machines in Proxmox using Terraform.

Youtube Video Link

https://youtu.be/UXXIl421W8g

Kubernetes Proxmox Terraform Template

Without further ado, below is the template I used to create my virtual machines. The main LAN network is 10.98.1.0/24, and the Kube internal network (on its own bridge) is 10.17.0.0/24.

This template creates a Kube server, two agents, and a storage server.

Update 2022-04-26: bumped Telmate provider version to 2.9.8 from 2.7.4

terraform {
  required_providers {
    proxmox = {
      source = "telmate/proxmox"
      version = "2.9.8"
    }
  }
}

provider "proxmox" {
  pm_api_url = "https://prox-1u.home.fluffnet.net:8006/api2/json" # change this to match your own proxmox
  pm_api_token_id = [secret]
  pm_api_token_secret = [secret]
  pm_tls_insecure = true
}

resource "proxmox_vm_qemu" "kube-server" {
  count = 1
  name = "kube-server-0${count.index + 1}"
  target_node = "prox-1u"
  # thanks to Brian on YouTube for the vmid tip
  # http://www.youtube.com/channel/UCTbqi6o_0lwdekcp-D6xmWw
  vmid = "40${count.index + 1}"

  clone = "ubuntu-2004-cloudinit-template"

  agent = 1
  os_type = "cloud-init"
  cores = 2
  sockets = 1
  cpu = "host"
  memory = 4096
  scsihw = "virtio-scsi-pci"
  bootdisk = "scsi0"

  disk {
    slot = 0
    size = "10G"
    type = "scsi"
    storage = "local-zfs"
    #storage_type = "zfspool"
    iothread = 1
  }

  network {
    model = "virtio"
    bridge = "vmbr0"
  }
  
  network {
    model = "virtio"
    bridge = "vmbr17"
  }

  lifecycle {
    ignore_changes = [
      network,
    ]
  }

  ipconfig0 = "ip=10.98.1.4${count.index + 1}/24,gw=10.98.1.1"
  ipconfig1 = "ip=10.17.0.4${count.index + 1}/24"
  sshkeys = <<EOF
  ${var.ssh_key}
  EOF
}

resource "proxmox_vm_qemu" "kube-agent" {
  count = 2
  name = "kube-agent-0${count.index + 1}"
  target_node = "prox-1u"
  vmid = "50${count.index + 1}"

  clone = "ubuntu-2004-cloudinit-template"

  agent = 1
  os_type = "cloud-init"
  cores = 2
  sockets = 1
  cpu = "host"
  memory = 4096
  scsihw = "virtio-scsi-pci"
  bootdisk = "scsi0"

  disk {
    slot = 0
    size = "10G"
    type = "scsi"
    storage = "local-zfs"
    #storage_type = "zfspool"
    iothread = 1
  }

  network {
    model = "virtio"
    bridge = "vmbr0"
  }
  
  network {
    model = "virtio"
    bridge = "vmbr17"
  }

  lifecycle {
    ignore_changes = [
      network,
    ]
  }

  ipconfig0 = "ip=10.98.1.5${count.index + 1}/24,gw=10.98.1.1"
  ipconfig1 = "ip=10.17.0.5${count.index + 1}/24"
  sshkeys = <<EOF
  ${var.ssh_key}
  EOF
}

resource "proxmox_vm_qemu" "kube-storage" {
  count = 1
  name = "kube-storage-0${count.index + 1}"
  target_node = "prox-1u"
  vmid = "60${count.index + 1}"

  clone = "ubuntu-2004-cloudinit-template"

  agent = 1
  os_type = "cloud-init"
  cores = 2
  sockets = 1
  cpu = "host"
  memory = 4096
  scsihw = "virtio-scsi-pci"
  bootdisk = "scsi0"

  disk {
    slot = 0
    size = "20G"
    type = "scsi"
    storage = "local-zfs"
    #storage_type = "zfspool"
    iothread = 1
  }

  network {
    model = "virtio"
    bridge = "vmbr0"
  }
  
  network {
    model = "virtio"
    bridge = "vmbr17"
  }

  lifecycle {
    ignore_changes = [
      network,
    ]
  }

  ipconfig0 = "ip=10.98.1.6${count.index + 1}/24,gw=10.98.1.1"
  ipconfig1 = "ip=10.17.0.6${count.index + 1}/24"
  sshkeys = <<EOF
  ${var.ssh_key}
  EOF
}

After running Terraform plan and apply, you should have 4 new VMs in your Proxmox cluster:

Proxmox showing 4 virtual machines ready for Kubernetes

Conclusion

You now have 4 VMs ready for Kubernetes installation. The next post shows how to deploy a Kubernetes cluster with Ansible.

Tags cloud-init, kubernetes, linux, proxmox, terraform, ubuntu

Blog Admin Linux

UFW – add IPv6 rule to top of chain

Post author By Austin
Post date September 13, 2021
No Comments on UFW – add IPv6 rule to top of chain

Brief Introduction

This WordPress blog is decently protected from bots/hackers (read more at Securing this WordPress blog from evil hackers!) but I still get a ton of attempts on the site. Wordfence can block requests at the application layer but as I grow in traffic, I want to make sure CPU cycles aren’t wasted. Thus, I want to block some of these bots/hackers from even connecting to the server. I use Ubuntu, and UFW (Uncomplicated FireWall) is included. It’s pretty simple so I’ve stuck with it.

Blocking IPv4 is easy:

sudo ufw insert 1 deny from 1.2.3.4 comment "repeated unwanted hits on sdrforums.com"

The command broken down:

sudo – run as root since firewall modification requires root access
ufw – run the uncomplicated firewall program
insert – add a rule
1 – insert at the top of the rule list (firewalls evaluate rules from the top down – putting a deny after an allow would mean the traffic wouldn’t be blocked)
deny – deny the request
from – from the following IP
1.2.3.4 – IP address
comment – so you can leave a comment to remind yourself why the rule is in place (“unwanted hits on site.com”, “change request #123456”, “incident remedy #44444”, etc.)

Blocking IPv6 with UFW

So I tried the same command format with an IPv6 address and got an error message – “ERROR: Invalid position ‘1’”. I’ve never got that message before. Also, I do realize I need to widen this IPv6 subnet and block a much larger range of IPs but that’s a topic for a different day.

sudo ufw insert 1 deny from 2400:adc5:11f:5600:e468:9464:e881:b1c0 comment "repeated unwanted hits on sdrforums.com"
ERROR: Invalid position '1'

ufw ERROR: Invalid position '1' screenshot — ufw ERROR: Invalid position ‘1’ screenshot

My rule list at the time looked like this:

austin@rn-nyc-01:~$ sudo ufw status numbered
Status: active

     To                         Action      From
     --                         ------      ----
[ 1] Anywhere                   DENY IN     51.195.216.255             # repeated unwanted hits on sdrforums.com
[ 2] Anywhere                   DENY IN     51.195.90.229              # repeated unwanted hits on sdrforums.com
[ 3] Anywhere                   DENY IN     206.217.139.28             # excessive hits to wp-login and xmlrpc
[ 4] 22/tcp                     ALLOW IN    Anywhere
[ 5] 80/tcp                     ALLOW IN    Anywhere
[ 6] 443/tcp                    ALLOW IN    Anywhere
[ 7] 22/tcp (v6)                ALLOW IN    Anywhere (v6)
[ 8] 80/tcp (v6)                ALLOW IN    Anywhere (v6)
[ 9] 443/tcp (v6)               ALLOW IN    Anywhere (v6)

Pretty easy to understand what’s going on here. I have a few IPv4 addresses blocked, but nothing specific to IPv6. A bit of searching later and I learned that the first IPv6 rule needs to come after the last IPv4 rule. So in this case I needed to add the rule to position #7, since that is where the first IPv6 rule current is located:

austin@rn-nyc-dotnet-01:~$ sudo ufw insert 7 deny from 2400:adc5:11f:5600:e468:9464:e881:b1c0 comment "repeated unwanted hits on
 sdrforums.com"
Rule inserted (v6)

And it worked!

My UFW status is now this:

References

The post that started me in the right direction is here – https://joshtronic.com/2015/09/06/error-invalid-position-1/. Thank you Josh for posting about this! Stackoverflow wasn’t actually helpful for once.

Tags iptables, ipv6, linux, ufw, WordPress

homelab Kubernetes Linux proxmox Terraform

How to deploy VMs in Proxmox with Terraform

Post author By Austin
Post date September 1, 2021
50 Comments on How to deploy VMs in Proxmox with Terraform

Background

I’d like to learn Kubernetes and DevOps. A Kubernetes cluster requires at least 3 VMs/bare metal machines. In my last post, I wrote about how to create a Ubuntu cloud-init template for Proxmox. In this post, we’ll take that template and use it to deploy a couple VMs via automation using Terraform. If you don’t have a template, you need one before proceeding.

Overview

Install Terraform
Determine authentication method for Terraform to interact with Proxmox (user/pass vs API keys)
Terraform basic initialization and provider installation
Develop Terraform plan
Terraform plan
Run Terraform plan and watch the VMs appear!

Youtube Video Link

If you prefer video versions to follow along, please head on over to https://youtu.be/UXXIl421W8g for a live-action video of me deploying virtual machines in Proxmox using Terraform and why we’re running each command.

#1 – Install Terraform

curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
sudo apt-add-repository "deb [arch=$(dpkg --print-architecture)] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
sudo apt update
sudo apt install terraform

#2 – Determine Authentication Method (use API keys)

You have two options here:

Username/password – you can use the existing default root user and root password here to make things easy… or
API keys – this involves setting up a new user, giving that new user the required permissions, and then setting up API keys so that user doesn’t have to type in a password to perform actions

I went with the API key method since it is not desirable to have your root password sitting in Terraform files (even as an environment variable isn’t a great idea). I didn’t really know what I was doing and I basically gave the new user full admin permissions anyways. Should I lock it down? Surely. Do I know what the minimum required permissions are to do so? Nope. If someone in the comments or on Reddit could enlighten me, I’d really appreciate it!

So we need to create a new user. We’ll name it ‘blog_example’. To add a new user go to Datacenter in the left tab, then Permissions -> Users -> Click add, name the user and click add.

screenshot showing how to add a user in proxmox — Adding ‘blog_example’ user to my proxmox datacenter (cluster)

Next, we need to add API tokens. Click API tokens below users in the permissions category and click add. Select the user you just created and give the token an ID, and uncheck privilege separation (which means we want the token to have the same permissions as the user):

Adding a new API token for user ‘blog_example’

When you click Add it will show you the key. Save this key. It will never be displayed again!

Next we need to add a role to the new user. Permissions -> Add -> Path = ‘/’, User is the one you just made, role = ‘PVEVMAdmin’. This gives the user (and associated API token!) rights to all nodes (the / for path) to do VMAdmin activities:

You also need to add permissions to the storage used by the VMs you want to deploy (both from and to), for me this is /storage/local-zfs (might be /storage/local-lvm for you). Add that too in the path section. Use Admin for the role here because the user also needs the ability to allocate space in the datastore (you could use PVEVMAdmin + a datastore role but I haven’t dove into which one yet):

At this point we are done with the permissions:

It is time to turn to Terraform.

3 – Terraform basic information and provider installation

Terraform has three main stages: init, plan, and apply. We will start with describing the plans, which can be thought of a a type of configuration file for what you want to do. Plans are files stored in directories. Make a new directory (terraform-blog), and create two files: main.tf and vars.tf:

cd ~
mkdir terraform-blog && cd terraform-blog
touch main.tf vars.tf

The two files are hopefully reasonably named. The main content will be in main.tf and we will put a few variables in vars.tf. Everything could go in main.tf but it is a good practice to start splitting things out early. I actually don’t have as much in vars.tf as I should but we all gotta start somewhere

Ok so in main.tf let’s add the bare minimum. We need to tell Terraform to use a provider, which is the term they use for the connector to the entity Terraform will be interacting with. Since we are using Proxmox, we need to use a Proxmox provider. This is actually super easy – we just need to specify the name and version and Terraform goes out and grabs it from github and installs it. I used the Telmate Proxmox provider.

main.tf:

terraform {
  required_providers {
    proxmox = {
      source = "telmate/proxmox"
      version = "2.7.4"
    }
  }
}

Save the file. Now we’ll initialize Terraform with our barebones plan (terraform init), which will force it to go out and grab the provider. If all goes well, we will be informed that the provider was installed and that Terraform has been initialized. Terraform is also really nice in that it tells you the next step towards the bottom of the output (“try running ‘terraform plan’ next”).

austin@EARTH:/mnt/c/Users/Austin/terraform-blog$ terraform init

Initializing the backend...

Initializing provider plugins...
- Finding telmate/proxmox versions matching "2.7.4"...
- Installing telmate/proxmox v2.7.4...
- Installed telmate/proxmox v2.7.4 (self-signed, key ID A9EBBE091B35AFCE)

Partner and community providers are signed by their developers.
If you'd like to know more about provider signing, you can read about it here:
https://www.terraform.io/docs/cli/plugins/signing.html

Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

4 – Develop Terraform plan

Alright with the provider installed, it is time to use it to deploy a VM. We will use the template we created in the last post (How to create a Proxmox Ubuntu cloud-init image). Alter your main.tf file to be the following. I break it down inside the file with comments

terraform {
  required_providers {
    proxmox = {
      source = "telmate/proxmox"
      version = "2.7.4"
    }
  }
}

provider "proxmox" {
  # url is the hostname (FQDN if you have one) for the proxmox host you'd like to connect to to issue the commands. my proxmox host is 'prox-1u'. Add /api2/json at the end for the API
  pm_api_url = "https://prox-1u:8006/api2/json"

  # api token id is in the form of: <username>@pam!<tokenId>
  pm_api_token_id = "blog_example@pam!new_token_id"

  # this is the full secret wrapped in quotes. don't worry, I've already deleted this from my proxmox cluster by the time you read this post
  pm_api_token_secret = "9ec8e608-d834-4ce5-91d2-15dd59f9a8c1"

  # leave tls_insecure set to true unless you have your proxmox SSL certificate situation fully sorted out (if you do, you will know)
  pm_tls_insecure = true
}

# resource is formatted to be "[type]" "[entity_name]" so in this case
# we are looking to create a proxmox_vm_qemu entity named test_server
resource "proxmox_vm_qemu" "test_server" {
  count = 1 # just want 1 for now, set to 0 and apply to destroy VM
  name = "test-vm-${count.index + 1}" #count.index starts at 0, so + 1 means this VM will be named test-vm-1 in proxmox

  # this now reaches out to the vars file. I could've also used this var above in the pm_api_url setting but wanted to spell it out up there. target_node is different than api_url. target_node is which node hosts the template and thus also which node will host the new VM. it can be different than the host you use to communicate with the API. the variable contains the contents "prox-1u"
  target_node = var.proxmox_host

  # another variable with contents "ubuntu-2004-cloudinit-template"
  clone = var.template_name

  # basic VM settings here. agent refers to guest agent
  agent = 1
  os_type = "cloud-init"
  cores = 2
  sockets = 1
  cpu = "host"
  memory = 2048
  scsihw = "virtio-scsi-pci"
  bootdisk = "scsi0"

  disk {
    slot = 0
    # set disk size here. leave it small for testing because expanding the disk takes time.
    size = "10G"
    type = "scsi"
    storage = "local-zfs"
    iothread = 1
  }
  
  # if you want two NICs, just copy this whole network section and duplicate it
  network {
    model = "virtio"
    bridge = "vmbr0"
  }

  # not sure exactly what this is for. presumably something about MAC addresses and ignore network changes during the life of the VM
  lifecycle {
    ignore_changes = [
      network,
    ]
  }
  
  # the ${count.index + 1} thing appends text to the end of the ip address
  # in this case, since we are only adding a single VM, the IP will
  # be 10.98.1.91 since count.index starts at 0. this is how you can create
  # multiple VMs and have an IP assigned to each (.91, .92, .93, etc.)

  ipconfig0 = "ip=10.98.1.9${count.index + 1}/24,gw=10.98.1.1"
  
  # sshkeys set using variables. the variable contains the text of the key.
  sshkeys = <<EOF
  ${var.ssh_key}
  EOF
}

There is a good amount going on in here. Hopefully the embedded comments explain everything. If not, let me know in the comments or on Reddit (u/Nerdy-Austin).

Now for the vars.tf file. This is a bit easier to understand. Just declare a variable, give it a name, and a default value. That’s all I know at this point and it works.

variable "ssh_key" {
  default = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDcwZAOfqf6E6p8IkrurF2vR3NccPbMlXFPaFe2+Eh/8QnQCJVTL6PKduXjXynuLziC9cubXIDzQA+4OpFYUV2u0fAkXLOXRIwgEmOrnsGAqJTqIsMC3XwGRhR9M84c4XPAX5sYpOsvZX/qwFE95GAdExCUkS3H39rpmSCnZG9AY4nPsVRlIIDP+/6YSy9KWp2YVYe5bDaMKRtwKSq3EOUhl3Mm8Ykzd35Z0Cysgm2hR2poN+EB7GD67fyi+6ohpdJHVhinHi7cQI4DUp+37nVZG4ofYFL9yRdULlHcFa9MocESvFVlVW0FCvwFKXDty6askpg9yf4FnM0OSbhgqXzD austin@EARTH"
}

variable "proxmox_host" {
	default = "prox-1u"
}

variable "template_name" {
	default = "ubuntu-2004-cloudinit-template"
}

5 – Terraform plan (official term for “what will Terraform do next”)

Now with the .tf files completed, we can run the plan (terraform plan). We defined a count=1 resource, so we would expect Terraform to create a single VM. Let’s have Terraform run through the plan and tell us what it intends to do. It tells us a lot.

austin@EARTH:/mnt/c/Users/Austin/terraform-blog$ terraform plan

Terraform used the selected providers to generate the following execution plan. Resource actions
are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # proxmox_vm_qemu.test_server[0] will be created
  + resource "proxmox_vm_qemu" "test_server" {
      + additional_wait           = 15
      + agent                     = 1
      + balloon                   = 0
      + bios                      = "seabios"
      + boot                      = "cdn"
      + bootdisk                  = "scsi0"
      + clone                     = "ubuntu-2004-cloudinit-template"
      + clone_wait                = 15
      + cores                     = 2
      + cpu                       = "host"
      + default_ipv4_address      = (known after apply)
      + define_connection_info    = true
      + force_create              = false
      + full_clone                = true
      + guest_agent_ready_timeout = 600
      + hotplug                   = "network,disk,usb"
      + id                        = (known after apply)
      + ipconfig0                 = "ip=10.98.1.91/24,gw=10.98.1.1"
      + kvm                       = true
      + memory                    = 2048
      + name                      = "test-vm-1"
      + nameserver                = (known after apply)
      + numa                      = false
      + onboot                    = true
      + os_type                   = "cloud-init"
      + preprovision              = true
      + reboot_required           = (known after apply)
      + scsihw                    = "virtio-scsi-pci"
      + searchdomain              = (known after apply)
      + sockets                   = 1
      + ssh_host                  = (known after apply)
      + ssh_port                  = (known after apply)
      + sshkeys                   = <<-EOT
              ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDcwZAOfqf6E6p8IkrurF2vR3NccPbMlXFPaFe2+Eh/8QnQCJVTL6PKduXjXynuLziC9cubXIDzQA+4OpFYUV2u0fAkXLOXRIwgEmOrnsGAqJTqIsMC3XwGRhR9M84c4XPAX5sYpOsvZX/qwFE95GAdExCUkS3H39rpmSCnZG9AY4nPsVRlIIDP+/6YSy9KWp2YVYe5bDaMKRtwKSq3EOUhl3Mm8Ykzd35Z0Cysgm2hR2poN+EB7GD67fyi+6ohpdJHVhinHi7cQI4DUp+37nVZG4ofYFL9yRdULlHcFa9MocESvFVlVW0FCvwFKXDty6askpg9yf4FnM0OSbhgqXzD austin@EARTH
        EOT
      + target_node               = "prox-1u"
      + unused_disk               = (known after apply)
      + vcpus                     = 0
      + vlan                      = -1
      + vmid                      = (known after apply)

      + disk {
          + backup       = 0
          + cache        = "none"
          + file         = (known after apply)
          + format       = (known after apply)
          + iothread     = 1
          + mbps         = 0
          + mbps_rd      = 0
          + mbps_rd_max  = 0
          + mbps_wr      = 0
          + mbps_wr_max  = 0
          + media        = (known after apply)
          + replicate    = 0
          + size         = "10G"
          + slot         = 0
          + ssd          = 0
          + storage      = "local-zfs"
          + storage_type = (known after apply)
          + type         = "scsi"
          + volume       = (known after apply)
        }

      + network {
          + bridge    = "vmbr0"
          + firewall  = false
          + link_down = false
          + macaddr   = (known after apply)
          + model     = "virtio"
          + queues    = (known after apply)
          + rate      = (known after apply)
          + tag       = -1
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

────────────────────────────────────────────────────────────────────────────────────────────────

Note: You didn't use the -out option to save this plan, so Terraform can't guarantee to take
exactly these actions if you run "terraform apply" now.

You can see the output of the planning phase of Terraform. It is telling us it will create proxmox_vm_qemu.test_server[0] with a list of parameters. You can double-check the IP address here, as well as the rest of the basic settings. At the bottom is the summary – “Plan: 1 to add, 0 to change, 0 to destroy.” Also note that it tells us again what step to run next – “terraform apply”.

6 – Execute the Terraform plan and watch the VMs appear!

With the summary stating what we want, we can now apply the plan (terraform apply). Note that it prompts you to type in ‘yes’ to apply the changes after it determines what the changes are. It typically takes 1m15s +/- 15s for my VMs to get created.

If all goes well, you will be informed that 1 resource was added!

Command and full output:

austin@EARTH:/mnt/c/Users/Austin/terraform-blog$ terraform apply

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # proxmox_vm_qemu.test_server[0] will be created
  + resource "proxmox_vm_qemu" "test_server" {
      + additional_wait           = 15
      + agent                     = 1
      + balloon                   = 0
      + bios                      = "seabios"
      + boot                      = "cdn"
      + bootdisk                  = "scsi0"
      + clone                     = "ubuntu-2004-cloudinit-template"
      + clone_wait                = 15
      + cores                     = 2
      + cpu                       = "host"
      + default_ipv4_address      = (known after apply)
      + define_connection_info    = true
      + force_create              = false
      + full_clone                = true
      + guest_agent_ready_timeout = 600
      + hotplug                   = "network,disk,usb"
      + id                        = (known after apply)
      + ipconfig0                 = "ip=10.98.1.91/24,gw=10.98.1.1"
      + kvm                       = true
      + memory                    = 2048
      + name                      = "test-vm-1"
      + nameserver                = (known after apply)
      + numa                      = false
      + onboot                    = true
      + os_type                   = "cloud-init"
      + preprovision              = true
      + reboot_required           = (known after apply)
      + scsihw                    = "virtio-scsi-pci"
      + searchdomain              = (known after apply)
      + sockets                   = 1
      + ssh_host                  = (known after apply)
      + ssh_port                  = (known after apply)
      + sshkeys                   = <<-EOT
              ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDcwZAOfqf6E6p8IkrurF2vR3NccPbMlXFPaFe2+Eh/8QnQCJVTL6PKduXjXynuLziC9cubXIDzQA+4OpFYUV2u0fAkXLOXRIwgEmOrnsGAqJTqIsMC3XwGRhR9M84c4XPAX5sYpOsvZX/qwFE95GAdExCUkS3H39rpmSCnZG9AY4nPsVRlIIDP+/6YSy9KWp2YVYe5bDaMKRtwKSq3EOUhl3Mm8Ykzd35Z0Cysgm2hR2poN+EB7GD67fyi+6ohpdJHVhinHi7cQI4DUp+37nVZG4ofYFL9yRdULlHcFa9MocESvFVlVW0FCvwFKXDty6askpg9yf4FnM0OSbhgqXzD austin@EARTH
        EOT
      + target_node               = "prox-1u"
      + unused_disk               = (known after apply)
      + vcpus                     = 0
      + vlan                      = -1
      + vmid                      = (known after apply)

      + disk {
          + backup       = 0
          + cache        = "none"
          + file         = (known after apply)
          + format       = (known after apply)
          + iothread     = 1
          + mbps         = 0
          + mbps_rd      = 0
          + mbps_rd_max  = 0
          + mbps_wr      = 0
          + mbps_wr_max  = 0
          + media        = (known after apply)
          + replicate    = 0
          + size         = "10G"
          + slot         = 0
          + ssd          = 0
          + storage      = "local-zfs"
          + storage_type = (known after apply)
          + type         = "scsi"
          + volume       = (known after apply)
        }

      + network {
          + bridge    = "vmbr0"
          + firewall  = false
          + link_down = false
          + macaddr   = (known after apply)
          + model     = "virtio"
          + queues    = (known after apply)
          + rate      = (known after apply)
          + tag       = -1
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

proxmox_vm_qemu.test_server[0]: Creating...
proxmox_vm_qemu.test_server[0]: Still creating... [10s elapsed]
proxmox_vm_qemu.test_server[0]: Still creating... [20s elapsed]
proxmox_vm_qemu.test_server[0]: Still creating... [30s elapsed]
proxmox_vm_qemu.test_server[0]: Still creating... [40s elapsed]
proxmox_vm_qemu.test_server[0]: Still creating... [50s elapsed]
proxmox_vm_qemu.test_server[0]: Still creating... [1m0s elapsed]
proxmox_vm_qemu.test_server[0]: Creation complete after 1m9s [id=prox-1u/qemu/142]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Now go check Proxmox and see if your VM was created:

Successfully added a virtual machine (VM) to Proxmox with Terraform

Success! You should now be able to SSH into the new VM with the key you already provided (note: the username will be ‘ubuntu’, not whatever you had set in your key).

Last – Removing the test VM

I just set the count to 0 for the resource in the main.tf file and apply and the VM is stopped and destroyed.

Conclusion

This felt like a quick-n-dirty tutorial for how to use Terraform to deploy virtual machines in Proxmox but looking back, there is a decent amount of detail. It took me quite a while to work through permission issues, hostnames being invalid (turns out you can’t have underscores (_) in hostnames, duh, that took an hour to find), assigning roles to users vs the associated API keys, etc. but I’m glad I worked through everything and can pass it along. Check back soon for my next post on using Terraform to deploy a full set of Kubernetes machines to a Proxmox cluster (and thrilling sequel to that post, Using Ansible to bootstrap a Kubernetes Cluster)!

References

Tags cloud-init, kubernetes, linux, proxmox, terraform, ubuntu

homelab Kubernetes Linux proxmox

How to create a Proxmox Ubuntu cloud-init image

Post author By Austin
Post date August 30, 2021
35 Comments on How to create a Proxmox Ubuntu cloud-init image

Background for why I wanted to make a Proxmox Ubuntu cloud-init image

I have recently ventured down the path of attempting to learn CI/CD concepts. I have tried docker multiple times and haven’t really enjoyed the nuances any of the times. To me, LXC/LXD containers are far easier to understand than Docker when coming from a ‘one VM per service’ background. LXC/LXD containers can be assigned IP addresses (or get them from DHCP) and otherwise behave basically exactly like a VM from a networking perspective. Docker’s networking model is quite a bit more nuanced. Lots of people say it’s easier, but having everything run on ‘localhost:[high number port]’ doesn’t work well when you’ve got lots of services, unless you do some reverse proxying, like with Traefik or similar. Which is another configuration step.

It is so much easier to just have a LXC get an IP via DHCP and then it’s accessible from hostname right off the bat (I use pfSense for DHCP/DNS – all DHCP leases are entered right into DNS). Regardless, I know Kubernetes is the new hotness so I figured I need to learn it. Every tutorial says you need a master and at least two worker nodes. No sense making three separate virtual machines – let’s use the magic of virtualization and clone some images! I plan on using Terraform to deploy the virtual machines for my Kubernetes cluster (as in, I’ve already used this Proxmox Ubuntu cloud-init image to make my own Kubernetes nodes but haven’t documented it yet).

Overview

The quick summary for this tutorial is:

Download a base Ubuntu cloud image
Install some packages into the image
Create a Proxmox VM using the image
Convert it to a template
Clone the template into a full VM and set some parameters
Automate it so it runs on a regular basis (extra credit)?
???
Profit!

Youtube Video Link

If you prefer video versions to follow along, please head on over to https://youtu.be/1sPG3mFVafE for a live action video of me creating the Proxmox Ubuntu cloud-init image and why we’re running each command.

#1 – Downloading the base Ubuntu image

Luckily, Ubuntu (my preferred distro, guessing others do the same) provides base images that are updated on a regular basis – https://cloud-images.ubuntu.com/. We are interested in the “current” release of Ubuntu 20.04 Focal, which is the current Long Term Support version. Further, since Proxmox uses KVM, we will be pulling that image:

wget https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img

#2 – Install packages

It took me quite a while into my Terraform debugging process to determine that qemu-guest-agent wasn’t included in the cloud-init image. Why it isn’t, I have no idea. Luckily there is a very cool tool that I just learned about that enables installing packages directly into a image. The tool is called virt-customize and it comes in the libguestfs-tools package (“libguestfs is a set of tools for accessing and modifying virtual machine (VM) disk images” – https://www.libguestfs.org/).

Install the tools:

sudo apt update -y && sudo apt install libguestfs-tools -y

Then install qemu-guest-agent into the newly downloaded image:

sudo virt-customize -a focal-server-cloudimg-amd64.img --install qemu-guest-agent

At this point you can presumably install whatever else you want into the image but I haven’t tested installing other packages. qemu-guest-agent was what I needed to get the VM recognized by Terraform and accessible.

Update 2021-12-30 – it is possible to inject the SSH keys into the cloud image itself before turning it into a template and VM. You need to create a user first and the necessary folders:

# not quite working yet. skip this and continue
#sudo virt-customize -a focal-server-cloudimg-amd64.img --run-command 'useradd austin'
#sudo virt-customize -a focal-server-cloudimg-amd64.img --run-command 'mkdir -p /home/austin/.ssh'
#sudo virt-customize -a focal-server-cloudimg-amd64.img --ssh-inject austin:file:/home/austin/.ssh/id_rsa.pub
#sudo virt-customize -a focal-server-cloudimg-amd64.img --run-command 'chown -R austin:austin /home/austin'

#3 – Create a Proxmox virtual machine using the newly modified image

The commands here should be relatively self explanatory but in general we are creating a VM (VMID=9000, basically every other resource I saw used this ID so we will too) with basic resources (2 cores, 2048MB), assigning networking to a virtio adapter on vmbr0, importing the image to storage (your storage here will be different if you’re not using ZFS, probably either ‘local’ or ‘local-lvm’), setting disk 0 to use the image, setting boot drive to disk, setting the cloud init stuff to ide2 (which is apparently appears as a CD-ROM to the VM, at least upon inital boot), and adding a virtual serial port. I had only used qm to force stop VMs before this but it’s pretty useful.

sudo qm create 9000 --name "ubuntu-2004-cloudinit-template" --memory 2048 --cores 2 --net0 virtio,bridge=vmbr0
sudo qm importdisk 9000 focal-server-cloudimg-amd64.img local-zfs
sudo qm set 9000 --scsihw virtio-scsi-pci --scsi0 local-zfs:vm-9000-disk-0
sudo qm set 9000 --boot c --bootdisk scsi0
sudo qm set 9000 --ide2 local-zfs:cloudinit
sudo qm set 9000 --serial0 socket --vga serial0
sudo qm set 9000 --agent enabled=1

You can start the VM up at this point if you’d like and make any other changes you want because the next step is converting it to a template. If you do boot it, I will be completely honest I have no idea how to log into it. I actually just googled this because I don’t want to leave you without an answer – looks like you can use the same virt-customize we used before to set a root password according to stackoverflow (https://stackoverflow.com/questions/29137679/login-credentials-of-ubuntu-cloud-server-image). Not going to put that into a command window here because cloud-init is really meant for public/private key authentication (see post here for a quick SSH tutorial).

#4 – Convert VM to a template

Ok if you made any changes, shut down the VM. If you didn’t boot the VM, that’s perfectly fine also. We need to convert it to a template:

sudo qm template 9000

And now we have a functioning template!

screenshot of proxmox ui showing ubuntu 20.04 cloud-init template

#5 – Clone the template into a full VM and set some parameters

From this point you can clone the template as much as you want. But, each time you do so it makes sense to set some parameters, namely the SSH keys present in the VM as well as the IP address for the main interface. You could also add the SSH keys with virt-customize but I like doing it here.

First, clone the VM (here we are cloning the template with ID 9000 to a new VM with ID 999):

sudo qm clone 9000 999 --name test-clone-cloud-init

Next, set the SSH keys and IP address:

sudo qm set 999 --sshkey ~/.ssh/id_rsa.pub
sudo qm set 999 --ipconfig0 ip=10.98.1.96/24,gw=10.98.1.1

It’s now ready to start up!

sudo qm start 999

You should be able to log in without any problems (after trusting the SSH fingerprint). Note that the username is ‘ubuntu’, not whatever the username is for the key you provided.

ssh [email protected]

Once you’re happy with how things worked, you can stop the VM and clean up the resources:

sudo qm stop 999 && sudo qm destroy 999
rm focal-server-cloudimg-amd64.img

#6 – automating the process

I have not done so yet, but if you create VMs on a somewhat regular basis, it wouldn’t be hard to stick all of the above into a simple shell script (update 2022-04-19: simple shell script below) and run it via cron on a weekly basis or whatever frequency you prefer. I can’t tell you how many times I make a new VM from whatever .iso I downloaded and the first task is apt upgrade taking forever to run (‘sudo apt update’ –> “176 packages can be upgraded”). Having a nice template always ready to go would solve that issue and would frankly save me a ton of time.

#6.5 – Shell script to create template

# installing libguestfs-tools only required once, prior to first run
sudo apt update -y
sudo apt install libguestfs-tools -y

# remove existing image in case last execution did not complete successfully
rm focal-server-cloudimg-amd64.img
wget https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img
sudo virt-customize -a focal-server-cloudimg-amd64.img --install qemu-guest-agent
sudo qm create 9000 --name "ubuntu-2004-cloudinit-template" --memory 2048 --cores 2 --net0 virtio,bridge=vmbr0
sudo qm importdisk 9000 focal-server-cloudimg-amd64.img local-zfs
sudo qm set 9000 --scsihw virtio-scsi-pci --scsi0 local-zfs:vm-9000-disk-0
sudo qm set 9000 --boot c --bootdisk scsi0
sudo qm set 9000 --ide2 local-zfs:cloudinit
sudo qm set 9000 --serial0 socket --vga serial0
sudo qm set 9000 --agent enabled=1
sudo qm template 9000
rm focal-server-cloudimg-amd64.img
echo "next up, clone VM, then expand the disk"
echo "you also still need to copy ssh keys to the newly cloned VM"

#7-8 – Using this template with Terraform to automate VM creation

Next post – How to deploy VMs in Proxmox with Terraform

References

https://matthewkalnins.com/posts/home-lab-setup-part-1-proxmox-cloud-init/
https://registry.terraform.io/modules/sdhibit/cloud-init-vm/proxmox/latest/examples/ubuntu_single_vm

My original notes

https://matthewkalnins.com/posts/home-lab-setup-part-1-proxmox-cloud-init/
https://registry.terraform.io/modules/sdhibit/cloud-init-vm/proxmox/latest/examples/ubuntu_single_vm

# create cloud image VM
wget https://cloud-images.ubuntu.com/focal/20210824/focal-server-cloudimg-amd64.img
sudo qm create 9000 --name "ubuntu-2004-cloudinit-template" --memory 2048 --cores 2 --net0 virtio,bridge=vmbr0

# to install qemu-guest-agent or whatever into the guest image
#sudo apt-get install libguestfs-tools
#virt-customize -a focal-server-cloudimg-amd64.img --install qemu-guest-agent
sudo qm importdisk 9000 focal-server-cloudimg-amd64.img local-zfs
sudo qm set 9000 --scsihw virtio-scsi-pci --scsi0 local-zfs:vm-9000-disk-0
sudo qm set 9000 --boot c --bootdisk scsi0
sudo qm set 9000 --ide2 local-zfs:cloudinit
sudo qm set 9000 --serial0 socket --vga serial0
sudo qm template 9000

# clone cloud image to new VM
sudo qm clone 9000 999 --name test-clone-cloud-init
sudo qm set 999 --sshkey ~/.ssh/id_rsa.pub
sudo qm set 999 --ipconfig0 ip=10.98.1.96/24,gw=10.98.1.1
sudo qm start 999
  
# remove known host because SSH key changed
ssh-keygen -f "/home/austin/.ssh/known_hosts" -R "10.98.1.96"

# ssh in
ssh -i ~/.ssh/id_rsa [email protected]

# stop and destroy VM
sudo qm stop 999 && sudo qm destroy 999

Tags cloud-init, kubernetes, linux, proxmox, ubuntu