Recently ran into an issue where I added a node to my Proxmox cluster while a node was disconnected & off. That node (prox) caused the others to become unresponsive for a number of Proxmox things (it was missing from the cluster) upon boot.
Set node to local mode
The solution was to put the node that had been offline (called prox) into “local” mode. Thanks to Nicholas of Technicus / techblog.jeppson.org for the commands to do so:
sudo systemctl stop pve-cluster sudo /usr/bin/pmxcfs -l
This allows editing of the all-important /etc/pve/corosync.conf file.
Manually update corosync.conf
I basically just had to copy over the config present on the two synchronized nodes to node prox, then reboot. This allowed node prox to join the cluster again and things started working fine.
Problem corosync.conf on node prox:
logging { debug: off to_syslog: yes } nodelist { node { name: prox nodeid: 1 quorum_votes: 1 ring0_addr: 10.98.1.14 } node { name: prox-1u nodeid: 2 quorum_votes: 3 ring0_addr: 10.98.1.15 } } quorum { provider: corosync_votequorum } totem { cluster_name: prox-cluster config_version: 4 interface { linknumber: 0 } ip_version: ipv4-6 secauth: on version: 2 }
Fancy new corosync.conf on nodes prox-1u and prox-m92p:
logging { debug: off to_syslog: yes } nodelist { node { name: prox nodeid: 1 quorum_votes: 1 ring0_addr: 10.98.1.14 } node { name: prox-1u nodeid: 2 quorum_votes: 3 ring0_addr: 10.98.1.15 } node { name: prox-m92p nodeid: 3 quorum_votes: 1 ring0_addr: 10.98.1.92 } } quorum { provider: corosync_votequorum } totem { cluster_name: prox-cluster config_version: 5 interface { linknumber: 0 } ip_version: ipv4-6 secauth: on version: 2 }
The difference is that third node item as well as incrementing the config_version from 4 to 5. After I made those changes on node prox and rebooted, things worked fine.