This guide explains how to reinstall a compute node from Centos8 Stream-Yoga to AlmaLinux9-Yoga
In the following instructions the node cld-np-19, an INFN node, is the one to be reinstalled
First of all take note of the IP addresses and the interfaces name used by the VM for the management and data networks (192.168.60.x and 192.168.61.x) and the relevant interfaces
[root@cld-np-19 ~]# nmcli eno1: connected to System eno1 "Intel X710" ethernet (i40e), 34:E6:D7:F5:53:0D, hw, port e4434b3a8890, mtu 1500 ip4 default inet4 192.168.60.129/24 route4 192.168.60.0/24 metric 100 route4 default via 192.168.60.254 metric 100 inet6 fe80::36e6:d7ff:fef5:530d/64 route6 fe80::/64 metric 1024 eno3: connected to eno3 "Intel X710" ethernet (i40e), 34:E6:D7:F5:53:0F, hw, port e4434b3a8892, mtu 9000 inet4 192.168.61.129/24 route4 192.168.61.0/24 metric 101 inet6 fe80::65:ec7c:7325:3bec/64 route6 fe80::/64 metric 1024 ... ... |
Check if the host has 1 disk with more partitions or 2 disks (usually the first disk sda is used for Operating System the second sdb for the /var/lib/nova/instances). Check also if there is a BIOS boot partition used for EFI. This will be useful for choose one partition talbe in foreman we interface.
Use command fdisk and df: here some examples:
1) In this first example the cld-nl-24 host has two disks: sda with OS ad sdb with /var/lib/nova/instances in sda there isnt a BIOS Boot (so no EFI); [root@cld-nl-24 ~]# fdisk -l Disk /dev/sda: 893.8 GiB, 959656755200 bytes, 1874329600 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 262144 bytes / 262144 bytes Disklabel type: dos Disk identifier: 0x1d6bbd3d Device Boot Start End Sectors Size Id Type /dev/sda1 * 2048 2099199 2097152 1G 83 Linux /dev/sda2 2099200 264243199 262144000 125G 82 Linux swap / Solaris /dev/sda3 264243200 1874329599 1610086400 767.8G 83 Linux Disk /dev/sdb: 2.2 TiB, 2399276105728 bytes, 4686086144 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 262144 bytes / 524288 bytes [root@cld-nl-24 ~]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 252G 0 252G 0% /dev tmpfs 252G 4.0K 252G 1% /dev/shm tmpfs 252G 4.0G 248G 2% /run tmpfs 252G 0 252G 0% /sys/fs/cgroup /dev/sda3 755G 5.2G 712G 1% / /dev/sda1 976M 269M 641M 30% /boot /dev/sdb 2.2T 281G 2.0T 13% /var/lib/nova/instances tmpfs 51G 0 51G 0% /run/user/0 2) In this second example cld-dfa-gpu-01 has 2 disks: sda use for OS with BIOS boot partition (EFI) and the nvme0n1p1 has the /var/lib/nova/instances mounted on it; [root@cld-dfa-gpu-01 ~]# fdisk -l Disk /dev/nvme0n1: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0xc6f9ace9 Device Boot Start End Sectors Size Id Type /dev/nvme0n1p1 2048 3907029167 3907027120 1.8T 83 Linux Disk /dev/sda: 3.7 TiB, 4000225165312 bytes, 7812939776 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 65536 bytes / 65536 bytes Disklabel type: gpt Disk identifier: 0A02235E-3DA4-4F30-942E-EE3AC02107C4 Device Start End Sectors Size Type /dev/sda1 2048 4095 2048 1M BIOS boot /dev/sda2 4096 2101247 2097152 1G Linux filesystem /dev/sda3 2101248 264245247 262144000 125G Linux swap /dev/sda4 264245248 7812937727 7548692480 3.5T Linux filesystem [root@cld-dfa-gpu-01 ~]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 79G 0 79G 0% /dev tmpfs 79G 0 79G 0% /dev/shm tmpfs 79G 4.1G 75G 6% /run tmpfs 79G 0 79G 0% /sys/fs/cgroup /dev/sda4 3.5T 5.1G 3.3T 1% / /dev/nvme0n1p1 1.8T 422G 1.3T 25% /var/lib/nova/instances /dev/sda2 976M 231M 679M 26% /boot tmpfs 16G 0 16G 0% /run/user/0 3) In this third example has just one disk with more partitions, without BIOS boot (so no EFI); [root@cld-np-15 ~]# fdisk -l Disk /dev/sda: 1.1 TiB, 1199638052864 bytes, 2343043072 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0xc572f99b Device Boot Start End Sectors Size Id Type /dev/sda1 * 2048 2099199 2097152 1G 83 Linux /dev/sda2 2099200 264243199 262144000 125G 82 Linux swap / Solaris /dev/sda3 264243200 327157759 62914560 30G 83 Linux /dev/sda4 327157760 2343043071 2015885312 961.3G 5 Extended /dev/sda5 327159808 2343043071 2015883264 961.3G 83 Linux [root@cld-np-15 ~]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 63G 0 63G 0% /dev tmpfs 63G 4.0K 63G 1% /dev/shm tmpfs 63G 4.0G 59G 7% /run tmpfs 63G 0 63G 0% /sys/fs/cgroup /dev/sda3 30G 4.9G 24G 18% / /dev/sda1 976M 253M 657M 28% /boot /dev/sda5 946G 175G 723G 20% /var/lib/nova/instances tmpfs 13G 0 13G 0% /run/user/0 |
Then from one controller node (cld-ctrl-01 or cld-ctrl-02) disable the compute node, so that no new VMs will be instantiated on this compute node:
[root@cld-ctrl-01 ~]# source admin-openrc.sh [root@cld-ctrl-01 ~]# openstack compute service set --disable cld-np-19.cloud.pd.infn.it nova-compute |
To check if the compute node was actually disabled:
[root@cld-ctrl-01 ~]# openstack compute service list |
Then finds all the virtual machines instantiated on this compute node:
openstack server list --all --host cld-np-19.cloud.pd.infn.it |
Then migrate each VM instantiated on this compute node to other hypervisors.
To avoid the migration of the same virtual machine multiple times, please migrate the VM on a compute node that was already migrated to AlmaLinux9
The migration of a virtual machine usually requires a downtime (~ 15') of that VM, unless the VM was created using a volume. So before migrating the VM you must agree this operation with the relevant owner.
E.g. to find the owner of a VM with UUID be832766-7996-4ccf-a9c1-5eec2e5f8fd3:
[root@cld-ctrl-01 ~]# openstack server show be832766-7996-4ccf-a9c1-5eec2e5f8fd3 | grep user_id | user_id | 7681252c430040e7bb96c7c4f7c88464 | [root@cld-ctrl-01 ~]# openstack user show 7681252c430040e7bb96c7c4f7c88464 +---------------------+----------------------------------+ | Field | Value | +---------------------+----------------------------------+ | domain_id | default | | email | Ysabella.Ong@lnl.infn.it | | enabled | True | | id | 7681252c430040e7bb96c7c4f7c88464 | | name | ysaong@infn.it | | options | {} | | password_expires_at | None | +---------------------+----------------------------------+ |
To migrate a virtual machine to another compute node, there are two possible procedures:
Check if there are any VM not more connected to nova using libvirt client:
virsh list --all |
Once there are no more VMs on that compute node, reinstall the node with AlmaLinux9 using foreman (https://cld-config.cloud.pd.infn.it/users/login)
Go to Hosts → All hosts → cld-np-19.cloud.pd.infn.it
Edit the host in this way:
If there are some missing partition table, one more can be created in foreman web interface, make a clone from the one more similar and change the disk name or size or adding EFI biosboot partition, ....
Save the changes and then build the node. Open a remote console (via https://blade-cld-rmc.lan/cgi-bin/webcgi/login) to reboot the compute
When the node restarts after the update, make sure that SELinux is disabled:
[root@cld-np-19 ~]# getenforce Disabled |
Then do an update of the packages (probabily only puppet will be updated):
[root@cld-np-19 ~]# yum clean all [root@cld-np-19 ~]# yum update -y |
Once the node has been reinstalled with AlmaLinux9, configure the data network
If in the compute node there is a dedicated interface (like cld-np-19 eno3) for data lan
Use the correct ip and interface name in the commands below changing the name and ip with yours (change "eno3" and "192.168.61.129")
[root@cld-np-19 ~]# nmcli con add type ethernet ifname eno3 (change here the interface name, eno3, with your one) [root@cld-np-19 ~]# nmcli con mod eno3 ipv4.method manual ipv4.addr "192.168.61.129/24" (change here the interface name, eno3 and use the corretct ip in data network) [root@cld-np-19 ~]# nmcli con mod eno3 connection.autoconnect true [root@cld-np-19 ~]# nmcli con mod eno3 802-3-ethernet.mtu 9000 [root@cld-np-19 ~]# nmcli con up eno3 [root@cld-np-19 ~]# ip link set eno3 mtu 9000 |
If the interfaces used the addresses with the tagged network (VLAN 302 andl VLAN 301 in same interface) use these commands.
Use the correct ip and interface name the commands below (change "enp2s0f0.302" and "192.168.61.129")
[root@cld-nl-24 ~]# nmcli con add type vlan ifname enp2s0f0.302 dev enp2s0f0 id 302 [root@cld-nl-24 ~]# nmcli con mod vlan-enp2s0f0.302 ipv4.method manual ipv4.addr "192.168.61.129/24" [root@cld-np-24 ~]# nmcli con up vlan-enp2s0f0.302 [root@cld-np-24 ~]# nmcli con mod vlan-enp2s0f0.302 802-3-ethernet.mtu 9000 [root@cld-np-24 ~]# ip link set enp2s0f0.302 mtu 9000 |
The address to be used must be the original one
MTU must be 9000
If this is a DELL host, reinstall DELL OpenManage, as explained here
Check on Nagios that the relevant checks get green
Then configure the node as Openstack compute node using puppet
Stop puppet:
systemctl stop puppet |
In foreman move the host under the ComputeNode-Prod_Yoga.el9 hostgroup
Run puppet manually:
puppet agent -t |
If the configuration fails reporting
error: "net.bridge.bridge-nf-call-ip6tables" is an unknown key error: "net.bridge.bridge-nf-call-iptables" is an unknown key error: "net.bridge.bridge-nf-call-arptables" is an unknown key |
then please issue:
modprobe br_netfilter |
and then rerun puppet
If the procedure terminated without error, enable puppet and reboot the host
systemctl enable puppet shutdown -r now |
Enable the nagios check for lv
Log on cld-nagios.cloud.pd.infn.it
cd /etc/nagios/objects/ |
Edit cloudcomputenodes.cfg (or the other file where this compute node is defined) and add a passive check:
define service{ use LVS ; Name of service template to use host_name cld-np-19 service_description LVS freshness_threshold 28800 } |
Make sure there are no problems in the configuration:
[root@cld-nagios objects]# nagios -v /etc/nagios/nagios.cfg |
and restart nagios to have applied the new check:
[root@cld-nagios objects]# systemctl restart nagios |
openstack compute service set --enable cld-np-19.cloud.pd.infn.it nova-compute |
Within 24 hours, the "VM network" nagios check should get green