This guide explains how to reinstall a compute node from Centos8 Stream-Yoga to AlmaLinux9-Yoga
In the following instructions the node cld-np-19, an INFN node, is the one to be reinstalled
Take note of the IP addresses and the interfaces used by the VM
First of all take note of the IP addresses used by the VM for the management and data networks (192.168.60.x and 192.168.61.x) ) and the relevant interfaces
[root@cld-np-19 ~]# nmcli eno1: connected to System eno1 "Intel X710" ethernet (i40e), 34:E6:D7:F5:53:0D, hw, port e4434b3a8890, mtu 1500 ip4 default inet4 192.168.60.129/24 route4 192.168.60.0/24 metric 100 route4 default via 192.168.60.254 metric 100 inet6 fe80::36e6:d7ff:fef5:530d/64 route6 fe80::/64 metric 1024 eno3: connected to eno3 "Intel X710" ethernet (i40e), 34:E6:D7:F5:53:0F, hw, port e4434b3a8892, mtu 9000 inet4 192.168.61.129/24 route4 192.168.61.0/24 metric 101 inet6 fe80::65:ec7c:7325:3bec/64 route6 fe80::/64 metric 1024 ... ...
Disable the compute node
Then disable the compute node, so that no new VMs will be instantiated on this compute node:
openstack compute service set --disable cld-np-19.cloud.pd.infn.it nova-compute
To check if the compute node was actually disabled:
openstack compute service list
Finds the VM instantiated on this compute node
Then finds all the virtual machines instantiated on this compute node:
openstack server list --all --host cld-np-19.cloud.pd.infn.it
Migrate the VMs instantiated on this compute node
Then migrate each VM instantiated on this compute node to other hypervisors.
To avoid the migration of the same virtual machine multiple times, please migrate the VM on a compute node that was already migrated to AlmaLinux9
The migration of a virtual machine usually requires a downtime (~ 15') of that VM, unless the VM was created using a volume. So before migrating the VM you must agree this operation with the relevant owner.
E.g. to find the owner of a VM with UUID be832766-7996-4ccf-a9c1-5eec2e5f8fd3:
[root@cld-ctrl-01 ~]# openstack server show be832766-7996-4ccf-a9c1-5eec2e5f8fd3 | grep user_id | user_id | 7681252c430040e7bb96c7c4f7c88464 | [root@cld-ctrl-01 ~]# openstack user show 7681252c430040e7bb96c7c4f7c88464 +---------------------+----------------------------------+ | Field | Value | +---------------------+----------------------------------+ | domain_id | default | | email | Ysabella.Ong@lnl.infn.it | | enabled | True | | id | 7681252c430040e7bb96c7c4f7c88464 | | name | ysaong@infn.it | | options | {} | | password_expires_at | None | +---------------------+----------------------------------+
To migrate a virtual machine to another compute node, there are two possible procedures:
- Procedure to be used if the VM was created using a volume (the migration procedure doesn't require a downtime)
- Procedure to be used if the VM was created using ephemeral storage (the migration procedure requires a downtime)
Reinstall the node with AlmaLinux9
Once there are no more VMs on that compute node, reinstall the node with AlmaLinux9 using foreman
- The hostgroup must be hosts_all
- The kickstart to be used must be TBC
- TBC
When the node restarts after the update, make sure that SELinux is disabled:
[root@cld-np-19 ~]# getenforce Disabled
Then do an update of the packages (probabily only puppet will be updated):
[root@cld-np-19 ~]# yum clean all [root@cld-np-19 ~]# yum update -y
Configure the data network
Once the node has been reinstalled with AlmaLinux9, configure the data network
TBC
The address to be used must be the original one
MTU must be 9000
Reinstall OpenManage
If this is a DELL host, reinstall DELL OpenManage, as explained here
Check on Nagios that the relevant checks get green
Configure the node as Openstack compute node using puppet
Then configure the node as Openstack compute node using puppet
Stop puppet:
systemctl stop puppet
In foreman move the host under the ComputeNode-Prod_Yoga hostgroup
Run puppet manually:
puppet agent -t
If the configuration fails reporting
error: "net.bridge.bridge-nf-call-ip6tables" is an unknown key error: "net.bridge.bridge-nf-call-iptables" is an unknown key error: "net.bridge.bridge-nf-call-arptables" is an unknown key
then please issue:
modprobe br_netfilter
and then rerun puppet
If the procedure terminated without error, enable puppet and reboot the host
systemctl enable puppet shutdown -r now
Enable the nagios check for lv
Enable the nagios check for lv
Log on cld-nagios.cloud.pd.infn.it
cd /etc/nagios/objects/
Edit cloudcomputenodes.cfg (or the other file where this compute node is defined) and add a passive check:
define service{ use LVS ; Name of service template to use host_name cld-np-19 service_description LVS freshness_threshold 28800 }
Make sure there are no problems in the configuration:
[root@cld-nagios objects]# nagios -v /etc/nagios/nagios.cfg
and restart nagios to have applied the new check:
[root@cld-nagios objects]# systemctl restart nagios
Some Checks
- Verify that the checks on Nagios gets green (only the "VM network" should stay in error)
- Verify that a ssh from root@cld-log to root@cld-np-19 and to root@cld-np-19.cloud.pd.infn.it works without requiring password
- Verify that a ssh from nagios@cld-nagios to nagios@cld-np-19 and to nagios@cld-np-19.cloud.pd.infn.it works without requiring password
Re-enable the compute node
openstack compute service set --enable cld-np-19.cloud.pd.infn.it nova-compute
Within 24 hours, the "VM network" nagios check should get green