...
In the following instructions the node cld-np-19, an INFN node, is the one to be reinstalled
Table of Contents |
---|
Take note of the IP addresses and the interfaces used by the VM
First of all take note of the IP addresses and the interfaces name used by the VM for the management and data networks (192.168.60.x and 192.168.61.x) ) and the relevant interfaces
...
Code Block |
---|
[root@cld-np-19 ~]# nmcli
eno1: connected to System eno1
"Intel X710"
ethernet (i40e), 34:E6:D7:F5:53:0D, hw, port e4434b3a8890, mtu 1500
ip4 default
inet4 192.168.60.129/24
route4 192.168.60.0/24 metric 100
route4 default via 192.168.60.254 metric 100
inet6 fe80::36e6:d7ff:fef5:530d/64
route6 fe80::/64 metric 1024
eno3: connected to eno3
"Intel X710"
ethernet (i40e), 34:E6:D7:F5:53:0F, hw, port e4434b3a8892, mtu 9000
inet4 192.168.61.129/24
route4 192.168.61.0/24 metric 101
inet6 fe80::65:ec7c:7325:3bec/64
route6 fe80::/64 metric 1024
...
... |
Then disable the compute node, so that no new VMs will be instantiated on this compute node:
Code Block |
---|
openstack compute service set --disable cld-np-19.cloud.pd.infn.it nova-compute |
To check if the compute node was actually disabled:
Code Block |
---|
openstack compute service list |
Then finds all the virtual machines instantiated on this compute node:
Code Block |
---|
openstack server list --all --host cld-np-19.cloud.pd.infn.it |
Then migrate each VM instantiated on this compute node to other hypervisors.
To avoid the migration of the same virtual machine multiple times, please migrate the VM on a compute node that was already migrated to AlmaLinux9
The migration of a virtual machine usually requires a downtime (~ 15') of that VM, unless the VM was created using a volume. So before migrating the VM you must agree this operation with the relevant owner.
E.g. to find the owner of a VM with UUID be832766-7996-4ccf-a9c1-5eec2e5f8fd3:
Code Block |
---|
[root@cld-ctrl-01 ~]# openstack server show be832766-7996-4ccf-a9c1-5eec2e5f8fd3 | grep user_id
| user_id | 7681252c430040e7bb96c7c4f7c88464 |
[root@cld-ctrl-01 ~]# openstack user show 7681252c430040e7bb96c7c4f7c88464
+---------------------+----------------------------------+
| Field | Value |
+---------------------+----------------------------------+
| domain_id | default |
| email | Ysabella.Ong@lnl.infn.it |
| enabled | True |
| id | 7681252c430040e7bb96c7c4f7c88464 |
| name | ysaong@infn.it |
| options | {} |
| password_expires_at | None |
+---------------------+----------------------------------+
|
To migrate a virtual machine to another compute node, there are two possible procedures:
- Procedure to be used if the VM was created using a volume (the migration procedure doesn't require a downtime)
- Procedure to be used if the VM was created using ephemeral storage (the migration procedure requires a downtime)
Once there are no more VMs on that compute node, reinstall the node with AlmaLinux9 using foreman
- The hostgroup must be hosts_all
- The kickstart to be used must be TBC
- TBC
When the node restarts after the update, make sure that SELinux is disabled:
Code Block |
---|
[root@cld-np-19 ~]# getenforce
Disabled |
Then do an update of the packages (probabily only puppet will be updated):
Code Block |
---|
[root@cld-np-19 ~]# yum clean all
[root@cld-np-19 ~]# yum update -y |
Once the node has been reinstalled with AlmaLinux9, configure the data network
TBC
The address to be used must be the original one
MTU must be 9000
If this is a DELL host, reinstall DELL OpenManage, as explained here
Check on Nagios that the relevant checks get green
Then configure the node as Openstack compute node using puppet
Stop puppet:
Code Block | ||
---|---|---|
| ||
systemctl stop puppet |
In foreman move the host under the ComputeNode-Prod_Yoga hostgroup
Run puppet manually:
Code Block | ||
---|---|---|
| ||
puppet agent -t
|
If the configuration fails reporting
Code Block | ||
---|---|---|
| ||
error: "net.bridge.bridge-nf-call-ip6tables" is an unknown key
error: "net.bridge.bridge-nf-call-iptables" is an unknown key
error: "net.bridge.bridge-nf-call-arptables" is an unknown key |
then please issue:
Code Block | ||
---|---|---|
| ||
modprobe br_netfilter |
and then rerun puppet
If the procedure terminated without error, enable puppet and reboot the host
Code Block | ||
---|---|---|
| ||
systemctl enable puppet
shutdown -r now |
Enable the nagios check for lv
Log on cld-nagios.cloud.pd.infn.it
cd /etc/nagios/objects/
Edit cloudcomputenodes.cfg (or the other file where this compute node is defined) and add a passive check:
Code Block |
---|
define service{
use LVS ; Name of service template to use
host_name cld-np-19
service_description LVS
freshness_threshold 28800
} |
Make sure there are no problems in the configuration:
Code Block |
---|
[root@cld-nagios objects]# nagios -v /etc/nagios/nagios.cfg |
and restart nagios to have applied the new check:
Code Block |
---|
[root@cld-nagios objects]# systemctl restart nagios |
...
Enable the checks in Nagios for this host
...
Wait till all checks are ok (in particular the VM network and volume one: remember to enable the node just for the time to force the run from Nagios; othwerwise the check will fail.)
...
Stop puppet on the 2 controller nodes:
Code Block | ||
---|---|---|
| ||
systemctl stop puppet |
...
Add the new compute node in cld-config:/var/puppet/puppet_yoga/controller_yoga/templates/aai_settings.py.erb
Run puppet on the first controller node (this will trigger a restart of httpd):
Code Block | ||
---|---|---|
| ||
puppet agent -t |
Run puppet on the second controller node (this will trigger a restart of httpd):
Code Block | ||
---|---|---|
| ||
puppet agent -t
|
Start puppet on the two controller nodes:
Code Block | ||
---|---|---|
| ||
systemctl start puppet
|
Enable the host:
Code Block | ||
---|---|---|
| ||
openstack compute service set --enable cld-np-19.cloud.pd.infn.it nova-compute
|
...
Take note of disk or disks configuration
Check if the host has 1 disk with more partitions or 2 disks (usually the first disk sda is used for Operating System the second sdb for the /var/lib/nova/instances). Check also if there is a BIOS boot partition used for EFI. This will be useful for choose one partition talbe in foreman we interface.
Use command fdisk and df: here some examples:
Code Block |
---|
1) In this first example the cld-nl-24 host has two disks: sda with OS ad sdb with /var/lib/nova/instances in sda there isnt a BIOS Boot (so no EFI);
[root@cld-nl-24 ~]# fdisk -l
Disk /dev/sda: 893.8 GiB, 959656755200 bytes, 1874329600 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 262144 bytes / 262144 bytes
Disklabel type: dos
Disk identifier: 0x1d6bbd3d
Device Boot Start End Sectors Size Id Type
/dev/sda1 * 2048 2099199 2097152 1G 83 Linux
/dev/sda2 2099200 264243199 262144000 125G 82 Linux swap / Solaris
/dev/sda3 264243200 1874329599 1610086400 767.8G 83 Linux
Disk /dev/sdb: 2.2 TiB, 2399276105728 bytes, 4686086144 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 262144 bytes / 524288 bytes
[root@cld-nl-24 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 252G 0 252G 0% /dev
tmpfs 252G 4.0K 252G 1% /dev/shm
tmpfs 252G 4.0G 248G 2% /run
tmpfs 252G 0 252G 0% /sys/fs/cgroup
/dev/sda3 755G 5.2G 712G 1% /
/dev/sda1 976M 269M 641M 30% /boot
/dev/sdb 2.2T 281G 2.0T 13% /var/lib/nova/instances
tmpfs 51G 0 51G 0% /run/user/0
2) In this second example cld-dfa-gpu-01 has 2 disks: sda use for OS with BIOS boot partition (EFI) and the nvme0n1p1 has the /var/lib/nova/instances mounted on it;
[root@cld-dfa-gpu-01 ~]# fdisk -l
Disk /dev/nvme0n1: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xc6f9ace9
Device Boot Start End Sectors Size Id Type
/dev/nvme0n1p1 2048 3907029167 3907027120 1.8T 83 Linux
Disk /dev/sda: 3.7 TiB, 4000225165312 bytes, 7812939776 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes
Disklabel type: gpt
Disk identifier: 0A02235E-3DA4-4F30-942E-EE3AC02107C4
Device Start End Sectors Size Type
/dev/sda1 2048 4095 2048 1M BIOS boot
/dev/sda2 4096 2101247 2097152 1G Linux filesystem
/dev/sda3 2101248 264245247 262144000 125G Linux swap
/dev/sda4 264245248 7812937727 7548692480 3.5T Linux filesystem
[root@cld-dfa-gpu-01 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 79G 0 79G 0% /dev
tmpfs 79G 0 79G 0% /dev/shm
tmpfs 79G 4.1G 75G 6% /run
tmpfs 79G 0 79G 0% /sys/fs/cgroup
/dev/sda4 3.5T 5.1G 3.3T 1% /
/dev/nvme0n1p1 1.8T 422G 1.3T 25% /var/lib/nova/instances
/dev/sda2 976M 231M 679M 26% /boot
tmpfs 16G 0 16G 0% /run/user/0
3) In this third example has just one disk with more partitions, without BIOS boot (so no EFI);
[root@cld-np-15 ~]# fdisk -l
Disk /dev/sda: 1.1 TiB, 1199638052864 bytes, 2343043072 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xc572f99b
Device Boot Start End Sectors Size Id Type
/dev/sda1 * 2048 2099199 2097152 1G 83 Linux
/dev/sda2 2099200 264243199 262144000 125G 82 Linux swap / Solaris
/dev/sda3 264243200 327157759 62914560 30G 83 Linux
/dev/sda4 327157760 2343043071 2015885312 961.3G 5 Extended
/dev/sda5 327159808 2343043071 2015883264 961.3G 83 Linux
[root@cld-np-15 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 63G 0 63G 0% /dev
tmpfs 63G 4.0K 63G 1% /dev/shm
tmpfs 63G 4.0G 59G 7% /run
tmpfs 63G 0 63G 0% /sys/fs/cgroup
/dev/sda3 30G 4.9G 24G 18% /
/dev/sda1 976M 253M 657M 28% /boot
/dev/sda5 946G 175G 723G 20% /var/lib/nova/instances
tmpfs 13G 0 13G 0% /run/user/0
|
Disable the compute node
Then from one controller node (cld-ctrl-01 or cld-ctrl-02) disable the compute node, so that no new VMs will be instantiated on this compute node:
Code Block |
---|
[root@cld-ctrl-01 ~]# source admin-openrc.sh
[root@cld-ctrl-01 ~]# openstack compute service set --disable cld-np-19.cloud.pd.infn.it nova-compute |
To check if the compute node was actually disabled:
Code Block |
---|
[root@cld-ctrl-01 ~]# openstack compute service list |
Finds the VM instantiated on this compute node
Then finds all the virtual machines instantiated on this compute node:
Code Block |
---|
openstack server list --all --host cld-np-19.cloud.pd.infn.it |
Migrate the VMs instantiated on this compute node
Then migrate each VM instantiated on this compute node to other hypervisors.
To avoid the migration of the same virtual machine multiple times, please migrate the VM on a compute node that was already migrated to AlmaLinux9
The migration of a virtual machine usually requires a downtime (~ 15') of that VM, unless the VM was created using a volume. So before migrating the VM you must agree this operation with the relevant owner.
E.g. to find the owner of a VM with UUID be832766-7996-4ccf-a9c1-5eec2e5f8fd3:
Code Block |
---|
[root@cld-ctrl-01 ~]# openstack server show be832766-7996-4ccf-a9c1-5eec2e5f8fd3 | grep user_id
| user_id | 7681252c430040e7bb96c7c4f7c88464 |
[root@cld-ctrl-01 ~]# openstack user show 7681252c430040e7bb96c7c4f7c88464
+---------------------+----------------------------------+
| Field | Value |
+---------------------+----------------------------------+
| domain_id | default |
| email | Ysabella.Ong@lnl.infn.it |
| enabled | True |
| id | 7681252c430040e7bb96c7c4f7c88464 |
| name | ysaong@infn.it |
| options | {} |
| password_expires_at | None |
+---------------------+----------------------------------+
|
To migrate a virtual machine to another compute node, there are two possible procedures:
- Procedure to be used if the VM was created using a volume (the migration procedure doesn't require a downtime)
- Procedure to be used if the VM was created using ephemeral storage (the migration procedure requires a downtime)
Check in the Compute orphan VM
Check if there are any VM not more connected to nova using libvirt client:
Code Block |
---|
virsh list --all |
Reinstall the node with AlmaLinux9
Once there are no more VMs on that compute node, reinstall the node with AlmaLinux9 using foreman (https://cld-config.cloud.pd.infn.it/users/login)
Go to Hosts → All hosts → cld-np-19.cloud.pd.infn.it
Edit the host in this way:
- The hostgroup must be hosts_all
- The Operating system has to be set to AlmaLinux 9.2 (if the Operating system form doesn't appear, check in the "manage host" option is selected. If not, select it)
- The media (dispositivo) has to be set to AlmaLinux
- The Partion table to be used must be one with LVM thin but with the disks structure you have seen above. Cloud be:
- Kickstart default - swap 128GB - LVM thin (one disk with more partitions)
- Kickstart default - swap 128GB - 2 DISKS - LVM thin (two disks sda with OS and sdb with /var/lib/nova/instances
- Kickstart default - swap 128GB - 2 DISKS NVME - LVM thin (two disks sda with OS and nvme0n1p1 with /var/lib/nova/instances)
- Kickstart default - swap 128GB - 2 DISKS EFI NVME - LVM thin (two disks sda with OS and a biosboot partition for EFI and nvme0n1p1 with /var/lib/nova/instances)
- Kickstart default - swap 128GB - EFI- LVM thin (one disk with more partitions and a biosboot partition for EFI)
- ....
- Kickstart default - swap 128GB - LVM thin (one disk with more partitions)
If there are some missing partition table, one more can be created in foreman web interface, make a clone from the one more similar and change the disk name or size or adding EFI biosboot partition, ....
- Check if the root password is already set otherwise add it.
- Check if under "Interfaces" the interfaces eno1 and eno2 have the ip (management and data)
Save the changes and then build the node. Open a remote console (via https://blade-cld-rmc.lan/cgi-bin/webcgi/login) to reboot the compute
When the node restarts after the update, make sure that SELinux is disabled:
Code Block |
---|
[root@cld-np-19 ~]# getenforce
Disabled |
Then do an update of the packages (probabily only puppet will be updated):
Code Block |
---|
[root@cld-np-19 ~]# yum clean all
[root@cld-np-19 ~]# yum update -y |
Configure the data network
Once the node has been reinstalled with AlmaLinux9, configure the data network
If in the compute node there is a dedicated interface (like cld-np-19 eno3) for data lan
Use the correct ip and interface name in the commands below changing the name and ip with yours (change "eno3" and "192.168.61.129")
Code Block |
---|
[root@cld-np-19 ~]# nmcli con add type ethernet ifname eno3 (change here the interface name, eno3, with your one)
[root@cld-np-19 ~]# nmcli con mod eno3 ipv4.method manual ipv4.addr "192.168.61.129/24" (change here the interface name, eno3 and use the corretct ip in data network)
[root@cld-np-19 ~]# nmcli con mod eno3 connection.autoconnect true
[root@cld-np-19 ~]# nmcli con mod eno3 802-3-ethernet.mtu 9000
[root@cld-np-19 ~]# nmcli con up eno3
[root@cld-np-19 ~]# ip link set eno3 mtu 9000 |
If the interfaces used the addresses with the tagged network (VLAN 302 andl VLAN 301 in same interface) use these commands.
Use the correct ip and interface name the commands below (change "enp2s0f0.302" and "192.168.61.129")
Code Block |
---|
[root@cld-nl-24 ~]# nmcli con add type vlan ifname enp2s0f0.302 dev enp2s0f0 id 302
[root@cld-nl-24 ~]# nmcli con mod vlan-enp2s0f0.302 ipv4.method manual ipv4.addr "192.168.61.129/24"
[root@cld-np-24 ~]# nmcli con up vlan-enp2s0f0.302
[root@cld-np-24 ~]# nmcli con mod vlan-enp2s0f0.302 802-3-ethernet.mtu 9000
[root@cld-np-24 ~]# ip link set enp2s0f0.302 mtu 9000 |
The address to be used must be the original one
MTU must be 9000
Reinstall OpenManage
If this is a DELL host, reinstall DELL OpenManage, as explained here
Check on Nagios that the relevant checks get green
Configure the node as Openstack compute node using puppet
Then configure the node as Openstack compute node using puppet
Stop puppet:
Code Block | ||
---|---|---|
| ||
systemctl stop puppet |
In foreman move the host under the ComputeNode-Prod_Yoga.el9 hostgroup
Run puppet manually:
Code Block | ||
---|---|---|
| ||
puppet agent -t
|
If the configuration fails reporting
Code Block | ||
---|---|---|
| ||
error: "net.bridge.bridge-nf-call-ip6tables" is an unknown key
error: "net.bridge.bridge-nf-call-iptables" is an unknown key
error: "net.bridge.bridge-nf-call-arptables" is an unknown key |
then please issue:
Code Block | ||
---|---|---|
| ||
modprobe br_netfilter |
and then rerun puppet
If the procedure terminated without error, enable puppet and reboot the host
Code Block | ||
---|---|---|
| ||
systemctl enable puppet
shutdown -r now |
Enable the nagios check for lv
Enable the nagios check for lv
Log on cld-nagios.cloud.pd.infn.it
Code Block |
---|
cd /etc/nagios/objects/ |
Edit cloudcomputenodes.cfg (or the other file where this compute node is defined) and add a passive check:
Code Block |
---|
define service{
use LVS ; Name of service template to use
host_name cld-np-19
service_description LVS
freshness_threshold 28800
} |
Make sure there are no problems in the configuration:
Code Block |
---|
[root@cld-nagios objects]# nagios -v /etc/nagios/nagios.cfg |
and restart nagios to have applied the new check:
Code Block |
---|
[root@cld-nagios objects]# systemctl restart nagios |
Some Checks
- Verify that the checks on Nagios gets green (only the "VM network" should stay in error)
- Verify that a ssh from root@cld-log to root@cld-np-19 and to root@cld-np-19.cloud.pd.infn.it works without requiring password
- Verify that a ssh from nagios@cld-nagios to nagios@cld-np-19 and to nagios@cld-np-19.cloud.pd.infn.it works without requiring password
Re-enable the compute node
Code Block | ||
---|---|---|
| ||
openstack compute service set --enable cld-np-19.cloud.pd.infn.it nova-compute
|
Within 24 hours, the "VM network" nagios check should get green
Run the aggregate_manage.sh script to add the node in the relevant hostgroups, e.g.:
Code Block | ||
---|---|---|
| ||
/usr/local/bin/aggregate_manage.sh add cld-np-19.cloud.pd.infn.it INFN |
Related articles
Content by Label | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...
hidden | true |
---|
...