You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

This guide explains how to reinstall a compute node from Centos8 Stream-Yoga to AlmaLinux9-Yoga


In the following instructions the node cld-np-19, an INFN node, is the one to be reinstalled


First of all take note of the IP addresses used by the VM for the management and data networks (192.168.60.x and 192.168.61.x) ) and the relevant interfaces



[root@cld-np-19 ~]# nmcli 





eno1: connected to System eno1
        "Intel X710"
        ethernet (i40e), 34:E6:D7:F5:53:0D, hw, port e4434b3a8890, mtu 1500
        ip4 default
        inet4 192.168.60.129/24
        route4 192.168.60.0/24 metric 100
        route4 default via 192.168.60.254 metric 100
        inet6 fe80::36e6:d7ff:fef5:530d/64
        route6 fe80::/64 metric 1024

eno3: connected to eno3
        "Intel X710"
        ethernet (i40e), 34:E6:D7:F5:53:0F, hw, port e4434b3a8892, mtu 9000
        inet4 192.168.61.129/24
        route4 192.168.61.0/24 metric 101
        inet6 fe80::65:ec7c:7325:3bec/64
        route6 fe80::/64 metric 1024


...

...




Then disable the compute node, so that no new VMs will be instantiated on this compute node:


openstack compute service set --disable cld-np-19.cloud.pd.infn.it nova-compute

To check if the compute node was actually disabled:

openstack compute service list




Then finds all the virtual machines instantiated on this compute node:


openstack server list --all --host cld-np-19.cloud.pd.infn.it 



Then migrate each VM instantiated on this compute node to other hypervisors.

To avoid the migration of the same virtual machine multiple times, please migrate the VM on a compute node that was already migrated to AlmaLinux9

The migration of a virtual machine usually requires a downtime (~ 15') of that VM, unless the VM was created using a volume. So before migrating the VM you must agree this operation with the relevant owner.


E.g. to find the owner of a VM with UUID be832766-7996-4ccf-a9c1-5eec2e5f8fd3:



[root@cld-ctrl-01 ~]# openstack server show be832766-7996-4ccf-a9c1-5eec2e5f8fd3 | grep user_id
| user_id                             | 7681252c430040e7bb96c7c4f7c88464                            |
[root@cld-ctrl-01 ~]# openstack user show 7681252c430040e7bb96c7c4f7c88464
+---------------------+----------------------------------+
| Field               | Value                            |
+---------------------+----------------------------------+
| domain_id           | default                          |
| email               | Ysabella.Ong@lnl.infn.it         |
| enabled             | True                             |
| id                  | 7681252c430040e7bb96c7c4f7c88464 |
| name                | ysaong@infn.it                   |
| options             | {}                               |
| password_expires_at | None                             |
+---------------------+----------------------------------+




To migrate a virtual machine to another compute node, there are two possible procedures:



Once there are no more VMs on that compute node, reinstall the node with AlmaLinux9 using foreman


  • The hostgroup must be hosts_all
  • The kickstart to be used must be TBC
  • TBC


When the node restarts after the update, make sure that SELinux is disabled:


[root@cld-np-19 ~]# getenforce 
Disabled



Then do an update of the packages (probabily only puppet will be updated):



[root@cld-np-19 ~]#  yum clean all
[root@cld-np-19 ~]#  yum update -y






Once the node has been reinstalled with AlmaLinux9, configure the data network


TBC


The address to be used must be the original one

MTU must be 9000



If this is a DELL host, reinstall DELL OpenManage, as explained here


Check on Nagios that the relevant checks get green



Then configure the node as Openstack compute node using puppet



Stop puppet:

systemctl stop puppet

In foreman move the host under the ComputeNode-Prod_Yoga  hostgroup

Run puppet manually:

puppet agent -t

If the configuration fails reporting


error: "net.bridge.bridge-nf-call-ip6tables" is an unknown key  
error: "net.bridge.bridge-nf-call-iptables" is an unknown key  
error: "net.bridge.bridge-nf-call-arptables" is an unknown key


then please issue:


modprobe br_netfilter

and then rerun puppet


If the procedure terminated without error, enable puppet and reboot the host

systemctl enable puppet
shutdown -r now

Enable the nagios check for lv

Log on cld-nagios.cloud.pd.infn.it

 cd /etc/nagios/objects/

Edit cloudcomputenodes.cfg (or the other file where this compute node is defined) and add a passive check:

define service{
        use                             LVS       ; Name of service template to use
        host_name                       cld-np-19
        service_description             LVS
        freshness_threshold             28800
        }

Make sure there are no problems in the configuration:

[root@cld-nagios objects]# nagios -v /etc/nagios/nagios.cfg

and restart nagios to have applied the new check:

 [root@cld-nagios objects]# systemctl restart nagios
  • Enable the checks in Nagios for this host

  • Wait till all checks are ok (in particular the VM network and volume one: remember to enable the node just for the time to force the run from Nagios; othwerwise the check will fail.)

  • Add this node in the following scripts on cld-ctrl-01: /usr/local/bin/display_usage_of_hypervisors.sh, /usr/local/bin/host_aggregate.sh, /usr/local/bin/new_project.sh, /usr/local/bin/free_resources_compute_nodes.sh
  • Add this node in the /etc/cron.daily/vm-log.sh script on cld-log
  • Create the directory /var/disk-iscsi/qemu/cld-np-19 in cld-log
  • Verify that a ssh from root@cld-log to root@cld-np-19 works without requiring password
  • Stop puppet on the 2 controller nodes:

    systemctl stop puppet
  • Add the new compute node in cld-config:/var/puppet/puppet_yoga/controller_yoga/templates/aai_settings.py.erb

  • Run puppet on the first controller node (this will trigger a restart of httpd):

    puppet agent -t
  • Run puppet on the second controller node (this will trigger a restart of httpd):

    puppet agent -t
    
  • Start puppet on the two controller nodes:

    systemctl start puppet 
    
  • Enable the host:

    openstack compute service set --enable cld-np-19.cloud.pd.infn.it nova-compute
    
  • Add the host to the 'admin' aggregate


  • Run the aggregate_manage.sh script to add the node in the relevant hostgroups, e.g.:

    /usr/local/bin/aggregate_manage.sh add cld-np-19.cloud.pd.infn.it INFN



  • No labels