Live migration for instances created on ephemeral storage


This procedure must be used only from/to hypervisor using LVM thin

Let's suppose that we want to migrate a VM with id dad2d364-3432-454d-a72d-910067e0187d  from compute-02 to compute-04

It's a good practice to check if there is sufficient space in the LVM before taking snapshots and, anyway, "fstrim" the LVM-thin that mange /var/lib/nova/instances before taking snapshots (see Data% and Meta% columns):

[root@compute-02 instances]# lvs
  LV      VG    Attr       LSize  Pool   Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lvmnova lvmvg Vwi-aotz-- <1.82t pool00        99.48
  pool00  lvmvg twi-aotz-- <1.82t               99.48  24.80

[root@compute-02 instances]# fstrim /var/lib/nova/instances/

[root@compute-02 instances]# lvs
  LV      VG    Attr       LSize  Pool   Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lvmnova lvmvg Vwi-aotz-- <1.82t pool00        0.16
  pool00  lvmvg twi-aotz-- <1.82t               0.16   11.04


First of all on the source hypervisor let's take a LV snapshot:


[root@compute-02 ~]# lvcreate -s --name lvmnova.dad2d364-3432-454d-a72d-910067e0187d lvmvg/lvmnova
  WARNING: Sum of all thin volume sizes (335.75 GiB) exceeds the size of thin pool lvmvg/pool00 and the size of whole volume group (167.88 GiB).
  WARNING: You have not turned on protection against thin pools running out of space.
  WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
  Logical volume "lvmnova.dad2d364-3432-454d-a72d-910067e0187d" created.
[root@compute-02 ~]# lvs
  LV                                           VG    Attr       LSize    Pool   Origin  Data%  Meta%  Move Log Cpy%Sync Convert
  lvmnova                                      lvmvg Vwi-aotz-- <167.88g pool00         6.11                                   
  lvmnova.dad2d364-3432-454d-a72d-910067e0187d lvmvg Vwi---tz-k <167.88g pool00 lvmnova                                        
  pool00                                       lvmvg twi-aotz-- <167.88g                6.14   11.04                           
[root@compute-02 ~]# 


Let's set the snapshot in read only mode:

[root@compute-02 ~]# lvchange -p r lvmvg/lvmnova.dad2d364-3432-454d-a72d-910067e0187d
  Logical volume lvmvg/lvmnova.dad2d364-3432-454d-a72d-910067e0187d changed.
[root@compute-02 ~]# 


Let's migrate the instance,using  the "openstack server migrate" command with the '–block-migration' option, after having sourced the admin openrc script, i.e.:

[root@controller-01 ~]# openstack server migrate --os-compute-api-version 2.30 --live --host compute-04.cloud.pd.infn.it --block-migration dad2d364-3432-454d-a72d-910067e0187d 
[root@controller-01 ~]# 




The command doesn't work if source and target hypervisor aren't compatible. The error would be something like:

Unacceptable CPU info: CPU doesn't have compatibility

In this case, please find another compatible hypervisor, or you can use cold migration (i.e. the VM will be powered off during the migration)


If the block live migration for some reason didn't work and you have a "corrupted VM"

[root@controller-01 ~]# nova stop dad2d364-3432-454d-a72d-910067e0187d 
[root@controller-01 ~]# 



## Activate snapshot

root@compute-02 ~]# lvchange -ay -K lvmvg/lvmnova.dad2d364-3432-454d-a72d-910067e0187d

## Mount snapshot

[root@compute-02 ~]# mkdir /mnt/snap
[root@compute-02 ~]# mount /dev/lvmvg/lvmnova.dad2d364-3432-454d-a72d-910067e0187d /mnt/snap -o ro,nouuid,norecovery 
   
[root@compute-02 ~]# ll /mnt/snap/dad2d364-3432-454d-a72d-910067e0187d/
console.log  disk         disk.info    


## Copy disk image to target hypervisor

[root@compute-02 ~]# scp /mnt/snap/dad2d364-3432-454d-a72d-910067e0187d/disk compute-04.cloud.pd.infn.it:/var/lib/nova/instances/dad2d364-3432-454d-a72d-910067e0187d
The authenticity of host 'compute-04.cloud.pd.infn.it (192.168.60.89)' can't be established.
ED25519 key fingerprint is SHA256:Fqq7jDeH98KxJ3J9QCFnBaWdiESy7gZ9+toHWU8i1jA.
This host key is known by the following other names/addresses:
    ~/.ssh/known_hosts:1: compute-04
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'compute-04.cloud.pd.infn.it' (ED25519) to the list of known hosts.
root@compute-04.cloud.pd.infn.it's password: 
disk  

[root@compute-02 ~]# umount /mnt/snap
[root@compute-02 ~]#  

## Deactivate snapshot

[root@compute-02 ~]# lvchange -an -K lvmvg/lvmnova.dad2d364-3432-454d-a72d-910067e0187d
[root@compute-02 ~]# lvscan 
  ACTIVE            '/dev/lvmvg/pool00' [<167.88 GiB] inherit
  ACTIVE            '/dev/lvmvg/lvmnova' [<167.88 GiB] inherit
  inactive          '/dev/lvmvg/lvmnova.dad2d364-3432-454d-a72d-910067e0187d' [<167.88 GiB] inherit
[root@compute-02 ~]# 
    
                                                                                                                                                                           




[root@controller-01 ~]# nova start dad2d364-3432-454d-a72d-910067e0187d
Request to start server dad2d364-3432-454d-a72d-910067e0187d has been accepted.
[root@controller-01 ~]# 



Cold migration for instances created on ephemeral storage

With cold migration instance will be rebooted

E.g. to migrated instance with UUID e1914906-7e33-4d30-88f0-785130fdd85d  to the compute node cld-np-14:

nova migrate e1914906-7e33-4d30-88f0-785130fdd85d --host cld-np-14.cloud.pd.infn.it

When the migration has been completed, the status of the server is "VERIFY_RESIZE". You will then need to confirm the operation:

nova resize-confirm e1914906-7e33-4d30-88f0-785130fdd85d

For instances booted from volumes


To migrate an instance from a compute node to another one, you can use the "openstack server migrate" command, after having sourced the admin openrc script. E.g.:

openstack server migrate --os-compute-api-version 2.30 --live --host cld-nl-19.cloud.pd.infn.it 298c94e2-e4ec-478c-8347-c77f7e1cc8df


If source and target hypervisor aren't compatible, you will get an error message. In this case find another compatible target hypervisor or use cold migration (see above).


Troubleshooting


If source and target compute nodes have different settings for SELINUX (e.g. 'disabled' vs 'permissive') live migration will fail. No errors will be reported when issuing the command and no errors will be reported in the output of "openstack server event list <uuid> command". In the nova log of the compute node you will see something like:

libvirtError: unsupported configuration: Unable to find security driver for model selinux