Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
[root@cld-dfa-gpu-04 Samples]# nvidia-smi topo -m
        GPU0    GPU1    CPU Affinity    NUMA Affinity
GPU0     X      SYS     0,2,4,6,8,10    0
GPU1    SYS      X      1,3,5,7,9,11    1

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

the CPU affinity shown above is truncated, another way to show it is looking at the file below:

Code Block
[root@cld-dfa-gpu-04 ~]# cat /sys/bus/pci/devices/0000:17:00.0/local_cpulist
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110

[root@cld-dfa-gpu-04 ~]#  cat /sys/bus/pci/devices/0000:ca:00.0/local_cpulist
1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,85,87,89,91,93,95,97,99,101,103,105,107,109,111

where 0000:17:00.0 and 0000:ca:00.0 identify the GPUs through their Bus ID (found e.g. with "lspci | grep NVI" command).

  • to verify which GPU is attached to a VM and its associated NUMA node, execute from the hypervisor the following commands:

    Code Block
    [root@cld-dfa-gpu-04 ~]# virsh dumpxml instance-001f81d1
    ...
       <hostdev mode='subsystem' type='pci' managed='yes'>
          <driver name='vfio'/>
          <source>
            <address domain='0x0000' bus='0x17' slot='0x00' function='0x0'/>
          </source> 
    
    [root@cld-dfa-gpu-04 ~]# virsh nodedev-dumpxml pci_0000_17_00_0
    ...
        <product id='0x25b6'>GA107GL [A2 / A16]</product>
        <vendor id='0x10de'>NVIDIA Corporation</vendor>
        <capability type='virt_functions' maxCount='16'/>
        <iommuGroup number='20'>
          <address domain='0x0000' bus='0x17' slot='0x00' function='0x0'/>
        </iommuGroup>
        <numa node='0'/>

    i.e. look at the hostdev with the driver vfio to find the source PCI bus ID, and then look in more details at the pci_0000_17_00_0 device to find the GPU model and the NUMA node associated to it.

  • when we use a flavor created e.g. with 8 VCPUs and the properties 

    Code Block
    --property aggregate_instance_extra_specs:size='cld-dfa-gpu-04' --property hw:numa_nodes='1' --property pci_passthrough:alias='A2:1'


    we can verify that the CPU affinity correspond to the right NUMA node by issuing the command below. This will ensure that VCPUs pin the physical CPUs of the socket directly connected with the GPU, optimizing CPU-GPU communication.

    Code Block
    [root@cld-dfa-gpu-04 ~]# virsh vcpupin instance-002013e4 (VM with GPU on NUMA node 0)
     VCPU   CPU Affinity
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     0      0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110
     1      0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110
     2      0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110
     3      0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110
     4      0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110
     5      0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110
     6      0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110
     7      0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110
     
    [root@cld-dfa-gpu-04 ~]# virsh vcpupin instance-002013f0 (VM with GPU on NUMA node 1)
     VCPU   CPU Affinity
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     0      1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,85,87,89,91,93,95,97,99,101,103,105,107,109,111
     1      1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,85,87,89,91,93,95,97,99,101,103,105,107,109,111
     2      1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,85,87,89,91,93,95,97,99,101,103,105,107,109,111
     3      1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,85,87,89,91,93,95,97,99,101,103,105,107,109,111
     4      1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,85,87,89,91,93,95,97,99,101,103,105,107,109,111
     5      1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,85,87,89,91,93,95,97,99,101,103,105,107,109,111
     6      1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,85,87,89,91,93,95,97,99,101,103,105,107,109,111
     7      1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,85,87,89,91,93,95,97,99,101,103,105,107,109,111