You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

  • be sure to blacklist the nouveau driver in one of the /etc/modprobe.d/*.conf files (in case create /etc/modprobe.d/gpu-blacklist.conf file with the two lines below)
blacklist nouveau
options nouveau modeset=0

and then execute:

$ dracut --force; reboot
  • look at the model of GPU(s) with lspci command, e.g.:
[root@cld-np-gpu-03 ~]# lspci -nn | grep NVI
ca:00.0 3D controller [0302]: NVIDIA Corporation GA100GL [A30 PCIe] [10de:20b7] (rev a1)

[root@cld-dfa-gpu-04 ~]# lspci -nn | grep NVI
17:00.0 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1)
ca:00.0 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1)

in the first server there is one A30 GPU card, in the second one two A2 GPU cards.

Look at the driver available for the given GPU card at https://www.nvidia.com/Download/index.aspx?lang=en-us. Once selected the driver version for the proper OS, you can follow the instructions there and install the rpm. If something goes wrong with the rpm, try with the .run script as described in https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html#runfile, e.g.:

# yum install gcc kernel-devel (if not already installed)
BASE_URL=https://us.download.nvidia.com/tesla
DRIVER_VERSION=515.65.01
curl -fSsl -O $BASE_URL/$DRIVER_VERSION/NVIDIA-Linux-x86_64-$DRIVER_VERSION.run
sh ./NVIDIA-Linux-x86_64-$DRIVER_VERSION.run

# if something goes wrong, try:
sh ./NVIDIA-Linux-x86_64-$DRIVER_VERSION.run --add-this-kernel
sh ./NVIDIA-Linux-x86_64-$DRIVER_VERSION-custom.run
  • check if things work (no reboot should be needed), e.g.:
[root@cld-dfa-gpu-04 ~]# nvidia-smi
Fri Sep  2 17:37:04 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A2           Off  | 00000000:17:00.0 Off |                    0 |
|  0%   33C    P0    19W /  60W |      0MiB / 15356MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A2           Off  | 00000000:CA:00.0 Off |                    0 |
|  0%   35C    P0    18W /  60W |      0MiB / 15356MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+





  • No labels