blacklist nouveau options nouveau modeset=0 |
and then execute:
$ dracut --force; reboot |
[root@cld-np-gpu-03 ~]# lspci -nn | grep NVI ca:00.0 3D controller [0302]: NVIDIA Corporation GA100GL [A30 PCIe] [10de:20b7] (rev a1) [root@cld-dfa-gpu-04 ~]# lspci -nn | grep NVI 17:00.0 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1) ca:00.0 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1) |
in the first server there is one A30 GPU card, in the second one two A2 GPU cards.
Look at the driver available for the given GPU card at https://www.nvidia.com/Download/index.aspx?lang=en-us. Once selected the driver version for the proper OS, you can follow the instructions there and install the rpm. If something goes wrong with the rpm, try with the .run script as described in https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html#runfile, e.g.:
# yum install gcc kernel-devel (if not already installed) BASE_URL=https://us.download.nvidia.com/tesla DRIVER_VERSION=515.65.01 curl -fSsl -O $BASE_URL/$DRIVER_VERSION/NVIDIA-Linux-x86_64-$DRIVER_VERSION.run sh ./NVIDIA-Linux-x86_64-$DRIVER_VERSION.run # if something goes wrong, try: sh ./NVIDIA-Linux-x86_64-$DRIVER_VERSION.run --add-this-kernel sh ./NVIDIA-Linux-x86_64-$DRIVER_VERSION-custom.run |
[root@cld-dfa-gpu-04 ~]# nvidia-smi Fri Sep 2 17:37:04 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A2 Off | 00000000:17:00.0 Off | 0 | | 0% 33C P0 19W / 60W | 0MiB / 15356MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA A2 Off | 00000000:CA:00.0 Off | 0 | | 0% 35C P0 18W / 60W | 0MiB / 15356MiB | 1% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ |