- be sure to blacklist the nouveau driver in one of the /etc/modprobe.d/*.conf files (in case create /etc/modprobe.d/gpu-blacklist.conf file with the two lines below)
blacklist nouveau options nouveau modeset=0
and then execute:
$ dracut --force; reboot
- look at the model of GPU(s) with lspci command, e.g.:
[root@cld-np-gpu-03 ~]# lspci -nn | grep NVI ca:00.0 3D controller [0302]: NVIDIA Corporation GA100GL [A30 PCIe] [10de:20b7] (rev a1) [root@cld-dfa-gpu-04 ~]# lspci -nn | grep NVI 17:00.0 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1) ca:00.0 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1)
in the first server there is one A30 GPU card, in the second one two A2 GPU cards.
Look at the driver available for the given GPU card at https://www.nvidia.com/Download/index.aspx?lang=en-us. Once selected the driver version for the proper OS, you can follow the instructions there and install the rpm. If something goes wrong with the rpm, try with the .run script as described in https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html#runfile, e.g.:
BASE_URL=https://us.download.nvidia.com/tesla DRIVER_VERSION=515.65.01 curl -fSsl -O $BASE_URL/$DRIVER_VERSION/NVIDIA-Linux-x86_64-$DRIVER_VERSION.run sh ./NVIDIA-Linux-x86_64-$DRIVER_VERSION.run