- be sure to blacklist the nouveau driver in one of the /etc/modprobe.d/*.conf files (in case create /etc/modprobe.d/gpu-blacklist.conf file with the two lines below)
blacklist nouveau
options nouveau modeset=0 |
and then execute:
- look at the model of GPU(s) with lspci command, e.g.:
[root@cld-np-gpu-03 ~]# lspci -nn | grep NVI
ca:00.0 3D controller [0302]: NVIDIA Corporation GA100GL [A30 PCIe] [10de:20b7] (rev a1)
[root@cld-dfa-gpu-04 ~]# lspci -nn | grep NVI
17:00.0 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1)
ca:00.0 3D controller [0302]: NVIDIA Corporation GA107GL [A2 / A16] [10de:25b6] (rev a1) |
in the first server there is one A30 GPU card, in the second one two A2 GPU cards.
Look at the driver available for the given GPU card at https://www.nvidia.com/Download/index.aspx?lang=en-us. Once selected the driver version for the proper OS, you can follow the instructions there and install the rpm. If something goes wrong with the rpm, try with the .run script as described in https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html#runfile, e.g.:
# yum install gcc kernel-devel (if not already installed)
BASE_URL=https://us.download.nvidia.com/tesla
DRIVER_VERSION=515.65.01
curl -fSsl -O $BASE_URL/$DRIVER_VERSION/NVIDIA-Linux-x86_64-$DRIVER_VERSION.run
sh ./NVIDIA-Linux-x86_64-$DRIVER_VERSION.run
# if something goes wrong, try:
sh ./NVIDIA-Linux-x86_64-$DRIVER_VERSION.run --add-this-kernel
sh ./NVIDIA-Linux-x86_64-$DRIVER_VERSION-custom.run |
- check if things work (no reboot should be needed), e.g.:
[root@cld-dfa-gpu-04 ~]# nvidia-smi
Fri Sep 2 17:37:04 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A2 Off | 00000000:17:00.0 Off | 0 |
| 0% 33C P0 19W / 60W | 0MiB / 15356MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A2 Off | 00000000:CA:00.0 Off | 0 |
| 0% 35C P0 18W / 60W | 0MiB / 15356MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+ |
# if needed, yum install make freeglut-devel libX11-devel libXi-devel libXmu-devel mesa-libGLU-devel freeimage-devel
CUDA_VERSION=11.7.1
DRIVER_VERSION=515.65.01
wget https://developer.download.nvidia.com/compute/cuda/$CUDA_VERSION/local_installers/cuda_${CUDA_VERSION}_${DRIVER_VERSION}_linux.run
sh cuda_${CUDA_VERSION}_${DRIVER_VERSION}_linux.run |