This commit is contained in:
100
Docs/Linux下载并安装GPU驱动.md
Normal file
100
Docs/Linux下载并安装GPU驱动.md
Normal file
@@ -0,0 +1,100 @@
|
||||
> 本文作者丁辉
|
||||
|
||||
# Linux下载并安装GPU驱动
|
||||
|
||||
[NVIDIA中文官方驱动下载页面](https://www.nvidia.cn/Download/index.aspx?lang=cn)
|
||||
|
||||
## GPU驱动下载
|
||||
|
||||
1. 查看显卡型号
|
||||
|
||||
```bash
|
||||
lspci | grep -i nvidia
|
||||
# 或
|
||||
lspci | grep -i vga
|
||||
```
|
||||
|
||||
结果
|
||||
|
||||
```bash
|
||||
[root@offends ~]# lspci | grep -i nvidia
|
||||
00:08.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
|
||||
```
|
||||
|
||||
2. 根据自己的显卡型号去下载驱动
|
||||
|
||||
- Product Type:Tesla
|
||||
- Product Series:T
|
||||
- Product:T4
|
||||
- Operating System:Linux 64-bit
|
||||
- CUDA Toolkit:Any
|
||||
- Language:Chinese (Traditional)
|
||||
|
||||
```bash
|
||||
wget https://cn.download.nvidia.com/tesla/440.95.01/NVIDIA-Linux-x86_64-440.95.01.run
|
||||
```
|
||||
|
||||
3. 部署
|
||||
|
||||
```bash
|
||||
bash NVIDIA-Linux-x86_64-*.run
|
||||
```
|
||||
|
||||
4. 测试效果
|
||||
|
||||
```bash
|
||||
nvidia-smi
|
||||
```
|
||||
|
||||
结果
|
||||
|
||||
```bash
|
||||
[root@offends ~]#
|
||||
Mon Oct 2 16:22:37 2023
|
||||
+-----------------------------------------------------------------------------+
|
||||
| NVIDIA-SMI 460.106.00 Driver Version: 460.106.00 CUDA Version: 11.2 |
|
||||
|-------------------------------+----------------------+----------------------+
|
||||
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
|
||||
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|
||||
| | | MIG M. |
|
||||
|===============================+======================+======================|
|
||||
| 0 Tesla T4 On | 00000000:00:08.0 Off | 0 |
|
||||
| N/A 30C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
|
||||
| | | N/A |
|
||||
+-------------------------------+----------------------+----------------------+
|
||||
|
||||
+-----------------------------------------------------------------------------+
|
||||
| Processes: |
|
||||
| GPU GI CI PID Type Process name GPU Memory |
|
||||
| ID ID Usage |
|
||||
|=============================================================================|
|
||||
| No running processes found |
|
||||
+-----------------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
## Linux 卸载 NVIDIA 驱动
|
||||
|
||||
- 有部署文件情况下
|
||||
|
||||
```bash
|
||||
bash NVIDIA-Linux-x86_64-*.run --uninstall
|
||||
```
|
||||
|
||||
- 没有原部署文件的情况下
|
||||
|
||||
```bash
|
||||
/usr/bin/nvidia-uninstall
|
||||
```
|
||||
|
||||
## 问题记录
|
||||
|
||||
- 经过多次卸载安重新装遇到报错
|
||||
|
||||
```bash
|
||||
ERROR: An NVIDIA kernel module 'nvidia' appears to already be loaded in your kernel. This may be because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen
|
||||
if your kernel was configured without support for module unloading. Please be sure to exit any programs that may be using the GPU(s) before attempting to upgrade your driver. If no GPU-based programs are running, you
|
||||
know that your kernel supports module unloading, and you still receive this message, then an error may have occured that has corrupted an NVIDIA kernel module's usage count, for which the simplest remedy is to reboot
|
||||
your computer.
|
||||
```
|
||||
|
||||
**解决方法直接 `reboot` 重启服务器解决成功率 99% 哈哈**
|
Reference in New Issue
Block a user