Kubernetes/Helm/Helm部署NVIDIA-K8s-Device-Plugin.md

> 本文作者：丁辉

# Helm部署NVIDIA-K8s-Device-Plugin插件

## 介绍

**NVIDIA-K8s-Device-Plugin 是一个用于在 Kubernetes 环境中管理和配置 NVIDIA GPU 设备的插件**。这个插件允许集群中的容器应用与 GPU 进行通信和交互，从而能够利用 GPU 的强大计算能力来执行高性能计算任务。

## GPU容器化基础环境准备(必做)

[请查看此文档](https://gitee.com/offends/Kubernetes/blob/main/GPU/%E5%AE%B9%E5%99%A8%E4%BD%BF%E7%94%A8GPU.md)

## 开始部署

[Github仓库](https://github.com/NVIDIA/k8s-device-plugin)

1. 添加仓库

   ```bash
   helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
   helm repo update
   ```

2. GPU 节点添加标签

   ```bash
   kubectl label nodes ${node} nvidia.com/gpu.present=true
   ```

3. 部署插件

   ```bash
   helm install nvidia-device-plugin nvdp/nvidia-device-plugin \
     --namespace nvidia-device-plugin \
     --create-namespace
   ```

4. 检查 Node 是否已经识别到 NVIDIA

   ```bash
   kubectl describe node ${node} | grep nvidia
   ```


## 卸载

卸载 nvidia-device-plugin

```bash
helm uninstall nvidia-device-plugin -n nvidia-device-plugin
```

## 结果测试

1. 部署测试容器

   ```bash
   cat <<EOF | kubectl apply -f -
   apiVersion: v1
   kind: Pod
   metadata:
     name: gpu-pod
   spec:
     restartPolicy: Never
     containers:
       - name: cuda-container
         image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2
         resources:
           limits:
             nvidia.com/gpu: 1 # requesting 1 GPU
     tolerations:
     - key: nvidia.com/gpu
       operator: Exists
       effect: NoSchedule
   EOF
   ```

2. 检查日志

   ```bash
   kubectl logs gpu-pod
   ```

   > 日志如下即代表 Pod 已可以使用 GPU 资源
   >
   > ```bash
   > [Vector addition of 50000 elements]
   > Copy input data from the host memory to the CUDA device
   > CUDA kernel launch with 196 blocks of 256 threads
   > Copy output data from the CUDA device to the host memory
   > Test PASSED
   > Done
   > ```

3. 清理测试 Pod

   ```bash
   kubectl delete pod gpu-pod
   ```

# GPU 共享访问

[官方文档](https://github.com/NVIDIA/k8s-device-plugin?tab=readme-ov-file#shared-access-to-gpus)

NVIDIA 设备插件通过其配置文件中一组扩展选项允许 GPU 的超额分配。有两种可用的共享方式：时间切片和 MPS。

注意：时间切片和 MPS 的使用是互斥的。

- 在时间切片的情况下，CUDA 时间切片用于允许共享 GPU 的工作负载相互交错。然而，并未采取特殊措施来隔离从同一底层 GPU 获得副本的工作负载，每个工作负载都可以访问 GPU 内存，并在与其他所有工作负载相同的故障域中运行（这意味着如果一个工作负载崩溃，它们全部都会崩溃）。

- 在 MPS 的情况下，使用控制守护程序来管理对共享 GPU 的访问。与时间切片相反，MPS 进行空间分区，并允许内存和计算资源被显式地分区，并对每个工作负载强制执行这些限制。

## 使用 CUDA 时间切片

1. 创建配置文件

   ```yaml
   cat << EOF > /tmp/dp-config.yaml
   version: v1
   sharing:
     timeSlicing:
       resources:
       - name: nvidia.com/gpu
         replicas: 10
   EOF
   ```

   > 如果将此配置应用于具有 8 个 GPU 的节点，则该插件现在将向`nvidia.com/gpu`Kubernetes 通告 80 个资源，而不是 8 个。

2. 更新 NVIDIA-K8s-Device-Plugin插件

   ```bash
   helm install nvidia-device-plugin nvdp/nvidia-device-plugin \
     --namespace nvidia-device-plugin \
     --create-namespace \
     --set-file config.map.config=/tmp/dp-config.yaml
   ```


## 使用 CUDA MPS

> 目前在启用了 MIG 的设备上不支持使用 MPS 进行共享
>

1. 创建配置文件

   ```yaml
   cat << EOF > /tmp/dp-config.yaml
   version: v1
   sharing:
     mps:
       resources:
       - name: nvidia.com/gpu
         replicas: 10
   EOF
   ```

   > 如果将此配置应用于具有 8 个 GPU 的节点，则该插件现在将向`nvidia.com/gpu`Kubernetes 通告 80 个资源，而不是 8 个。每块卡会按照 10 分之一的资源来作为 `nvidia.com/gpu: 1` 受用。

2. 添加节点标签

   ```bash
   kubectl label nodes ${node} nvidia.com/mps.capable=true
   ```

3. 更新 NVIDIA-K8s-Device-Plugin插件

   ```bash
   helm install nvidia-device-plugin nvdp/nvidia-device-plugin \
     --namespace nvidia-device-plugin \
     --create-namespace \
     --set-file config.map.config=/tmp/dp-config.yaml
   ```