Setup K8s to use nvidia drivers
Setting up k8's to use nvidia
- Setting up k8's to use nvidia
- Prerequisites
- Quick Start
- Preparing your GPU Nodes
- Enabling GPU Support in Kubernetes
- Checks
- Sample yaml file
- References
- Destroy
Prerequisites
The list of prerequisites for running the NVIDIA device plugin is described below:- NVIDIA drivers ~= 410.48
- nvidia-docker version > 2.0 (see how to install and it's prerequisites)
- docker configured with nvidia as the default runtime.
- Kubernetes version >= 1.10kubeadm installation.
- Post installation use flannel as the network plugin
- $ sudo kubeadm init --pod-network-cidr=10.244.0.0/16 - $ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml - $ kubectl taint nodes --all node-role.kubernetes.io/master-
Quick Start
Preparing your GPU Nodes
The following steps need to be executed on all your GPU nodes. This README assumes that the NVIDIA drivers and nvidia-docker have been installed.Note that you need to install the nvidia-docker2 package and not the nvidia-container-toolkit. This is because the new
--gpus
options haven't reached Kubernetes yet. Example:# Add the package repositories
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt-get update && sudo apt-get install -y nvidia-docker2
$ sudo systemctl restart docker
/etc/docker/daemon.json
:{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
if runtimes
is not already present, head to the install page of nvidia-docker
Note
Don't forget to restart docker after changing daemon.json for your changes to reflect.systemctl restart docker
Enabling GPU Support in Kubernetes
Once you have enabled this option on all the GPU nodes you wish to use, you can then enable GPU support in your cluster by deploying the following Daemonset:$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta4/nvidia-device-plugin.yml
Checks
docker run --rm nvidia/k8s-device-plugin:1.0.0-beta nvidia-device-plugin
Should render output as follows(base) ubuntu@ip-172-31-42-125:~$ docker run --rm nvidia/k8s-device-plugin:1.0.0-beta nvidia-device-plugin
Unable to find image 'nvidia/k8s-device-plugin:1.0.0-beta' locally
1.0.0-beta: Pulling from nvidia/k8s-device-plugin
743f2d6c1f65: Pull complete
fcd797589536: Pull complete
Digest: sha256:f284efc70d5b4b4760cd7b60280e7e9370f64fca0b15f5e73d2742f4cfe7169f
Status: Downloaded newer image for nvidia/k8s-device-plugin:1.0.0-beta
2020/02/24 09:39:53 Loading NVML
2020/02/24 09:39:56 Fetching devices.
2020/02/24 09:39:56 Starting FS watcher.
2020/02/24 09:39:56 Failed to created FS watcher.
Now we are good to go to launch pod and access gpu's from pods.Sample yaml file
(base) ubuntu@ip-172-31-42-125:~$ cat test_gpu_pod.yml
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
restartPolicy: Never
containers:
- name: cuda-container
image: nvidia/cuda:10.0-base
command: ["sh", "-c", "tail -f /dev/null"]
resources:
limits:
nvidia.com/gpu: 1 # requesting 1 GPU
Go ahead and create pod.(base) ubuntu@ip-172-31-42-125:~$ kubectl create -f test_gpu_pod.yml
pod/gpu-pod created
(base) ubuntu@ip-172-31-42-125:~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
gpu-pod 1/1 Running 0 6s
(base) ubuntu@ip-172-31-42-125:~$ kubectl exec -it gpu-pod /bin/bash
root@gpu-pod:/# nvidia-smi
Mon Feb 24 12:27:10 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48 Driver Version: 410.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 33C P8 31W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
root@gpu-pod:/# cd
root@gpu-pod:~# exit
Destroy:
If you want to destroy the setup.sudo kubeadm reset
References:
- Source:
- Troubleshooting:
Comments
Post a Comment