AMD GPU device plugin for Kubernetes 是一个amd的k8s显卡插件。
环境说明:
+ ubuntu 22.04
+ k8s 1.23.9
安装amd的内核驱动
wget https://repo.radeon.com/amdgpu-install/22.20.3/ubuntu/jammy/amdgpu-install_22.20.50203-1_all.deb
sudo apt install ./amdgpu-install_22.20.50203-1_all.deb
sudo apt-get update
sudo amdgpu-install --usecase=dkms
# 安装完成后,必须重启一下
sudo reboot
k8s插件
kubectl create -f https://raw.githubusercontent.com/RadeonOpenCompute/k8s-device-plugin/master/k8s-ds-amdgpu-dp.yaml
工作负载 pod 中调用amd的gpu demo
resources.limits
中加入一个对应的标识就行了
apiVersion: v1
kind: Pod
metadata:
name: alexnet-tf-gpu-pod
labels:
purpose: demo-tf-amdgpu
spec:
containers:
- name: alexnet-tf-gpu-container
image: rocm/tensorflow:latest
workingDir: /root
env:
- name: HIP_VISIBLE_DEVICES
value: "0" # # 0,1,2,...,n for running on GPU and select the GPUs, -1 for running on CPU
command: ["/bin/bash", "-c", "--"]
args: ["python3 benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --model=alexnet; trap : TERM INT; sleep infinity & wait"]
resources:
limits:
amd.com/gpu: 1 # requesting a GPU