GPU Kernel Manager is a software stack that aims to deploy, manage and monitor GPU Kernels in a Kubernetes cluster. It will use the utilities developed in MCU to accomplish these goals.
- go version v1.22.0+
- podman version 5.3.1+.
- kubectl version v1.11.3+.
- Access to a Kubernetes v1.11.3+ cluster.
To simulate a GPU in kind
, GKM is leveraging scripts in
kind-gpu-sim.
These scripts require podman
and some environment variables defined.
If kind
is not being used, then docker
can be used to build images.
Makefile
will use podman
if found, and fallback to docker
if not found
(but make run-on-kind
will fail).
To create a kind
cluster with a simulated GPU and latest GKM running:
export KIND_EXPERIMENTAL_PROVIDER=podman
export DOCKER_HOST=unix:///run/user/$UID/podman/podman.sock
make run-on-kind
Check the GKM installed pods:
$ kubectl get pods -n gkm-system
NAME READY STATUS RESTARTS AGE
gkm-agent-7ggr2 1/1 Running 0 74m
gkm-agent-mc9h6 1/1 Running 0 74m
gkm-controller-manager-c7b6f4f87-9zgns 3/3 Running 0 74m
gkm-csi-node-nd6qn 2/2 Running 0 74m
gkm-csi-node-tkkc8 2/2 Running 0 74m
gkm-test-pod 1/1 Running 0 64m
To delete a kind
cluster with a simulated GPU:
make destroy-kind
There is an example yaml that creates a GKMCache
custom resource (CR)
instance which points an OCI Image with GPU Kernel Cache. Example:
apiVersion: gkm.io/v1alpha1
kind: GKMCache
metadata:
name: flash-attention-rocm
spec:
image: quay.io/mtahhan/flash-attention-rocm:latest
The example yaml also includes a test pod that references the GKMCache
CR
instance. Example:
kind: Pod
apiVersion: v1
metadata:
name: gkm-test-pod
namespace: gkm-system
spec:
tolerations:
- key: gpu
operator: Equal
effect: NoSchedule
value: "true"
nodeSelector:
gkm-test-node: "true"
containers:
- name: alpine
:
volumeMounts:
- name: kernel-volume
mountPath: "/cache"
volumes:
- name: kernel-volume
csi:
driver: csi.gkm.io
volumeAttributes:
csi.gkm.io/GKMCache: flash-attention-rocm
Pod Spec Highlights:
- The
volumes:
namedkernel-volume
references the GKM CSI driver viadriver: csi.gkm.io
and references the GKM Cache CR viacsi.gkm.io/GKMCache: flash-attention-rocm
. - The
volumeMounts:
namedkernel-volume
maps the GPU Kernel Cache to the directory/cache
within the pod. - There is a Node Selector
gkm-test-node: "true"
. Themake run-on-kind
command adds this label to nodekind-gpu-sim-worker
. This is help monitor logs while applying the pod.
Because of the Node Selector, the test pod will be launched on node
kind-gpu-sim-worker
. Determine the CSI Plugin instant running on this node:
$ kubectl get pods -n gkm-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE
gkm-agent-7ggr2 1/1 Running 0 102m 10.244.1.6 kind-gpu-sim-worker
gkm-agent-mc9h6 1/1 Running 0 102m 10.244.2.3 kind-gpu-sim-worker2
gkm-controller-manager-c7b6f4f87-9zgns 3/3 Running 0 102m 10.244.0.5 kind-gpu-sim-control-plane
gkm-csi-node-nd6qn 2/2 Running 0 102m 10.89.0.67 kind-gpu-sim-worker2
gkm-csi-node-tkkc8 2/2 Running 0 102m 10.89.0.66 kind-gpu-sim-worker <-- HERE
Now the example yaml can be applied:
kubectl apply -f examples/flash-attention-rocm.yaml
The gkm-test-pod
should be running and the cache should be volume mounted in
the pod:
$ kubectl get pods -n gkm-system
NAME READY STATUS RESTARTS AGE
gkm-agent-7ggr2 1/1 Running 0 74m
gkm-agent-mc9h6 1/1 Running 0 74m
gkm-controller-manager-c7b6f4f87-9zgns 3/3 Running 0 74m
gkm-csi-node-nd6qn 2/2 Running 0 74m
gkm-csi-node-tkkc8 2/2 Running 0 74m
gkm-test-pod 1/1 Running 0 64m
kubectl exec -it -n gkm-system gkm-test-pod -- sh
sh-5.2# ls /cache/
c4d45c651d6ac181a78d8d2f3ead424b8b8f07dd23dc3de0a99f425d8a633fc6 c880dcbe2ffa9f4c96a3c5ce87fbf0b61a04ee4c46f96ee728d2d1efb65133f6 e0a7f37fbe7bb678faad9ffe683ba5d53d92645aefa5b62195bc2683b9971485
By default, Makefile
defaults to quay.io/gkm/*
for pushing and pulling.
For building private images and testing, set the environment variable
QUAY_USER
to override image repository.
Note: Make sure not to check-in
kustomization.yaml
files with overridden quay.io user account.
Start by building and pushing the GKM images, then start kind
cluster:
export QUAY_USER=<UserName>
make build-images
make push-images
make run-on-kind
Following are the steps to build the installer and distribute this project to users.
-
Build the installer for the image built and published in the registry:
make build-installer IMG=quay.io/gkm/operator:latest
Note: The makefile target mentioned above generates an 'install.yaml' file in the dist directory. This file contains all the resources built with Kustomize, which are necessary to install this project without its dependencies.
-
Using the installer
Users can just run kubectl apply -f to install the project, i.e.:
kubectl apply -f https://raw.githubusercontent.com/<org>/GPU-kernel-manager-operator/<tag or branch>/dist/install.yaml
// TODO(user): Add detailed information on how you would like others to contribute to this project
Note: Run
make help
for more information on all potentialmake
targets.
More information can be found via the Kubebuilder Documentation