Skip to content

Update gpu chart, pull #29

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 5, 2025
Merged

Update gpu chart, pull #29

merged 10 commits into from
May 5, 2025

Conversation

byako
Copy link
Contributor

@byako byako commented Apr 14, 2025

This is mostly from intel/helm-charts#76 , but with a slight change @eero-t :

@rouke-broersma FYI, I changed the namespace to only use .Values.namespace, seems --namespaceHelm parameter is forcing Helm to ensure namespace exists before applying templates - this way it should be simpler - just one source of truth.
Doc is updated.

@byako byako requested a review from eero-t April 14, 2025 11:13
@rouke-broersma
Copy link
Contributor

rouke-broersma commented Apr 14, 2025

@byako in that case I think it makes more sense to only use .Release.Namespace since this is the official value for this purpose. .Release.Namespace is always filled. A separate namespace variable is then not needed. If not specified I believe the current namespace (from kubecontext) is used. When deploying from Argocd or Flux the target namespace is used. I myself never use namespaceOverride, I agree it's superfluous. .Release.Namespace is sufficient in all cases except when a resource needs to be deployed in a specific alternate namespace but this would of course be on a resource by resource case.

With helm if --create-namespace is skipped, helm expects the namespace to already exist. This is a fair assumption. The fact that we manage the namespace from the Chart should not matter from the helm perspective. We only manage the namespace to add the label, not to actually create it before starting deployment. This is how helm works so no one using helm will be surprised by this.

The chart creates a namespace with a security label that will not be created when using --create-namespace parameter

Yes but after the namespace is created helm will still apply the Namespace object as defined in the Chart, which will update the namespace with the correct label. So this it not a problem. On updates the namespace will already exist, so Helm will also not remove the label after that. We could add a helm weight to the namespace to make sure it's the first object that's applied during install, if that is a concern. However I don't think that's neccesary as the Chart is not that large, applying the namespace label will be near-instant. Before any pod has been scheduled. And if a pod does manage to schedule before the label is applied, on retry it will be allowed to deploy.

All in all using .Release.Namespace will allow normal Helm operations on the Chart including all assumptions and rules that already apply to Helm Charts.

@byako
Copy link
Contributor Author

byako commented Apr 14, 2025

@byako in that case I think it makes more sense to only use .Release.Namespace since this is the official value for this purpose. .Release.Namespace is always filled. A separate namespace variable is then not needed. If not specified I believe the current namespace is used. When deploying from Argocd or Flux the target namespace is used. I myself never use namespaceOverride, I agree it's superfluous. .Release.Namespace is sufficient in all cases except when a resource needs to be deployed in a specific alternate namespace but this would of course be on a resource by resource case.

The chart creates a namespace with a security label that will not be created when using --create-namespace parameter

Yes but after the namespace is created helm will still apply the Namespace object as defined in the Chart, which will update the namespace with the correct label. So this it not a problem. On updates the namespace will already exist, so Helm will also not remove the label after that.

I'm afraid that is not the case - when the namespace template is applied - it's applied not as a patch, but as a create:

$ helm list -A
NAME    NAMESPACE       REVISION        UPDATED STATUS  CHART   APP VERSION

$ kubectl get ns
NAME                     STATUS   AGE
default                  Active   3d16h
kube-node-lease          Active   3d16h
kube-public              Active   3d16h
kube-system              Active   3d16h
node-feature-discovery   Active   3d16h

$ helm install --namespace intel-gpu-resource-driver intel-gpu-resource-driver charts/intel-gpu-resource-driver/                                 
Error: INSTALLATION FAILED: create: failed to create: namespaces "intel-gpu-resource-driver" not found

$ helm install --namespace intel-gpu-resource-driver --create-namespace intel-gpu-resource-driver charts/intel-gpu-resource-driver/              
Error: INSTALLATION FAILED: namespaces "intel-gpu-resource-driver" already exists

Do you know if there is a way to make it applied as a patch - then it would work like you suggest on one hand, and we can prevent default being used as a value in .Release.Namespace through macro / helper.

@rouke-broersma
Copy link
Contributor

rouke-broersma commented Apr 14, 2025

That is.. Strange of Helm to do.. I must admit I never use Helm directly, only through other orchestration tools where this works as expected. How about if you do helm upgrade --install, does it then still complain?

If that doesn't work I think the best option is to not include the namespace in the Chart at all, and documenting that the namespace should be pre-created with the correct security level set if you use a cluster with pod security enabled. Fwiw I already do that for almost all Charts anyway, because most Charts do not include a Namespace object and I don't like letting Helm create an 'unmanaged' Namespace. A quick search suggests that Helm simply does not appreciate when a namespace is part of the Chart..

By the way just wanted to say great work on the DRA implementation :) Worked straight away with my integrated graphics cluster.

@byako
Copy link
Contributor Author

byako commented Apr 14, 2025

That is.. Strange of Helm to do.. I must admit I never use Helm directly, only through other orchestration tools where this works as expected. How about if you do helm upgrade --install, does it then still complain?

Yep

$ helm upgrade --install --namespace intel-gpu-resource-driver --create-namespace intel-gpu-resource-driver charts/intel-gpu-resource-driver/
Release "intel-gpu-resource-driver" does not exist. Installing it now.
Error: namespaces "intel-gpu-resource-driver" already exists

If that doesn't work I think the best option is to not include the namespace in the Chart at all, and documenting that the namespace should be pre-created with the correct security level set if you use a cluster with pod security enabled. Fwiw I already do that for almost all Charts anyway, because most Charts do not include a Namespace object and I don't like letting Helm create an 'unmanaged' Namespace. A quick search suggests that Helm simply does not appreciate when a namespace is part of the Chart..

I got that impression as well, but thought namespace template would make it easier for user to install the namespace labels since Helm doesn't support user-specified metadata for freshly created release namespace.. let's get rid of the namespace template then.

By the way just wanted to say great work on the DRA implementation :) Worked straight away with my integrated graphics cluster.

Great to hear, thanks for the feedback!

Copy link

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, looks fine to me.

(I forgot about this until you mentioned it in another PR...)

@byako byako merged commit f45d451 into main May 5, 2025
@rouke-broersma
Copy link
Contributor

🚀 🚀 🚀 When can this be released? 😁

@byako
Copy link
Contributor Author

byako commented May 6, 2025

Soon, I'll need to change the artifact name to have -chart suffix, and update the doc, so we could publish both the Helm chart and the container image into the same ghcr.io registry in addition to Docker Hub. WIP.

@byako
Copy link
Contributor Author

byako commented May 9, 2025

@rouke-broersma FYI, this was published couple of days back - v0.7.1, with the old name for now.

@rouke-broersma
Copy link
Contributor

rouke-broersma commented May 9, 2025

Update works perfectly, thanks!

> k get resourceslice      
NAME                                                    NODE                 DRIVER             POOL                 AGE
talos-worker-1-gpu.intel.com-rt8zm     talos-worker-1   gpu.intel.com   talos-worker-1   46s
talos-worker-2-gpu.intel.com-47r6b     talos-worker-2   gpu.intel.com   talos-worker-2   46s
talos-worker-3-gpu.intel.com-5lvcn      talos-worker-3   gpu.intel.com   talos-worker-3   46s
talos-worker-4-gpu.intel.com-g58mz   talos-worker-4   gpu.intel.com   talos-worker-4   46s
talos-worker-5-gpu.intel.com-k2qzk     talos-worker-5   gpu.intel.com   talos-worker-5   46s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants