feat: move to llsd for ca bundle and run.yaml config #156

rhdedgar · 2025-09-03T20:43:51Z

With this PR, users no longer need to create separate ConfigMaps for CA bundle and run.yaml configuration data.

No longer relies on FieldSelectors / Field Indexers for Custom Resources, improving support for older versions of Kubernetes (<1.31) and OpenShift (<4.17)
Improved performance on clusters that have many ConfigMaps, especially on older clusters.

Closes #134

Closes #135

Closes RHAIENG-662

Signed-off-by: Doug Edgar <[email protected]>

rhdedgar · 2025-09-04T19:02:13Z

After rebasing from main and force-pushing the rebased branch, it now says that I requested a review from everyone.

Is that a bug or a feature?

derekhiggins · 2025-09-05T07:41:47Z

Before taking a indepth look I've noticed that this is a breaking change. Existing users will need to migrate their configurations from ConfigMap. Do we want to support both methods for a short deprecation period?

leseb

I understand the rationale for this change and the downside of watching the CM but this feels like a bit of a regression. Now users have to edit the CR with the rather tedious PEM content. Some clusters (like OpenShift) automatically generate the CA bundles in a ConfigMap, which makes it much more convenient to just reference them directly.

rhuss · 2025-09-08T05:54:22Z

api/v1alpha1/llamastackdistribution_types.go

-	// ConfigMapName is the name of the ConfigMap containing user configuration
-	ConfigMapName string `json:"configMapName"`
-	// ConfigMapNamespace is the namespace of the ConfigMap (defaults to the same namespace as the CR)
+	// CustomConfig contains arbitrary text data that represents a user-provided run.yamlconfiguration file


Suggested change

// CustomConfig contains arbitrary text data that represents a user-provided run.yamlconfiguration file

// CustomConfig contains arbitrary text data that represents a user-provided run.yaml configuration file

rhuss

@rhdedgar thanks for the PR! I think it goes in the right direction, but the original intention for the configuration override with a custom run.yaml was:

Add a subschema of run.yaml directly in the resource file. It should be reflected in the CRDs schema. It was not meant to include the run.yaml as a single opaque string, but a more typed approach that helps user to create the fields. This is also a means to protect users for fast schema changes in run.yaml as it currently still happens quite a lot. With keeping a sub-schema when can easily add an additional mapping layer for newer version of the run.yaml schema, so that we keep the operator config stable over a longer time.

For a full overwrite, I would still keep the configMap option, but maybe only locally in the same namespace and picking it up only during startup (no watch needed).

Just moving the full run.yaml makes it even harder to maintain (updating a string inlined yaml is even more error prone with the formatting, especially when it has been changed by some k8s mangling)

Not sure if this makes sense, I would love to hear some more opinions on this (and especially what subschema from run.yaml should we allow).

rhdedgar · 2025-09-09T18:52:33Z

I understand the rationale for this change and the downside of watching the CM but this feels like a bit of a regression. Now users have to edit the CR with the rather tedious PEM content. Some clusters (like OpenShift) automatically generate the CA bundles in a ConfigMap, which makes it much more convenient to just reference them directly.

Hi, in this case, the operator is still creating and managing an ephemeral ConfigMap to hold the run.yaml and CA Bundle data, so that ConfigMap can be annotated with service.beta.openshift.io/inject-cabundle: "true" to pull in the cluster certificates and combine the two sources via the init container.

I intended to keep that logic separate from this PR, and re-introduce it in the midstream repo to keep this upstream repo separate from RHOAI-specific or OpenShift specific features (as briefly discussed with @rhuss).

@rhdedgar thanks for the PR! I think it goes in the right direction, but the original intention for the configuration override with a custom run.yaml was:

Add a subschema of run.yaml directly in the resource file. It should be reflected in the CRDs schema. It was not meant to include the run.yaml as a single opaque string, but a more typed approach that helps user to create the fields. This is also a means to protect users for fast schema changes in run.yaml as it currently still happens quite a lot. With keeping a sub-schema when can easily add an additional mapping layer for newer version of the run.yaml schema, so that we keep the operator config stable over a longer time.

For a full overwrite, I would still keep the configMap option, but maybe only locally in the same namespace and picking it up only during startup (no watch needed).

Just moving the full run.yaml makes it even harder to maintain (updating a string inlined yaml is even more error prone with the formatting, especially when it has been changed by some k8s mangling)

Not sure if this makes sense, I would love to hear some more opinions on this (and especially what subschema from run.yaml should we allow).

Hi, there was some discussion in #117 around keeping the option to provide a custom UserConfig, which has an arbitrary schema not defined by the operator. I originally intended for this PR to not overlap with the sub-schema feature outlined in #117, and instead provide a new way of providing the run.yaml data that's currently provided by a ConfigMap. It can be removed if the sub-schema method is the only one that should be supported though.

I'm a little concerned about keeping the existing ConfigMap method, while only removing the watch for the ConfigMap. That would make it so that a user has to manually ensure the existing pod is deleted and re-created to pull in the new run.yaml data. I think it will get confusing if someone sees a config that doesn't match what the container is actually using, so maybe there's another way to avoid that path.

VaishnaviHire · 2025-09-09T20:58:01Z

Instead of introducing crd changes as part of this fix, should this PR include updates on Watches? I agree with @rhdedgar, run.yaml configmap migration to CRD should be covered under #117 .

It should be reflected in the CRDs schema. It was not meant to include the run.yaml as a single opaque string, but a more typed approach that helps user to create the fields.

@rhuss +1 on this.

Given #117, we will only have to watch only ca-bundle configmap eventually. We should keep the ca-bundle configmap reference so as to support auto-generated/ existing cert configmaps.

For this PR, instead of removing the watch completely, can we have opinionated watches ?

Watch for run.yaml configmap only in LlamaStackDistribution CR namespace?
For ca-bundle, since its common across CR instances, watch for ca-bundle configmap in operator namespace?

rhuss · 2025-09-10T15:14:50Z

Yeah, I think for the configuration I think we should allow references only to ConfigMaps in the same namespace (config). This should fix the performance issue without being too restrictive.

For the caBundle, I'm unsure how it works in a multi-tenant setup, where the operator runs completely outside any tenant and different tenants have different CAs they trust. (but then, I'm not deep in how the ca bundle is currently used, so there might be an easy solution, too)

VaishnaviHire · 2025-09-10T15:26:53Z

For the caBundle, I'm unsure how it works in a multi-tenant setup, where the operator runs completely outside any tenant and different tenants have different CAs they trust. (but then, I'm not deep in how the ca bundle is currently used, so there might be an easy solution, too)

We currently reference the ca-bundle configmap in our CR. For multi-tenancy, I think we can just watch the tenant namespace/s? Require users to have the ca-bundle configmap within those namespaces?

rhuss · 2025-09-10T15:28:25Z

@rhdedgar I fully agree that we should keep the same semantics about restarting, regardless of how the configuration is stored.

For the external, fully-overridable run.yaml case, wdyt think about the idea, that the ConfigMaps themselves should be immutable (you can set CMs with a flag immutable), and then require the user to make an update of the reference field in the LLSD pointing to the ConfigMap ? (probably somehting like customConfig: run-yaml-config-asdff1234, changing the suffix for every update). Something what kustomize does, too.

The benefit of this approach is that the reconciled Deployment resource differs from the original one (change in the configmap name), so going to restart the deployment. If you only change the configMap but not the DeploymentSpec, usually nothing happens. Of course, any ConfigMap mounted into a Pod reflects the new content, but the process within the Pod's container needs to detect that (via a file watch) and need to do a hot-reload (without restart). That would be the best solution anyways, but LLS is not capable of doing this.

Using immutable configMaps also the benefit of having any way for history and allows rollback, too.

tl;dr - I probably would favor such a solution:

Keep the ConfigMap reference and don't include a full run.yaml (that really is a bad and fragile user experience, mainly when run.yaml can get large. Additionally, you need to perform special escaping, such as $ -> $$, due to the special YAML semantics. It's really not trivial.
Encourage people who use an external configMap to use either kustomize to manage their LLSDs or recommend immutable configmaps to trigger restarts.
Focus on Proposal: Add run.yaml directly into LLSD as a sub-schema #117 in addition to allow a smoother and more stable user experience for simpler use cases, like adding additional models or tools.

rhdedgar · 2025-09-16T14:22:30Z

Closing this PR, as I was able to obtain a support exception for the older versions of OpenShift.

This was referenced Sep 3, 2025

(fix): Remove selectableFields to support older k8s version #154

Closed

docs: update using "starter" distro than "ollama" #96

Open

feat: move to llsd for ca bundle and run.yaml config

819b3fe

Signed-off-by: Doug Edgar <[email protected]>

rhdedgar force-pushed the configmaps_to_llsd_pr branch from 768599c to 819b3fe Compare September 4, 2025 18:54

rhdedgar requested review from cdoern, derekhiggins, leseb, mfleader, nathan-weinberg, rhuss and VaishnaviHire as code owners September 4, 2025 18:54

leseb reviewed Sep 5, 2025

View reviewed changes

rhuss reviewed Sep 8, 2025

View reviewed changes

rhdedgar closed this Sep 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: move to llsd for ca bundle and run.yaml config #156

feat: move to llsd for ca bundle and run.yaml config #156

Uh oh!

rhdedgar commented Sep 3, 2025 •

edited

Loading

Uh oh!

rhdedgar commented Sep 4, 2025

Uh oh!

derekhiggins commented Sep 5, 2025

Uh oh!

leseb left a comment •

edited

Loading

Uh oh!

rhuss Sep 8, 2025

Uh oh!

rhuss left a comment

Uh oh!

rhdedgar commented Sep 9, 2025

Uh oh!

VaishnaviHire commented Sep 9, 2025

Uh oh!

rhuss commented Sep 10, 2025

Uh oh!

VaishnaviHire commented Sep 10, 2025

Uh oh!

rhuss commented Sep 10, 2025

Uh oh!

rhdedgar commented Sep 16, 2025

Uh oh!

Uh oh!

	// CustomConfig contains arbitrary text data that represents a user-provided run.yamlconfiguration file
	// CustomConfig contains arbitrary text data that represents a user-provided run.yaml configuration file

feat: move to llsd for ca bundle and run.yaml config #156

feat: move to llsd for ca bundle and run.yaml config #156

Uh oh!

Conversation

rhdedgar commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhdedgar commented Sep 4, 2025

Uh oh!

derekhiggins commented Sep 5, 2025

Uh oh!

leseb left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rhuss Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

rhuss left a comment

Choose a reason for hiding this comment

Uh oh!

rhdedgar commented Sep 9, 2025

Uh oh!

VaishnaviHire commented Sep 9, 2025

Uh oh!

rhuss commented Sep 10, 2025

Uh oh!

VaishnaviHire commented Sep 10, 2025

Uh oh!

rhuss commented Sep 10, 2025

Uh oh!

rhdedgar commented Sep 16, 2025

Uh oh!

Uh oh!

rhdedgar commented Sep 3, 2025 •

edited

Loading

leseb left a comment •

edited

Loading