Add disableWorkloadRBAC flag to skip per-workload RBAC creation#4030
Add disableWorkloadRBAC flag to skip per-workload RBAC creation#4030
Conversation
3fe7ae2 to
57e9006
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4030 +/- ##
==========================================
+ Coverage 68.64% 68.67% +0.03%
==========================================
Files 444 445 +1
Lines 45222 45392 +170
==========================================
+ Hits 31041 31175 +134
- Misses 11782 11814 +32
- Partials 2399 2403 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
57e9006 to
45382c3
Compare
Some K8s platform teams reject the operator because its ClusterRole includes roles/rolebindings permissions for dynamic RBAC creation. This adds an opt-in DISABLE_WORKLOAD_RBAC env var (exposed via operator.rbac.disableWorkloadRBAC Helm value) so the operator skips per-workload ServiceAccount, Role, and RoleBinding creation. When enabled: - All controller ensureRBACResources() methods return nil immediately - ClusterRole omits roles/rolebindings rules and SA write verbs - Registry API ClusterRole/ClusterRoleBinding are not rendered - Users must pre-create RBAC resources externally Default behavior (false) is unchanged. Co-Authored-By: Claude Opus 4.6 <[email protected]>
The ClusterRole is generated by controller-gen and CI verifies it matches. Helm conditionals cannot be used in generated files. The operator code guards are the enforcement mechanism — the ClusterRole permissions are a ceiling, not a guarantee. The registry-api ClusterRole/ClusterRoleBinding (hand-managed) retain their conditionals. Co-Authored-By: Claude Opus 4.6 <[email protected]>
65f0915 to
08bfe6d
Compare
| featureVMCP = "ENABLE_VMCP" | ||
| // disableWorkloadRBAC disables per-workload RBAC management (ServiceAccount, Role, RoleBinding). | ||
| // When enabled, the operator will not create RBAC resources for workloads, | ||
| // allowing them to be managed externally (e.g., via per-workload Helm charts). |
There was a problem hiding this comment.
I'm not sure if we want to create Helm Charts for this - Seems a bit overkill? It will only be 3 resources?
| @@ -1,3 +1,4 @@ | |||
| {{- if not .Values.operator.rbac.disableWorkloadRBAC }} | |||
There was a problem hiding this comment.
I don't actually think this is used anyways. This was added previously by RedHat but I don't think it's used.
| @@ -1,3 +1,4 @@ | |||
| {{- if not .Values.operator.rbac.disableWorkloadRBAC }} | |||
ChrisJBurns
left a comment
There was a problem hiding this comment.
So, I don't actually know how this disables the workload RBAC permissions? The Operator ClusterRole itself will still have these permissions? As they are hardcoded into the kubebuilder annotations for task operator-manifests right?
The generated ClusterRole (from controller-gen) cannot use Helm conditionals since CI verifies it matches the kubebuilder annotations. Move serviceaccounts, roles, and rolebindings permissions out of the kubebuilder annotations and into a hand-managed ClusterRole (toolhive-operator-workload-rbac-role) that is conditionally created based on the disableWorkloadRBAC Helm value. This ensures that when workload RBAC is disabled, the operator genuinely does not have permissions to create per-workload RBAC resources. Co-Authored-By: Claude Opus 4.6 <[email protected]>
fc259c3 to
7825661
Compare
| @@ -0,0 +1,35 @@ | |||
| {{- if not .Values.operator.rbac.disableWorkloadRBAC }} | |||
| --- | |||
| apiVersion: rbac.authorization.k8s.io/v1 | |||
There was a problem hiding this comment.
How is this auto generated? I'm not able to see the kubebuilder annotations
There was a problem hiding this comment.
this isn't. I had to do this manually because kubebuilder kept removing my templating.
Summary
DISABLE_WORKLOAD_RBACenv var andoperator.rbac.disableWorkloadRBACHelm value (default:false)roles/rolebindingspermissions and ServiceAccount write verbs from the operator's ClusterRoleMotivation
Some Kubernetes platform teams enforce strict policies on which cluster-scoped resources an operator's ClusterRole may reference. In environments managed by GitOps tools like ArgoCD, app-projects must whitelist every cluster-scoped resource the operator needs — including
rolesandrolebindingspermissions used for dynamic RBAC creation at runtime. This creates friction for adoption in security-conscious environments.By allowing the operator to opt out of per-workload RBAC management, platform teams can:
roles/rolebindingspermissions, eliminating the need to whitelist those resources in app-project definitionsget/list/watch) and loses all write permissions for RBAC resourcesThis is PR1 of a two-part effort. A follow-up PR will add per-workload Helm charts that bundle SA + Role + RoleBinding + CR for each workload type, providing a turnkey solution for externally-managed RBAC.
Changes
Go code:
DISABLE_WORKLOAD_RBACconstant and flag reading inmain.go, passed to all controller setup functionsDisableWorkloadRBAC boolfield added toMCPServerReconciler,MCPRemoteProxyReconciler, andVirtualMCPServerReconciler— each guardsensureRBACResources()with an early returnregistryapi.manageraccepts the flag and guards its ownensureRBACResources()NewMCPRegistryReconcilerforwards the flag to the registry API managerHelm charts:
operator.rbac.disableWorkloadRBACvalue invalues.yamlandvalues-openshift.yamlDISABLE_WORKLOAD_RBACenv var indeployment.yamlserviceaccountssplit into its own rule block with conditional write verbs;roles/rolebindingsblock wrapped in conditionalregistry-api-clusterrole.yamlandregistry-api-clusterrolebinding.yamlwrapped in conditionalTests:
ensureRBACResourcesreturns nil and creates no resources when disabledexternalRBAC-values.yaml) for Helm template lintingHow to use
When this flag is set, users must pre-create the following resources for each workload:
<name>-proxy-runnerServiceAccount/Role/RoleBinding +<name>-mcp-serverServiceAccount<name>-remote-proxy-runnerServiceAccount/Role/RoleBinding<name>-vmcpServiceAccount/Role/RoleBinding<name>-registry-apiServiceAccount/Role/RoleBindingDeployments will fail to schedule if the required ServiceAccounts are not present — this is a safe fail-closed behavior.
Test plan
task lint-fix— 0 issuestask test— all unit tests pass (exit 0)helm templatewithexternalRBAC-values.yaml— ClusterRole has no roles/rolebindings, SA is read-only, registry-api resources absenthelm templatewith defaults — output unchanged from maintask helm-docs— README regenerated🤖 Generated with Claude Code