Skip to content

Add disableWorkloadRBAC flag to skip per-workload RBAC creation#4030

Open
JAORMX wants to merge 3 commits intomainfrom
feat/disable-workload-rbac
Open

Add disableWorkloadRBAC flag to skip per-workload RBAC creation#4030
JAORMX wants to merge 3 commits intomainfrom
feat/disable-workload-rbac

Conversation

@JAORMX
Copy link
Collaborator

@JAORMX JAORMX commented Mar 6, 2026

Summary

  • Adds DISABLE_WORKLOAD_RBAC env var and operator.rbac.disableWorkloadRBAC Helm value (default: false)
  • When enabled, the operator skips creating per-workload ServiceAccount, Role, and RoleBinding resources
  • Conditionally removes roles/rolebindings permissions and ServiceAccount write verbs from the operator's ClusterRole
  • Conditionally excludes the registry-api ClusterRole and ClusterRoleBinding from the chart

Motivation

Some Kubernetes platform teams enforce strict policies on which cluster-scoped resources an operator's ClusterRole may reference. In environments managed by GitOps tools like ArgoCD, app-projects must whitelist every cluster-scoped resource the operator needs — including roles and rolebindings permissions used for dynamic RBAC creation at runtime. This creates friction for adoption in security-conscious environments.

By allowing the operator to opt out of per-workload RBAC management, platform teams can:

  • Pre-provision RBAC via GitOps: Ship ServiceAccount, Role, and RoleBinding alongside each workload CR in dedicated per-workload Helm charts or Kustomize overlays, keeping RBAC under full version control and audit
  • Satisfy ArgoCD app-project constraints: The operator's ClusterRole no longer needs roles/rolebindings permissions, eliminating the need to whitelist those resources in app-project definitions
  • Enforce least-privilege at the operator level: The operator only retains read access to ServiceAccounts (get/list/watch) and loses all write permissions for RBAC resources
  • Support policy engines: Works cleanly with OPA/Gatekeeper or Kyverno policies that restrict which controllers may create RBAC resources

This is PR1 of a two-part effort. A follow-up PR will add per-workload Helm charts that bundle SA + Role + RoleBinding + CR for each workload type, providing a turnkey solution for externally-managed RBAC.

Changes

Go code:

  • New DISABLE_WORKLOAD_RBAC constant and flag reading in main.go, passed to all controller setup functions
  • DisableWorkloadRBAC bool field added to MCPServerReconciler, MCPRemoteProxyReconciler, and VirtualMCPServerReconciler — each guards ensureRBACResources() with an early return
  • registryapi.manager accepts the flag and guards its own ensureRBACResources()
  • NewMCPRegistryReconciler forwards the flag to the registry API manager

Helm charts:

  • operator.rbac.disableWorkloadRBAC value in values.yaml and values-openshift.yaml
  • DISABLE_WORKLOAD_RBAC env var in deployment.yaml
  • ClusterRole: serviceaccounts split into its own rule block with conditional write verbs; roles/rolebindings block wrapped in conditional
  • registry-api-clusterrole.yaml and registry-api-clusterrolebinding.yaml wrapped in conditional

Tests:

  • Unit test per controller verifying ensureRBACResources returns nil and creates no resources when disabled
  • CI values file (externalRBAC-values.yaml) for Helm template linting

How to use

# values.yaml
operator:
  rbac:
    disableWorkloadRBAC: true

When this flag is set, users must pre-create the following resources for each workload:

  • MCPServer: <name>-proxy-runner ServiceAccount/Role/RoleBinding + <name>-mcp-server ServiceAccount
  • MCPRemoteProxy: <name>-remote-proxy-runner ServiceAccount/Role/RoleBinding
  • VirtualMCPServer: <name>-vmcp ServiceAccount/Role/RoleBinding
  • MCPRegistry: <name>-registry-api ServiceAccount/Role/RoleBinding

Deployments will fail to schedule if the required ServiceAccounts are not present — this is a safe fail-closed behavior.

Test plan

  • task lint-fix — 0 issues
  • task test — all unit tests pass (exit 0)
  • helm template with externalRBAC-values.yaml — ClusterRole has no roles/rolebindings, SA is read-only, registry-api resources absent
  • helm template with defaults — output unchanged from main
  • task helm-docs — README regenerated

🤖 Generated with Claude Code

@github-actions github-actions bot added the size/S Small PR: 100-299 lines changed label Mar 6, 2026
@JAORMX JAORMX force-pushed the feat/disable-workload-rbac branch from 3fe7ae2 to 57e9006 Compare March 6, 2026 11:38
@github-actions github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 6, 2026
@codecov
Copy link

codecov bot commented Mar 6, 2026

Codecov Report

❌ Patch coverage is 27.90698% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.67%. Comparing base (9089ac7) to head (7825661).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
cmd/thv-operator/main.go 0.00% 29 Missing ⚠️
...thv-operator/controllers/mcpregistry_controller.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4030      +/-   ##
==========================================
+ Coverage   68.64%   68.67%   +0.03%     
==========================================
  Files         444      445       +1     
  Lines       45222    45392     +170     
==========================================
+ Hits        31041    31175     +134     
- Misses      11782    11814      +32     
- Partials     2399     2403       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JAORMX JAORMX force-pushed the feat/disable-workload-rbac branch from 57e9006 to 45382c3 Compare March 6, 2026 11:52
@github-actions github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 6, 2026
JAORMX and others added 2 commits March 6, 2026 15:10
Some K8s platform teams reject the operator because its ClusterRole
includes roles/rolebindings permissions for dynamic RBAC creation.
This adds an opt-in DISABLE_WORKLOAD_RBAC env var (exposed via
operator.rbac.disableWorkloadRBAC Helm value) so the operator skips
per-workload ServiceAccount, Role, and RoleBinding creation.

When enabled:
- All controller ensureRBACResources() methods return nil immediately
- ClusterRole omits roles/rolebindings rules and SA write verbs
- Registry API ClusterRole/ClusterRoleBinding are not rendered
- Users must pre-create RBAC resources externally

Default behavior (false) is unchanged.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The ClusterRole is generated by controller-gen and CI verifies it
matches. Helm conditionals cannot be used in generated files.

The operator code guards are the enforcement mechanism — the
ClusterRole permissions are a ceiling, not a guarantee. The registry-api
ClusterRole/ClusterRoleBinding (hand-managed) retain their conditionals.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@JAORMX JAORMX force-pushed the feat/disable-workload-rbac branch from 65f0915 to 08bfe6d Compare March 6, 2026 13:11
@github-actions github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 6, 2026
yrobla
yrobla previously approved these changes Mar 6, 2026
featureVMCP = "ENABLE_VMCP"
// disableWorkloadRBAC disables per-workload RBAC management (ServiceAccount, Role, RoleBinding).
// When enabled, the operator will not create RBAC resources for workloads,
// allowing them to be managed externally (e.g., via per-workload Helm charts).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we want to create Helm Charts for this - Seems a bit overkill? It will only be 3 resources?

@@ -1,3 +1,4 @@
{{- if not .Values.operator.rbac.disableWorkloadRBAC }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually think this is used anyways. This was added previously by RedHat but I don't think it's used.

@@ -1,3 +1,4 @@
{{- if not .Values.operator.rbac.disableWorkloadRBAC }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Collaborator

@ChrisJBurns ChrisJBurns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I don't actually know how this disables the workload RBAC permissions? The Operator ClusterRole itself will still have these permissions? As they are hardcoded into the kubebuilder annotations for task operator-manifests right?

The generated ClusterRole (from controller-gen) cannot use Helm
conditionals since CI verifies it matches the kubebuilder annotations.

Move serviceaccounts, roles, and rolebindings permissions out of the
kubebuilder annotations and into a hand-managed ClusterRole
(toolhive-operator-workload-rbac-role) that is conditionally created
based on the disableWorkloadRBAC Helm value. This ensures that when
workload RBAC is disabled, the operator genuinely does not have
permissions to create per-workload RBAC resources.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@JAORMX JAORMX force-pushed the feat/disable-workload-rbac branch from fc259c3 to 7825661 Compare March 9, 2026 12:18
@github-actions github-actions bot added size/M Medium PR: 300-599 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 9, 2026
@@ -0,0 +1,35 @@
{{- if not .Values.operator.rbac.disableWorkloadRBAC }}
---
apiVersion: rbac.authorization.k8s.io/v1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this auto generated? I'm not able to see the kubebuilder annotations

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't. I had to do this manually because kubebuilder kept removing my templating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Medium PR: 300-599 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants