Skip to content

Control plane deadlock via Gatekeeper-Generated ValidatingAdmissionPolicies #4530

@shikanime

Description

@shikanime

What steps did you take and what happened:
I am using k0s with Gatekeeper integration with Gatekeeper Library including (kinda specifically those):

  • K8sPSPPrivilegedContainer
  • K8sPSPHostProcess
  • K8sPSPAllowPrivilegeEscalationContainer.

Source of my Gatekeeper configurations: https://github.com/shikanime/manifests/tree/6aaa5a792a96f68abb4ad9741af3af37b78242fa/configs/gatekeeper.

After a cluster restart, the control plane failed to bootstrap. The kube-apiserver became active but immediately began denying critical internal k0s and Kubernetes management requests. Specifically, it blocked the creation of EtcdMember objects and the update of Leases in kube-node-lease.

Because the VAP was configured to use a Gatekeeper Constraint as a paramKind (constraints.gatekeeper.sh/v1beta1), and those constraints were not yet fully initialized or reachable during the early boot phase, the API server "failed closed."

This created a circular dependency:

  1. k0s needs to create an EtcdMember to start the cluster database.
  2. The API server intercepts this request to check a VAP.
  3. The VAP fails to find its parameter definition (Constraint).
  4. The API server denies the EtcdMember creation.
  5. The cluster database never starts, keeping the cluster in a permanent "Connection Refused" death loop.

What did you expect to happen:

Perhaps we should add something specific to the documentation for distribution-specific installations. I don't remember if RKE2 or K3s does anything similar to managing the cluster state within the cluster.

Anything else you would like to add:

The logs show the API server explicitly forbidding the k0s supervisor from managing the cluster:

E0422 22:11:26.971959 1804 leaderelection.go:488] "Failed to update lease" err="leases.coordination.k8s.io \"k0s-endpoint-reconciler\" is forbidden: ValidatingAdmissionPolicy 'gatekeeper-k8spspprivilegedcontainer' denied request: failed to configure policy: failed to find resource referenced by paramKind: 'constraints.gatekeeper.sh/v1beta1, Kind=K8sPSPPrivilegedContainer'"
Error: failed to start cluster components: failed to create EtcdMember object for this controller: etcdmembers.etcd.k0sproject.io "nishir" is forbidden: ValidatingAdmissionPolicy 'gatekeeper-k8spspprivilegedcontainer' denied request...

Temporary mitigation:

sudo k0s kubectl delete validatingadmissionpolicy --all
sudo k0s kubectl delete validatingwebhookconfiguration -l gatekeeper.sh/system=yes
sudo k0s kubectl delete mutatingwebhookconfiguration -l gatekeeper.sh/system=yes

Environment:

  • Gatekeeper version: 3.22.0
  • Kubernetes version:
    • Client Version: v1.35.2
    • Kustomize Version: v5.7.1
    • Server Version: v1.35.2+k0s

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions