From 176c84b09a9b1e7ecf04189ae805d2f1abe660a7 Mon Sep 17 00:00:00 2001 From: Krzysztof Ostrowski Date: Fri, 31 Jan 2025 20:25:18 +0100 Subject: [PATCH] second iteration - first iteration --- .../pod-security-admission-enforcement.md | 136 +++++++++--------- 1 file changed, 72 insertions(+), 64 deletions(-) diff --git a/enhancements/authentication/pod-security-admission-enforcement.md b/enhancements/authentication/pod-security-admission-enforcement.md index e2356ec78c..3fc64ca3ba 100644 --- a/enhancements/authentication/pod-security-admission-enforcement.md +++ b/enhancements/authentication/pod-security-admission-enforcement.md @@ -25,34 +25,35 @@ superseded-by: [] ## Summary -This enhancement introduces a **new cluster-scoped API** and changes to involved controllers and to the `OpenShiftPodSecurityAdmission` `FeatureGate` to roll out [Pod Security Admission (PSA)](https://kubernetes.io/docs/concepts/security/pod-security-admission/) enforcement [in OpenShift](https://www.redhat.com/en/blog/pod-security-admission-in-openshift-4.11) gradually. -Enforcement means that the `PodSecurityAdmissionLabelSynchronizationController` is setting the `pod-security.kubernetes.io/enforce` label on Namespaces, and the PodSecurityAdmission plugin is enforcing the `Restricted` [Pod Security Standard (PSS)](https://kubernetes.io/docs/concepts/security/pod-security-standards/). -Gradually means that both changes happen in separate steps. +This enhancement introduces a **new cluster-scoped API**, changes to the relevant controllers and to the `OpenShiftPodSecurityAdmission` `FeatureGate` to gradually roll out [Pod Security Admission (PSA)](https://kubernetes.io/docs/concepts/security/pod-security-admission/) enforcement [in OpenShift](https://www.redhat.com/en/blog/pod-security-admission-in-openshift-4.11). +Enforcement means that the `PodSecurityAdmissionLabelSynchronizationController` sets the `pod-security.kubernetes.io/enforce` label on Namespaces, and the PodSecurityAdmission plugin enforces the `Restricted` [Pod Security Standard (PSS)](https://kubernetes.io/docs/concepts/security/pod-security-standards/). +By “gradually,” it means that these changes happen in separate steps. The new API offers users the option to manipulate the outcome by enforcing the `Privileged` or `Restricted` PSS directly. -The suggested default decision is `Conditional`, which will only progress if no potentially failing workloads are found. -The progression will start with the `PodSecurityAdmissionLabelSynchronizationController` labeling **Namespaces** for enforcement and finish with the **Global Configuration** of `PodSecurity` being set to `Restricted` by default. +The suggested default decision is `Conditional`, which only progresses if no potentially failing workloads are found. +The progression starts with the `PodSecurityAdmissionLabelSynchronizationController` labeling **Namespaces** for enforcement and finishes with the **Global Configuration** of `PodSecurity` being set to `Restricted` by default. This enhancement expands the ["PodSecurity admission in OpenShift"](https://github.com/openshift/enhancements/blob/61581dcd985130357d6e4b0e72b87ee35394bf6e/enhancements/authentication/pod-security-admission.md) and ["Pod Security Admission Autolabeling"](https://github.com/openshift/enhancements/blob/61581dcd985130357d6e4b0e72b87ee35394bf6e/enhancements/authentication/pod-security-admission-autolabeling.md) enhancements. ## Motivation -After introducing Pod Security Admission and the Autolabeling based on SCCs, clusters were identified that had Namespaces with Pod Security violations. -Over the last few releases, the number of clusters with failing workloads has been reduced significantly. -Although these numbers are now very low, we still must avoid any scenario where users end up with failing workloads. -To enable a smooth and safe transition, this proposal uses a gradual and conditional rollout based on the new API, which also provides an overview of which Namespaces could contain failing workloads. +After introducing Pod Security Admission and Autolabeling based on SCCs, some clusters were found to have Namespaces with Pod Security violations. +Over the last few releases, the number of clusters with failing workloads has dropped significantly. +Although these numbers are now quite low, it is essential to avoid any scenario where users end up with failing workloads. +To ensure a smooth and safe transition, this proposal uses a gradual, conditional rollout based on the new API. +This approach also provides an overview of which Namespaces could contain failing workloads. ### Goals -1. Start the process of rolling out PodSecurityAdmission enforcement. -1. Minimize the risk of breakage for existing workloads. -1. Enable users to remain in "privileged" mode for a couple of releases. +1. Start the process of rolling out Pod Security Admission enforcement. +2. Minimize the risk of breakage for existing workloads. +3. Allow users to remain in “privileged” mode for a couple of releases. ### Non-Goals 1. Enabling the PSA label-syncer to evaluate workloads with user-based SCC decisions. -1. Providing a detailed list of every Pod Security violation in a Namespace. -3. Enable the user to move from enforcing the Global config to just the PSA label syncer setting the enforce label. +2. Providing a detailed list of every Pod Security violation in a Namespace. +3. Allowing the user to move from enforcing the global config to relying solely on the PSA label syncer to set the `enforce` label. ## Proposal @@ -83,19 +84,23 @@ To allow a safer rollout of enforcement, the following steps are proposed: 2. **Global Config Enforce** Once all viable Namespaces are labeled successfully (or satisfy PSS `restricted`), the cluster will set the `PodSecurity` configuration for the kube-apiserver to `Restricted`, again only if there would be no failing workloads. -The feature flag `OpenShiftPodSecurityAdmission` being enabled is a pre-condition for this process to start. It will also serve as a break-glass option. If unexpected failures occur, the rollout will be reverted by removing the `FeatureGate` from the default `FeatureSet`. +The feature flag `OpenShiftPodSecurityAdmission` being enabled is a pre-condition for this process to start. +It will also serve as a break-glass option. +If unexpected failures occur, the rollout will be reverted by removing the `FeatureGate` from the default `FeatureSet`. #### Examples - **Category 1**: Namespaces with workloads that use user-bound SCCs (workloads created directly by a user) without meeting the `Restricted` PSS. -- **Category 2**: Namespaces that do not have the `pod-security.kubernetes.io/enforce` label and whose workloads would not satisfy the `Restricted` PSS. Possible cases include: +- **Category 2**: Namespaces that do not have the `pod-security.kubernetes.io/enforce` label and whose workloads would not satisfy the `Restricted` PSS. + + Possible cases include: 1. Namespaces with `security.openshift.io/scc.podSecurityLabelSync: "false"` and no `pod-security.kubernetes.io/enforce` label set. 2. `openshift-` prefixed Namespaces (not necessarily created or managed by OpenShift teams). ### User Control and Insights -To allow users influence over this gradual transition, a new API called `PSAEnforcementConfig` is introduced. It will let administrators: - +To allow user influence over this gradual transition, a new API called `PSAEnforcementConfig` is introduced. +It will let administrators: - Force `Restricted` enforcement, ignoring potential violations. - Remain in `Privileged` mode, regardless of whether violations exist or not. - Let the cluster evaluate the state and automatically enforce `Restricted` if no workloads would fail. @@ -104,8 +109,7 @@ To allow users influence over this gradual transition, a new API called `PSAEnfo ### Release Timing The gradual process will span three releases: - -- **Release `n-1`**: Introduce the new API, diagnostics for identifying violating Namespaces enable the PSA label syncer to remove enforce labels from release `n`. +- **Release `n-1`**: Introduce the new API, diagnostics for identifying violating Namespaces, and enable the PSA label syncer to remove enforce labels from release `n`. - **Release `n`**: Permit the `PodSecurityAdmissionLabelSynchronizationController` to set enforce labels if there are no workloads that would fail. - **Release `n+2`**: Enable the PodSecurity configuration to enforce `restricted` if there are no workloads that would fail. @@ -115,18 +119,18 @@ Here, `n` could be OCP `4.19`, assuming it is feasible to backport the API and d ### Improved Diagnostics -The [ClusterFleetEvaluation](https://github.com/openshift/enhancements/blob/61581dcd985130357d6e4b0e72b87ee35394bf6e/dev-guide/cluster-fleet-evaluation.md) revealed that some clusters would fail enforcement, and it was unclear why certain Namespaces in those clusters were failing. -The assumed root-cause is that [`PodSecurityAdmissionLabelSynchronizationController`](https://github.com/openshift/cluster-policy-controller/blob/master/pkg/psalabelsyncer/podsecurity_label_sync_controller.go) (PSA label syncer) does not label Namespaces that rely on user-based SCCs. -In some cases, it couldn't be evaluated at all as the PSA labels have been overwritten by the user. -To confirm this and identify all potential causes, we need improved diagnostics. +The [ClusterFleetEvaluation](https://github.com/openshift/enhancements/blob/61581dcd985130357d6e4b0e72b87ee35394bf6e/dev-guide/cluster-fleet-evaluation.md) revealed that certain clusters would fail enforcement without clear explanations. +A likely root cause is that the [`PodSecurityAdmissionLabelSynchronizationController`](https://github.com/openshift/cluster-policy-controller/blob/master/pkg/psalabelsyncer/podsecurity_label_sync_controller.go) (PSA label syncer) does not label Namespaces that rely on user-based SCCs. +In some cases, the evaluation was impossible because PSA labels had been overwritten by users. +Additional diagnostics are required to confirm the full set of potential causes. #### New SCC Annotation: `security.openshift.io/ValidatedSCCSubjectType` -Currently, the annotation `openshift.io/scc` indicates which SCC admitted a workload, but it does not distinguish **how** that SCC was granted—**via a user** or **via the Pod’s service account**. -A new annotation will help determine if a ServiceAccount with the required SCCs was used or if a user created the workload out of band. -Since the PSA label syncer does not track user-based SCCs, it cannot fully assess labeling in those cases. +The annotation `openshift.io/scc` currently indicates which SCC admitted a workload, but it does not distinguish **how** the SCC was granted—whether through a user or a Pod’s ServiceAccount. +A new annotation will help determine if a ServiceAccount with the required SCCs was used, or if a user created the workload out of band. +Because the PSA label syncer does not track user-based SCCs itself, it cannot fully assess labeling under those circumstances. -Based on this reasoning, the proposal introduces: +To address this, the proposal introduces: ```go // ValidatedSCCSubjectTypeAnnotation indicates the subject type that allowed the @@ -141,25 +145,26 @@ This annotation will be set by the [`SecurityContextConstraint` admission](https The PSA label syncer must set the `security.openshift.io/MinimallySufficientPodSecurityStandard` annotation. Because users can modify `pod-security.kubernetes.io/warn` and `pod-security.kubernetes.io/audit`, these labels do not reliably indicate the minimal standard. -The new annotation ensures a clear record of the minimal Pod Security Standard that would be enforced if the `pod-security.kubernetes.io/enforce` label would be set. +The new annotation ensures a clear record of the minimal PSS that would be enforced if the `pod-security.kubernetes.io/enforce` label were set. #### Update the PodSecurityReadinessController -By adding these annotations, the [`PodSecurityReadinessController`](https://github.com/openshift/cluster-kube-apiserver-operator/blob/master/pkg/operator/podsecurityreadinesscontroller/podsecurityreadinesscontroller.go) PodSecurityReadinessController can more accurately identify potentially failing Namespaces and understand their root causes: +By adding these annotations, the [`PodSecurityReadinessController`](https://github.com/openshift/cluster-kube-apiserver-operator/blob/master/pkg/operator/podsecurityreadinesscontroller/podsecurityreadinesscontroller.go) can more accurately identify potentially failing Namespaces and understand their root causes: -- With the `security.openshift.io/MinimallySufficientPodSecurityStandard` annotation, it can evaluate Namespaces that lack the `pod-security.kubernetes.io/enforce` label yet have user-overridden warn or audit labels. -- With `ValidatedSCCSubjectType`, the controller can classify issues arising from user-based SCC workloads separately. We strongly suspect that many of the remaining clusters with violations involve workloads admitted by user SCCs instead of ServiceAccounts. +- With the `security.openshift.io/MinimallySufficientPodSecurityStandard` annotation, it can evaluate Namespaces that lack the `pod-security.kubernetes.io/enforce` label but have user-overridden warn or audit labels. +- With `ValidatedSCCSubjectType`, the controller can classify issues arising from user-based SCC workloads separately. + Many of the remaining clusters with violations appear to involve workloads admitted by user SCCs. ### Secure Rollout -As noted in the Proposal, we will introduce these changes gradually: first enforcing Pod Security Admission at the Namespace level, then at the global (cluster-wide) level. -In addition to adjusting how the `OpenShiftPodSecurityAdmission` `FeatureGate` behaves, we must give administrators both visibility and control over this transition. -Therefore, introducing a new API becomes necessary. +The Proposal section indicates that enforcement will be introduced first at the Namespace level and later at the global (cluster-wide) level. +In addition to adjusting how the `OpenShiftPodSecurityAdmission` `FeatureGate` behaves, administrators need visibility and control throughout this transition. +A new API is necessary to provide this flexibility. #### New API -This API provides the flexibility to roll out Pod Security Admission enforcement to clusters in a gradual way. -It gives users the ability to control the process and see feedback about the cluster’s state regarding Pod Security violations. +This API offers a gradual way to roll out Pod Security Admission enforcement to clusters. +It gives users the ability to influence the rollout and see feedback on which Namespaces might violate Pod Security standards. ```go package v1alpha1 @@ -296,9 +301,9 @@ type ViolatingNamespace struct { When `Conditional` is selected, enforcement depends on whether there are violating Namespaces and on the current release. - In `n` and `n+1`: It only progresses from `Privileged` to `LabelEnforcement`, if there would be no PSA label syncer violations. -- In `n+1`: It only progresses from `LabelEnforcement` to`FullEnforcement`, if there would be no PodSecurity Config violations. +- In `n+1`: It only progresses from `LabelEnforcement` to`FullEnforcement`, if there would be no PodSecurity config violations. -Below is a table illustrating the expected behavior, that would occur if the `FeatureGate` `OpenShiftPodSecurityAdmission` is enabled: +Below is a table illustrating the expected behavior when the `FeatureGate` `OpenShiftPodSecurityAdmission` is enabled: | spec.targetMode | violations found | release | status.enforcementMode | | ----------------- | ---------------- | ------- | ---------------------- | @@ -321,24 +326,24 @@ Below is a table illustrating the expected behavior, that would occur if the `Fe | Conditional | none | n + 1 | FullEnforcement | | Conditional | found | n + 1 | Privileged | -Once a cluster with `spec.targetMode = Conditional` settles on a stricter enforcement, it can revert to `Privileged` only if the user explicitly sets `spec.targetMode = Privileged`. -If a cluster in `spec.mode = Conditional` mode begins with `status.EnforcmentMode = Privileged`, it may immediately flip to a more restricted `status.EnforcmentMode` when no violations remain. -To manage rollout timing, a user could set `spec.mode = Privileged` and then switch to `Conditional` back at a suitable time. +A cluster that uses `spec.targetMode = Conditional` can revert to `Privileged` only if the user explicitly sets `spec.targetMode = Privileged`. +A cluster in `spec.mode = Conditional` that starts with `status.EnforcementMode = Privileged` may switch to a more restrictive enforcement mode as soon as there are no violations. +To manage the timing of this rollout, an administrator can set `spec.mode = Privileged` and later switch it to Conditional when ready. -`status.violatingNamespaces` lists the Namespaces that would have failing workloads, if a `status.enforcementMode = LabelEnforcement` or `status.enforcementMode = FullEnforcement` is being enforced. -The reason gives the user the ability to identify which Namespaces are failing and in which which type of failure it is: based on the PSA label syncer or the PodSecurity configuration. -To identify specific workloads, an administrator must query the kube-apiserver or optionally use the [cluster debugging tool](https://github.com/openshift/cluster-debug-tools), which can pinpoint the problematic workloads. +`status.violatingNamespaces` lists the Namespaces that would fail if `status.enforcementMode` were `LabelEnforcement` or `FullEnforcement`. +The reason field helps identify whether the PSA label syncer or the PodSecurity config is the root cause. +Administrators must query the kube-apiserver (or use the [cluster debugging tool](https://github.com/openshift/cluster-debug-tools)) to pinpoint specific workloads. ### Implementation Details - The `PodSecurityReadinessController` in the `cluster-kube-apiserver-operator` will manage the new API. - If the `FeatureGate` is removed, the cluster must revert to its previous state. -- The [`Config Oberserver controller`](https://github.com/openshift/cluster-kube-apiserver-operator/blob/218530fdea4e89b93bc6e136d8b5d8c3beacdd51/pkg/operator/configobservation/configobservercontroller/observe_config_controller.go#L131) needs to be updated for the `PodSecurity` configuration behavior. +- The [`Config Observer controller`](https://github.com/openshift/cluster-kube-apiserver-operator/blob/218530fdea4e89b93bc6e136d8b5d8c3beacdd51/pkg/operator/configobservation/configobservercontroller/observe_config_controller.go#L131) must be updated for the `PodSecurity` configuration. #### PodSecurity Configuration -Currently, a Config Observer in the `cluster-kube-apiserver-operator` manages the Global Config for the kube-apiserver, adjusting behavior based on the feature flag. -It will need to watch both the `status.enforcementMode` and the `FeatureGate` to make its decisions. +A Config Observer in the `cluster-kube-apiserver-operator` manages the Global Config for the kube-apiserver, adjusting behavior based on the feature flag. +It must watch both the `status.enforcementMode` and the `FeatureGate` to make decisions. #### PSA Label Syncer @@ -346,8 +351,8 @@ The PSA label syncer will watch the `status.enforcementMode` and the `OpenShiftP If `status.enforcementMode` is `LabelEnforcement` or `FullEnforcement` and `OpenShiftPodSecurityAdmission` is enabled, the syncer will set the `pod-security.kubernetes.io/enforce` label. Otherwise, it will refrain from setting that label and remove any enforce labels it owns. -Because the ability to set `pod-security.kubernetes.io/enforce` will be introduced in release `n`, the ability to remove that label must exist in release `n-1`. -Otherwise, the cluster won't be able to revert to its previous state. +Because the ability to set `pod-security.kubernetes.io/enforce` is introduced in release `n`, the ability to remove that label must exist in release `n-1`. +Otherwise, the cluster will be unable to revert to its previous state. ## Open Questions @@ -358,20 +363,20 @@ It may be possible to introduce an intermediate step and set the cluster to `bas ### Fresh Installs -The System Administrator needs to pre-configure the new API’s `spec.targetMode`, choosing whether the cluster will be `privileged`, `restricted`, or `conditional` upon a fresh install. +The System Administrator needs to pre-configure the new API’s `spec.targetMode`, choosing whether the cluster will be `privileged`, `restricted`, or `conditional` during a fresh install. ## Test Plan The PSA label syncer currently maps SCCs to PSS through a hard-coded rule set, and the PSA version is set to `latest`. -This arrangement risks becoming outdated if the mapping logic changes over time upstream. +This setup risks becoming outdated if the mapping logic changes upstream. To protect user workloads, an end-to-end test should fail if the mapping logic no longer behaves as expected. Ideally, the PSA label syncer would use the `podsecurityadmission` package directly. -Otherwise it can't be guaranteed that all possible SCCs are mapped correctly. +Otherwise, it can't be guaranteed that all possible SCCs are mapped correctly. ## Graduation Criteria -- If `status.enforcementMode = LabelEnforcement` rolls out on most clusters without negative feedback, `status.enforcementMode = FullEnforcement` can be enforced in the next release. -- If the majority of users have `status.enforcementMode = FullEnforcement`, upgrades can be blocked on clusters that are still not in that state. +- If `status.enforcementMode = LabelEnforcement` rolls out on most clusters with no adverse effects, `status.enforcementMode = FullEnforcement` can be enabled in the subsequent release. +- If the majority of users have `status.enforcementMode = FullEnforcement`, then upgrades can be blocked on clusters that do not reach that state. ## Upgrade / Downgrade Strategy @@ -385,20 +390,23 @@ See the earlier references, including the [PSA Label Syncer](#psa-label-syncer) ## New Installation -Because a fresh install should not have any violating Namespaces, the cluster would move to `status.enforcementMode = LabelEnforcement` or `status.enforcementMode = FullEnforcement`, if `spec.mode = Privileged` is not set. -As described earlier, the default for new installs is `Conditional`, prompting users to adopt `Restricted`. +The default for new installs is `Conditional`, to prompt administrators toward adopting `Restricted`. + +A fresh install should not have any violating Namespaces. +Therefore, as `spec.targetMode` is not set to `Privileged`, the cluster would move to `status.enforcementMode = LabelEnforcement` or `status.enforcementMode = FullEnforcement`. An administrator can also configure the cluster to start in `Privileged` if desired. ## Operational Aspects -- If a cluster is set to `Conditional` and has violations initially, those violations might be resolved one-by-one. - Once all violations are resolved, the cluster may suddenly switch to `Restricted`, which some admins may prefer to manage manually. +- If a cluster is set to `Conditional` and has initial violations, those may be resolved one by one. + Once all violations are resolved, the cluster may immediately transition to `Restricted`. + Some administrators may prefer managing this switch manually. - After a cluster switches to a stricter `status`, no violating workloads should be possible. - If a violating workload appears, there is no automatic fallback to a more privileged state, preventing unnecessary kube-apiserver restarts. -- If users identify issues in a cluster that has already switched to a stricter enforcement than `Privileged`, they can proactively set `spec.mode = Privileged` to halt enforcement on other clusters. -- ClusterAdmins must ensure the proper `securityContext` is set for directly created workloads (user-based SCCs). - Modifying default workload templates may simplify this process. -- To determine specific problems in a violating Namespace, admins can query the kube-apiserver: + If a violating workload appears, there is no automatic fallback to a more privileged state, thus avoiding additional kube-apiserver restarts. +- Administrators facing issues in a cluster already set to a stricter enforcement can change `spec.targetMode` to `Privileged` to halt enforcement for other clusters. +- ClusterAdmins must ensure that directly created workloads (user-based SCCs) have correct `securityContext` settings. + Updating default workload templates can help. +- To identify specific problems in a violating Namespace, administrators can query the kube-apiserver: ```bash kubectl label --dry-run=server --overwrite $NAMESPACE --all \