Skip to content

Conversation

@fedepaol
Copy link
Member

  • Adds status cleaner job that acts as webhook backend plus takes care of removing frrstatus instances related to those nodes where an frrk8s instance is not running anymore
  • Adds readOnlyFileSystem=true and allowPrivilegeEscalation=false to the frrk8s controller container

@fedepaol
Copy link
Member Author

/testwith openshift/frr#107

@coderabbitai
Copy link

coderabbitai bot commented Nov 10, 2025

Walkthrough

Added two emptyDir volumes and mounts to the FRR DaemonSet and hardened the frr and controller containers. Introduced a new Deployment frr-k8s-statuscleaner (runs /statuscleaner, TLS from secret, probes, hostNetwork). Removed the webhook Deployment and repointed the webhook Service to the statuscleaner.

Changes

Cohort / File(s) Summary
FRR DaemonSet: volumes & security
bindata/network/frr-k8s/frr-k8s.yaml
Added two emptyDir volumes (frr-lib, frr-tmp) and mounted them into the frr and controller containers at /var/lib/frr and /var/tmp/frr. Set allowPrivilegeEscalation: false and readOnlyRootFilesystem: true on both containers.
New status cleaner Deployment
bindata/network/frr-k8s/node-status-cleaner.yaml
Added Deployment frr-k8s-statuscleaner in openshift-frr-k8s running /statuscleaner with args including --disable-cert-rotation=true, --namespace (from metadata via env), --webhook-port=9123, and --frrk8s-selector=component=frr-k8s. Uses image {{.FRRK8sImage}}, sets NAMESPACE via downward API, mounts secret frr-k8s-webhook-server-cert at /tmp/k8s-webhook-server/serving-certs, exposes port 9123, includes HTTPS liveness/readiness probes at /healthz, sets hostNetwork: true, adds tolerations for control-plane/master, priorityClassName: system-cluster-critical, serviceAccountName: frr-k8s-daemon, and terminationGracePeriodSeconds: 10.
Webhook service & Deployment removal
bindata/network/frr-k8s/webhook.yaml
Removed apps/v1 Deployment frr-k8s-webhook-server. Updated Service frr-k8s-webhook-service selector from component: frr-k8s-webhook-server to component: frr-k8s-statuscleaner.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Verify the Service selector update and removed Deployment do not leave dangling references (RBAC, ServiceAccount, ClusterRoleBindings).
  • Confirm secret mount path and HTTPS liveness/readiness probe settings (port 9123, cert files) match the statuscleaner expectations.
  • Ensure FRR and controller containers operate correctly with readOnlyRootFilesystem: true and that the new emptyDir mounts provide required writable paths.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 62f6e7f and fdff55f.

📒 Files selected for processing (3)
  • bindata/network/frr-k8s/frr-k8s.yaml (3 hunks)
  • bindata/network/frr-k8s/node-status-cleaner.yaml (1 hunks)
  • bindata/network/frr-k8s/webhook.yaml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • bindata/network/frr-k8s/frr-k8s.yaml
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • bindata/network/frr-k8s/webhook.yaml
  • bindata/network/frr-k8s/node-status-cleaner.yaml
🔇 Additional comments (1)
bindata/network/frr-k8s/webhook.yaml (1)

14-14: Service selector properly aligned to new statuscleaner component.

The selector update routes the webhook service to the new frr-k8s-statuscleaner Deployment (which handles webhook interception), replacing the removed frr-k8s-webhook-server. The webhook configuration and cert secret remain consistent.


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 10, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fedepaol

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 10, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 05d6f46 and 292eb0c.

📒 Files selected for processing (3)
  • bindata/network/frr-k8s/frr-k8s.yaml (3 hunks)
  • bindata/network/frr-k8s/node-status-cleaner.yaml (1 hunks)
  • bindata/network/frr-k8s/webhook.yaml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • bindata/network/frr-k8s/node-status-cleaner.yaml
  • bindata/network/frr-k8s/webhook.yaml
  • bindata/network/frr-k8s/frr-k8s.yaml
🪛 YAMLlint (1.37.1)
bindata/network/frr-k8s/frr-k8s.yaml

[error] 179-179: syntax error: expected , but found '-'

(syntax)

🔇 Additional comments (4)
bindata/network/frr-k8s/frr-k8s.yaml (2)

162-163: Verify readOnlyRootFilesystem=true compatibility with FRR runtime behavior.

The frr container now has readOnlyRootFilesystem: true added alongside the new emptyDir volumes for /var/lib/frr and /var/tmp/frr. Confirm that FRR daemon can successfully initialize and write all necessary state/logs to these temporary volumes and doesn't require write access to other root filesystem paths.

If the frr daemon requires write access to paths outside the mounted emptyDirs (e.g., /etc/frr), the readOnlyRootFilesystem setting may need adjustment or additional volume mounts.


48-51: Security hardening additions approved for frr container.

The addition of allowPrivilegeEscalation: false and readOnlyRootFilesystem: true, combined with emptyDir volumes for scratch space at /var/lib/frr and /var/tmp/frr, strengthens the security posture of the FRR daemon container. This aligns well with the PR's security hardening objectives.

Also applies to: 162-163

bindata/network/frr-k8s/webhook.yaml (1)

14-14: Service selector correctly routes to status cleaner deployment.

The selector change from frr-k8s-webhook-server to frr-k8s-statuscleaner aligns with the architectural refactoring to consolidate webhook serving into the status cleaner component. Ensure that the status cleaner deployment in node-status-cleaner.yaml includes the label component: frr-k8s-statuscleaner in its pod template to match this selector.

bindata/network/frr-k8s/node-status-cleaner.yaml (1)

1-82: Webhook configuration and security posture review.

The status cleaner deployment is appropriately configured for webhook serving:

  • HTTPS probes at /healthz with proper initial delays and timeouts (lines 48-62)
  • TLS certificate mounting from secret at /tmp/k8s-webhook-server/serving-certs (lines 75-78)
  • --disable-cert-rotation=true flag (line 29) is appropriate since certificate lifecycle is managed externally via the secret
  • Tolerations for control-plane nodes (lines 68-73) enable webhook availability on cluster-critical nodes
  • privileged SCC annotation (line 21) matches the frr-k8s DaemonSet requirement

Resource requests are minimal but appropriate for a webhook component (10m CPU, 50Mi memory).

@fedepaol fedepaol force-pushed the alignfrrk8supstream branch 4 times, most recently from 5872e43 to 62f6e7f Compare November 14, 2025 09:38
@fedepaol
Copy link
Member Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 14, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 5872e43 and 62f6e7f.

📒 Files selected for processing (3)
  • bindata/network/frr-k8s/frr-k8s.yaml (3 hunks)
  • bindata/network/frr-k8s/node-status-cleaner.yaml (1 hunks)
  • bindata/network/frr-k8s/webhook.yaml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • bindata/network/frr-k8s/webhook.yaml
  • bindata/network/frr-k8s/frr-k8s.yaml
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • bindata/network/frr-k8s/node-status-cleaner.yaml
🔇 Additional comments (3)
bindata/network/frr-k8s/node-status-cleaner.yaml (3)

53-68: Probe configuration is sound.

Liveness and readiness probes are properly configured with HTTPS and reasonable timing. ✓


12-18: Selector and template labels are now aligned.

The Deployment selector (line 14) and pod template labels (line 18) both correctly use component: frr-k8s-statuscleaner. This resolves the label mismatch flagged in the prior review.


73-88: Security configuration appropriate for privileged workload.

The pod uses hostNetwork: true with openshift.io/required-scc: privileged, which is necessary for network webhook functionality. The certificate volume is mounted read-only (line 72), and the service account is appropriately scoped. The security settings align with the workload's networking requirements.

Comment on lines +26 to +31
args:
- $(LOG_LEVEL)
- --disable-cert-rotation=true
- --namespace=$(NAMESPACE)
- --webhook-port=9123
- --frrk8s-selector=component=frr-k8s
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CRITICAL: Environment variables won't expand in args array.

Kubernetes does not perform shell-style variable expansion on args arrays. Lines 27 and 29 use $(LOG_LEVEL) and $(NAMESPACE) syntax, which will be passed literally to the container as the strings "$(LOG_LEVEL)" and "--namespace=$(NAMESPACE)", not their resolved values. This will cause the application to receive malformed arguments and fail.

To fix this, use a shell wrapper to enable variable expansion:

      containers:
      - command:
-       - /statuscleaner
+       - sh
+       - -c
+       - /statuscleaner $(LOG_LEVEL) --disable-cert-rotation=true --namespace=$(NAMESPACE) --webhook-port=9123 --frrk8s-selector=component=frr-k8s
-       args:
-       - $(LOG_LEVEL)
-       - --disable-cert-rotation=true
-       - --namespace=$(NAMESPACE)
-       - --webhook-port=9123
-       - --frrk8s-selector=component=frr-k8s

Alternatively, if the application supports reading from environment variables directly, refactor it to read LOG_LEVEL and NAMESPACE from the environment instead of command-line arguments.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In bindata/network/frr-k8s/node-status-cleaner.yaml around lines 26 to 31 the
args array uses $(LOG_LEVEL) and $(NAMESPACE) which Kubernetes does not expand,
so those tokens will be passed literally; fix by either (A) switching to a shell
wrapper command that performs expansion (e.g. set command to run /bin/sh -c and
build the full CLI string using $LOG_LEVEL and $NAMESPACE so the shell expands
them at container start — ensure the image contains a shell), or (B) remove
those args and let the app read LOG_LEVEL and NAMESPACE from environment
variables (export them in env: and update the app startup code to read process
env instead); pick one approach and update the manifest accordingly.

@fedepaol
Copy link
Member Author

/testwith openshift/cluster-network-operator/master/frrk8s-e2e openshift/frr#107

@fedepaol
Copy link
Member Author

/testwith openshift/cluster-network-operator/master/e2e-metal-ipi-ovn-dualstack-bgp openshift/frr#107

@fedepaol
Copy link
Member Author

/testwith openshift/cluster-network-operator/master/e2e-metal-ipi-ovn-dualstack-bgp openshift/frr#107 openshift/release#71624

@fedepaol
Copy link
Member Author

/testwith openshift/cluster-network-operator/master/e2e-metal-ipi-ovn-dualstack-bgp openshift/frr#107

@fedepaol
Copy link
Member Author

/testwith openshift/cluster-network-operator/master/frrk8s-e2e openshift/frr#107

1 similar comment
@fedepaol
Copy link
Member Author

/testwith openshift/cluster-network-operator/master/frrk8s-e2e openshift/frr#107

@fedepaol
Copy link
Member Author

/testwith openshift/cluster-network-operator/master/e2e-metal-ipi-ovn-dualstack-bgp openshift/frr#107

@fedepaol fedepaol changed the title Align frrk8s to upstream OCPBUGS-56173: Align frrk8s to upstream Nov 28, 2025
@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Nov 28, 2025
@openshift-ci-robot
Copy link
Contributor

@fedepaol: This pull request references Jira Issue OCPBUGS-56173, which is invalid:

  • expected the bug to target the "4.21.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

  • Adds status cleaner job that acts as webhook backend plus takes care of removing frrstatus instances related to those nodes where an frrk8s instance is not running anymore
  • Adds readOnlyFileSystem=true and allowPrivilegeEscalation=false to the frrk8s controller container

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@fedepaol
Copy link
Member Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 28, 2025
@fedepaol
Copy link
Member Author

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@fedepaol: This pull request references Jira Issue OCPBUGS-56173, which is invalid:

  • expected the bug to target the "4.21.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@fedepaol
Copy link
Member Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Nov 28, 2025
@openshift-ci-robot
Copy link
Contributor

@fedepaol: This pull request references Jira Issue OCPBUGS-56173, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @asood-rh

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Nov 28, 2025
@openshift-ci openshift-ci bot requested a review from asood-rh November 28, 2025 09:27
@jcaamano
Copy link
Contributor

/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp
/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

override temporarily

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 28, 2025

@jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/e2e-metal-ipi-ovn-dualstack-bgp, ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

In response to this:

/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp
/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

override temporarily

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@fedepaol
Copy link
Member Author

fedepaol commented Dec 1, 2025

/retest

@jcaamano
Copy link
Contributor

jcaamano commented Dec 1, 2025

/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw
/test e2e-metal-ipi-ovn-dualstack-bgp

The status cleaner pod is in charge of removing stale frrnodestates
resources from those nodes where frrk8s is not running anymore.

At the same time, it acts as the backend for the validation webhook, so
the configuration of the webhook now refers to it.

Signed-off-by: Federico Paolinelli <[email protected]>
readOnlyRootFilesystem prevents containers from writing to the root filesystem,
reducing attack surface and improving security posture by limiting potential
malicious file modifications and ensuring immutable container runtime.

allowPrivilegeEscalation=false prevents containers from gaining additional
privileges beyond those initially granted, further hardening the security
posture by blocking privilege escalation attacks.

Signed-off-by: Federico Paolinelli <[email protected]>
@fedepaol fedepaol force-pushed the alignfrrk8supstream branch from 62f6e7f to fdff55f Compare December 1, 2025 10:58
@jcaamano
Copy link
Contributor

jcaamano commented Dec 1, 2025

/override ci/prow/security

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 1, 2025

@jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/security

In response to this:

/override ci/prow/security

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jcaamano
Copy link
Contributor

jcaamano commented Dec 1, 2025

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 1, 2025

@fedepaol: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-windows fdff55f link true /test e2e-aws-ovn-windows
ci/prow/e2e-ovn-ipsec-step-registry fdff55f link true /test e2e-ovn-ipsec-step-registry
ci/prow/frrk8s-e2e fdff55f link false /test frrk8s-e2e
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw fdff55f link true /test e2e-metal-ipi-ovn-dualstack-bgp-local-gw

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants