Skip to content

Conversation

danwinship
Copy link
Contributor

We keep getting reports of iptables-alerter causing CPU usage alerts... we debugged this at one point and it was because crictl was suddenly using a ton of CPU and RAM for no apparent reason. We weren't able to reproduce beyond that and it's not really worth spending a lot of time trying to fix since crictl is not in the critical path of any normal pods anyway. I had tried improving things by using less crictl before (#2404) but we're still getting reports. This fixes it to not use crictl until after we've determined that some pod, somewhere on the node is using iptables, and that should hopefully be "never".

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Sep 17, 2025
@openshift-ci-robot
Copy link
Contributor

@danwinship: This pull request references Jira Issue OCPBUGS-61215, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @anuragthehatter

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

We keep getting reports of iptables-alerter causing CPU usage alerts... we debugged this at one point and it was because crictl was suddenly using a ton of CPU and RAM for no apparent reason. We weren't able to reproduce beyond that and it's not really worth spending a lot of time trying to fix since crictl is not in the critical path of any normal pods anyway. I had tried improving things by using less crictl before (#2404) but we're still getting reports. This fixes it to not use crictl until after we've determined that some pod, somewhere on the node is using iptables, and that should hopefully be "never".

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 17, 2025
@danwinship
Copy link
Contributor Author

/retest-required

@danwinship danwinship force-pushed the iptables-alerter-short-circuit branch from faace85 to 1bf470b Compare September 19, 2025 17:59
@danwinship
Copy link
Contributor Author

/verified by @danwinship

no e2e test, tested by hand

@danwinship
Copy link
Contributor Author

/retest-required

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Sep 19, 2025
@openshift-ci-robot
Copy link
Contributor

@danwinship: This PR has been marked as verified by @danwinship.

In response to this:

/verified by @danwinship

no e2e test, tested by hand

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@martinkennelly
Copy link
Contributor

/lgtm

# have any iptables-using pods anyway, do a pre-scan of all (non-hostnetwork)
# namespaces without using crictl, and bail out early if we don't find anything
iptables_output=""
for netns_pid in $(lsns -t net -o pid -nr | sort -u | grep -v '^1$'); do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not for others; '^1$' excludes pid 1 :)

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 30, 2025
Copy link
Contributor

openshift-ci bot commented Sep 30, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, martinkennelly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 6f6d0ee and 2 for PR HEAD 1bf470b in total

@martinkennelly
Copy link
Contributor

/test e2e-aws-ovn-serial-2of2

Unrelated disruption

@martinkennelly
Copy link
Contributor

/test e2e-aws-ovn-upgrade

Unrelated - job reached timeout limit.

Copy link
Contributor

openshift-ci bot commented Oct 2, 2025

@danwinship: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.20-upgrade-from-stable-4.19-e2e-gcp-ovn-upgrade 1bf470b link false /test 4.20-upgrade-from-stable-4.19-e2e-gcp-ovn-upgrade
ci/prow/e2e-vsphere-ovn 1bf470b link false /test e2e-vsphere-ovn
ci/prow/e2e-aws-hypershift-ovn-kubevirt 1bf470b link false /test e2e-aws-hypershift-ovn-kubevirt
ci/prow/e2e-vsphere-ovn-dualstack 1bf470b link false /test e2e-vsphere-ovn-dualstack
ci/prow/e2e-network-mtu-migration-ovn-ipv4 1bf470b link false /test e2e-network-mtu-migration-ovn-ipv4
ci/prow/okd-scos-e2e-aws-ovn 1bf470b link false /test okd-scos-e2e-aws-ovn
ci/prow/4.20-upgrade-from-stable-4.19-e2e-azure-ovn-upgrade 1bf470b link false /test 4.20-upgrade-from-stable-4.19-e2e-azure-ovn-upgrade
ci/prow/e2e-openstack-ovn 1bf470b link false /test e2e-openstack-ovn
ci/prow/4.20-upgrade-from-stable-4.19-e2e-aws-ovn-upgrade 1bf470b link false /test 4.20-upgrade-from-stable-4.19-e2e-aws-ovn-upgrade
ci/prow/security 1bf470b link false /test security

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@danwinship
Copy link
Contributor Author

/override ci/prow/e2e-ovn-ipsec-step-registry
/override ci/prow/e2e-aws-ovn-serial-2of2

unrelated, very failure-prone jobs

Copy link
Contributor

openshift-ci bot commented Oct 2, 2025

@danwinship: Overrode contexts on behalf of danwinship: ci/prow/e2e-aws-ovn-serial-2of2, ci/prow/e2e-ovn-ipsec-step-registry

In response to this:

/override ci/prow/e2e-ovn-ipsec-step-registry
/override ci/prow/e2e-aws-ovn-serial-2of2

unrelated, very failure-prone jobs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot openshift-merge-bot bot merged commit 339bfc9 into openshift:master Oct 2, 2025
32 of 42 checks passed
@openshift-ci-robot
Copy link
Contributor

@danwinship: Jira Issue Verification Checks: Jira Issue OCPBUGS-61215
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-61215 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

In response to this:

We keep getting reports of iptables-alerter causing CPU usage alerts... we debugged this at one point and it was because crictl was suddenly using a ton of CPU and RAM for no apparent reason. We weren't able to reproduce beyond that and it's not really worth spending a lot of time trying to fix since crictl is not in the critical path of any normal pods anyway. I had tried improving things by using less crictl before (#2404) but we're still getting reports. This fixes it to not use crictl until after we've determined that some pod, somewhere on the node is using iptables, and that should hopefully be "never".

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@danwinship danwinship deleted the iptables-alerter-short-circuit branch October 2, 2025 16:01
@danwinship
Copy link
Contributor Author

/cherry-pick release-4.20

@openshift-cherrypick-robot

@danwinship: new pull request created: #2811

In response to this:

/cherry-pick release-4.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-robot
Copy link
Contributor

Fix included in accepted release 4.21.0-0.nightly-2025-10-02-215712

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants