Skip to content

Conversation

martinkennelly
Copy link
Contributor

because as per Adrian Moreno:

"This alarm is great and we need visibility
into these packet drops. Actually, it's already
surfacing some customer issues that would
otherwise stay undetected.
The mild problem, however, is the naming.
Technically, there are many possible reasons
for the ovs_vswitchd_dp_flows_lookup_lost metric to increase, not just an overflow in the netlink
socket (as the name of the alarm suggests).
In fact, I have written a KB article listing some
of them: https://access.redhat.com/articles/7115263. I'm opening this bug for us to consider renaming it as something more accurate (and less scary),
e.g: OVNKubernetesNodeOVSDpLostPacket."

The alert name is misleading and may indicate a bug where in reality, its just we ran out of space to
process new flows and therefore drop packets.

because as per Adrian Moreno:

"This alarm is great and we need visibility
into these packet drops. Actually, it's already
surfacing some customer issues that would
otherwise stay undetected.
The mild problem, however, is the naming.
Technically, there are many possible reasons
for the `ovs_vswitchd_dp_flows_lookup_lost` metric
to increase, not just an overflow in the netlink
socket (as the name of the alarm suggests).
In fact, I have written a KB article listing some
of them: https://access.redhat.com/articles/7115263.
I'm opening this bug for us to consider renaming it
as something more accurate (and less scary),
e.g: OVNKubernetesNodeOVSDpLostPacket."

The alert name is misleading and may indicate a bug
where in reality, its just we ran out of space to
process new flows and therefore drop packets.

Signed-off-by: Martin Kennelly <[email protected]>
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Aug 19, 2025
@openshift-ci-robot
Copy link
Contributor

@martinkennelly: This pull request references Jira Issue OCPBUGS-54766, which is invalid:

  • expected the bug to target the "4.20.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

because as per Adrian Moreno:

"This alarm is great and we need visibility
into these packet drops. Actually, it's already
surfacing some customer issues that would
otherwise stay undetected.
The mild problem, however, is the naming.
Technically, there are many possible reasons
for the ovs_vswitchd_dp_flows_lookup_lost metric to increase, not just an overflow in the netlink
socket (as the name of the alarm suggests).
In fact, I have written a KB article listing some
of them: https://access.redhat.com/articles/7115263. I'm opening this bug for us to consider renaming it as something more accurate (and less scary),
e.g: OVNKubernetesNodeOVSDpLostPacket."

The alert name is misleading and may indicate a bug where in reality, its just we ran out of space to
process new flows and therefore drop packets.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@martinkennelly
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Aug 19, 2025
@openshift-ci-robot
Copy link
Contributor

@martinkennelly: This pull request references Jira Issue OCPBUGS-54766, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.20.0) matches configured target version for branch (4.20.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @anuragthehatter

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@martinkennelly
Copy link
Contributor Author

/retest

@martinkennelly
Copy link
Contributor Author

Simple change for you @kyrtapz to close a bug we have thats normal prio.

@martinkennelly
Copy link
Contributor Author

@ahardin-rh do we have docs that reference this alert ? May need updating.

We also have to search for any kcs on this alert and update. I'll do this.

@martinkennelly
Copy link
Contributor Author

/retest

1 similar comment
@martinkennelly
Copy link
Contributor Author

/retest

Copy link
Contributor

openshift-ci bot commented Aug 23, 2025

@martinkennelly: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-serial 85915e8 link false /test e2e-aws-ovn-serial
ci/prow/4.20-upgrade-from-stable-4.19-e2e-azure-ovn-upgrade 85915e8 link false /test 4.20-upgrade-from-stable-4.19-e2e-azure-ovn-upgrade
ci/prow/4.20-upgrade-from-stable-4.19-e2e-aws-ovn-upgrade 85915e8 link false /test 4.20-upgrade-from-stable-4.19-e2e-aws-ovn-upgrade
ci/prow/security 85915e8 link false /test security
ci/prow/e2e-aws-hypershift-ovn-kubevirt 85915e8 link false /test e2e-aws-hypershift-ovn-kubevirt

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@martinkennelly
Copy link
Contributor Author

Asking Adrian for +1

@martinkennelly
Copy link
Contributor Author

Pingd Adrian but hes on PTO. Waiting.

@amorenoz
Copy link

amorenoz commented Sep 1, 2025

New name looks good to me, thanks.

@martinkennelly
Copy link
Contributor Author

/assign @kyrtapz

@kyrtapz
Copy link
Contributor

kyrtapz commented Sep 1, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 1, 2025
Copy link
Contributor

openshift-ci bot commented Sep 1, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kyrtapz, martinkennelly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 1, 2025
@openshift-bot
Copy link
Contributor

/jira refresh

The requirements for Jira bugs have changed (Jira issues linked to PRs on main branch need to target different OCP), recalculating validity.

@openshift-ci-robot openshift-ci-robot added jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. and removed jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Sep 2, 2025
@openshift-ci-robot
Copy link
Contributor

@openshift-bot: This pull request references Jira Issue OCPBUGS-54766, which is invalid:

  • expected the bug to target either version "4.21." or "openshift-4.21.", but it targets "4.20.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

The requirements for Jira bugs have changed (Jira issues linked to PRs on main branch need to target different OCP), recalculating validity.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@martinkennelly
Copy link
Contributor Author

@kyrtapz can you over ride the bgp job - its unrelated. thanks.

@martinkennelly
Copy link
Contributor Author

/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

unrelated

Copy link
Contributor

openshift-ci bot commented Sep 11, 2025

@martinkennelly: martinkennelly unauthorized: /override is restricted to Repo administrators, approvers in top level OWNERS file, and the following github teams:openshift: openshift-release-oversight openshift-staff-engineers openshift-sustaining-engineers.

In response to this:

/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

unrelated

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@martinkennelly
Copy link
Contributor Author

/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw

Looks like its passing again

@martinkennelly
Copy link
Contributor Author

/tide refresh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants