Making FDP IPv4 QE jobs default #69292

anuragthehatter · 2025-09-16T02:59:09Z

Making FDP IPv4 QE jobs default and add TIMEOUT (post discussion with OCP QE infra maintainers) to 60 min to accomodate IPsec enable/disable/test usecase which takes ~50 min and currently causing interrupts in recent FDP workflows.

#68237 messed up due to bulk merge and force pushes..sorry. Opened this new clean one.

cc @martinkennelly @jluhrsen

anuragthehatter · 2025-09-16T02:59:29Z

/pj-rehearse pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-fdp-qe

openshift-ci-robot · 2025-09-16T02:59:31Z

@anuragthehatter: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

anuragthehatter · 2025-09-16T02:59:37Z

/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.19-e2e-aws-ovn-fdp-qe

openshift-ci-robot · 2025-09-16T02:59:39Z

@anuragthehatter: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

anuragthehatter · 2025-09-16T02:59:53Z

/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.20-e2e-aws-ovn-fdp-qe

openshift-ci-robot · 2025-09-16T02:59:56Z

@anuragthehatter: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

deepsm007 · 2025-09-16T18:08:39Z

/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.20-e2e-aws-ovn-fdp-qe pull-ci-openshift-ovn-kubernetes-release-4.19-e2e-aws-ovn-fdp-qe

openshift-ci-robot · 2025-09-16T18:08:42Z

@deepsm007: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

anuragthehatter · 2025-09-16T18:12:55Z

/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.19-e2e-aws-ovn-fdp-qe

openshift-ci-robot · 2025-09-16T18:12:57Z

@anuragthehatter: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

anuragthehatter · 2025-09-16T18:13:04Z

/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.20-e2e-aws-ovn-fdp-qe

openshift-ci-robot · 2025-09-16T18:13:08Z

@anuragthehatter: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

jluhrsen · 2025-09-19T16:27:39Z

/lgtm

asood-rh · 2025-09-22T14:07:22Z

@anuragthehatter Do we need @martinkennelly 's approval for this PR to merge? Not lot of code change except the flags and timeout.

martinkennelly · 2025-09-22T14:11:33Z

@anuragthehatter how many times have you executed this job and whats the pass rate? Thanks

anuragthehatter · 2025-09-22T14:23:28Z

@anuragthehatter how many times have you executed this job and whats the pass rate? Thanks

Executed 3 times above in rehearsals on different releases as seen above. Its 100% on all those 3 jobs.
The historical pass rate for this trigger is almost 100%. I would say 9/10 based on our QE past triggers hence we want to make it default as discussed in past team meeting discussions and helps catching issues early in FDP merges and defaukt OVN dev PRS.

martinkennelly · 2025-09-22T15:41:46Z

/lgtm

martinkennelly · 2025-09-22T15:42:59Z

@anuragthehatter how many times have you executed this job and whats the pass rate? Thanks

Executed 3 times above in rehearsals on different releases as seen above. Its 100% on all those 3 jobs. The historical pass rate for this trigger is almost 100%. I would say 9/10 based on our QE past triggers hence we want to make it default as discussed in past team meeting discussions and helps catching issues early in FDP merges and defaukt OVN dev PRS.

I think theres a command to run it many times looking for flakes. The reason I ask is i saw it fail recently on a ds merge it and it was flake. I trust your opinion here and if we are wrong, we can revert this PR.

anuragthehatter · 2025-09-22T15:52:05Z

@martinkennelly Yes. That d/s merge has IPsec usecase timeout issue which usually takes 45-52 minutes across various platforms hence this PR addressed TEST_TIMEOUT: "60" and testd well to accomodate that.

asood-rh · 2025-10-17T16:24:43Z

4.21, node gone missing. I think aws cluster could b using spot instances.

SDN: OCP-40908:SDN Allow from hostnetwork policy to allow traffic from hostnetwork pods on LoadBalancerService endpoint strategy cluster expand_less	10s
{failed OCP-40908:SDN Allow from hostnetwork policy to allow traffic from hostnetwork pods on LoadBalancerService endpoint strategy cluster failed 
        Scenario: OCP-40908:SDN Allow from hostnetwork policy to allow traffic from hostnetwork pods on LoadBalancerService endpoint strategy cluster

Given I store the ready and schedulable nodes in the :nodes clipboard

Message:

        nodes 'ip-10-0-25-232.us-east-2.compute.internal' not found (BushSlicer::ResourceNotFoundError)
./lib/openshift/resource.rb:66:in `get_checked'
./lib/openshift/resource.rb:130:in `get_cached_prop'
./lib/openshift/node.rb:110:in `ready?'
./features/step_definitions/node.rb:76:in `block (2 levels) in '
./features/step_definitions/node.rb:76:in `select'
./features/step_definitions/node.rb:76:in `/^I store the( schedulable| ready and schedulable)?( windows)? (node|master|worker)s in the(?: :(\S+))? clipboard(?: excluding "(.+?)")?$/'
features/tierN/networking/network-policy.feature:2578:in `I store the ready and schedulable nodes in the :nodes clipboard'

anuragthehatter · 2025-10-17T17:23:21Z

4.21, node gone missing. I think aws cluster could b using spot instances.

SDN: OCP-40908:SDN Allow from hostnetwork policy to allow traffic from hostnetwork pods on LoadBalancerService endpoint strategy cluster expand_less 10s
{failed OCP-40908:SDN Allow from hostnetwork policy to allow traffic from hostnetwork pods on LoadBalancerService endpoint strategy cluster failed
Scenario: OCP-40908:SDN Allow from hostnetwork policy to allow traffic from hostnetwork pods on LoadBalancerService endpoint strategy cluster

Given I store the ready and schedulable nodes in the :nodes clipboard

Message:
    nodes 'ip-10-0-25-232.us-east-2.compute.internal' not found (BushSlicer::ResourceNotFoundError)
./lib/openshift/resource.rb:66:in get_checked' ./lib/openshift/resource.rb:130:in get_cached_prop'
./lib/openshift/node.rb:110:in ready?' ./features/step_definitions/node.rb:76:in block (2 levels) in '
./features/step_definitions/node.rb:76:in select' ./features/step_definitions/node.rb:76:in /^I store the( schedulable| ready and schedulable)?( windows)? (node|master|worker)s in the(?: :(\S+))? clipboard(?: excluding "(.+?)")?$/'
features/tierN/networking/network-policy.feature:2578:in `I store the ready and schedulable nodes in the :nodes clipboard'

Hmm also hostnetwork pod usecases failing due to OTP migration, need compat_otp.SetNamespacePrivileged(oc, ns) in those 3 failed cases.

martinkennelly · 2025-10-17T18:34:30Z

ill be away for a week but can you folks handle any fixes and retry here? Since I found issues on the first run ye may have to run this job many times to shake out flakes. Unfortunately theres no aggresgate it seems for rehearsals.... idk why... its needed. I looked at the docs for rehearsal commands and didnt find any aggregate :/ @jluhrsen do you know any good command to run at each release main-3 jobs 10x times?

anuragthehatter · 2025-10-17T20:19:45Z

ill be away for a week but can you folks handle any fixes and retry here? Since I found issues on the first run ye may have to run this job many times to shake out flakes. Unfortunately theres no aggresgate it seems for rehearsals.... idk why... its needed. I looked at the docs for rehearsal commands and didnt find any aggregate :/ @jluhrsen do you know any good command to run at each release main-3 jobs 10x times?

Sure @martinkennelly Yep. I figured our OTE migration has again impacted few more cases. Will fix them and have a PR up soon https://github.com/openshift/openshift-tests-private/pull/27809. Once that merge we can rehearse again.

Few points to note:

We can ignore 4.22 errors as that branch from QE side is not ready with right configs or agents. When we get closer to 4.22 QE infra fix that
for 4.21 I have seen few failures due to OTE migration in QE, Have a test PR for it https://github.com/openshift/openshift-tests-private/pull/27809
for 4.17 cluster creationf failed at image payload image step so not a worry there
Rest all version passed.

anuragthehatter · 2025-10-22T17:52:55Z

/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.21-e2e-aws-ovn-fdp-qe
/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.17-e2e-aws-ovn-fdp-qe

openshift-ci-robot · 2025-10-22T17:52:57Z

@anuragthehatter: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

openshift-ci-robot · 2025-10-22T17:57:00Z

@anuragthehatter: requesting more than one rehearsal in one comment is not supported. If you would like to rehearse multiple specific jobs, please separate the job names by a space in a single command.

anuragthehatter · 2025-10-22T18:58:02Z

/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.17-e2e-aws-ovn-fdp-qe

openshift-ci-robot · 2025-10-22T18:58:05Z

@anuragthehatter: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

anuragthehatter · 2025-10-23T03:31:51Z

/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.17-e2e-aws-ovn-fdp-qe

openshift-ci-robot · 2025-10-23T03:31:54Z

@anuragthehatter: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

anuragthehatter · 2025-10-23T12:48:21Z

/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.21-e2e-aws-ovn-fdp-qe

openshift-ci-robot · 2025-10-23T12:48:25Z

@anuragthehatter: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

anuragthehatter · 2025-10-24T15:03:09Z

/pj-rehearse pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-fdp-qe

openshift-ci-robot · 2025-10-24T15:03:13Z

@anuragthehatter: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

anuragthehatter · 2025-10-24T17:47:03Z

/test generated-config

anuragthehatter · 2025-10-24T20:26:43Z

/pj-rehearse more

openshift-ci-robot · 2025-10-24T20:26:47Z

@anuragthehatter: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

anuragthehatter · 2025-10-29T00:43:37Z

/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-aws-ovn-fdp-qe

openshift-ci-robot · 2025-10-29T00:43:40Z

@anuragthehatter: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

anuragthehatter · 2025-10-30T01:33:02Z

/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.21-e2e-aws-ovn-fdp-qe

openshift-ci-robot · 2025-10-30T01:33:04Z

@anuragthehatter: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

anuragthehatter · 2025-11-02T16:53:32Z

@martinkennelly Based on various changes and commits over last several weeks, I am confident that we have achieved almost 100% stability across releases. Unfortunately we had OTE library migrations which caused flakes and upstream migration will also happen in near future which might causes flakes again and ERT team along with QE to step in to fix those if needed.

From stability POV, we have achieved consistent stability as tested here. Env install failures will always be out of our control :)

Please Note: We can ignore 4.22 runs at the moment. QE set Polarion tags and make agents needed for future releases when we enter into that release officially so 4.22 doesn't have blackened QE infra ready to run tests perfectly at the moment.

Let me know if you have any comments else it should be good to merge now. Also in future is we see any flake it will be tracked via Tracker

Resolved conflicts in ovn-kubernetes config files by accepting PR changes to make FDP IPv4 QE jobs default (always_run: true) with 60 min timeout.

openshift-ci-robot · 2025-11-28T20:21:24Z

[REHEARSALNOTIFIER]
@anuragthehatter: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name	Repo	Type	Reason
pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-fdp-qe	openshift/ovn-kubernetes	presubmit	Presubmit changed
pull-ci-openshift-ovn-kubernetes-release-4.14-e2e-aws-ovn-fdp-qe	openshift/ovn-kubernetes	presubmit	Presubmit changed
pull-ci-openshift-ovn-kubernetes-release-4.15-e2e-aws-ovn-fdp-qe	openshift/ovn-kubernetes	presubmit	Presubmit changed
pull-ci-openshift-ovn-kubernetes-release-4.16-e2e-aws-ovn-fdp-qe	openshift/ovn-kubernetes	presubmit	Presubmit changed
pull-ci-openshift-ovn-kubernetes-release-4.17-e2e-aws-ovn-fdp-qe	openshift/ovn-kubernetes	presubmit	Presubmit changed
pull-ci-openshift-ovn-kubernetes-release-4.18-e2e-aws-ovn-fdp-qe	openshift/ovn-kubernetes	presubmit	Presubmit changed
pull-ci-openshift-ovn-kubernetes-release-4.19-e2e-aws-ovn-fdp-qe	openshift/ovn-kubernetes	presubmit	Presubmit changed
pull-ci-openshift-ovn-kubernetes-release-4.20-e2e-aws-ovn-fdp-qe	openshift/ovn-kubernetes	presubmit	Presubmit changed
pull-ci-openshift-ovn-kubernetes-release-4.21-e2e-aws-ovn-fdp-qe	openshift/ovn-kubernetes	presubmit	Presubmit changed
pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-aws-ovn-fdp-qe	openshift/ovn-kubernetes	presubmit	Presubmit changed
pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-aws-ovn	openshift/ovn-kubernetes	presubmit	Ci-operator config changed
pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn	openshift/ovn-kubernetes	presubmit	Ci-operator config changed
pull-ci-openshift-ovn-kubernetes-release-4.20-e2e-aws-ovn	openshift/ovn-kubernetes	presubmit	Ci-operator config changed
pull-ci-openshift-ovn-kubernetes-release-4.18-e2e-aws-ovn	openshift/ovn-kubernetes	presubmit	Ci-operator config changed
pull-ci-openshift-ovn-kubernetes-release-4.15-e2e-aws-live-migration-sdn-ovn-rollback	openshift/ovn-kubernetes	presubmit	Ci-operator config changed
pull-ci-openshift-ovn-kubernetes-release-4.21-e2e-aws-ovn	openshift/ovn-kubernetes	presubmit	Ci-operator config changed
pull-ci-openshift-ovn-kubernetes-release-4.19-e2e-aws-ovn	openshift/ovn-kubernetes	presubmit	Ci-operator config changed
pull-ci-openshift-ovn-kubernetes-release-4.16-e2e-aws-ovn-local-gateway	openshift/ovn-kubernetes	presubmit	Ci-operator config changed
pull-ci-openshift-ovn-kubernetes-release-4.17-e2e-aws-ovn	openshift/ovn-kubernetes	presubmit	Ci-operator config changed

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

openshift-ci · 2025-11-28T20:35:43Z

@anuragthehatter: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/rehearse/openshift/ovn-kubernetes/release-4.18/e2e-aws-ovn-fdp-qe	`11f503d`	link	unknown	`/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.18-e2e-aws-ovn-fdp-qe`
ci/rehearse/openshift/ovn-kubernetes/release-4.22/e2e-aws-ovn-fdp-qe	`11f503d`	link	unknown	`/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-aws-ovn-fdp-qe`
ci/prow/generated-config	`135cd26`	link	true	`/test generated-config`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

anuragthehatter · 2025-11-29T13:39:30Z

Opened #71975

making FDP IPv4 QE jobs default with Test Timeout 60

8378b32

openshift-ci bot requested review from jcaamano and kyrtapz September 16, 2025 02:59

anuragthehatter mentioned this pull request Sep 16, 2025

Making FDP QE jobs IPv4 Default #68237

Closed

This was referenced Sep 18, 2025

[DNM] [FDP] [release-4.17] Pre-release testing for OVN 24.03. openshift/ovn-kubernetes#2754

Closed

[DNM] [FDP] [master] Pre-release testing for OVN 25.03. openshift/ovn-kubernetes#2755

Closed

FDP IPv6 QE jobs #69126

Closed

openshift-ci bot assigned jluhrsen Sep 19, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 19, 2025

openshift-ci bot assigned martinkennelly Sep 22, 2025

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 22, 2025

fixing generated config error

11f503d

openshift-ci-robot removed the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Oct 24, 2025

Merge master into def4 and resolve conflicts

135cd26

Resolved conflicts in ovn-kubernetes config files by accepting PR changes to make FDP IPv4 QE jobs default (always_run: true) with 60 min timeout.

anuragthehatter closed this Nov 29, 2025

Making FDP IPv4 QE jobs default #69292

Making FDP IPv4 QE jobs default #69292

Uh oh!

Conversation

anuragthehatter commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anuragthehatter commented Sep 16, 2025

Uh oh!

openshift-ci-robot commented Sep 16, 2025

Uh oh!

anuragthehatter commented Sep 16, 2025

Uh oh!

openshift-ci-robot commented Sep 16, 2025

Uh oh!

anuragthehatter commented Sep 16, 2025

Uh oh!

openshift-ci-robot commented Sep 16, 2025

Uh oh!

deepsm007 commented Sep 16, 2025

Uh oh!

openshift-ci-robot commented Sep 16, 2025

Uh oh!

anuragthehatter commented Sep 16, 2025

Uh oh!

openshift-ci-robot commented Sep 16, 2025

Uh oh!

anuragthehatter commented Sep 16, 2025

Uh oh!

openshift-ci-robot commented Sep 16, 2025

Uh oh!

jluhrsen commented Sep 19, 2025

Uh oh!

asood-rh commented Sep 22, 2025

Uh oh!

martinkennelly commented Sep 22, 2025

Uh oh!

anuragthehatter commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martinkennelly commented Sep 22, 2025

Uh oh!

martinkennelly commented Sep 22, 2025

Uh oh!

anuragthehatter commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asood-rh commented Oct 17, 2025

Uh oh!

anuragthehatter commented Oct 17, 2025

Uh oh!

martinkennelly commented Oct 17, 2025

Uh oh!

anuragthehatter commented Oct 17, 2025

Uh oh!

anuragthehatter commented Oct 22, 2025

Uh oh!

openshift-ci-robot commented Oct 22, 2025

Uh oh!

openshift-ci-robot commented Oct 22, 2025

Uh oh!

anuragthehatter commented Oct 22, 2025

Uh oh!

openshift-ci-robot commented Oct 22, 2025

Uh oh!

anuragthehatter commented Oct 23, 2025

Uh oh!

openshift-ci-robot commented Oct 23, 2025

Uh oh!

anuragthehatter commented Oct 23, 2025

Uh oh!

openshift-ci-robot commented Oct 23, 2025

Uh oh!

anuragthehatter commented Oct 24, 2025

Uh oh!

openshift-ci-robot commented Oct 24, 2025

Uh oh!

anuragthehatter commented Oct 24, 2025

Uh oh!

anuragthehatter commented Oct 24, 2025

anuragthehatter commented Sep 16, 2025 •

edited

Loading

anuragthehatter commented Sep 22, 2025 •

edited

Loading

anuragthehatter commented Sep 22, 2025 •

edited

Loading