[release-4.19] MCO-1720: Promote MachineConfigNode feature gate to default #2376

isabella-janssen · 2025-06-20T12:42:18Z

This promotes the MachineConfigNode feature gate to Default in the 4.19 brach.

openshift-ci-robot · 2025-06-20T12:42:22Z

@isabella-janssen: This pull request references MCO-1720 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

This promotes the MachineConfigNode feature gate to Default in the 4.19 brach.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2025-06-20T12:42:29Z

Hello @isabella-janssen! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

openshift-ci · 2025-06-20T12:43:40Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: isabella-janssen
Once this PR has been reviewed and has the lgtm label, please assign joelspeed for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

isabella-janssen · 2025-06-20T19:30:11Z

/retest-required

JoelSpeed · 2025-06-23T09:35:54Z

@isabella-janssen We don't generally backport feature promotions post GA unless there is a very very good reason to do so, what's the motivation here?

oourfali · 2025-06-23T13:31:15Z

@JoelSpeed Hi! Due to it finalizing in the last minute, the aggreement was to push it via 4.19 z-stream.
Motivation-wise we have several aspects:

Appliance - we have a customer interested in getting a support exception for the appliance, which leverages the pinned image set functionality. in order to allow the cluster to get upgraded, we'll need this API GA'ed
OVE - It is consuming the appliance as well.
There are additional use-cases being discussed at the moment

Overall, the api does exist for a while, and is a mature one. I hope the above clarifies things, but when we saw that we missed 4.19 the agreed plan was to backport.

isabella-janssen · 2025-06-23T13:41:02Z

/retest-required

JoelSpeed · 2025-06-24T06:57:55Z

Do we have the associated SBAR for this feature backport that we can link here?

isabella-janssen · 2025-06-24T13:46:20Z

/hold

Holding while we create an SBAR for this feature & PinnedImage sets.

isabella-janssen · 2025-06-24T13:46:27Z

/retest-required

isabella-janssen · 2025-06-25T10:48:25Z

/retest-required

isabella-janssen · 2025-06-27T12:45:13Z

/retest-required

isabella-janssen · 2025-07-01T01:31:12Z

/retest-required

isabella-janssen · 2025-07-01T12:21:06Z

/test verify-feature-promotion

isabella-janssen · 2025-07-01T20:00:14Z

/retest-required

isabella-janssen · 2025-07-02T01:38:35Z

/retest-required

isabella-janssen · 2025-07-02T12:07:21Z

/retest-required

isabella-janssen · 2025-07-02T17:39:54Z

/test verify-feature-promotion

isabella-janssen · 2025-07-03T01:14:11Z

/retest-required

JoelSpeed · 2025-07-03T14:50:56Z

As far as I can tell, some tests have been renamed recetly and those haven't timed out yet.

The only test I can see which doesn't pass enough is [Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:PinnedImages][OCPFeatureGate:MachineConfigNodes][Serial] Invalid PIS leads to degraded MCN in a standard Pool [apigroup:machineconfiguration.openshift.io] on ha metal amd64 ipv4.

If we can get a few more runs there to see if it comes above 95% I think we are good to promote this

isabella-janssen · 2025-07-03T17:13:52Z

/retest-required

isabella-janssen · 2025-07-03T17:31:22Z

As far as I can tell, some tests have been renamed recetly and those haven't timed out yet.

That's correct, they were updated June 26th (openshift/origin#29918).

The only test I can see which doesn't pass enough is [Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:PinnedImages][OCPFeatureGate:MachineConfigNodes][Serial] Invalid PIS leads to degraded MCN in a standard Pool [apigroup:machineconfiguration.openshift.io] on ha metal amd64 ipv4.

I have some manually triggered runs for that running now, hopefully we can get the passing rate up soon.

Related SBAR can be found here.

Thanks for the continued patience and help throughout this effort @JoelSpeed!

JoelSpeed · 2025-07-07T09:25:22Z

/test verify-feature-promotion

isabella-janssen · 2025-07-07T18:42:26Z

/test verify-feature-promotion

isabella-janssen · 2025-07-08T12:21:28Z

/test verify-feature-promotion

isabella-janssen · 2025-07-09T13:14:54Z

/test verify-feature-promotion

isabella-janssen · 2025-07-09T13:40:44Z

/unhold

@JoelSpeed It looks like the [Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:PinnedImages][OCPFeatureGate:MachineConfigNodes][Serial] Invalid PIS leads to degraded MCN in a standard Pool [apigroup:machineconfiguration.openshift.io] test has a good pass rate now.

Two tests, [Suite:openshift/machine-config-operator/disruptive][sig-mco][OCPFeatureGate:MachineConfigNodes] [Serial][Slow]Should properly report MCN conditions on node degrade [apigroup:machineconfiguration.openshift.io] and [Suite:openshift/machine-config-operator/disruptive][sig-mco][OCPFeatureGate:MachineConfigNodes] [Serial][Slow]Should properly create and remove MCN on node creation and deletion [apigroup:machineconfiguration.openshift.io] are new and are not being used to prove component readiness for the MCN feature gate. cc @yuqi-zhang

JoelSpeed · 2025-07-09T14:35:57Z

are not being used to prove component readiness for the MCN feature gate

How did we come to this conclusion? I would have expected all tests to contribute, whether they are new or not? Or are these tests specifically marked up in a way that they always pass? We have a method for marking tests as flakes so that they do not impact the initial signal, are you doing that?

isabella-janssen · 2025-07-09T15:47:17Z

How did we come to this conclusion? I would have expected all tests to contribute, whether they are new or not?

When originally working to GA MachineConfigNodes & PinnedImageSets in 4.19, we decided to use 11 tests to prove component readiness for the features, 6 specific to MCN & 5 for PIS, and the tests being used for MCN's component readiness were documented in Jira. We used those 11 tests to prove component readiness in lifting the feature gates in 4.20 and planned to use the same tests to support the 4.19 backport. Those tests now seem to have reached the required number of runs and pass rates.

The Should properly report MCN conditions on node degrade and Should properly create and remove MCN on node creation and deletion tests started running recently as part of separate MCO work (MCO-1652) and will contribute to component readiness going forward, but at this time. This decision was discussed with @yuqi-zhang & @craychee.

Or are these tests specifically marked up in a way that they always pass?

No, the tests pass / fail based on whether they perform as expected or not.

We have a method for marking tests as flakes so that they do not impact the initial signal, are you doing that?

We are not currently doing this and I am not aware of any plans to do this.

yuqi-zhang · 2025-07-09T17:22:48Z

How did we come to this conclusion? I would have expected all tests to contribute, whether they are new or not? Or are these tests specifically marked up in a way that they always pass? We have a method for marking tests as flakes so that they do not impact the initial signal, are you doing that?

Perhaps a better way to frame it is that these tests were introduced after the fact as additional checks for the features, since no test suite exercised them until recently, so we would like to not consider them as part of the requirement yet (the original 5-tests-per-FG is still being met, with these tests we would have 8 for MCN FG)

Put another way, if we were to for some reason add a new test for a feature every week as we expand on testing, does that mean that we'd never satisfy the requirements as there would always be a test not reaching enough # of runs to meet criteria?

As for the test pass rate, they also both have 100% pass rate on 4.20, just around 10 runs each (as opposed to a few hundred for all the other tests), and in 4.19 has only 4 and 7 runs, and thus isn't matching graduation criteria by themselves, so we will use them to track component readiness, but not use them to lift the FG if possible.

isabella-janssen · 2025-07-14T13:23:32Z

@JoelSpeed do you have any followup questions or concerns based on the justification from myself or @yuqi-zhang?

JoelSpeed · 2025-07-14T13:58:40Z

/test verify-feature-promotion

openshift-ci · 2025-07-14T14:07:17Z

@isabella-janssen: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/verify-feature-promotion	`75435dd`	link	true	`/test verify-feature-promotion`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

JoelSpeed · 2025-07-14T14:14:49Z

Put another way, if we were to for some reason add a new test for a feature every week as we expand on testing, does that mean that we'd never satisfy the requirements as there would always be a test not reaching enough # of runs to meet criteria?

I'll be honest, this is possibly the first time this has come up. Generally by the time feature promotion happens, the feature is complete, and the testing is complete (or mostly complete) at this time. Given that these features were promoted through the SBAR process, this is a bit of a wrinkle that we haven't crossed before.

Looking at the results as of today, the new tests do have some failures, promotion the feature with tests failing will contribute to red component readiness, which will trigger TRT to act. Have we looked at the failures on the newer tests to see what those are? (Since we are currently at 80%, I think it's at least looking into why those runs failed before we assess an override)

promote MachineConfigNode feature gate to default

75435dd

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 20, 2025

openshift-ci bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jun 20, 2025

openshift-ci bot requested review from JoelSpeed and sinnykumari June 20, 2025 12:43

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 24, 2025

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 9, 2025

[release-4.19] MCO-1720: Promote MachineConfigNode feature gate to default #2376

Are you sure you want to change the base?

[release-4.19] MCO-1720: Promote MachineConfigNode feature gate to default #2376

Uh oh!

Conversation

isabella-janssen commented Jun 20, 2025

Uh oh!

openshift-ci-robot commented Jun 20, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Jun 20, 2025

Uh oh!

openshift-ci bot commented Jun 20, 2025

Uh oh!

isabella-janssen commented Jun 20, 2025

Uh oh!

JoelSpeed commented Jun 23, 2025

Uh oh!

oourfali commented Jun 23, 2025

Uh oh!

isabella-janssen commented Jun 23, 2025

Uh oh!

JoelSpeed commented Jun 24, 2025

Uh oh!

isabella-janssen commented Jun 24, 2025

Uh oh!

isabella-janssen commented Jun 24, 2025

Uh oh!

isabella-janssen commented Jun 25, 2025

Uh oh!

isabella-janssen commented Jun 27, 2025

Uh oh!

isabella-janssen commented Jul 1, 2025

Uh oh!

isabella-janssen commented Jul 1, 2025

Uh oh!

isabella-janssen commented Jul 1, 2025

Uh oh!

isabella-janssen commented Jul 2, 2025

Uh oh!

isabella-janssen commented Jul 2, 2025

Uh oh!

isabella-janssen commented Jul 2, 2025

Uh oh!

isabella-janssen commented Jul 3, 2025

Uh oh!

JoelSpeed commented Jul 3, 2025

Uh oh!

isabella-janssen commented Jul 3, 2025

Uh oh!

isabella-janssen commented Jul 3, 2025

Uh oh!

JoelSpeed commented Jul 7, 2025

Uh oh!

isabella-janssen commented Jul 7, 2025

Uh oh!

isabella-janssen commented Jul 8, 2025

Uh oh!

isabella-janssen commented Jul 9, 2025

Uh oh!

isabella-janssen commented Jul 9, 2025

Uh oh!

JoelSpeed commented Jul 9, 2025

Uh oh!

isabella-janssen commented Jul 9, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuqi-zhang commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

isabella-janssen commented Jul 14, 2025

Uh oh!

JoelSpeed commented Jul 14, 2025

Uh oh!

openshift-ci bot commented Jul 14, 2025

Uh oh!

JoelSpeed commented Jul 14, 2025

Uh oh!

Uh oh!

openshift-ci-robot commented Jun 20, 2025 •

edited by openshift-ci bot

Loading

isabella-janssen commented Jul 9, 2025 •

edited by openshift-ci bot

Loading

yuqi-zhang commented Jul 9, 2025 •

edited

Loading