Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

on-cluster builds enhancement #1515

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

cheesesashimi
Copy link
Member

No description provided.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 6, 2023
Copy link
Contributor

openshift-ci bot commented Nov 6, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@cheesesashimi cheesesashimi force-pushed the zzlotnik/on-cluster-builds-enhancement branch 3 times, most recently from 56bfc1d to 01cb9d4 Compare November 7, 2023 19:13

#### Device Drivers For Specialty Hardware

As an OpenShift cluster administrator, I want to install a device driver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could mention day 0 possibity here

content to the OS on each of their cluster nodes.

2. The cluster administrator edits the MachineConfigPool, adding a
Dockerfile to the appropriate field, image registry details such as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarify the purpose of the image registry and where it could be.

3. The BuildController detects that these details are provided and
begins a build.

4. The built image is pushed to the cluster administrators’ image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it covered here.

registry of choice.

5. NodeController updates the desiredImage annotation on each node
within its pool according to its own rules.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to clarify what we mean by rules here.

or annotation to signal to the MCO that it should consider these
ConfigMaps as it prepares the rendered MachineConfig.

- What annotations will bootc support for ConfigMaps?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to try to track this stuff is best tracked in upstream discussions in bootc. It's really up to us! IOW we need to drive the design there so that it meets these use cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me! I added some strawmans to this enhancement to get us started but I'm happy to add them someplace else.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put a slightly reduced set of my ConfigMap strawmans from this enhancement in containers/bootc#22.

As an OpenShift cluster administrator, I want to prepare my customized
OS content and configuration changes ahead of time so that I can reduce
or eliminate issues with deploying them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### Managed OpenShift Service Provider
As a Managed OpenShift SRE, I want to install cloud specific tooling to ensure SRE can triage issues with all nodes involved in the lifecycle of a Managed OpenShift cluster, including bootstrap, without requiring any reboot/pivot.
#### Managed OpenShift Service Consumer
As a Managed OpenShift service consumer, I want to know any changes to node configuration can be tested and validated against a known set of criteria so I know my nodes will not break support agreements with Managed OpenShift Service Providers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these additional stories! I'll get them added.

upgraded to a new OpenShift release, any custom OS content will be
upgraded as well (subject to update availability).

- A cluster administrator will have the option to build their custom OS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- A cluster administrator will have the option to build their custom OS
- A cluster administrator will have the option to build and test their custom OS

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that!

orchestrate more than one build for each target architecture as well as
manifestlisting the final image. This consideration also extends to the
image registry the final image will be pushed to. To my understanding,
the default on-cluster image registry does not support manifestlisted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the internal image registry does support pushing and pulling manifestlists now . Imagestream imports was not working with manifestlists and that has been introduced since 4.13

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great to know! I'll adjust the enhancement accordingly. I'll also do some separate experimentation around this once we begin looking at multiarch support.


BuildController should be as modular as possible with how it performs
the build to allow it to efficiently operate in as many different
environments as possible. For example, if the OpenShift Image Builds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately the builds v1 api does not support building manifestlisted images so this will have to be homegrown as is suggested.

Copy link
Member Author

@cheesesashimi cheesesashimi Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured that would be the case. I've done some experimentation around doing something like that in the past, so it was definitely a useful exercise. I'll add some context around that being a requirement.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 6, 2024
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 13, 2024
@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Jan 21, 2024
Copy link
Contributor

openshift-ci bot commented Jan 21, 2024

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dhellmann
Copy link
Contributor

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, MCO-834, has status "Code Review". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

@dhellmann dhellmann removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 26, 2024
@cheesesashimi
Copy link
Member Author

/reopen
/remove-lifecycle rotten

@openshift-ci openshift-ci bot reopened this Jan 31, 2024
Copy link
Contributor

openshift-ci bot commented Jan 31, 2024

@cheesesashimi: Reopened this PR.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cheesesashimi cheesesashimi marked this pull request as ready for review January 31, 2024 18:13
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 31, 2024
@openshift-ci openshift-ci bot requested a review from ashcrow January 31, 2024 18:27
Copy link
Contributor

openshift-ci bot commented Dec 13, 2024

@jlebon: Reopened this PR.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Dec 13, 2024
@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 11, 2025
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 18, 2025
@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Jan 26, 2025
Copy link
Contributor

openshift-ci bot commented Jan 26, 2025

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@JoelSpeed
Copy link
Contributor

/reopen

@openshift-ci openshift-ci bot reopened this Jan 27, 2025
Copy link
Contributor

openshift-ci bot commented Jan 27, 2025

@JoelSpeed: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

openshift-ci bot commented Jan 27, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign miciah for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yuqi-zhang
Copy link
Contributor

/remove-lifecycle rotten

Will look to update this this week

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 28, 2025
@yuqi-zhang yuqi-zhang force-pushed the zzlotnik/on-cluster-builds-enhancement branch from b9cce87 to ebb5968 Compare February 4, 2025 00:00
@yuqi-zhang
Copy link
Contributor

Enhancement should now be updated for the latest state of APIs

Copy link
Contributor

@djoshy djoshy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few questions/updates. I'm guessing we will want to update this again based on how the removal of injecting MachineConfig contents into the image changes things.

ReleaseOSImageChange--> |OpenShift upgrade changed \n the OS base image| IncomingConfigChange
MachineConfigChange--> |How MachineConfigs are updated now| IncomingConfigChange
CustomOSContentChange--> |The Containerfile has changed| IncomingConfigChange
CustomBaseOSImageChange--> |The cluster admin changed the base image pullspec| IncomingConfigChange
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think v1 MachineOSConfig allows a change to the base OS image, so this branch can probably go away? Or are we saying a user could update OSImageURL via MC, hence overriding the base image for the build?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the flowchart this no longer makes sense, removed!

@@ -274,7 +268,6 @@ MachineOSConfigs can also be augmented with the following annotations:
| `machineconfiguration.openshift.io/rebuildImage` | When present, this will clear any failed image builds (and any supporting transitory objects created) and retry the build. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this annotation has changed into machineconfiguration.openshift.io/rebuild

@@ -274,7 +268,6 @@ MachineOSConfigs can also be augmented with the following annotations:
| `machineconfiguration.openshift.io/rebuildImage` | When present, this will clear any failed image builds (and any supporting transitory objects created) and retry the build. |
| `machineconfiguration.openshift.io/noRollout` | When present, this will perform the image build, but will not roll out the built image to the nodes within the targeted MachineConfigPool. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this annotation is used anymore, probably because it mirrored pool pausing behavior

@@ -476,23 +446,23 @@ flowchart TB
BootcConfigChange("bootc ConfigMap Change")
MachineConfigChange("MachineConfig Change")
CustomOSContentChange("Custom OS Content Change")
CustomBaseOSImageChange("Admin-Defined Base OS \n Image Change")
ReleaseOSImageChange("OpenShift Release \n Base OS Image Change")
CustomBaseOSImageChange("Admin-Defined Base OS Image Change")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question as previous flowchart

Updated content based on latest state:
 - Updated MachineOSConfig and MachineOSBuild references and charts
 - Removed reference to MachineOSImage
 - Removed reference to MCP extensions
 - Updated pod reference to Jobs
 - Added some snippets to graduation plans
@yuqi-zhang yuqi-zhang force-pushed the zzlotnik/on-cluster-builds-enhancement branch from ebb5968 to 0ba7877 Compare February 14, 2025 22:40
Copy link
Contributor

openshift-ci bot commented Feb 14, 2025

@cheesesashimi: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.