OCPBUGS-62790: Use separate tmpfs for ostree checkout on live ISO #10133

zaneb · 2025-11-28T01:36:49Z

Installations using ABI/assisted with 16GiB of RAM on the bootstrap node
were failing with "no space left on device" during bootstrapping. The
live ISO environment uses a tmpfs mounted at /var that is sized at 50%
of available RAM. On systems with 16GiB of RAM, this provides only 8GiB
of tmpfs space.

At the beginning of the bootstrap process, node-image-pull.sh creates an
ostree checkout underneath /var/ostree-container. When this is added to
the regular disk space usage of the later parts of the bootstrap, the
peak tmpfs usage hits around 9.4GiB.

This fix creates a separate 5GiB tmpfs to house /var/ostree-container,
so that it is not subject to the limits on the size of /var.

openshift-ci-robot · 2025-11-28T01:36:56Z

@zaneb: This pull request references Jira Issue OCPBUGS-62790, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.21.0) matches configured target version for branch (4.21.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @mhanss

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Installations using ABI/assisted with 16GiB of RAM on the bootstrap node
were failing with "no space left on device" during bootstrapping. The
live ISO environment uses a tmpfs mounted at /var that is sized at 50%
of available RAM. On systems with 16GiB of RAM, this provides only 8GiB
of tmpfs space.

At the beginning of the bootstrap process, node-image-pull.sh creates an
ostree checkout underneath /var/ostree-container. When this is added to
the regular disk space usage of the later parts of the bootstrap, the
peak tmpfs usage hits around 9.4GiB.

This fix creates a separate 5GiB tmpfs to house /var/ostree-container,
so that it is not subject to the limits on the size of /var.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

zaneb · 2025-11-28T10:51:39Z

This works and is independent of coreos internals, unlike #10055. Plus no resizing of a mounted filesystem. I think this is the way to go.

/verified by @zaneb
/cc @jlebon @andfasano

openshift-ci-robot · 2025-11-28T10:51:51Z

@zaneb: This PR has been marked as verified by @zaneb.

In response to this:

This works and is independent of coreos internals, unlike #10055. Plus no resizing of a mounted filesystem. I think this is the way to go.

/verified by @zaneb
/cc @jlebon @andfasano

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

zaneb · 2025-11-28T11:03:42Z

/cherry-pick release-4.20

openshift-cherrypick-robot · 2025-11-28T11:03:45Z

@zaneb: once the present PR merges, I will cherry-pick it on top of release-4.20 in a new PR and assign it to you.

In response to this:

/cherry-pick release-4.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Installations using ABI/assisted with 16GiB of RAM on the bootstrap node were failing with "no space left on device" during bootstrapping. The live ISO environment uses a tmpfs mounted at /var that is sized at 50% of available RAM. On systems with 16GiB of RAM, this provides only 8GiB of tmpfs space. At the beginning of the bootstrap process, node-image-pull.sh creates an ostree checkout underneath /var/ostree-container. When this is added to the regular disk space usage of the later parts of the bootstrap, the peak tmpfs usage hits around 9.4GiB. This fix creates a separate 4GiB tmpfs for /var/ostree-container, so that it is not subject to the limits on the size of /var.

openshift-ci · 2025-12-01T03:25:25Z

@zaneb: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-agent-two-node-fencing-ipv4	`027899c`	link	false	`/test e2e-agent-two-node-fencing-ipv4`
ci/prow/e2e-agent-compact-ipv4-iso-no-registry	`027899c`	link	false	`/test e2e-agent-compact-ipv4-iso-no-registry`
ci/prow/okd-scos-e2e-vsphere-ovn	`027899c`	link	false	`/test okd-scos-e2e-vsphere-ovn`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

zaneb · 2025-12-01T03:36:03Z

coreos/fedora-coreos-config#499 describes the original reason for using an xfs filesystem on top of tmpfs, which was related to an SELinux issue at startup. Since that isn't a concern here, we can greatly simplify by dropping the xfs part.

/verified by @zaneb

openshift-ci-robot · 2025-12-01T03:36:16Z

@zaneb: This PR has been marked as verified by @zaneb.

In response to this:

coreos/fedora-coreos-config#499 describes the original reason for using an xfs filesystem on top of tmpfs, which was related to an SELinux issue at startup. Since that isn't a concern here, we can greatly simplify by dropping the xfs part.

/verified by @zaneb

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

mhanss · 2025-12-01T08:31:40Z

/verified by @mhanss
https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/test-platform-results/pr-logs/pull/openshift_release/68119/rehearse-68119-periodic-ci-openshift-eng-agent-qe-infra-release-4.21-amd64-nightly-vsphere-agent-compact-fips-f7/1994493157932273664

openshift-ci-robot · 2025-12-01T08:31:52Z

@mhanss: This PR has been marked as verified by @mhanss.

In response to this:

/verified by @mhanss
https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/test-platform-results/pr-logs/pull/openshift_release/68119/rehearse-68119-periodic-ci-openshift-eng-agent-qe-infra-release-4.21-amd64-nightly-vsphere-agent-compact-fips-f7/1994493157932273664

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

pawanpinjarkar · 2025-12-01T21:09:55Z

/approve

openshift-ci · 2025-12-01T21:11:29Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pawanpinjarkar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [pawanpinjarkar]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

bfournie · 2025-12-01T21:23:54Z

/lgtm

openshift-ci-robot · 2025-12-02T01:53:03Z

@zaneb: Jira Issue Verification Checks: Jira Issue OCPBUGS-62790
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-62790 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

In response to this:

Installations using ABI/assisted with 16GiB of RAM on the bootstrap node
were failing with "no space left on device" during bootstrapping. The
live ISO environment uses a tmpfs mounted at /var that is sized at 50%
of available RAM. On systems with 16GiB of RAM, this provides only 8GiB
of tmpfs space.

At the beginning of the bootstrap process, node-image-pull.sh creates an
ostree checkout underneath /var/ostree-container. When this is added to
the regular disk space usage of the later parts of the bootstrap, the
peak tmpfs usage hits around 9.4GiB.

This fix creates a separate 5GiB tmpfs to house /var/ostree-container,
so that it is not subject to the limits on the size of /var.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-cherrypick-robot · 2025-12-02T01:54:07Z

@zaneb: new pull request created: #10140

In response to this:

/cherry-pick release-4.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Report file system space usage in agent-gather

131985a

openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Nov 28, 2025

openshift-ci bot requested review from andfasano, barbacbd and mhanss November 28, 2025 01:37

zaneb force-pushed the ostree-tmpfs branch 3 times, most recently from a658402 to 92c5f9b Compare November 28, 2025 08:34

zaneb mentioned this pull request Nov 28, 2025

OCPBUGS-62790: Resize /var fs to 10GiB for ABI installations #10055

Closed

openshift-ci bot requested a review from jlebon November 28, 2025 10:51

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Nov 28, 2025

zaneb added 2 commits December 1, 2025 13:10

Log peak ramdisk usage of node-image-pull

027899c

zaneb force-pushed the ostree-tmpfs branch from 92c5f9b to 027899c Compare December 1, 2025 00:12

openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label Dec 1, 2025

zaneb mentioned this pull request Dec 1, 2025

OCPBUGS-62790: Restore agent MASTER_MEMORY default to 16GiB openshift-metal3/dev-scripts#1819

Draft

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Dec 1, 2025

zaneb changed the title ~~OCPBUGS-62790: Use separate fs for ostree checkout on live ISO~~ OCPBUGS-62790: Use separate tmpfs for ostree checkout on live ISO Dec 1, 2025

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 1, 2025

openshift-ci bot assigned bfournie Dec 1, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 1, 2025

openshift-merge-bot bot merged commit 00584fe into openshift:main Dec 2, 2025
25 of 28 checks passed

openshift-cherrypick-robot mentioned this pull request Dec 2, 2025

[release-4.20] OCPBUGS-66231: Use separate tmpfs for ostree checkout on live ISO #10140

Open

OCPBUGS-62790: Use separate tmpfs for ostree checkout on live ISO #10133

OCPBUGS-62790: Use separate tmpfs for ostree checkout on live ISO #10133

Conversation

zaneb commented Nov 28, 2025

Uh oh!

openshift-ci-robot commented Nov 28, 2025

Uh oh!

zaneb commented Nov 28, 2025

Uh oh!

openshift-ci-robot commented Nov 28, 2025

Uh oh!

zaneb commented Nov 28, 2025

Uh oh!

openshift-cherrypick-robot commented Nov 28, 2025

Uh oh!

openshift-ci bot commented Dec 1, 2025

Uh oh!

zaneb commented Dec 1, 2025

Uh oh!

openshift-ci-robot commented Dec 1, 2025

Uh oh!

mhanss commented Dec 1, 2025

Uh oh!

openshift-ci-robot commented Dec 1, 2025

Uh oh!

pawanpinjarkar commented Dec 1, 2025

Uh oh!

openshift-ci bot commented Dec 1, 2025

Uh oh!

bfournie commented Dec 1, 2025

Uh oh!

Uh oh!

openshift-ci-robot commented Dec 2, 2025

Uh oh!

openshift-cherrypick-robot commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants