Skip to content

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Jul 28, 2025

The main benefit here is when we are using a systemd cgroup driver,
we actually ask systemd to add a PID, rather than doing it ourselves.
This way, we can add exec PID to a cgroup even when cgroup itself is
is not writable to us (rootless).

The implementation requires opencontainers/cgroups#26 (in oc/[email protected])
(which requires coreos/go-systemd#458 (in coreos/[email protected])).

This PR is a prerequisite for #4812 ("runc exec: use CLONE_INTO_CGROUP")

@kolyshkin kolyshkin changed the title runc exec: use manager.AddPid when possible runc exec: use manager.AddPid Jul 29, 2025
@kolyshkin kolyshkin requested a review from rata September 8, 2025 19:08
@kolyshkin kolyshkin requested review from lifubang and removed request for rata September 8, 2025 19:08
@kolyshkin kolyshkin marked this pull request as ready for review September 8, 2025 19:08
@kolyshkin
Copy link
Contributor Author

@cyphar do you think we can have this (and #4812) in 1.4?

Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@cyphar
Copy link
Member

cyphar commented Sep 10, 2025

@kolyshkin I think so, let me review this tomorrow (I mean to review the cgroups PR but didn't have time, sorry about that).

This fixes the following warning (seen on Fedora 42 and Ubuntu 24.04):

	+ sudo chown -R rootless.rootless /home/rootless
	chown: warning: '.' should be ':': ‘rootless.rootless’

Signed-off-by: Kir Kolyshkin <[email protected]>
The main idea is to maintain the code separately (and eventually kill V1
implementation).

Signed-off-by: Kir Kolyshkin <[email protected]>
Remove cgroupPaths field from struct setnsProcess, because:
 - we can get base cgroup paths from p.manager.GetPaths();
 - we can get sub-cgroup paths from p.process.SubCgroupPaths.

But mostly because we are going to need separate cgroup paths when
adopting cgroups.AddPid.

Signed-off-by: Kir Kolyshkin <[email protected]>
The main benefit here is when we are using a systemd cgroup driver,
we actually ask systemd to add a PID, rather than doing it ourselves.
This way, we can add rootless exec PID to a cgroup.

This requires newer opencontainers/cgroups and coreos/go-systemd.

Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin
Copy link
Contributor Author

@kolyshkin I think so, let me review this tomorrow (I mean to review the cgroups PR but didn't have time, sorry about that).

@cyphar please

Comment on lines +288 to +302
// On cgroup v2 + nesting + domain controllers, adding to initial cgroup may fail with EBUSY.
// https://github.com/opencontainers/runc/issues/2356#issuecomment-621277643
// Try to join the cgroup of InitProcessPid, unless sub-cgroup is explicitly set.
if p.initProcessPid != 0 && sub == "" {
initProcCgroupFile := fmt.Sprintf("/proc/%d/cgroup", p.initProcessPid)
initCg, initCgErr := cgroups.ParseCgroupFile(initProcCgroupFile)
if initCgErr == nil {
if initCgPath, ok := initCg[""]; ok {
initCgDirpath := filepath.Join(fs2.UnifiedMountpoint, initCgPath)
logrus.Debugf("adding pid %d to cgroup failed (%v), attempting to join %s",
p.pid(), err, initCgDirpath)
// NOTE: initCgDirPath is not guaranteed to exist because we didn't pause the container.
err = cgroups.WriteCgroupProc(initCgDirpath, p.pid())
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am guessing you don't want to remove this despite your comment in #2416 (comment) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, I wish this to be removed, but not in this PR, as it will break the test case added in PR #2416 (and may also break some funny users' workloads). I would like to hear from @AkihiroSuda first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it compatible with the existing releases of runc.
I wish we could simplify the implementation though.

@cyphar
Copy link
Member

cyphar commented Sep 17, 2025

I had one question, but feel free to merge anyway.

@kolyshkin kolyshkin merged commit 77ead42 into opencontainers:main Sep 18, 2025
36 checks passed
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Sep 24, 2025
It makes sense to make runc exec benefit from clone2(CLONE_INTO_CGROUP),
if it is available. Since it requires a recent kernel and might not work,
implement a fallback to older way of joining the cgroup.

Based on work done in
 - https://go-review.googlesource.com/c/go/+/417695
 - coreos/go-systemd#458
 - opencontainers/cgroups#26
 - opencontainers#4822

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Sep 24, 2025
It makes sense to make runc exec benefit from clone2(CLONE_INTO_CGROUP),
if it is available. Since it requires a recent kernel and might not work,
implement a fallback to older way of joining the cgroup.

Based on work done in
 - https://go-review.googlesource.com/c/go/+/417695
 - coreos/go-systemd#458
 - opencontainers/cgroups#26
 - opencontainers#4822

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Sep 24, 2025
It makes sense to make runc exec benefit from clone2(CLONE_INTO_CGROUP),
if it is available. Since it requires a recent kernel and might not work,
implement a fallback to older way of joining the cgroup.

Based on work done in
 - https://go-review.googlesource.com/c/go/+/417695
 - coreos/go-systemd#458
 - opencontainers/cgroups#26
 - opencontainers#4822

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Sep 26, 2025
It makes sense to make runc exec benefit from clone2(CLONE_INTO_CGROUP),
if it is available. Since it requires a recent kernel and might not work,
implement a fallback to older way of joining the cgroup.

Based on work done in
 - https://go-review.googlesource.com/c/go/+/417695
 - coreos/go-systemd#458
 - opencontainers/cgroups#26
 - opencontainers#4822

Regarding E2BIG check in shouldRetryWithoutCgroupFD. The clone3 syscall
first appeared in kernel v5.3 via commit [1], which added a check that
if the size of clone_args structure passed from the userspace is larger
than known to kernel, and the "unknown" part contains non-zero values,
E2BIG is returned. A similar check was already used in other similar
scenarios at the time, and later in kernel v5.4, this was generalized by
patch series [2].

[1]: torvalds/linux@7f192e3
[2]: https://lore.kernel.org/all/[email protected]/#r

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Sep 26, 2025
It makes sense to make runc exec benefit from clone2(CLONE_INTO_CGROUP),
if it is available. Since it requires a recent kernel and might not work,
implement a fallback to older way of joining the cgroup.

Based on work done in
 - https://go-review.googlesource.com/c/go/+/417695
 - coreos/go-systemd#458
 - opencontainers/cgroups#26
 - opencontainers#4822

Regarding E2BIG check in shouldRetryWithoutCgroupFD. The clone3 syscall
first appeared in kernel v5.3 via commit [1], which added a check that
if the size of clone_args structure passed from the userspace is larger
than known to kernel, and the "unknown" part contains non-zero values,
E2BIG is returned. A similar check was already used in other similar
scenarios at the time, and later in kernel v5.4, this was generalized by
patch series [2].

[1]: torvalds/linux@7f192e3
[2]: https://lore.kernel.org/all/[email protected]/#r

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Sep 26, 2025
It makes sense to make runc exec benefit from clone2(CLONE_INTO_CGROUP),
if it is available. Since it requires a recent kernel and might not work,
implement a fallback to older way of joining the cgroup.

Based on:
 - https://go-review.googlesource.com/c/go/+/417695
 - coreos/go-systemd#458
 - opencontainers/cgroups#26
 - opencontainers#4822

Signed-off-by: Kir Kolyshkin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants