runc exec: use manager.AddPid #4822

kolyshkin · 2025-07-28T22:55:47Z

The main benefit here is when we are using a systemd cgroup driver,
we actually ask systemd to add a PID, rather than doing it ourselves.
This way, we can add exec PID to a cgroup even when cgroup itself is
is not writable to us (rootless).

The implementation requires opencontainers/cgroups#26 (in oc/[email protected])
(which requires coreos/go-systemd#458 (in coreos/[email protected])).

This PR is a prerequisite for #4812 ("runc exec: use CLONE_INTO_CGROUP")

kolyshkin · 2025-09-08T19:11:31Z

@cyphar do you think we can have this (and #4812) in 1.4?

rata

LGTM, thanks!

tests/integration/exec.bats

cyphar · 2025-09-10T18:15:04Z

@kolyshkin I think so, let me review this tomorrow (I mean to review the cgroups PR but didn't have time, sorry about that).

This fixes the following warning (seen on Fedora 42 and Ubuntu 24.04): + sudo chown -R rootless.rootless /home/rootless chown: warning: '.' should be ':': ‘rootless.rootless’ Signed-off-by: Kir Kolyshkin <[email protected]>

Signed-off-by: Kir Kolyshkin <[email protected]>

The main idea is to maintain the code separately (and eventually kill V1 implementation). Signed-off-by: Kir Kolyshkin <[email protected]>

Remove cgroupPaths field from struct setnsProcess, because: - we can get base cgroup paths from p.manager.GetPaths(); - we can get sub-cgroup paths from p.process.SubCgroupPaths. But mostly because we are going to need separate cgroup paths when adopting cgroups.AddPid. Signed-off-by: Kir Kolyshkin <[email protected]>

The main benefit here is when we are using a systemd cgroup driver, we actually ask systemd to add a PID, rather than doing it ourselves. This way, we can add rootless exec PID to a cgroup. This requires newer opencontainers/cgroups and coreos/go-systemd. Signed-off-by: Kir Kolyshkin <[email protected]>

kolyshkin · 2025-09-17T02:10:54Z

@kolyshkin I think so, let me review this tomorrow (I mean to review the cgroups PR but didn't have time, sorry about that).

@cyphar please

cyphar · 2025-09-17T13:46:44Z

libcontainer/process_linux.go

+		// On cgroup v2 + nesting + domain controllers, adding to initial cgroup may fail with EBUSY.
+		// https://github.com/opencontainers/runc/issues/2356#issuecomment-621277643
+		// Try to join the cgroup of InitProcessPid, unless sub-cgroup is explicitly set.
+		if p.initProcessPid != 0 && sub == "" {
+			initProcCgroupFile := fmt.Sprintf("/proc/%d/cgroup", p.initProcessPid)
+			initCg, initCgErr := cgroups.ParseCgroupFile(initProcCgroupFile)
+			if initCgErr == nil {
+				if initCgPath, ok := initCg[""]; ok {
+					initCgDirpath := filepath.Join(fs2.UnifiedMountpoint, initCgPath)
+					logrus.Debugf("adding pid %d to cgroup failed (%v), attempting to join %s",
+						p.pid(), err, initCgDirpath)
+					// NOTE: initCgDirPath is not guaranteed to exist because we didn't pause the container.
+					err = cgroups.WriteCgroupProc(initCgDirpath, p.pid())
+				}
+			}


I am guessing you don't want to remove this despite your comment in #2416 (comment) ?

Ideally, I wish this to be removed, but not in this PR, as it will break the test case added in PR #2416 (and may also break some funny users' workloads). I would like to hear from @AkihiroSuda first.

Let's keep it compatible with the existing releases of runc.
I wish we could simplify the implementation though.

cyphar · 2025-09-17T13:48:03Z

I had one question, but feel free to merge anyway.

It makes sense to make runc exec benefit from clone2(CLONE_INTO_CGROUP), if it is available. Since it requires a recent kernel and might not work, implement a fallback to older way of joining the cgroup. Based on work done in - https://go-review.googlesource.com/c/go/+/417695 - coreos/go-systemd#458 - opencontainers/cgroups#26 - opencontainers#4822 Signed-off-by: Kir Kolyshkin <[email protected]>

It makes sense to make runc exec benefit from clone2(CLONE_INTO_CGROUP), if it is available. Since it requires a recent kernel and might not work, implement a fallback to older way of joining the cgroup. Based on work done in - https://go-review.googlesource.com/c/go/+/417695 - coreos/go-systemd#458 - opencontainers/cgroups#26 - opencontainers#4822 Regarding E2BIG check in shouldRetryWithoutCgroupFD. The clone3 syscall first appeared in kernel v5.3 via commit [1], which added a check that if the size of clone_args structure passed from the userspace is larger than known to kernel, and the "unknown" part contains non-zero values, E2BIG is returned. A similar check was already used in other similar scenarios at the time, and later in kernel v5.4, this was generalized by patch series [2]. [1]: torvalds/linux@7f192e3 [2]: https://lore.kernel.org/all/[email protected]/#r Signed-off-by: Kir Kolyshkin <[email protected]>

It makes sense to make runc exec benefit from clone2(CLONE_INTO_CGROUP), if it is available. Since it requires a recent kernel and might not work, implement a fallback to older way of joining the cgroup. Based on: - https://go-review.googlesource.com/c/go/+/417695 - coreos/go-systemd#458 - opencontainers/cgroups#26 - opencontainers#4822 Signed-off-by: Kir Kolyshkin <[email protected]>

kolyshkin added area/cgroupv2 area/cgroupv1 labels Jul 28, 2025

This was referenced Jul 28, 2025

Implement AddPid for cgroup managers opencontainers/cgroups#26

Merged

dbus: add AttachProcessesToUnit coreos/go-systemd#458

Merged

kolyshkin force-pushed the add-pid branch from a82146f to fec5ff1 Compare July 28, 2025 23:24

kolyshkin changed the title ~~runc exec: use manager.AddPid when possible~~ runc exec: use manager.AddPid Jul 29, 2025

kolyshkin mentioned this pull request Jul 29, 2025

runc exec: use CLONE_INTO_CGROUP #4812

Merged

kolyshkin mentioned this pull request Sep 8, 2025

build(deps): bump github.com/opencontainers/cgroups from 0.0.4 to 0.0.5 #4884

Closed

kolyshkin requested a review from rata September 8, 2025 19:08

kolyshkin force-pushed the add-pid branch from fec5ff1 to a742237 Compare September 8, 2025 19:08

kolyshkin requested review from lifubang and removed request for rata September 8, 2025 19:08

kolyshkin marked this pull request as ready for review September 8, 2025 19:08

kolyshkin requested review from rata, cyphar and AkihiroSuda September 8, 2025 19:09

rata approved these changes Sep 9, 2025

View reviewed changes

tests/integration/exec.bats Show resolved Hide resolved

kolyshkin force-pushed the add-pid branch from a742237 to 01c93bc Compare September 10, 2025 17:08

kolyshkin added 5 commits September 16, 2025 13:27

script/setup_rootless.sh: chown nit

7d6848f

This fixes the following warning (seen on Fedora 42 and Ubuntu 24.04): + sudo chown -R rootless.rootless /home/rootless chown: warning: '.' should be ':': ‘rootless.rootless’ Signed-off-by: Kir Kolyshkin <[email protected]>

libct: factor out addIntoCgroup from setnsProcess.start

b39e0d6

Signed-off-by: Kir Kolyshkin <[email protected]>

libct: split addIntoCgroup into V1 and V2

5560020

The main idea is to maintain the code separately (and eventually kill V1 implementation). Signed-off-by: Kir Kolyshkin <[email protected]>

kolyshkin force-pushed the add-pid branch from 01c93bc to 37b5acc Compare September 16, 2025 20:32

kolyshkin added this to the 1.4.0-rc.2 milestone Sep 16, 2025

cyphar approved these changes Sep 17, 2025

View reviewed changes

kolyshkin merged commit 77ead42 into opencontainers:main Sep 18, 2025
36 checks passed

kolyshkin added the impact/changelog label Sep 18, 2025

kolyshkin mentioned this pull request Sep 24, 2025

Usage of systemd-cgroup flag produces substantial load on DBus #4853

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

runc exec: use manager.AddPid #4822

runc exec: use manager.AddPid #4822

Uh oh!

kolyshkin commented Jul 28, 2025 •

edited

Loading

Uh oh!

kolyshkin commented Sep 8, 2025

Uh oh!

rata left a comment

Uh oh!

Uh oh!

cyphar commented Sep 10, 2025

Uh oh!

kolyshkin commented Sep 17, 2025

Uh oh!

cyphar Sep 17, 2025

Uh oh!

kolyshkin Sep 18, 2025

Uh oh!

AkihiroSuda Sep 18, 2025

Uh oh!

cyphar commented Sep 17, 2025

Uh oh!

Uh oh!

Uh oh!

runc exec: use manager.AddPid #4822

runc exec: use manager.AddPid #4822

Uh oh!

Conversation

kolyshkin commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kolyshkin commented Sep 8, 2025

Uh oh!

rata left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cyphar commented Sep 10, 2025

Uh oh!

kolyshkin commented Sep 17, 2025

Uh oh!

cyphar Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

kolyshkin Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

AkihiroSuda Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

cyphar commented Sep 17, 2025

Uh oh!

Uh oh!

Uh oh!

kolyshkin commented Jul 28, 2025 •

edited

Loading