-
Notifications
You must be signed in to change notification settings - Fork 2.2k
runc exec: use manager.AddPid #4822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
@kolyshkin I think so, let me review this tomorrow (I mean to review the cgroups PR but didn't have time, sorry about that). |
This fixes the following warning (seen on Fedora 42 and Ubuntu 24.04): + sudo chown -R rootless.rootless /home/rootless chown: warning: '.' should be ':': ‘rootless.rootless’ Signed-off-by: Kir Kolyshkin <[email protected]>
Signed-off-by: Kir Kolyshkin <[email protected]>
The main idea is to maintain the code separately (and eventually kill V1 implementation). Signed-off-by: Kir Kolyshkin <[email protected]>
Remove cgroupPaths field from struct setnsProcess, because: - we can get base cgroup paths from p.manager.GetPaths(); - we can get sub-cgroup paths from p.process.SubCgroupPaths. But mostly because we are going to need separate cgroup paths when adopting cgroups.AddPid. Signed-off-by: Kir Kolyshkin <[email protected]>
The main benefit here is when we are using a systemd cgroup driver, we actually ask systemd to add a PID, rather than doing it ourselves. This way, we can add rootless exec PID to a cgroup. This requires newer opencontainers/cgroups and coreos/go-systemd. Signed-off-by: Kir Kolyshkin <[email protected]>
@cyphar please |
// On cgroup v2 + nesting + domain controllers, adding to initial cgroup may fail with EBUSY. | ||
// https://github.com/opencontainers/runc/issues/2356#issuecomment-621277643 | ||
// Try to join the cgroup of InitProcessPid, unless sub-cgroup is explicitly set. | ||
if p.initProcessPid != 0 && sub == "" { | ||
initProcCgroupFile := fmt.Sprintf("/proc/%d/cgroup", p.initProcessPid) | ||
initCg, initCgErr := cgroups.ParseCgroupFile(initProcCgroupFile) | ||
if initCgErr == nil { | ||
if initCgPath, ok := initCg[""]; ok { | ||
initCgDirpath := filepath.Join(fs2.UnifiedMountpoint, initCgPath) | ||
logrus.Debugf("adding pid %d to cgroup failed (%v), attempting to join %s", | ||
p.pid(), err, initCgDirpath) | ||
// NOTE: initCgDirPath is not guaranteed to exist because we didn't pause the container. | ||
err = cgroups.WriteCgroupProc(initCgDirpath, p.pid()) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am guessing you don't want to remove this despite your comment in #2416 (comment) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, I wish this to be removed, but not in this PR, as it will break the test case added in PR #2416 (and may also break some funny users' workloads). I would like to hear from @AkihiroSuda first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep it compatible with the existing releases of runc.
I wish we could simplify the implementation though.
I had one question, but feel free to merge anyway. |
It makes sense to make runc exec benefit from clone2(CLONE_INTO_CGROUP), if it is available. Since it requires a recent kernel and might not work, implement a fallback to older way of joining the cgroup. Based on work done in - https://go-review.googlesource.com/c/go/+/417695 - coreos/go-systemd#458 - opencontainers/cgroups#26 - opencontainers#4822 Signed-off-by: Kir Kolyshkin <[email protected]>
It makes sense to make runc exec benefit from clone2(CLONE_INTO_CGROUP), if it is available. Since it requires a recent kernel and might not work, implement a fallback to older way of joining the cgroup. Based on work done in - https://go-review.googlesource.com/c/go/+/417695 - coreos/go-systemd#458 - opencontainers/cgroups#26 - opencontainers#4822 Signed-off-by: Kir Kolyshkin <[email protected]>
It makes sense to make runc exec benefit from clone2(CLONE_INTO_CGROUP), if it is available. Since it requires a recent kernel and might not work, implement a fallback to older way of joining the cgroup. Based on work done in - https://go-review.googlesource.com/c/go/+/417695 - coreos/go-systemd#458 - opencontainers/cgroups#26 - opencontainers#4822 Signed-off-by: Kir Kolyshkin <[email protected]>
It makes sense to make runc exec benefit from clone2(CLONE_INTO_CGROUP), if it is available. Since it requires a recent kernel and might not work, implement a fallback to older way of joining the cgroup. Based on work done in - https://go-review.googlesource.com/c/go/+/417695 - coreos/go-systemd#458 - opencontainers/cgroups#26 - opencontainers#4822 Regarding E2BIG check in shouldRetryWithoutCgroupFD. The clone3 syscall first appeared in kernel v5.3 via commit [1], which added a check that if the size of clone_args structure passed from the userspace is larger than known to kernel, and the "unknown" part contains non-zero values, E2BIG is returned. A similar check was already used in other similar scenarios at the time, and later in kernel v5.4, this was generalized by patch series [2]. [1]: torvalds/linux@7f192e3 [2]: https://lore.kernel.org/all/[email protected]/#r Signed-off-by: Kir Kolyshkin <[email protected]>
It makes sense to make runc exec benefit from clone2(CLONE_INTO_CGROUP), if it is available. Since it requires a recent kernel and might not work, implement a fallback to older way of joining the cgroup. Based on work done in - https://go-review.googlesource.com/c/go/+/417695 - coreos/go-systemd#458 - opencontainers/cgroups#26 - opencontainers#4822 Regarding E2BIG check in shouldRetryWithoutCgroupFD. The clone3 syscall first appeared in kernel v5.3 via commit [1], which added a check that if the size of clone_args structure passed from the userspace is larger than known to kernel, and the "unknown" part contains non-zero values, E2BIG is returned. A similar check was already used in other similar scenarios at the time, and later in kernel v5.4, this was generalized by patch series [2]. [1]: torvalds/linux@7f192e3 [2]: https://lore.kernel.org/all/[email protected]/#r Signed-off-by: Kir Kolyshkin <[email protected]>
It makes sense to make runc exec benefit from clone2(CLONE_INTO_CGROUP), if it is available. Since it requires a recent kernel and might not work, implement a fallback to older way of joining the cgroup. Based on: - https://go-review.googlesource.com/c/go/+/417695 - coreos/go-systemd#458 - opencontainers/cgroups#26 - opencontainers#4822 Signed-off-by: Kir Kolyshkin <[email protected]>
The main benefit here is when we are using a systemd cgroup driver,
we actually ask systemd to add a PID, rather than doing it ourselves.
This way, we can add exec PID to a cgroup even when cgroup itself is
is not writable to us (rootless).
The implementation requires opencontainers/cgroups#26 (in oc/[email protected])
(which requires coreos/go-systemd#458 (in coreos/[email protected])).
This PR is a prerequisite for #4812 ("runc exec: use CLONE_INTO_CGROUP")