use clone3 for exec process creation to reduce cgroup lock contention #4782

lujinda · 2025-06-13T02:00:56Z

Note: This PR is only for discussion, and the code is for demonstration. If it is confirmed that there is no problem, the code structure may need to be optimized

Currently, the runc exec process creates child processes by first cloning the child process and then writing its PID into cgroup.procs. This approach leads to high lock contention on the cgroup_threadgroup_rwsem read-write lock under conditions of high container density and numerous exec probes, potentially causing system hang.

This change introduces the usage of the clone3 system call within the setnsProcess.start function to merge the application of the cgroup into the clone operation (assuming cgroup v2 is in use). By doing so, it avoids the need to write PIDs to cgroup.procs directly, thereby bypassing the requirement for taking the write lock and reducing the risk of lock contention.

Currently, the runc exec process creates child processes by first cloning the child process and then writing its PID into cgroup.procs. This approach leads to high lock contention on the cgroup_threadgroup_rwsem read-write lock under conditions of high container density and numerous exec probes, potentially causing system hang. This change introduces the usage of the clone3 system call within the setnsProcess.start function to merge the application of the cgroup into the clone operation (assuming cgroup v2 is in use). By doing so, it avoids the need to write PIDs to cgroup.procs directly, thereby bypassing the requirement for taking the write lock and reducing the risk of lock contention. Signed-off-by: jinda.ljd <[email protected]>

rata

I think using clone3 might be a good idea (with a fallback, of course), thanks!

Do you have perf numbers to share? To better understand when this is problematic and how much help this provides on kernels that support it.

I wonder if @kolyshkin that, IIRC, wrote the patch for golang had ideas already.

rata · 2025-06-20T14:35:35Z

libcontainer/process_linux.go

@@ -203,6 +204,28 @@ func (p *setnsProcess) start() (retErr error) {

 	// Get the "before" value of oom kill count.
 	oom, _ := p.manager.OOMKillCount()
+	useClone3 := false
+	if cgroups.IsCgroup2UnifiedMode() && p.initProcessPid != 0 {


start() is already complex, let's move this to a function with a clear name, so it's more readable.

rata · 2025-06-20T14:36:25Z

libcontainer/process_linux.go

+		procPid := p.pid()
+		if useClone3 {
+			procPid = -1
+		}
+		if err := cgroups.WriteCgroupProc(path, procPid); err != nil && !p.rootlessCgroups {


I guess if it's -1 it's not written? Let's just useClone3 for the condition, explaining that if we are using clone3, then it's already set in the cgroup.

rata · 2025-06-20T14:39:08Z

libcontainer/process_linux.go

+				p.cmd.SysProcAttr.UseCgroupFD = true
+				p.cmd.SysProcAttr.CgroupFD = int(fd.Fd())


man clone3 says this is available since linux 5.7. You are setting useClone3 to true, but I don't think that is detecting CLONE_INTO_CGROUP is supported in this kernel, right? We will need to improve the detection. Not sure about the golang wrapper, but IIRC @kolyshkin wrote that for Go. He might have tips :)

lujinda force-pushed the clone3_exec branch from 342ed8e to 0298a45 Compare June 13, 2025 02:01

rata reviewed Jun 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

use clone3 for exec process creation to reduce cgroup lock contention #4782

use clone3 for exec process creation to reduce cgroup lock contention #4782

Uh oh!

lujinda commented Jun 13, 2025

Uh oh!

rata left a comment •

edited

Loading

Uh oh!

rata Jun 20, 2025

Uh oh!

rata Jun 20, 2025

Uh oh!

rata Jun 20, 2025

Uh oh!

Uh oh!

		p.cmd.SysProcAttr.UseCgroupFD = true
		p.cmd.SysProcAttr.CgroupFD = int(fd.Fd())

use clone3 for exec process creation to reduce cgroup lock contention #4782

Are you sure you want to change the base?

use clone3 for exec process creation to reduce cgroup lock contention #4782

Uh oh!

Conversation

lujinda commented Jun 13, 2025

Uh oh!

rata left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rata Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

rata Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

rata Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rata left a comment •

edited

Loading