Skip to content

Usage of systemd-cgroup flag produces substantial load on DBus #4853

@mattnappo

Description

@mattnappo

Description

At modal.com we run a custom multi-tenant container runtime which can use runc or runsc (gVisor). We use the systemd-cgroup flag on all of our containers. Recently, we noticed that DBus on the host machine becomes unresponsive when many containers spin up/down rapidly. For example, when a host spins up sufficiently many containers, the following command will take 10+ seconds, often timing out entirely after 25s:

busctl get-property org.freedesktop.systemd1 /org/freedesktop/systemd1 \
           org.freedesktop.systemd1.Manager Version

However, when we don't use systemd-cgroup, DBus is much more responsive, notably with much shorter tail latencies.

We ran an experiment which spins up n containers while timing the above command in a loop every second. We repeated this experiment on increasing values of n with and without the systemd-cgroup flag enabled. Each container just does sleep inf. This data covers a few seconds of host idling, container startup, 30 seconds of container runtime, and container shutdown. Ran on a AWS c6i.8xlarge.

Image

My understanding is that when systemd-cgroup is not used, runc configures cgroups directly via cgroupfs which places zero additional load on DBus. How does runc configure cgroups when the flag is used? Does it write a systemd DBus message to the DBus socket, or does it write to the systemd private socket?

It seems like a performance concern that the difference in D-Bus lag between using and not using the systemd-cgroup flag is so large.

Steps to reproduce the issue

  1. In a loop, measure how long a simple DBus method_call takes, such as org.freedesktop.systemd1.Manager Version on the host machine.
  2. Spin up n containers that sleep inf, where n is large enough to create DBus load as measured by (1)
  3. Let the containers run for a few seconds, then tear them down
  4. Observe the DBus lag measurements.

Describe the results you received and expected

We expect similar DBus responsiveness whether or not we use systemd-cgroup.

What version of runc are you using?

runc version 1.2.5
commit: v1.2.5-0-g59923ef
spec: 1.2.0
go: go1.23.7
libseccomp: 2.5.2

Host OS information

Ran on a AWS c6i.8xlarge

NAME="Oracle Linux Server"
VERSION="9.6"
ID="ol"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="9.6"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Oracle Linux Server 9.6"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:oracle:linux:9:6:server"
HOME_URL="https://linux.oracle.com/"
BUG_REPORT_URL="https://github.com/oracle/oracle-linux"

ORACLE_BUGZILLA_PRODUCT="Oracle Linux 9"
ORACLE_BUGZILLA_PRODUCT_VERSION=9.6
ORACLE_SUPPORT_PRODUCT="Oracle Linux"
ORACLE_SUPPORT_PRODUCT_VERSION=9.6

Host kernel information

Linux 5.15.0-309.180.4.el9uek.x86_64 #2 SMP Wed May 21 06:56:22 PDT 2025 x86_64 x86_64 x86_64 GNU/Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions