You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cmd, pkg/nvidia: Enable the proprietary NVIDIA driver
This uses the NVIDIA Container Toolkit [1] to generate a Container
Device Interface specification [2] on the host during the 'enter' and
'run' commands. The specification is saved as JSON in the runtime
directories at /run/toolbox or $XDG_RUNTIME_DIR/toolbox to make it
available to the Toolbx container's entry point. The environment
variables in the specification are directly passed to 'podman exec',
while the hooks and mounts are handled by the entry point.
Toolbx containers already have access to all the devices in the host
operating system's /dev, and containers share the kernel space driver
with the host. So, this is only about making the user space driver
available to the container. It's done by bind mounting the files
mentioned in the generated CDI specification from the host to the
container, and then updating the container's dynamic linker cache.
This neither depends on 'nvidia-ctk cdi generate' to generate the
Container Device Interface specification nor on 'podman create --device'
to consume it.
The main problem with nvidia-ctk and 'podman create' is that the
specification must be saved in /etc/cdi or /var/run/cdi, both of which
require root access, for it to be visible to 'podman create --device'.
Toolbx containers are often used rootless, so requiring root privileges
for hardware support, something that's not necessary on the host, will
be a problem.
Secondly, updating the toolbox(1) binary won't let existing containers
use the proprietary NVIDIA driver, because 'podman create' only affects
new containers.
Therefore, toolbox(1) uses the Go APIs used by 'nvidia-ctk cdi generate'
and 'podman create --device' to generate, save, load and apply the CDI
specification itself. This removes the need for root privileges due to
/etc/cdi or /var/run/cdi, and makes the driver available to existing
containers.
Until Bats 1.10.0, 'run --keep-empty-lines' had a bug where it counted
the trailing newline on the last line as a separate line [3]. However,
Bats 1.10.0 is only available in Fedora >= 39 and is absent from Fedora
38.
Based on an idea from Ievgen Popovych.
[1] https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/https://github.com/NVIDIA/nvidia-container-toolkit
[2] https://github.com/cncf-tags/container-device-interface
[3] Bats commit 6648e2143bffb933
bats-core/bats-core@6648e2143bffb933bats-core/bats-core#708#116
0 commit comments