Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gpu detected in container created by podman run but not by podman compose #25196

Closed
stumpf84 opened this issue Feb 2, 2025 · 3 comments
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug. remote Problem is in podman-remote

Comments

@stumpf84
Copy link

stumpf84 commented Feb 2, 2025

Issue Description

I'm running podman desktop on Windows 11 23H2.
Creating a container using podman run detects my gpu correctly.
Creating a container using podman compose does not.

Steps to reproduce the issue

  1. run
    podman run -p 11434:11434 -v ollama_ollama_data:/root/.ollama --device nvidia.com/gpu=all --name ollama ollama/ollama:latest

  2. create a compose file

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama_compose
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    restart: unless-stopped
    devices:
      - nvidia.com/gpu=all    
volumes:
  ollama_data:
  1. run
    podman compose up

Describe the results you received

Running the container using podman run shows
level=INFO source=types.go:131 msg="inference compute" id=GPU-2e3db6bb-6d29-dd8d-c4cb-2e891458ba6c library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA GeForce RTX 4090" total="24.0 GiB" available="22.5 GiB"

Running the container using the compose file shows
level=INFO source=gpu.go:392 msg="no compatible GPUs were discovered"

Describe the results you expected

Detect the gpu for podman run and podman compose up

podman info output

host:
  arch: amd64
  buildahVersion: 1.38.0
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.1.12-2.fc40.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.12, commit: '
  cpuUtilization:
    idlePercent: 99.39
    systemPercent: 0.1
    userPercent: 0.51
  cpus: 16
  databaseBackend: sqlite
  distribution:
    distribution: fedora
    variant: container
    version: "40"
  eventLogger: journald
  freeLocks: 2041
  hostname: Ericsons-Desktop
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.15.167.4-microsoft-standard-WSL2
  linkmode: dynamic
  logDriver: journald
  memFree: 15965605888
  memTotal: 16725131264
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.13.1-1.fc40.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.13.1
    package: netavark-1.13.1-1.fc40.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.13.1
  ociRuntime:
    name: crun
    package: crun-1.19.1-1.fc40.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.19.1
      commit: 3e32a70c93f5aa5fea69b50256cca7fd4aa23c80
      rundir: /run/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20241211.g09478d5-1.fc40.x86_64
    version: |
      pasta 0^20241211.g09478d5-1.fc40.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: unix:///run/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: true
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 4294967296
  swapTotal: 4294967296
  uptime: 0h 53m 44.00s
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /usr/share/containers/storage.conf
  containerStore:
    number: 4
    paused: 0
    running: 0
    stopped: 4
  graphDriverName: overlay
  graphOptions:
    overlay.imagestore: /usr/lib/containers/storage
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 1081101176832
  graphRootUsed: 30164312064
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 3
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 5.3.1
  Built: 1732147200
  BuiltTime: Thu Nov 21 01:00:00 2024
  GitCommit: ""
  GoVersion: go1.22.7
  Os: linux
  OsArch: linux/amd64
  Version: 5.3.1

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

Yes

Additional environment details

WSL-Version: 2.3.26.0
Kernelversion: 5.15.167.4-1
WSLg-Version: 1.0.65
MSRDC-Version: 1.2.5620
Direct3D-Version: 1.611.1-81528511
DXCore-Version: 10.0.26100.1-240331-1435.ge-release
Windows-Version: 10.0.22631.4830

Additional information

No response

@stumpf84 stumpf84 added the kind/bug Categorizes issue or PR as related to a bug. label Feb 2, 2025
@github-actions github-actions bot added the remote Problem is in podman-remote label Feb 2, 2025
@giuseppe
Copy link
Member

giuseppe commented Feb 3, 2025

could it be fixed with #25171?

Can you please try with the development version of Podman from the git main branch?

@danlydonut
Copy link

I'm experiencing this same issue.

When using podman run the GPU is detected. When using podman compose for the same container, no luck.

could it be fixed with #25171?

Can you please try with the development version of Podman from the git main branch?

I followed Building the Podman client and client installer on Windows successfully. Confirmed podman.exe --version reports as 5.5.0-dev. However, after running podman.exe machine init I end up with a podman machine that's v5.4.0-rc3. Watching the logs, it looks like it's getting the latest, but unsure why the resultant reports as v5.4.0-rc3:

Looking up Podman Machine image at quay.io/podman/machine-os-wsl:5.5 to create VM

Further, going into shell for the podman machine, it looks like podman within it is 5.3.2.

Is there something I'm missing? This is admittedly my first go building this project. Is there separate way to build podman machine or force the dev version?

Finally, I found the this same issue reported on the NVIDIA dev forums but with a note indicating podman-compose seemingly worked. I didn't validate that detail, nor do I know that it factors give the difference between "docker-compose" and "docker compose" that I presume applies to podman, but figured it was worth sharing.
https://forums.developer.nvidia.com/t/podman-compose-not-working-with-gpu-support/292349

@Luap99
Copy link
Member

Luap99 commented Feb 12, 2025

The the podman machine wsl image uses a old build process that simply just takes the stable fedora version, so you will need to wait until podman v5.4.0 lands in fedora stable and then until the image gets rebuild. Might take a week or more.

I did a quick test with

diff --git a/test/compose/cdi_device/docker-compose.yml b/test/compose/cdi_device/docker-compose.yml
index dfbeb2e906..69b3adc66e 100644
--- a/test/compose/cdi_device/docker-compose.yml
+++ b/test/compose/cdi_device/docker-compose.yml
@@ -6,10 +6,5 @@ services:
       - /dev:/dev-host
     security_opt:
       - label=disable
-    deploy:
-      resources:
-        reservations:
-          devices:
-          - driver: cdi
-            device_ids: ['vendor.com/device=myKmsg']
-            capabilities: []
+    devices:
+      - vendor.com/device=myKmsg

on main and that worked so the cdi device can be listed as normal device so that should just work once the server is updated to 5.4.0 I think.

@Luap99 Luap99 closed this as completed Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. remote Problem is in podman-remote
Projects
None yet
Development

No branches or pull requests

4 participants