Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nvidia GPU support #1

Closed
aksiksi opened this issue Nov 19, 2023 · 10 comments
Closed

Nvidia GPU support #1

aksiksi opened this issue Nov 19, 2023 · 10 comments
Labels
compose Docker Compose docker Docker-specific feature New feature or request podman Podman-specific

Comments

@aksiksi
Copy link
Owner

aksiksi commented Nov 19, 2023

Some random links:

@aksiksi aksiksi added feature New feature or request and removed feature New feature or request labels Nov 19, 2023
@aksiksi
Copy link
Owner Author

aksiksi commented Feb 24, 2024

With CDI support now added to NixOS (NixOS/nixpkgs#284507), GPU access should work in a container. See this thread for details: https://discourse.nixos.org/t/nvidia-gpu-support-in-podman-and-cdi-nvidia-ctk/36286.

Podman

Add the CDI device(s) to devices in your Compose file, like so:

jellyfin:
    image: lscr.io/linuxserver/jellyfin
    container_name: jellyfin
    security_opt:
      - label=disable
    devices:
      - nvidia.com/gpu=all

Docker

  1. Enable experimental CDI support in the daemon:
{
  virtualisation.docker.daemon.settings = {
    features = { cdi = true; };
  };
}
  1. Pass in CDI devices to your service(s):
jellyfin:
    image: lscr.io/linuxserver/jellyfin
    container_name: jellyfin
    security_opt:
      - label=disable
    devices:
      - nvidia.com/gpu=all

@aksiksi aksiksi closed this as completed Feb 24, 2024
@SpidFightFR
Copy link

Any news on the docker side ? I personally added the CDI devices to my docker compose config, when i run docker-compose up it uses my GPU, when i use compose2nix, it gets back on the CPU.

@aksiksi
Copy link
Owner Author

aksiksi commented Aug 31, 2024

I personally don’t use Docker. But I did some digging into Docker PRs and found that CDI support is still experimental: moby/moby#47087

To enable it, you’ll first need to set the CDI feature flag: https://docs.docker.com/reference/cli/dockerd/#enable-cdi-devices

In your NixOS config, try this:

virtualisation.docker.daemon.settings = {
  features = { cdi = true; };
};

And you will need to of course pass in “devices” in CDI format as part of your Compose config.

@SpidFightFR
Copy link

I personally don’t use Docker. But I did some digging into Docker PRs and found that CDI support is still experimental: moby/moby#47087

To enable it, you’ll first need to set the CDI feature flag: https://docs.docker.com/reference/cli/dockerd/#enable-cdi-devices

In your NixOS config, try this:

virtualisation.docker.daemon.settings = {
  features = { cdi = true; };
};

And you will need to of course pass in “devices” in CDI format as part of your Compose config.

I'm trying to do my tests using InvokeAI (that's the only GPU-Docker tool i got rn):

The compose part looks like this:

# Copyright (c) 2023 Eugene Brodsky https://github.com/ebr

x-invokeai: &invokeai
    image: "local/invokeai:latest"
    build:
      context: ..
      dockerfile: docker/Dockerfile

    # Create a .env file in the same directory as this docker-compose.yml file
    # and populate it with environment variables. See .env.sample
    env_file:
      - .env

    # variables without a default will automatically inherit from the host environment
    environment:
      # if set, CONTAINER_INVOKEAI_ROOT will override the Invoke runtime directory location *inside* the container
      - INVOKEAI_ROOT=${CONTAINER_INVOKEAI_ROOT:-/invokeai}
      - HF_HOME
    ports:
      - "${INVOKEAI_PORT:-9090}:${INVOKEAI_PORT:-9090}"
    volumes:
      - type: bind
        source: ${HOST_INVOKEAI_ROOT:-${INVOKEAI_ROOT:-~/invokeai}}
        target: ${CONTAINER_INVOKEAI_ROOT:-/invokeai}
        bind:
          create_host_path: true
      - ${HF_HOME:-~/.cache/huggingface}:${HF_HOME:-/invokeai/.cache/huggingface}
    tty: true
    stdin_open: true


services:
  invokeai-cuda:
    <<: *invokeai
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - driver: cdi
              device_ids:
                - nvidia.com/gpu=all

Though the docker config.nix doesn't contain the device part, idk if it was ignored because of an error on my side ?

@aksiksi
Copy link
Owner Author

aksiksi commented Aug 31, 2024

No error on your side - compose2nix does not yet support deploy.resources.reservations.devices. I can add support for that today but only for CDI - shouldn't be tricky. Will work on a PR now.

Do you have the CDI feature enabled in your Docker config? If not, it's strange how the GPU is detected when running with Docker Compose directly.

Two other minor notes:

  • FYI, the Build spec (build) is not supported by compose2nix, so you'll need to build the container first. See: Support for Compose Build spec #4
  • tty and stdin_open are not supported - not sure how critical these are, though.

@SpidFightFR
Copy link

I can add support for that today but only for CDI

Sounds great to me !
Thank you for your answers.

Do you have the CDI feature enabled in your Docker config? If not, it's strange how the GPU is detected when running with Docker Compose directly.

nope, i only got that enabled:
hardware.nvidia-container-toolkit.enable = true; on docker 25. Something that was recommended to me on the nixos discourse

@aksiksi
Copy link
Owner Author

aksiksi commented Aug 31, 2024

Ah gotcha, thanks for clarifying! It looks like the feature flag is set by the NixOS module here: https://github.com/NixOS/nixpkgs/blob/nixos-24.05/nixos/modules/services/hardware/nvidia-container-toolkit/default.nix#L72

I'll also update the README with these steps for others who want to get CDI GPU support running in Docker.

@aksiksi
Copy link
Owner Author

aksiksi commented Aug 31, 2024

In the meantime, can you please try passing in your devices via the devices block like this:

services:
  invokeai-cuda:
    <<: *invokeai
    restart: unless-stopped
    devices:
      - nvidia.com/gpu=all

The change I am making will be doing exactly this, so you'll just have another way to write it in Compose.

@SpidFightFR
Copy link

SpidFightFR commented Aug 31, 2024

In the meantime, can you please try passing in your devices via the devices block like this:

services:
  invokeai-cuda:
    <<: *invokeai
    restart: unless-stopped
    devices:
      - nvidia.com/gpu=all

The change I am making will be doing exactly this, so you'll just have another way to write it in Compose.

Hey, i made the changes you provided me (thanks) and updated my flake to match your latest commit.
Docker compose no longer works in standalone, however, it works for compose2nix as a service. So i consider my problem solved. Thanks a lot ! 👍 😄

Edit: Error log for docker compose in standalone:

$ docker compose up -d
[+] Running 0/1
 ⠙ Container docker-invokeai-cuda-1  Starting                                                                                                                                                                   0.2s 
Error response from daemon: error gathering device information while adding custom device "nvidia.com/gpu=all": no such file or directory

EDIT 2: i guess i can make another service with the working standalone docker compose but ngl i don't have the use for that anymore...

@aksiksi
Copy link
Owner Author

aksiksi commented Aug 31, 2024

Awesome! If you update/use the latest compose2nix PR I just merged, you can go back to your original config and it should work with compose2nix as well :)

See step (2) in the readme section I just added: https://github.com/aksiksi/compose2nix?tab=readme-ov-file#nvidia-gpu-support

@aksiksi aksiksi added docker Docker-specific podman Podman-specific compose Docker Compose labels Aug 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compose Docker Compose docker Docker-specific feature New feature or request podman Podman-specific
Projects
None yet
Development

No branches or pull requests

2 participants