apptainer docker tensorflow container issue with 2.18 - skipping loading of GPU #217

ashep29 · 2025-01-20T01:59:28Z

I'm using apptainer pull docker://tensorflow/tensorflow:latest-gpu, but tensorflow 2.18 is skipping loading of GPU, with this message:

W0000 00:00:1736383795.205652 747392 gpu_device.cc:2344] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at [https://www.tensorflow.org/install/gpu] for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

The tensor 2.17 container worked fine, but it appears that the 2.18 container wants to use libcudnn.9.0 but the container only provides libcudnn.8.0. The system libcudnn is not mapped to the container. The system has both 8 and 9 installed. This looks like a bug in the build of the container.

Note: Just using the system python and installing 2.18 (no containers), works fine as the system has both cudnn 8 and 9 installed.

ngaywood · 2025-01-22T01:04:11Z

The tensorflow 2.18 tried to open libcudnn.so.9 that is missing from the docker container.

This has also been reported here [libcudnn.so.9 missing] (tensorflow/tensorflow#80538 (comment))

JoshCaughtFire · 2025-01-26T08:43:26Z

Afaik, based on this https://www.tensorflow.org/install/source#gpu it appears that 2.18 requires cuDNN 9.3, however the current build (https://github.com/tensorflow/build/blob/master/tensorflow_runtime_dockerfiles/gpu.packages.txt) is installing 8.9.6.50. From my understanding, that's the version last used with 2.13 and actually might explain some of the stability issues I've seen training on the docker image for recent versions.

Building an docker image from the official image with that lib installs seems to resolve it:

FROM tensorflow/tensorflow:2.18.0-gpu

RUN apt update && apt install -y --no-install-recommends libcudnn9-cuda-12=9.3.0.75-1

Fixing the official image would be preferred, but this has worked on my project. I haven't looked at how it is built enough to know if it's as simple as updating the gpu.packages.txt or if it needs to be updated based on the version it's being built against as I think this is built internally at Google and not sure their setup.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apptainer docker tensorflow container issue with 2.18 - skipping loading of GPU #217

apptainer docker tensorflow container issue with 2.18 - skipping loading of GPU #217

ashep29 commented Jan 20, 2025

ngaywood commented Jan 22, 2025

JoshCaughtFire commented Jan 26, 2025

apptainer docker tensorflow container issue with 2.18 - skipping loading of GPU #217

apptainer docker tensorflow container issue with 2.18 - skipping loading of GPU #217

Comments

ashep29 commented Jan 20, 2025

ngaywood commented Jan 22, 2025

JoshCaughtFire commented Jan 26, 2025