Skip to content

[Offload] Failure to check validity of image for sm_120 architecture #148703

Open
@jafudev

Description

@jafudev

When compiling an offloading capable compiler with the recommended CMake cache file,

cd llvm-project
mkdir build
cd build
cmake ../llvm -G Ninja \
    -C ../offload/cmake/caches/Offload.cmake \
    -DCMAKE_BUILD_TYPE="Release" \
    -DCMAKE_INSTALL_PREFIX="/opt/llvm/llvm-project/install" \
    -DCMAKE_C_COMPILER="clang-19" \
    -DCMAKE_CXX_COMPILER="clang++-19" \
    -DLIBOMPTARGET_ENABLE_DEBUG=ON
ninja -j 24 install

running an executable containing any offloading with OMP_TARGET_OFFLOAD=mandatory ./a.out and LIBOMPTARGET_DEBUG=1 crashes with

omptarget --> Init offload library!
OMPT --> Entering connectLibrary
OMPT --> OMPT: Trying to load library libomp.so
OMPT --> OMPT: Trying to get address of connection routine ompt_libomp_connect
OMPT --> OMPT: Library connection handle = 0x70ef79559e60
OMPT --> Exiting connectLibrary
omptarget --> Loading RTLs...
omptarget --> RTLs loaded!
PluginInterface --> Failure to check validity of image 0x5651b5deed20: Invalid CUDA addressing modePluginInterface --> Failure to check validity of image 0x5651b5deed20: Invalid CUDA addressing modePluginInterface --> Failure to check validity of image 0x5651b5deed20: Invalid CUDA addressing modeomptarget --> No RTL found for image 0x00005651b5dea120!
omptarget --> Done registering entries!
omptarget --> Entering target region for device -1 with entry point 0x00005651b5dea047
omptarget --> Use default device id 0
omptarget --> Call to omp_get_num_devices returning 0
omptarget --> omp_get_num_devices() == 0 but offload is manadatory
omptarget error: Consult https://openmp.llvm.org/design/Runtimes.html for debugging options.
omptarget error: No images found compatible with the installed hardware. Segmentation fault (core dumped)

on an RTX 5070ti (sm_120 architecture).

In the related discussion on the llvm discourse (https://discourse.llvm.org/t/issues-compiling-with-offloading-support/87258/7), @jhuber6 suspects a change in the CUDA ELF ABI, see PluginInterface --> Failure to check validity of image 0x5651b5deed20: Invalid CUDA addressing mode, which is related to (https://github.com/llvm/llvm-project/blob/main/offload/plugins-nextgen/common/src/Utils/ELF.cpp#L76) is causing this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions