Skip to content

Use system's default rocgdb instead of AOMP's#853

Open
saiislam wants to merge 1 commit into
aomp-devfrom
saiislam-clang-325070
Open

Use system's default rocgdb instead of AOMP's#853
saiislam wants to merge 1 commit into
aomp-devfrom
saiislam-clang-325070

Conversation

@saiislam
Copy link
Copy Markdown
Member

@saiislam saiislam commented Mar 6, 2024

rocgdb requires libpython.so which is more likely to be found by the system's default rocgdb.

The one in AOMP/bin/rocgdb complains about missing libpython.so file.

rocgdb requires libpython.so which is more likely to be found
by the system's default rocgdb.

The one in AOMP/bin/rocgdb complains about missing libpython.so file.
@jplehr
Copy link
Copy Markdown
Contributor

jplehr commented Mar 6, 2024

I don't think we want to test the sytem's rocgdb (by accident or on purpose).

@ronlieb
Copy link
Copy Markdown
Contributor

ronlieb commented Mar 7, 2024

I also wonder why we don’t want to test the rocgdb we built and packed ?

@dpalermo
Copy link
Copy Markdown
Contributor

dpalermo commented Mar 7, 2024

I am seeing a different complaint instead of a missing libpython:

[r6 ~]$ /COD/LATEST/aomp/bin/rocgdb
amd-dbgapi library version mismatch, got 0.70.1, need 0.71+

Seems to have started on:

[r6 ~]$ /COD/2023-12-20/aomp/bin/rocgdb
amd-dbgapi library version mismatch, got 0.70.1, need 0.71+

Works before that date:

[r6 ~]$ /COD/2023-12-19/aomp/bin/rocgdb
GNU gdb (AOMP_18.0-1) 13.2
...
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) q

@dpalermo
Copy link
Copy Markdown
Contributor

dpalermo commented Mar 7, 2024

We are actually staging python libs into COD to allow tools that are linked to specific versions of python shared objects to work. If you see a missing python lib error, paste in the exact error message and the system you saw it on.

@saiislam
Copy link
Copy Markdown
Member Author

saiislam commented Mar 7, 2024

We are actually staging python libs into COD to allow tools that are linked to specific versions of python shared objects to work. If you see a missing python lib error, paste in the exact error message and the system you saw it on.

I am getting same error irrespective of using 2024-03-07 build or 2023-12-04 build.

Note: results are on r11

/COD/2024-03-07/aomp/bin/clang++  -g -O0    -fopenmp --offload-arch=gfx90a  -D__OFFLOAD_ARCH_gfx90a__ clang-325070.cpp -o clang-325070
/COD/2024-03-07/aomp/bin/rocgdb -x doit.gdb --args ./clang-325070 0 2>&1 | tee run.log
/COD/2024-03-07/aomp/bin/rocgdb: error while loading shared libraries: libpython3.8.so.1.0: cannot open shared object file: No such file or directory
make: *** [../Makefile.rules:71: run] Error 127
/COW/2023-12-04/aomp/bin/clang++  -g -O0    -fopenmp --offload-arch=gfx90a  -D__OFFLOAD_ARCH_gfx90a__ clang-325070.cpp -o clang-325070
/COW/2023-12-04/aomp/bin/rocgdb -x doit.gdb --args ./clang-325070 0 2>&1 | tee run.log
/COW/2023-12-04/aomp/bin/rocgdb: error while loading shared libraries: libpython3.8.so.1.0: cannot open shared object file: No such file or directory
make: *** [../Makefile.rules:71: run] Error 127

@dpalermo
Copy link
Copy Markdown
Contributor

dpalermo commented Mar 7, 2024

Looking back at the original thread on the 'CI OpenMP compiler daily triage group' teams chat motivated the staging fix, you will need to do the following on a 22.04 system:

[r11 ~]$ PYTHONHOME=/COD/LATEST/aomp/lib/python3.8 PYTHONPATH=/COD/LATEST/aomp/lib/python3.8  LD_LIBRARY_PATH=/COD/LATEST/aomp/lib /COD/LATEST/aomp/bin/rocgdb
GNU gdb (AOMP_19.0-0) 13.2
...
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb)

Also note that setting the above env vars also fixes the 'amd-dbgapi library version mismatch, got 0.70.1, need 0.71+' error now seen on 20.04 systems.

Not a "fix" so much as a workaround for running rocgdb built on an older OS.

Not that it helps us in this situation, but the moral of the story is don't link your product with the python shared objects. There is just no backward compatibility guaranteed (at least not building on 20.04 and running on 22.04).

@dpalermo
Copy link
Copy Markdown
Contributor

dpalermo commented Mar 7, 2024

The 'amd-dbgapi library version mismatch, got 0.70.1, need 0.71+' error is even a problem on the same system where rocgdb was built. Without specifying LD_LIBRARY_PATH, it is picking up the library from the system /opt/rocm:

[r5 /COD/LATEST/aomp]$ ldd /COD/LATEST/aomp/bin/rocgdb | grep dbgapi
        librocm-dbgapi.so.0 => /opt/rocm-5.7.0/lib/librocm-dbgapi.so.0 (0x00007f8046edb000)

Gets the staged librocm-dbgapi.so.0 with the workaround:

[r5 /COD/LATEST/aomp]$ PYTHONHOME=/COD/LATEST/aomp/lib/python3.8 PYTHONPATH=/COD/LATEST/aomp/lib/python3.8  LD_LIBRARY_PATH=/COD/LATEST/aomp/lib ldd /COD/LATEST/aomp/bin/rocgdb | grep dbgapi
        librocm-dbgapi.so.0 => /COD/LATEST/aomp/lib/librocm-dbgapi.so.0 (0x00007f25342d1000)

This issue feels like a cmake bug in rocgdb, as it should try to pick up shared libraries relative to it's installed location first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants