The Linux TSAN CI configuration is consistently timing out in the IREE async io_uring CTS core test on the managed ROCm Linux runner.
Failing run:
https://github.com/ROCm/hrx-system/actions/runs/26672062101?pr=18
Failing job:
Linux / CMake / CI (TSAN, tsan, true, true, false) / TSAN
Runner/job details from the log:
- Runner: aws-linux-scale-rocm-prod-vzddl-runner-9k42x
- Workflow ref: refs/pull/18/merge
- Merge commit: bcd69bc
- Config:
- HRX_SANITIZER=tsan
- HRX_ASSERTIONS=true
- HRX_BUILD_TYPE=RelWithDebInfo
- HRX_TEST_GPU=false
- HRX_PACKAGE=false
- Test command:
ctest --test-dir /__w/hrx-system/hrx-system/build/linux/install/composed/share/hrx-system/tests \
--output-on-failure \
--parallel 48
Failure:
Total Test time (real) = 60.06 sec
The following tests FAILED:
2 - iree/async/platform/io_uring/cts/core_tests (Timeout)
The timeout occurs inside:
CTS/TsanBridgeTest.MultiThreadReuse/io_uring
The last emitted test output before the 60s CTest timeout is:
[ RUN ] CTS/TsanBridgeTest.SingleThreadReuse/io_uring_minimal
[ OK ] CTS/TsanBridgeTest.SingleThreadReuse/io_uring_minimal (0 ms)
[ RUN ] CTS/TsanBridgeTest.MultiThreadReuse/io_uring
Expected behavior:
iree/async/platform/io_uring/cts/core_tests should complete under TSAN, or the specific TSAN-incompatible io_uring case should be disabled with an issue reference.
Local investigation:
- Built the same TSAN configuration locally using ROCm /srv/vm-shared/shared/rocm-7.14.0a20260527.
- Local host: Fedora 43, Linux 6.19.6-200.fc43.x86_64, 192 logical CPUs.
- Full local TSAN installed-test run used --parallel 96; it did not reproduce this timeout. io_uring/cts/core_tests completed.
- Repeated isolated local runs of iree/async/platform/io_uring/cts/core_tests also completed.
- Repeated local runs of the whole iree/async/platform/io_uring group completed.
- Local TSAN did expose unrelated ROCr/HSA TSAN reports in AMDGPU tests on the GPU host, but those are distinct from this CI timeout.
Current hypothesis:
This is likely a Linux/kernel/runtime interaction specific to the current ROCm CI runner environment for io_uring under TSAN. The hang is in the TSAN bridge test’s MultiThreadReuse/io_uring
parameterization, not in the whole TSAN suite generally.
The Linux TSAN CI configuration is consistently timing out in the IREE async io_uring CTS core test on the managed ROCm Linux runner.
Failing run:
https://github.com/ROCm/hrx-system/actions/runs/26672062101?pr=18
Failing job:
Linux / CMake / CI (TSAN, tsan, true, true, false) / TSAN
Runner/job details from the log:
Failure:
Expected behavior:
iree/async/platform/io_uring/cts/core_tests should complete under TSAN, or the specific TSAN-incompatible io_uring case should be disabled with an issue reference.
Local investigation:
Current hypothesis:
This is likely a Linux/kernel/runtime interaction specific to the current ROCm CI runner environment for io_uring under TSAN. The hang is in the TSAN bridge test’s MultiThreadReuse/io_uring
parameterization, not in the whole TSAN suite generally.