This minor release focuses on stabilizing GPU/NCCL Lanczos and QR workflows, tightening MPI/BLACS integration, and strengthening tests and build configuration.
Highlights
GPU/NCCL performance & stability: More GPU-resident Lanczos with warm-up phases for Lanczos/QR, fused kernels, async host copies, and better CUDA error reporting.
MPI/BLACS correctness: BLACS contexts are now correctly bound to the user’s MPI communicator, fixing issues when using sub-communicators.
Robust testing & CI: Broader unit test coverage (including Fortran interfaces), more complete test runs, and stabilized CI across configurations.
Better configuration & packaging: Added pkg-config support and chase_config.h.in for easier downstream integration.
New & Improved
Implemented fully GPU-resident Lanczos with fused kernels and warm-up phases for Lanczos and QR when using NCCL.
Cleaned up GPU kernels and enabled async host copies to reduce overhead.
Improved examples and interfaces, including updates to 1_hello_world.cpp and new/updated Fortran interface tests.
Added pkg-config support and chase_config.h.in to streamline configuration and discovery of ChASE from external projects.
Introduced chase_config infrastructure for more flexible build-time configuration.
Bug Fixes
Fixed a critical bug by binding BLACS contexts to the user’s MPI communicator, ensuring correct behavior with sub-communicators.
Resolved multiple issues in GPU-resident Lanczos, including stream handling in chase_gpu and bugs in Lanczos with NCCL.
Fixed a typing issue for single-precision eigenvalue vectors in the Fortran interface.
Fixed build problems for the GPU version when NCCL is not present, allowing clean builds with or without NCCL.
Added explicit CUDA kernel error retrieval to aid debugging and robustness.
Testing & CI
Added full runs of all builds into unit tests, including GPU and NCCL combinations.
Expanded and stabilized Lanczos unit tests and related test files.