10 Mar 15:44

nidode

6fed1ae

Stabilized GPU Lanczos, BLACS Context Fixes Latest

Latest

This minor release focuses on stabilizing GPU/NCCL Lanczos and QR workflows, tightening MPI/BLACS integration, and strengthening tests and build configuration.

Highlights

GPU/NCCL performance & stability: More GPU-resident Lanczos with warm-up phases for Lanczos/QR, fused kernels, async host copies, and better CUDA error reporting.
MPI/BLACS correctness: BLACS contexts are now correctly bound to the user’s MPI communicator, fixing issues when using sub-communicators.
Robust testing & CI: Broader unit test coverage (including Fortran interfaces), more complete test runs, and stabilized CI across configurations.
Better configuration & packaging: Added pkg-config support and chase_config.h.in for easier downstream integration.

New & Improved

Implemented fully GPU-resident Lanczos with fused kernels and warm-up phases for Lanczos and QR when using NCCL.
Cleaned up GPU kernels and enabled async host copies to reduce overhead.
Improved examples and interfaces, including updates to 1_hello_world.cpp and new/updated Fortran interface tests.
Added pkg-config support and chase_config.h.in to streamline configuration and discovery of ChASE from external projects.
Introduced chase_config infrastructure for more flexible build-time configuration.

Bug Fixes

Fixed a critical bug by binding BLACS contexts to the user’s MPI communicator, ensuring correct behavior with sub-communicators.
Resolved multiple issues in GPU-resident Lanczos, including stream handling in chase_gpu and bugs in Lanczos with NCCL.
Fixed a typing issue for single-precision eigenvalue vectors in the Fortran interface.
Fixed build problems for the GPU version when NCCL is not present, allowing clean builds with or without NCCL.
Added explicit CUDA kernel error retrieval to aid debugging and robustness.

Testing & CI

Added full runs of all builds into unit tests, including GPU and NCCL combinations.
Expanded and stabilized Lanczos unit tests and related test files.

Assets 2

22 Jan 15:31

nidode

v1.7.0

95c4def

Official release: ChASE is now extended to tackle pseudo-Hermitian eigenproblems

In this release:

New solver for pseudo-Hermitian definite eigenvalue problems. This includes:
- Introduction of the PseudoHermitianMatrix class
- Modified algorithm for the Chebyshev filter
- Modified algorithm for the QR factorization
- New oblique Rayleigh-Ritz projection
- Parallel distribution and implementations for distributed CPU and GPU
Improved and expanded documentation
Updated README file
Improved and updated C and Fortran interfaces for pseudo-Hermitian problems
Resolved bugs in printing parameters
Updated and improved CI pipeline
Updated performance counters
Improved statistics on early locking vectors
Added mixed precision support

Assets 2

19 Dec 19:26

nidode

v1.7.0-rc1

9afa354

Major functionality extension: added solve for pseudo-Hermitian eigenproblems

Christmas release!

ChASE can now tackle pseudo-Hermitian problem as they arise, for instance in Bethe-Salpeter equation. The solve has been extended by generalizing the Chebyshev filter and designing a new oblique method for the Rayleigh-Ritz projection. The algorithmic improvements are also part of a paper that will soon be submitted for publication. This is a release candidate.

Assets 2

28 Jan 15:01

nidode

v1.6.0

97e69f0

ChASE code structure fully revised

ChASE has gone a complex restructuring process that included major changes in the hierarchical folder structure, a clearer division between parallel implementations and a new Test-Driven-Development which introduced unit testing for many of its routines. In particular:

A clear division between Chase algorithm and their implementation has been maintained
The name of the base virtual class has changed from Chase to ChaseBase
All Implementation for sequential and parallel CPU/GPU architecture has been clearly separated and based on the new ChaseBase class.
All reliance on external numerical libraries (BLAS, LAPACK, etc.) has been separated and templated
Utilities for the initialization of pure MPI, cuda-aware MPI and NCCL grids has been redefined and clearly separated
Implementation of the kernels have been grouped in a linalg folder which contains
- matrix classes for shared-memory CPU and GPU architectures
- matrix classes for distinct distribution of matrices and vectors
- shared memory kernels
- distributed CPU kernels
- single GPU kernels
- distributed GPU kernels
Functionalities for mixed precision have been added
Unit testing has been introduced for
- the grid functionalities
- matrix type functionalities
- all kernels for the separate cases (CPU, GPU, distributed CPU, etc.)
Examples have been generalized and redistribution of matrices has been abstracted
A template for Collaboration Agreement (CLA) has been added

Assets 2

27 Sep 14:29

nidode

v1.5.0

025d942

Added CI pipeline for automatic building and testing

Created a CI pipeline and included unit and integration testing for the QR decomposition.

Assets 2

08 Dec 14:55

nidode

v1.4.1

f95fd44

Bug fix: GPU-timing syncronization

A problem was observed with different NVTX ranges on CPU and GPU. The problem has been solved by explicitly synchronizing the CPU with the GPU.

Assets 2

07 Aug 11:52

nidode

v1.4.0

58dce4e

ChASE v1.4.0. Major release.

Introduced a new distributed GPU-build of ChASE entirely based on the NVIDIA NCCL library, which avoids the explicit data movement between host and device memory, and leads to much faster collective communications among the involved GPUs. This new release achieves between a 1.5x and 3x with respect to the traditional distributed multi-GPUs build. Now ChASE can be compiled and executed with the following distinct parallel configurations:

Distributed CPU only
Distributed multi-GPUs (traditionally based on host-device communication standards)
Distributed multi-GPUs (using NVIDIA NCCL library)

Assets 2

05 Apr 13:08

nidode

v1.3.1

2f0babf

ChASE v1.3.1: minor release

Updated the estimation bound for the condition number of the matrix of filtered vectors V. This estimate bounds from above the actual condition number of the matrix V allowing for the dynamical selection of the Communication-Avoiding QR-decomposition (CAQR) variant within the ChASE library at run time.

Assets 2

10 Mar 14:40

nidode

v1.3.0

16deae1

ChASE v1.3.0. Major release.

This release features a number of changes in the parallel implementation and the algorithm.

The QR factorization, which was previously done redundantly on each MPI process, is not parallelized on a 1D sub-grid of the 2D MPI cartesian grid.
As a consequence of the additional parallelization, the number and structure of the workspace buffers has changed greatly diminishing the memory footprint of the entire library
The use of the postApplication function has been substituted with the result that some of the communication is now hidden behind computation during the execution of the Rayleigh-Ritz kernel and the Residual kernel
The parallel HouseholderQR algorithm has been substituted with the CholeskyQR algorithm (and its more stable variants). A mechanism to avoid failure of this algorithm has been introduced based on numerical analysis results.
A new parallel random generator has been added to reduce the time spent initializing the computation, especially for large scale problems.

Assets 2

23 Jan 14:27

nidode

v1.2.1

a307474

ChASE is now integrated into the ELSI library.

In this release:

The C and Fortran interfaces have been improved
Dependencies on Nvtx tool has been removed
the ELSI interface has been included

Assets 2

Releases: ChASE-library/ChASE

Stabilized GPU Lanczos, BLACS Context Fixes

Highlights

New & Improved

Bug Fixes

Testing & CI

Uh oh!

Official release: ChASE is now extended to tackle pseudo-Hermitian eigenproblems

Uh oh!

Major functionality extension: added solve for pseudo-Hermitian eigenproblems

Christmas release!

Uh oh!

ChASE code structure fully revised

Uh oh!

Added CI pipeline for automatic building and testing

Uh oh!

Bug fix: GPU-timing syncronization

Uh oh!

ChASE v1.4.0. Major release.

Uh oh!

ChASE v1.3.1: minor release

Uh oh!

ChASE v1.3.0. Major release.

Uh oh!

ChASE is now integrated into the ELSI library.

Uh oh!