Optional direct linkage to Intel MKL with fine-grained multithreading control #1

chillenb · 2024-10-08T02:28:11Z

A large number of PySCF users also use Intel MKL, since x64 is probably still the most common architecture in academic clusters. This PR allows PySCF to make use of MKL-specific BLAS extensions and other convenient functions when MKL is used.

As a result, MKL's fine-grained threading control API is exposed. This allows MKL to be set to sequential mode whenever MKL/blas/lapack functions are invoked within an OpenMP parallel region, while MKL is multithreaded outside of such regions.

A side benefit is that, if an MKL-variant PySCF conda package is built, PySCF will be able to retain control over the number of threads used for BLAS. Effectively, conda-installed PySCF with libblas=mkl links to the Intel MKL Single Dynamic Library, which is multithreaded by default. (Performance is still quite good in this case and one can always use the OMP_NESTED and MKL_THREADING_LAYER environment variables to prevent oversubscription, if it ever occurs.)

chillenb · 2024-10-08T02:41:32Z

The majority of changed lines are of one of two types:

malloc, calloc, free, and realloc are replaced with macros pyscf_malloc, pyscf_calloc, pyscf_free, and pyscf_realloc. When MKL is used, these functions are redefined as mkl_malloc, etc., and allocate chunks aligned to 64 bytes.
In functions that call BLAS in a parallel region, the following structure is used to prevent oversubscription:

#pragma omp parallel
{
#ifdef PYSCF_USE_MKL
    int save = mkl_set_num_threads_local(1);
#endif

    ...

#ifdef PYSCF_USE_MKL
    mkl_set_num_threads_local(save);
#endif
}

chillenb added 5 commits October 7, 2024 15:47

Add option to link MKL directly.

916dff0

MKL CI

bbe9a48

add service function pyscf_has_mkl()

43e2b1e

match all mkl_set_num_threads_local

bc2534c

fix realloc and add basic parallel arithmetic

4168f10

chillenb force-pushed the mkl branch 2 times, most recently from ca865aa to 7834c99 Compare October 8, 2024 02:35

few more fixes

1f20a26

chillenb force-pushed the mkl branch from 7834c99 to 1f20a26 Compare October 8, 2024 02:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optional direct linkage to Intel MKL with fine-grained multithreading control #1

Optional direct linkage to Intel MKL with fine-grained multithreading control #1

chillenb commented Oct 8, 2024

chillenb commented Oct 8, 2024

Optional direct linkage to Intel MKL with fine-grained multithreading control #1

Are you sure you want to change the base?

Optional direct linkage to Intel MKL with fine-grained multithreading control #1

Conversation

chillenb commented Oct 8, 2024

chillenb commented Oct 8, 2024