Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional direct linkage to Intel MKL with fine-grained multithreading control #1

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

chillenb
Copy link
Owner

@chillenb chillenb commented Oct 8, 2024

A large number of PySCF users also use Intel MKL, since x64 is probably still the most common architecture in academic clusters. This PR allows PySCF to make use of MKL-specific BLAS extensions and other convenient functions when MKL is used.

As a result, MKL's fine-grained threading control API is exposed. This allows MKL to be set to sequential mode whenever MKL/blas/lapack functions are invoked within an OpenMP parallel region, while MKL is multithreaded outside of such regions.

A side benefit is that, if an MKL-variant PySCF conda package is built, PySCF will be able to retain control over the number of threads used for BLAS. Effectively, conda-installed PySCF with libblas=mkl links to the Intel MKL Single Dynamic Library, which is multithreaded by default. (Performance is still quite good in this case and one can always use the OMP_NESTED and MKL_THREADING_LAYER environment variables to prevent oversubscription, if it ever occurs.)

@chillenb chillenb force-pushed the mkl branch 2 times, most recently from ca865aa to 7834c99 Compare October 8, 2024 02:35
@chillenb
Copy link
Owner Author

chillenb commented Oct 8, 2024

The majority of changed lines are of one of two types:

  1. malloc, calloc, free, and realloc are replaced with macros pyscf_malloc, pyscf_calloc, pyscf_free, and pyscf_realloc. When MKL is used, these functions are redefined as mkl_malloc, etc., and allocate chunks aligned to 64 bytes.
  2. In functions that call BLAS in a parallel region, the following structure is used to prevent oversubscription:
#pragma omp parallel
{
#ifdef PYSCF_USE_MKL
    int save = mkl_set_num_threads_local(1);
#endif

    ...

#ifdef PYSCF_USE_MKL
    mkl_set_num_threads_local(save);
#endif
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant