Optional direct linkage to Intel MKL with fine-grained multithreading control #1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A large number of PySCF users also use Intel MKL, since x64 is probably still the most common architecture in academic clusters. This PR allows PySCF to make use of MKL-specific BLAS extensions and other convenient functions when MKL is used.
As a result, MKL's fine-grained threading control API is exposed. This allows MKL to be set to sequential mode whenever MKL/blas/lapack functions are invoked within an OpenMP parallel region, while MKL is multithreaded outside of such regions.
A side benefit is that, if an MKL-variant PySCF conda package is built, PySCF will be able to retain control over the number of threads used for BLAS. Effectively, conda-installed PySCF with libblas=mkl links to the Intel MKL Single Dynamic Library, which is multithreaded by default. (Performance is still quite good in this case and one can always use the
OMP_NESTED
andMKL_THREADING_LAYER
environment variables to prevent oversubscription, if it ever occurs.)