C source code for `np.einsum` #898

dhjx1996 · 2024-12-20T02:01:01Z

dhjx1996
Dec 20, 2024

Hi @riclarsson, this is an idea I had ever since we discussed the importance of numpy.einsum in making PythonicDISORT fast, sorry that I am following up on it only now (and maybe you have already done this).

You probably know that all of NumPy, SciPy not to mention many base Python functions are implemented in C. Here is the source code for np.einsum: https://github.com/numpy/numpy/blob/e2805398f9a63b825f4a2aab22e9f169ff65aae9/numpy/core/src/multiarray/einsum.c.src. Would such C source code help in porting important NumPy, SciPy and other Python functions into ARTS? A well-built einsum algorithm may also greatly speed up tensor operations in other parts of ARTS. In this vein, the C source code of the scipy.sparse library may be worth looking into as well.

riclarsson · 2025-02-21T13:31:04Z

riclarsson
Feb 21, 2025
Maintainer

Hi @dhjx1996 ,

I already implemented einsum in ARTS in C++, linking it with our vector/matrix/tensor data types. It is optimised to even use appropriate BLAS-expressions when possible. The ARTS C++ einsum is available here. I basically just kept using your einsum expressions mostly as they were.

My implementation does not do anything special to optimise. I therefore had to manually change some of your expressions so that the path was not repeating, and so that BLAS operations (often matrix-vector multiply) are activated whenever possible.

In timing, I think we spend currently 90% or so of our time running the DISORT port inside the lapack functions. Especially dgeev_ is slower than ideal because it is not pure real numbers. The original DISORT code has a special algorithm that ignores the possibility of imaginary eigenvalues.

I am not sure we could save much time in the code unless we optimise the core lapack. I do not think we can optimise my einsum implementation too much. It would be interesting to see if the C-code you link to is indeed faster, though I do not think it is the best place to spend time.

One of the places CDISORT is significantly faster is when it computes pure fluxes. Somehow, it must be using a completely different path through the code, or some optimisation I am not aware of.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C source code for `np.einsum` #898

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

C source code for np.einsum #898

Uh oh!

dhjx1996 Dec 20, 2024

Replies: 1 comment

Uh oh!

riclarsson Feb 21, 2025 Maintainer

C source code for `np.einsum` #898

dhjx1996
Dec 20, 2024

riclarsson
Feb 21, 2025
Maintainer