Dask meta-namespace in `apply_where` #196

crusaderky · 2025-03-18T11:56:10Z

crusaderky
Mar 18, 2025

State of the art

Dask struggles with masked updates. This is due to the fact that x[mask] has unknown shape, and Dask today is not smart enough to track that in x[mask] = y[mask] lhs and rhs have the same shape (dask/dask#11831).

As a way to cope with that, xpx.apply_where calls da.map_blocks and applies f1 and f2 to the individual chunks. While this works, it has the issue that the final user needs to be aware of the meta namespace, that is the namespace of the Dask chunks.

This is currently solved internally with a private function meta_namespace:

from array_api_compat import array_namespace
import array_api_extra as xpx
from array_api_extra._lib._utils._helpers import meta_namespace

xp = array_namespace(x)
mxp = meta_namespace(x, xp=xp)  # Same as xp unless xp is Dask
y = xpx.apply_where(
    x > 0,
    x, 
    lambda x: mxp.sin(x),
    lambda x: mxp.cos(x),
    xp=xp,
)

If you forget about the meta-namespace and just use xp in the lambdas, at the moment most things will keep working.
This is because accidentally several functions in the dask.array, numpy, and cupy namespaces are interoperable or even the same function. However you will find cases where this doesn't hold true and you need the correct namespace.

This will become a much bigger source of headaches in the future when dask around generic Array API compatible namespaces will become commonplace (note: Dask does NOT support them today).

This pattern repeats itself many, many times in scipy. At the moment there are only a handful of cases that are array API-aware, and they all use xp.divide, so the problem can be worked around by replacing it with operator.truediv. But if you look at scipy.stats in scipy/scipy#22557 you'll find a miriad of calls to np. functions inside the lambdas.

Proposed solutions

In the long term, I see several possible ways forward:

make meta_namespace public API.
like: explicit is better than implicit
dislike: very verbose
add signature magic to apply_where; if f1 or f2 accept a keyword argument called "xp", pass to it the meta-namespace:

        out = apply_where(
            ...,  # cond
            lambda a, b, xp: xp.isinf(a) & xp.isinf(b) & (xp.sign(a) == xp.sign(b)),
            lambda a, b, xp: xp.abs(a - b) <= (atol + rtol * xp.abs(b)),
            (a, b)
        )

like: synthetic; no need for helper functions
dislike: obscure functionality which needs to be commented every time; otherwise unwary maintainers will break it by trying to simplify it. (this negates its compactness benefit)

as above, but call the special parameter mxp.
like: unlikely to shadow another local variable, so new readers are forced to stop and think how it's populated
dislike: pattern is not used anywhere else. It's immediately clear to all what xp means; not so much with mxp.
just use xp from the outer context in the lambdas. Expect that, by the time Dask starts supporting arbitrary array api compliant meta-namespaces, it will also have fixed its issues with NaN shapes. When that happens, we'll remove all special case handling for Dask in array-api-extra and the lamdbas will just run on filtered Dask arrays.
like: cleanest; no need to explicitly test alternative backends
dislike: not going to happen without substantial effort.

What about `lazy_apply`?

lazy_apply(as_numpy=False) has the same issue. However, one would expect most applied functions not to be lambdas there, so one can expect that they all start with the pattern xp = array_namespace(x, ...) on their first line.

mdhaber · 2025-04-13T02:32:52Z

mdhaber
Apr 13, 2025

Are these errors related to this discussion?

from array_api_compat.dask import array as xp
import array_api_extra as xpx
x = xp.asarray(0.6)
round_x = xp.asarray(False)
xpx.apply_where(round_x, x, xp.round, fill_value=x)
# see error below

Traceback (most recent call last):
  File "C:\Users\matth\miniforge3\envs\scipy-dev\Lib\site-packages\IPython\core\interactiveshell.py", line 3577, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-13-61ab47c55dba>", line 6, in <module>
    xpx.apply_where(round_x, x, xp.round, fill_value=x)
  File "C:\Users\matth\Desktop\scipy\scipy\_lib\array_api_extra/src/array_api_extra/_lib/_funcs.py", line 141, in apply_where
    return xp.map_blocks(_apply_where, cond, f1, f2, fill_value, *args_, xp=meta_xp)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\matth\miniforge3\envs\scipy-dev\Lib\site-packages\dask\array\core.py", line 840, in map_blocks
    dtype = apply_infer_dtype(func, args, original_kwargs, "map_blocks")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\matth\miniforge3\envs\scipy-dev\Lib\site-packages\dask\array\core.py", line 506, in apply_infer_dtype
    raise ValueError(msg)
ValueError: `dtype` inference failed in `map_blocks`.
Please specify the dtype explicitly using the `dtype` kwarg.
Original error is below:
------------------------
AttributeError("'numpy.ndarray' object has no attribute 'map_blocks'")
Traceback:
---------
  File "C:\Users\matth\miniforge3\envs\scipy-dev\Lib\site-packages\dask\array\core.py", line 481, in apply_infer_dtype
    o = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\matth\Desktop\scipy\scipy\_lib\array_api_extra/src/array_api_extra/_lib/_funcs.py", line 159, in _apply_where
    temp1 = f1(*(arr[cond] for arr in args))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\matth\miniforge3\envs\scipy-dev\Lib\site-packages\dask\array\routines.py", line 1580, in round
    return a.map_blocks(np.round, decimals=decimals, dtype=a.dtype)
           ^^^^^^^^^^^^

Similarly:

# same as above
def special_case(x,):
    return xp.where(x > 0, x, xp.nan)

xpx.apply_where(round_x, x, special_case, fill_value=x)
# see error below

Traceback (most recent call last):
  File "C:\Users\matth\miniforge3\envs\scipy-dev\Lib\site-packages\IPython\core\interactiveshell.py", line 3577, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-01dcdd45da97>", line 10, in <module>
    xpx.apply_where(round_x, x, special_case, fill_value=x)
  File "C:\Users\matth\miniforge3\envs\scipy-dev\Lib\site-packages\array_api_extra\_lib\_funcs.py", line 144, in apply_where
    return xp.map_blocks(_apply_where, cond, f1, f2, fill_value, *args_, xp=meta_xp)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\matth\miniforge3\envs\scipy-dev\Lib\site-packages\dask\array\core.py", line 840, in map_blocks
    dtype = apply_infer_dtype(func, args, original_kwargs, "map_blocks")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\matth\miniforge3\envs\scipy-dev\Lib\site-packages\dask\array\core.py", line 506, in apply_infer_dtype
    raise ValueError(msg)
ValueError: `dtype` inference failed in `map_blocks`.
Please specify the dtype explicitly using the `dtype` kwarg.
Original error is below:
------------------------
TypeError("Multiple namespaces for array inputs: {<module 'array_api_compat.dask.array' from 'C:\\\\Users\\\\matth\\\\miniforge3\\\\envs\\\\scipy-dev\\\\Lib\\\\site-packages\\\\array_api_compat\\\\dask\\\\array\\\\__init__.py'>, <module 'array_api_compat.numpy' from 'C:\\\\Users\\\\matth\\\\miniforge3\\\\envs\\\\scipy-dev\\\\Lib\\\\site-packages\\\\array_api_compat\\\\numpy\\\\__init__.py'>}")
Traceback:
---------
  File "C:\Users\matth\miniforge3\envs\scipy-dev\Lib\site-packages\dask\array\core.py", line 481, in apply_infer_dtype
    o = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\matth\miniforge3\envs\scipy-dev\Lib\site-packages\array_api_extra\_lib\_funcs.py", line 177, in _apply_where
    return at(out, cond).set(temp1)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\matth\miniforge3\envs\scipy-dev\Lib\site-packages\array_api_extra\_lib\_at.py", line 358, in set
    return self._op(_AtOp.SET, None, None, y, copy=copy, xp=xp)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\matth\miniforge3\envs\scipy-dev\Lib\site-packages\array_api_extra\_lib\_at.py", line 270, in _op
    xp = array_namespace(x, y) if xp is None else xp
         ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\matth\miniforge3\envs\scipy-dev\Lib\site-packages\array_api_compat\common\_helpers.py", line 587, in array_namespace
    raise TypeError(f"Multiple namespaces for array inputs: {namespaces}")

1 reply

crusaderky Jun 3, 2025
Author

Yes, these are exactly because of it.
This fixes it:

- xpx.apply_where(round_x, x, xp.round, fill_value=x)
+ mxp = array_namespace(x._meta) if is_dask_array(x) else xp
+ xpx.apply_where(round_x, x, mxp.round, fill_value=x)

mdhaber · 2025-06-06T21:18:26Z

mdhaber
Jun 6, 2025

Might as well throw an option 5 into the mix:

Require that callables detect the namespace of the input arrays with array_namespace.

We are pretty used to doing this, so I don't mind it if it's the difference between Dask working and not working. The downside is it would probably mean that lambdas are out; we'd need to def a function and use ... = array_namespace(...) as the first line.

I also wouldn't mind something like 2 or 3 - but maybe just require that the callable accept xp (or whatever it is named). In the recent work on stats.variation, stats.pearsonr, it wasn't immediately obvious that the Dask errors were related to this issue. I think insisting on passing xp (or similar) would have prevented this.

I'm not a fan of 1 right now. This issue alone doesn't establish sufficient need for something like that, especially if it's something that Dask could handle down the line. There might be a real need at some point, but probably not here.

I'd like to avoid using the term meta to refer to the underlying namespace. I understand that Dask stores an 0-d instance of the underlying array type as _meta, but I think that use of the prefix "meta" is in the sense of "metadata" (information about the array). This does not pave the way for using the term "meta" to mean the same as underyling, which is almost opposite what I would have expected it to mean.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dask meta-namespace in `apply_where` #196

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Dask meta-namespace in apply_where #196

Uh oh!

Uh oh!

crusaderky Mar 18, 2025

State of the art

Proposed solutions

What about lazy_apply?

Replies: 2 comments · 1 reply

Uh oh!

Uh oh!

mdhaber Apr 13, 2025

Uh oh!

crusaderky Jun 3, 2025 Author

Uh oh!

mdhaber Jun 6, 2025

Dask meta-namespace in `apply_where` #196

crusaderky
Mar 18, 2025

What about `lazy_apply`?

Replies: 2 comments 1 reply

mdhaber
Apr 13, 2025

crusaderky Jun 3, 2025
Author

mdhaber
Jun 6, 2025