v0.17.0

barronalex released this 23 Aug 18:48

· 297 commits to main since this release

684e11c

Highlights

mx.einsum: PR
Big speedups in reductions: benchmarks
2x faster model loading: PR
mx.fast.metal_kernel for custom GPU kernels: docs

Core

Faster program exits
Laplace sampling
mx.nan_to_num
nn.tanh gelu approximation
Fused GPU quantization ops
Faster group norm
bf16 winograd conv
vmap support for mx.scatter
mx.pad "edge" padding
More numerically stable mx.var
mx.linalg.cholesky_inv/mx.linalg.tri_inv
mx.isfinite
Complex mx.sign now mirrors NumPy 2.0 behaviour
More flexible mx.fast.rope
Update to nanobind 2.1

Bug Fixes

gguf zero initialization
expm1f overflow handling
bfloat16 hadamard
large arrays for various ops
rope fix
bf16 array creation
preserve dtype in nn.Dropout
nn.TransformerEncoder with norm_first=False
excess copies from contiguity bug

Assets 2