Releases
v0.13.0
Highlights
Block sparse matrix multiply speeds up MoEs by >2x
Improved quantization algorithm should work well for all networks
Improved gpu command submission speeds up training and inference
Core
Bitwise ops added:
mx.bitwise_[or|and|xor]
, mx.[left|right]_shift
, operator overloads
Groups added to Conv1d
Added mx.metal.device_info
to get better informed memory limits
Added resettable memory stats
mlx.optimizers.clip_grad_norm
and mlx.utils.tree_reduce
added
Add mx.arctan2
Unary ops now accept array-like inputs ie one can do mx.sqrt(2)
Bugfixes
Fixed shape for slice update
Bugfix in quantize that used slightly wrong scales/biases
Fixed memory leak for multi-output primitives encountered with gradient checkpointing
Fixed conversion from other frameworks for all datatypes
Fixed index overflow for matmul with large batch size
Fixed initialization ordering that occasionally caused segfaults
You can’t perform that action at this time.