v0.12.0
Highlights
- Faster quantized matmul
- Up to 40% faster QLoRA or prompt processing, some numbers
Core
mx.synchronize
to wait for computation dispatched withmx.async_eval
mx.radians
andmx.degrees
mx.metal.clear_cache
to return to the OS the memory held by MLX as a cache for future allocations- Change quantization to always represent 0 exactly (relevant issue)
Bugfixes
- Fixed quantization of a block with all 0s that produced NaNs
- Fixed the
len
field in the buffer protocol implementation