Skip to content
Merged
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions ReleaseNotes.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,56 @@
# Release Notes

## New in Release 2.16.0

Post-training Quantization:

- Breaking changes:
- ...
- General:
- ...
- Features:
- (Torch) Introduced a novel weight compression method to significantly improve the accuracy of Large Language Models (LLMs) with int4 weights. Leveraging Quantization-Aware Training (QAT) and absorbable LoRA adapters, this approach can achieve a 2x reduction in accuracy loss during compression compared to the best post-training weight compression technique in NNCF (Scale Estimation + AWQ + GPTQ). The `nncf.compress_weight` API now includes a new `compression_format` option, `CompressionFormat.FQ_LORA`, for this QAT method, and a sample compression pipeline with preview support is available [here](examples/llm_compression/torch/qat_with_lora).
- (Torch) Add support for 4-bit weight compression, along with AWQ and Scale Estimation data-aware methods to reduce quality loss after compression.
Comment thread
nikita-malininn marked this conversation as resolved.
Outdated
- Fixes:
- Fixed occasional failures of weight compression algorithm on ARM CPUs.
Comment thread
nikita-malininn marked this conversation as resolved.
Outdated
- (Torch) Fixed weight compression for float16/bfloat16 models.
Comment thread
nikita-malininn marked this conversation as resolved.
Outdated
- Improvements:
- Reduced the run time and peak memory of mixed precision assignment procedure during weight compression in the OpenVINO backend. Overall compression time reduction in mixed precision case is about 20-40%; peak memory reduction is about 20%.
Comment thread
nikita-malininn marked this conversation as resolved.
Outdated
- (TorchFX, Experimental) Added quantization support for (TorchFX)[https://pytorch.org/docs/stable/fx.html] models exported with dynamic shapes.
Comment thread
nikita-malininn marked this conversation as resolved.
Outdated
- Deprecations/Removals:
- ...
- Tutorials:
- ...
- Known issues:
- ...

Compression-aware training:

- Breaking changes:
- ...
- General:
- ...
- Features:
- ...
- Fixes:
- ...
- Improvements:
- ...
- Deprecations/Removals:
- ...
- Tutorials:
- ...
- Known issues:
- ...

Deprecations/Removals:

- ...

Requirements:

- Updated PyTorch (2.6.0) and Torchvision (0.21.0) versions.
Comment thread
nikita-malininn marked this conversation as resolved.

## New in Release 2.15.0

Post-training Quantization:
Expand Down