Skip to content

feat(quantization): add calibration cache to quantize_static#28221

Open
Rishi-Dave wants to merge 2 commits intomicrosoft:mainfrom
Rishi-Dave:rishidave/feat/calibration-cache
Open

feat(quantization): add calibration cache to quantize_static#28221
Rishi-Dave wants to merge 2 commits intomicrosoft:mainfrom
Rishi-Dave:rishidave/feat/calibration-cache

Conversation

@Rishi-Dave
Copy link
Copy Markdown
Contributor

Summary

  • Add an optional calibration_cache_path parameter to quantize_static() so users can save and reload the calibration result (TensorsData) across runs.
  • Avoids re-running the expensive calibration inference pass when iterating on post-calibration options such as nodes_to_exclude, activation_type, or weight_type.
  • Cache format is JSON, mirroring the encoder already used by write_calibration_table — no new serialization surface area.

Motivation

Fixes #21908. Users commonly re-run quantize_static multiple times on the same model and calibration dataset while varying the set of excluded nodes or the quant types, to trade off accuracy vs. speed. Today, every call repeats the full calibration inference loop even though the calibration result is identical, which is costly on large calibration datasets. There was no supported way to persist the computed tensor ranges — write_calibration_table writes a lossy table (drops histogram data) and has no paired reader. This PR closes that gap.

Changes

  • python/tools/quantization/calibrate.py:
    • Add TensorData.from_dict and TensorsData.from_dict classmethods (inverse of existing to_dict).
    • Add module-level _CalibrationCacheEncoder(json.JSONEncoder), save_tensors_data(tensors, path), and load_tensors_data(path). The encoder handles TensorData/TensorsData/np.ndarray/CalibrationMethod/numpy scalars. Writes are atomic (tmp file + os.replace) and auto-create parent directories.
  • python/tools/quantization/quantize.py:
    • quantize_static gains calibration_cache_path: str | Path | None = None. If the path exists, calibration is skipped and ranges are loaded from the cache. If the path is new, calibration runs and the result is saved. Raises ValueError if the cached calibration_method does not match the caller's calibrate_method.
    • calibration_data_reader becomes optional; at least one of it or an existing cache must be provided, else ValueError.
  • python/tools/quantization/__init__.py: export TensorData, TensorsData, save_tensors_data, load_tensors_data.
  • Tests: new TestCalibrationCache in test/python/quantization/test_calibration.py covering MinMax roundtrip, Entropy roundtrip (with histogram), missing-path error, parent-dir auto-creation, numpy scalar bins handling, method-mismatch guard, end-to-end quantize_static cache hit/miss, and ValueError when neither reader nor cache is provided.

Test Plan

  • python -m pytest onnxruntime/test/python/quantization/test_calibration.py::TestCalibrationCache -v
  • python -m pytest onnxruntime/test/python/quantization/test_calibration.py::TestCalibrateMinMaxCalibrator -v (regression)
  • lintrunner -a on changed files: clean.

Backward Compatibility

calibration_data_reader changes from required-positional to optional-keyword. Existing call sites — whether positional or keyword — continue to work unchanged. The new behavior is only engaged when calibration_cache_path is provided.

Introduce an optional calibration_cache_path parameter on quantize_static
so users can save the computed TensorsData after calibration and reload
it on subsequent runs. This avoids repeating the expensive calibration
inference pass when only post-calibration knobs (e.g. nodes_to_exclude,
quant types) change between runs.

The cache is a human-readable JSON file whose schema mirrors the encoder
used by write_calibration_table: TensorData / TensorsData round-trip
through new from_dict classmethods and module-level save_tensors_data /
load_tensors_data helpers in calibrate.py. calibration_data_reader is now
optional; at least one of it or an existing cache file must be provided.

Fixes microsoft#21908
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a JSON-backed calibration cache for Python static quantization to allow reusing TensorsData across runs and skipping repeated calibration inference when inputs are unchanged.

Changes:

  • Add JSON serialization/deserialization utilities for TensorData/TensorsData (save/load calibration caches).
  • Extend quantize_static() with optional calibration_cache_path and make calibration_data_reader optional when an existing cache is provided.
  • Add test coverage for cache roundtrips and quantize_static cache hit/miss behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
onnxruntime/python/tools/quantization/calibrate.py Adds from_dict support plus JSON cache encoder and save_tensors_data / load_tensors_data.
onnxruntime/python/tools/quantization/quantize.py Adds calibration_cache_path to quantize_static and reuses cached TensorsData when present.
onnxruntime/python/tools/quantization/init.py Exports cache-related helpers and data structures via onnxruntime.quantization.
onnxruntime/test/python/quantization/test_calibration.py Adds TestCalibrationCache covering serialization and end-to-end cache usage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/python/tools/quantization/calibrate.py
Comment thread onnxruntime/python/tools/quantization/quantize.py
Comment thread onnxruntime/python/tools/quantization/quantize.py
- load_tensors_data: reject non-file paths up front with a ValueError instead
  of letting Path.open raise IsADirectoryError or similar.
- quantize_static: when extra_options['SmoothQuant']=True, require a non-None
  calibration_data_reader since the cache stores per-tensor ranges only and
  cannot drive the SmoothQuant transform.
- quantize_static: treat the cache path as a hit only when it is a regular
  file; raise ValueError if it exists but is e.g. a directory, so callers get
  a clear message instead of a low-level IOError.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] May the Calibration Cache in the roadmap?

2 participants