feat(quantization): add calibration cache to quantize_static#28221
Open
Rishi-Dave wants to merge 2 commits intomicrosoft:mainfrom
Open
feat(quantization): add calibration cache to quantize_static#28221Rishi-Dave wants to merge 2 commits intomicrosoft:mainfrom
Rishi-Dave wants to merge 2 commits intomicrosoft:mainfrom
Conversation
Introduce an optional calibration_cache_path parameter on quantize_static so users can save the computed TensorsData after calibration and reload it on subsequent runs. This avoids repeating the expensive calibration inference pass when only post-calibration knobs (e.g. nodes_to_exclude, quant types) change between runs. The cache is a human-readable JSON file whose schema mirrors the encoder used by write_calibration_table: TensorData / TensorsData round-trip through new from_dict classmethods and module-level save_tensors_data / load_tensors_data helpers in calibrate.py. calibration_data_reader is now optional; at least one of it or an existing cache file must be provided. Fixes microsoft#21908
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a JSON-backed calibration cache for Python static quantization to allow reusing TensorsData across runs and skipping repeated calibration inference when inputs are unchanged.
Changes:
- Add JSON serialization/deserialization utilities for
TensorData/TensorsData(save/load calibration caches). - Extend
quantize_static()with optionalcalibration_cache_pathand makecalibration_data_readeroptional when an existing cache is provided. - Add test coverage for cache roundtrips and
quantize_staticcache hit/miss behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| onnxruntime/python/tools/quantization/calibrate.py | Adds from_dict support plus JSON cache encoder and save_tensors_data / load_tensors_data. |
| onnxruntime/python/tools/quantization/quantize.py | Adds calibration_cache_path to quantize_static and reuses cached TensorsData when present. |
| onnxruntime/python/tools/quantization/init.py | Exports cache-related helpers and data structures via onnxruntime.quantization. |
| onnxruntime/test/python/quantization/test_calibration.py | Adds TestCalibrationCache covering serialization and end-to-end cache usage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- load_tensors_data: reject non-file paths up front with a ValueError instead of letting Path.open raise IsADirectoryError or similar. - quantize_static: when extra_options['SmoothQuant']=True, require a non-None calibration_data_reader since the cache stores per-tensor ranges only and cannot drive the SmoothQuant transform. - quantize_static: treat the cache path as a hit only when it is a regular file; raise ValueError if it exists but is e.g. a directory, so callers get a clear message instead of a low-level IOError.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
calibration_cache_pathparameter toquantize_static()so users can save and reload the calibration result (TensorsData) across runs.nodes_to_exclude,activation_type, orweight_type.write_calibration_table— no new serialization surface area.Motivation
Fixes #21908. Users commonly re-run
quantize_staticmultiple times on the same model and calibration dataset while varying the set of excluded nodes or the quant types, to trade off accuracy vs. speed. Today, every call repeats the full calibration inference loop even though the calibration result is identical, which is costly on large calibration datasets. There was no supported way to persist the computed tensor ranges —write_calibration_tablewrites a lossy table (drops histogram data) and has no paired reader. This PR closes that gap.Changes
python/tools/quantization/calibrate.py:TensorData.from_dictandTensorsData.from_dictclassmethods (inverse of existingto_dict)._CalibrationCacheEncoder(json.JSONEncoder),save_tensors_data(tensors, path), andload_tensors_data(path). The encoder handlesTensorData/TensorsData/np.ndarray/CalibrationMethod/numpy scalars. Writes are atomic (tmp file +os.replace) and auto-create parent directories.python/tools/quantization/quantize.py:quantize_staticgainscalibration_cache_path: str | Path | None = None. If the path exists, calibration is skipped and ranges are loaded from the cache. If the path is new, calibration runs and the result is saved. RaisesValueErrorif the cachedcalibration_methoddoes not match the caller'scalibrate_method.calibration_data_readerbecomes optional; at least one of it or an existing cache must be provided, elseValueError.python/tools/quantization/__init__.py: exportTensorData,TensorsData,save_tensors_data,load_tensors_data.TestCalibrationCacheintest/python/quantization/test_calibration.pycovering MinMax roundtrip, Entropy roundtrip (with histogram), missing-path error, parent-dir auto-creation, numpy scalarbinshandling, method-mismatch guard, end-to-endquantize_staticcache hit/miss, andValueErrorwhen neither reader nor cache is provided.Test Plan
python -m pytest onnxruntime/test/python/quantization/test_calibration.py::TestCalibrationCache -vpython -m pytest onnxruntime/test/python/quantization/test_calibration.py::TestCalibrateMinMaxCalibrator -v(regression)lintrunner -aon changed files: clean.Backward Compatibility
calibration_data_readerchanges from required-positional to optional-keyword. Existing call sites — whether positional or keyword — continue to work unchanged. The new behavior is only engaged whencalibration_cache_pathis provided.