Add size threshold to prevent constant folding from inflating model memory footprint#28204
Add size threshold to prevent constant folding from inflating model memory footprint#28204
Conversation
- Add kOrtSessionOptionsConfigConstantFoldingNodeWeightSizeThreshold to onnxruntime_session_options_config_keys.h - Modify ConstantFolding::ApplyImpl to skip folding when computed output exceeds the threshold - Add ConstantFoldingWithSizeThreshold test to graph_transform_test.cc Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/68d58b05-bfab-48bc-9539-9d3ef5f4091e Co-authored-by: xadupre <22452781+xadupre@users.noreply.github.com>
Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/68d58b05-bfab-48bc-9539-9d3ef5f4091e Co-authored-by: xadupre <22452781+xadupre@users.noreply.github.com>
| } | ||
| } | ||
|
|
||
| // If a size threshold was configured, check whether any output tensor exceeds it. |
There was a problem hiding this comment.
@copilot you need to move the test before the computation happens and use the estimated size before computation
There was a problem hiding this comment.
Done in 38c9da7. The check now runs before the computation: for each output NodeArg, the elem type and all concrete dim values are read from the TypeAsProto, and estimated_size = num_elements * elem_size is compared against the threshold before OptimizerExecutionFrame is created or the kernel is run. If any dim is symbolic/unknown the pre-check is skipped for that output (safe fallback).
| for (const OrtValue& ort_value : fetches) { | ||
| if (ort_value.IsTensor()) { | ||
| const size_t tensor_size = ort_value.Get<Tensor>().SizeInBytes(); | ||
| if (tensor_size > output_size_threshold) { |
There was a problem hiding this comment.
@copilot if inputs are deleted, you need to take that into account and only consider the increase
There was a problem hiding this comment.
Done in dea8bad. The threshold check now computes the net memory increase: total_output_size - freed_input_size, where freed_input_size sums the sizes of input initializers that are exclusively consumed by this node (consumer count == 1). If the net increase ≤ threshold the node is folded; if > threshold it is skipped. For the Unsqueeze model, inputs are exclusively consumed (same size as outputs), so net = 0 and any non-zero threshold still allows folding. The updated test reflects this: Case 2 (1-byte threshold) now asserts 0 Unsqueeze remaining. Cases 3a/3b add a Tile graph with a genuine 788-byte net increase and verify the threshold blocks (100 bytes) or allows (1000 bytes) folding.
Per reviewer feedback, estimate output size from NodeArg type/shape info before running the kernel. This avoids the wasted computation when the folded output would exceed the configured threshold. - Read elem_type and concrete shape dims from each output NodeArg - If all dims are known static values, compute estimated_size = num_elements * elem_size - If estimated_size > output_size_threshold, skip node before OptimizerExecutionFrame is created - When shape is unknown (symbolic dims), proceed with computation (no pre-check possible) - Remove the old post-computation threshold check Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/d9f1c887-e20e-4f28-b8f0-a7362922ce19 Co-authored-by: xadupre <22452781+xadupre@users.noreply.github.com>
Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/505dd4fb-8403-4ebe-98cd-ec3a1ff16fd6 Co-authored-by: xadupre <22452781+xadupre@users.noreply.github.com>
Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/505dd4fb-8403-4ebe-98cd-ec3a1ff16fd6 Co-authored-by: xadupre <22452781+xadupre@users.noreply.github.com>
Description
Adds
kOrtSessionOptionsConfigConstantFoldingNodeWeightSizeThreshold("session.constant_folding_node_weight_size_threshold") — a session config option that caps the maximum net memory increase from constant-folding any single node. A node is skipped iftotal_output_size - freed_input_size > threshold, wherefreed_input_sizeaccounts for input initializers that are exclusively consumed by the node being folded (and will be deleted after folding).onnxruntime_session_options_config_keys.h: New config key. Value is a non-negative decimal integer;"0"(default) disables the check, preserving existing behavior.constant_folding.cc: At the start of eachApplyImplpass, reads the threshold once. For each candidate node, the estimated net size increase is computed fromNodeArgtype/shape info (element type × product of concrete dim values) beforeOptimizerExecutionFrameis created or any kernel is run. Input initializers with a single consumer are counted as freed. If the net increase exceeds the threshold, the node is skipped with an INFO-level log. When a dimension is symbolic or unknown, the pre-check is skipped and the node proceeds normally.graph_transform_test.cc:ConstantFoldingWithSizeThreshold— verifies multiple scenarios: no threshold folds all nodes; a 1-byte threshold still folds Unsqueeze (net increase = 0 since inputs are exclusively consumed at equal size); a Tile graph with a 788-byte net increase is blocked by a 100-byte threshold and allowed by a 1000-byte threshold.Example usage:
SessionOptions so; so.config_options.AddConfigEntry( kOrtSessionOptionsConfigConstantFoldingNodeWeightSizeThreshold, "1048576"); // 1 MB capMotivation and Context
Constant folding materializes computed tensors as graph initializers. Without a size limit, a single large folded node can dramatically increase the in-memory model size relative to the original. This threshold gives users control over the memory/optimization trade-off. The check accounts for inputs that will be freed after folding, so only the true net memory increase is compared against the threshold. The check is performed before computation to avoid wasting CPU and memory on tensors that will ultimately be discarded.