Skip to content

Add size threshold to prevent constant folding from inflating model memory footprint#28204

Draft
Copilot wants to merge 6 commits intomainfrom
copilot/add-threshold-for-constant-folding
Draft

Add size threshold to prevent constant folding from inflating model memory footprint#28204
Copilot wants to merge 6 commits intomainfrom
copilot/add-threshold-for-constant-folding

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 23, 2026

Description

Adds kOrtSessionOptionsConfigConstantFoldingNodeWeightSizeThreshold ("session.constant_folding_node_weight_size_threshold") — a session config option that caps the maximum net memory increase from constant-folding any single node. A node is skipped if total_output_size - freed_input_size > threshold, where freed_input_size accounts for input initializers that are exclusively consumed by the node being folded (and will be deleted after folding).

  • onnxruntime_session_options_config_keys.h: New config key. Value is a non-negative decimal integer; "0" (default) disables the check, preserving existing behavior.
  • constant_folding.cc: At the start of each ApplyImpl pass, reads the threshold once. For each candidate node, the estimated net size increase is computed from NodeArg type/shape info (element type × product of concrete dim values) before OptimizerExecutionFrame is created or any kernel is run. Input initializers with a single consumer are counted as freed. If the net increase exceeds the threshold, the node is skipped with an INFO-level log. When a dimension is symbolic or unknown, the pre-check is skipped and the node proceeds normally.
  • graph_transform_test.cc: ConstantFoldingWithSizeThreshold — verifies multiple scenarios: no threshold folds all nodes; a 1-byte threshold still folds Unsqueeze (net increase = 0 since inputs are exclusively consumed at equal size); a Tile graph with a 788-byte net increase is blocked by a 100-byte threshold and allowed by a 1000-byte threshold.

Example usage:

SessionOptions so;
so.config_options.AddConfigEntry(
    kOrtSessionOptionsConfigConstantFoldingNodeWeightSizeThreshold, "1048576"); // 1 MB cap

Motivation and Context

Constant folding materializes computed tensors as graph initializers. Without a size limit, a single large folded node can dramatically increase the in-memory model size relative to the original. This threshold gives users control over the memory/optimization trade-off. The check accounts for inputs that will be freed after folding, so only the true net memory increase is compared against the threshold. The check is performed before computation to avoid wasting CPU and memory on tensors that will ultimately be discarded.

Copilot AI and others added 2 commits April 23, 2026 12:18
- Add kOrtSessionOptionsConfigConstantFoldingNodeWeightSizeThreshold to onnxruntime_session_options_config_keys.h
- Modify ConstantFolding::ApplyImpl to skip folding when computed output exceeds the threshold
- Add ConstantFoldingWithSizeThreshold test to graph_transform_test.cc

Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/68d58b05-bfab-48bc-9539-9d3ef5f4091e

Co-authored-by: xadupre <22452781+xadupre@users.noreply.github.com>
Copilot AI changed the title [WIP] Add a threshold to disable constant folding when memory exceeds limit Add size threshold to prevent constant folding from inflating model memory footprint Apr 23, 2026
Copilot AI requested a review from xadupre April 23, 2026 12:21
}
}

// If a size threshold was configured, check whether any output tensor exceeds it.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot you need to move the test before the computation happens and use the estimated size before computation

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 38c9da7. The check now runs before the computation: for each output NodeArg, the elem type and all concrete dim values are read from the TypeAsProto, and estimated_size = num_elements * elem_size is compared against the threshold before OptimizerExecutionFrame is created or the kernel is run. If any dim is symbolic/unknown the pre-check is skipped for that output (safe fallback).

for (const OrtValue& ort_value : fetches) {
if (ort_value.IsTensor()) {
const size_t tensor_size = ort_value.Get<Tensor>().SizeInBytes();
if (tensor_size > output_size_threshold) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot if inputs are deleted, you need to take that into account and only consider the increase

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in dea8bad. The threshold check now computes the net memory increase: total_output_size - freed_input_size, where freed_input_size sums the sizes of input initializers that are exclusively consumed by this node (consumer count == 1). If the net increase ≤ threshold the node is folded; if > threshold it is skipped. For the Unsqueeze model, inputs are exclusively consumed (same size as outputs), so net = 0 and any non-zero threshold still allows folding. The updated test reflects this: Case 2 (1-byte threshold) now asserts 0 Unsqueeze remaining. Cases 3a/3b add a Tile graph with a genuine 788-byte net increase and verify the threshold blocks (100 bytes) or allows (1000 bytes) folding.

Per reviewer feedback, estimate output size from NodeArg type/shape info
before running the kernel. This avoids the wasted computation when the
folded output would exceed the configured threshold.

- Read elem_type and concrete shape dims from each output NodeArg
- If all dims are known static values, compute estimated_size = num_elements * elem_size
- If estimated_size > output_size_threshold, skip node before OptimizerExecutionFrame is created
- When shape is unknown (symbolic dims), proceed with computation (no pre-check possible)
- Remove the old post-computation threshold check

Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/d9f1c887-e20e-4f28-b8f0-a7362922ce19

Co-authored-by: xadupre <22452781+xadupre@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add a threshold above which constant folding is disabled

2 participants