Xnnpack: Support clone.default with skip_dim_order=True#19797
Conversation
With the default XNNPACK test config, skip_dim_order=False rewrites aten.clone.default to dim_order_ops._clone_dim_order.default. That path is already supported through CloneDimOrderConfig. Some XNNPACK export flows use skip_dim_order=True, where aten.clone.default stays as aten.clone.default and is not selected by the partitioner. Adds CloneConfig for dim-order-preserving aten.clone.default nodes so this path is partitioned directly. This reduces delegate splits in the EdgeTAM mask decoder, where profiling exports use skip_dim_order=True. Signed-off-by: Måns Nilsson <mans.nilsson@arm.com> Change-Id: Ic48ec187f26048b68a805c6edd6dad41b3dab481
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19797
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New Failure, 2 Cancelled Jobs, 1 Unrelated FailureAs of commit 63fb2ae with merge base ee4c90a ( NEW FAILURE - The following job has failed:
CANCELLED JOBS - The following jobs were cancelled. Please retry:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
Adds direct XNNPACK partitioning support for dim-order-preserving aten.clone.default when Edge export is run with skip_dim_order=True, reducing delegate segmentation in flows that bypass the dim-order rewrite.
Changes:
- Add
CloneConfigto partition plainclone.defaultnodes when input/outputdim_ordermatches. - Register
CloneConfigin the XNNPACK partitioner config set. - Extend XNNPACK clone op serialization to explicitly define input/output tensors with quant params, and add a new test covering the
skip_dim_order=Truepartitioning path.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| backends/xnnpack/test/ops/test_clone.py | Adds a test for skip_dim_order=True to ensure aten.clone.default is delegated (and proposes adding a negative/regression case). |
| backends/xnnpack/partition/config/generic_node_configs.py | Introduces CloneConfig to partition dim-order-preserving clone.default. |
| backends/xnnpack/partition/config/init.py | Exposes/registers CloneConfig in ALL_PARTITIONER_CONFIGS. |
| backends/xnnpack/operators/op_clone.py | Updates CloneVisitor tensor definition to use QuantParams for input/output tensors. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def test_fp32_clone_default_partitions_with_skip_dim_order(self): | ||
| """Test plain aten.clone.default partitioning without dim-order rewrite.""" | ||
| inputs = (torch.randn(2, 3, 4, 5),) | ||
| ( | ||
| Tester(self.Clone(), inputs) | ||
| .export() | ||
| .check_count({"torch.ops.aten.clone.default": 1}) | ||
| .to_edge_transform_and_lower( | ||
| ToEdgeTransformAndLower( | ||
| edge_compile_config=get_xnnpack_edge_compile_config( | ||
| skip_dim_order=True | ||
| ) | ||
| ) | ||
| ) | ||
| .check_count({"torch.ops.higher_order.executorch_call_delegate": 1}) | ||
| .check_not( | ||
| [ | ||
| "executorch_exir_dialects_edge__ops_aten_clone_default", | ||
| "executorch_exir_dialects_edge__ops_dim_order_ops__clone_dim_order_default", | ||
| ] | ||
| ) | ||
| .to_executorch() | ||
| .serialize() | ||
| .run_method_and_compare_outputs() | ||
| ) |
With the default XNNPACK test config, skip_dim_order=False rewrites aten.clone.default to dim_order_ops._clone_dim_order.default. That path is already supported through CloneDimOrderConfig.
Some XNNPACK export flows use skip_dim_order=True, where aten.clone.default stays as aten.clone.default and is not selected by the partitioner.
Adds CloneConfig for dim-order-preserving aten.clone.default nodes so this path is partitioned directly.
This reduces delegate splits in the EdgeTAM mask decoder, where profiling exports use skip_dim_order=True.
cc @GregoryComer @digantdesai @cbilgin @freddan80 @per @zingo @oscarandersson8218 @Sebastian-Larsson @robell @rascani