[webgpu/js] Optimize resize webgpu op & fix precision issues #23591

xenova · 2025-02-05T21:13:42Z

Description

This PR is a follow-up to #23488 and partially improves upon #23403. It does the following:

Prevents unnecessary cache shader recompilation for 'nearest' resize operation.
Fixes precision (offset-by-one) errors with asymmetric coordinate transform. When running the Kokoro TTS model, values for the /decoder/decoder/generator/f0_upsamp/Resize_output_0 results in differences at the end bounds due to precision issues when dividing 21600 by 72 (should be 300, but seemingly results in 299.999, which causes issues when flooring)

Motivation and Context

I did a deep dive over the weekend to try fix Kokoro TTS on WebGPU and found that the above node had a large difference. Thinking this was a major issue, I spent some time fixing it. Turns out, it only happens for a small number of values, leading to high maximum error, but most values are correct (as seen here).

BEFORE:

[/decoder/decoder/generator/f0_upsamp/Resize_output_0] atol: 78.6640682220459 | rtol: 24.13991587587724 | avgDiff: 0.009967932171121087 | medianDiff: 0.000030517578125

AFTER:

[/decoder/decoder/generator/f0_upsamp/Resize_output_0] atol: 0.0011138916015625 | rtol: 0.0020059924232260704 | avgDiff: 0.00008570214675873825 | medianDiff: 0.000030517578125

So, although it has a very small impact on the final output (waveform), this bug could appear with other models in a more severe way.

BEFORE:

[waveform] atol: 0.04784199967980385 | rtol: 1366.0462001093495 | avgDiff: 0.0009544936942737713 | medianDiff: 0.00015346752479672432

AFTER:

[waveform] atol: 0.04775865003466606 | rtol: 1354.7002460360852 | avgDiff: 0.000954830244055033 | medianDiff: 0.00015274062752723694

guschmue · 2025-02-05T22:56:24Z

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline

azure-pipelines · 2025-02-05T22:56:37Z

Azure Pipelines successfully started running 1 pipeline(s).

guschmue · 2025-02-05T22:56:38Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

guschmue · 2025-02-05T22:56:46Z

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

azure-pipelines · 2025-02-05T22:56:54Z

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

guschmue · 2025-02-05T22:56:56Z

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI

azure-pipelines · 2025-02-05T22:56:58Z

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines · 2025-02-05T22:57:04Z

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

xenova · 2025-02-05T23:09:58Z

CI failed because I forgot to npm run format 🤦

guschmue · 2025-02-05T23:50:32Z

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline

guschmue · 2025-02-05T23:50:41Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

azure-pipelines · 2025-02-05T23:50:43Z

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines · 2025-02-05T23:50:52Z

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

guschmue · 2025-02-05T23:50:59Z

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

guschmue · 2025-02-05T23:51:07Z

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI

azure-pipelines · 2025-02-05T23:51:10Z

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines · 2025-02-05T23:51:15Z

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

guschmue · 2025-02-06T01:46:14Z

one unit test is unhappy:

That test is part of the onnx test suite. You should be able to run this local:

cd js/web
npm test -- model test_resize_downsample_linear -b=webgpu --wasm-number-threads 1

Yeah, a bit painful to make our CI happy :)

xenova · 2025-02-06T12:05:37Z

4c6617a should fix it.

@satyajandhyala I discovered two test cases for which the previous code would produce incorrect results, and I think it could be a good idea to add these to the CI. However, I'm unsure how to do this myself.

(here's the model, same as the one in the opset10 test data).
test_resize_downsample_linear.zip

Test 1: Fractional upscale (e.g., 2.6)

const feeds = {
  X: new ort.Tensor(
    'float32',
    [1, 2, 3, 4, 5, 6, 7, 8],
    [1, 1, 2, 4],
  ),
  scales: new ort.Tensor(
    'float32',
    [1, 1, /* downscale */ 0.6, /* upscale */ 2.6], [4],
  )
};

Test 2: Specific integer upscale leading to floating point issues (300 in this case). A linear search reveals the smallest number this happens for is 37. Also happens for 41, 47, 55, 66, and many more.

const feeds = {
  X: new ort.Tensor(
    'float32',
    [1, 2, 3, 4, 5, 6, 7, 8],
    [1, 1, 2, 4],
  ),
  scales: new ort.Tensor(
    'float32',
    [1, 1, /* downscale */ 0.6, /* upscale */ 300], [4],
  )
};

guschmue · 2025-02-06T16:11:48Z

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline

guschmue · 2025-02-06T16:11:57Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

azure-pipelines · 2025-02-06T16:12:02Z

Azure Pipelines successfully started running 1 pipeline(s).

guschmue · 2025-02-06T16:12:04Z

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

azure-pipelines · 2025-02-06T16:12:08Z

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

guschmue · 2025-02-06T16:12:11Z

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI

azure-pipelines · 2025-02-06T16:12:17Z

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines · 2025-02-06T16:12:20Z

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

Optimize resize webgpu op + fix precision issues

d58804e

xenova changed the title ~~Optimize resize webgpu op & fix precision issues~~ [webgpu/js] Optimize resize webgpu op & fix precision issues Feb 5, 2025

Formatting

fd202cd

Only perform integer divsion when xScale is non-fractional

4c6617a

guschmue approved these changes Feb 6, 2025

View reviewed changes

guschmue merged commit d981b15 into microsoft:main Feb 6, 2025
44 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[webgpu/js] Optimize resize webgpu op & fix precision issues #23591

[webgpu/js] Optimize resize webgpu op & fix precision issues #23591

xenova commented Feb 5, 2025 •

edited

Loading

guschmue commented Feb 5, 2025

azure-pipelines bot commented Feb 5, 2025

guschmue commented Feb 5, 2025

guschmue commented Feb 5, 2025

azure-pipelines bot commented Feb 5, 2025

guschmue commented Feb 5, 2025

azure-pipelines bot commented Feb 5, 2025

azure-pipelines bot commented Feb 5, 2025

xenova commented Feb 5, 2025

guschmue commented Feb 5, 2025

guschmue commented Feb 5, 2025

azure-pipelines bot commented Feb 5, 2025

azure-pipelines bot commented Feb 5, 2025

guschmue commented Feb 5, 2025

guschmue commented Feb 5, 2025

azure-pipelines bot commented Feb 5, 2025

azure-pipelines bot commented Feb 5, 2025

guschmue commented Feb 6, 2025 •

edited

Loading

xenova commented Feb 6, 2025 •

edited

Loading

guschmue commented Feb 6, 2025

guschmue commented Feb 6, 2025

azure-pipelines bot commented Feb 6, 2025

guschmue commented Feb 6, 2025

azure-pipelines bot commented Feb 6, 2025

guschmue commented Feb 6, 2025

azure-pipelines bot commented Feb 6, 2025

azure-pipelines bot commented Feb 6, 2025

[webgpu/js] Optimize resize webgpu op & fix precision issues #23591

[webgpu/js] Optimize resize webgpu op & fix precision issues #23591

Conversation

xenova commented Feb 5, 2025 • edited Loading

Description

Motivation and Context

guschmue commented Feb 5, 2025

azure-pipelines bot commented Feb 5, 2025

guschmue commented Feb 5, 2025

guschmue commented Feb 5, 2025

azure-pipelines bot commented Feb 5, 2025

guschmue commented Feb 5, 2025

azure-pipelines bot commented Feb 5, 2025

azure-pipelines bot commented Feb 5, 2025

xenova commented Feb 5, 2025

guschmue commented Feb 5, 2025

guschmue commented Feb 5, 2025

azure-pipelines bot commented Feb 5, 2025

azure-pipelines bot commented Feb 5, 2025

guschmue commented Feb 5, 2025

guschmue commented Feb 5, 2025

azure-pipelines bot commented Feb 5, 2025

azure-pipelines bot commented Feb 5, 2025

guschmue commented Feb 6, 2025 • edited Loading

xenova commented Feb 6, 2025 • edited Loading

guschmue commented Feb 6, 2025

guschmue commented Feb 6, 2025

azure-pipelines bot commented Feb 6, 2025

guschmue commented Feb 6, 2025

azure-pipelines bot commented Feb 6, 2025

guschmue commented Feb 6, 2025

azure-pipelines bot commented Feb 6, 2025

azure-pipelines bot commented Feb 6, 2025

xenova commented Feb 5, 2025 •

edited

Loading

guschmue commented Feb 6, 2025 •

edited

Loading

xenova commented Feb 6, 2025 •

edited

Loading