Skip to content

Conversation

pytorchbot
Copy link
Collaborator

@pytorchbot pytorchbot commented Aug 30, 2025

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #13815 by @SS-JIA
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/315/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/315/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/314/orig
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/315/orig
@diff-train-skip-merge

cc @SS-JIA @manuelcandales @cbilgin

ssjia added 2 commits August 29, 2025 17:33
…upgrade glslc

Pull Request resolved: #13814

## Motivation

Prepare for shaders that will use accelerated int8 dot product GLSL extensions, i.e. `dotPacked4x8AccSatEXT`

## Changes

* Query for support for the shader integer dot product extension when creating the VkPhysicalDevice
* Request the shader integer dot product extension when creating VkDevice
* Provide APIs to check if the extension is available in the current runtime.
ghstack-source-id: 306632732
@exported-using-ghexport

Differential Revision: [D81323427](https://our.internmc.facebook.com/intern/diff/D81323427/)
…ulkan operator testing to CI

Pull Request resolved: #13815

## Motivation

Provide an easy way to test and benchmark custom operators when developing them.

## Changes

Introduces a custom op test suite under `backends/vulkan/test/custom_ops`. Each operator will have its own test file, as seen in the next diff. `utils.[h|cpp]` define common utilities that can be used across test files.

To facilitate prototyping, prototype shaders and C++ host code can be placed under the `impl/` and `glsl` folders.

Output of the test binary looks like:

```
=== Compute Shader Performance Benchmark ===
Add Operation Prototyping Framework
----------------------------------------------------------------------
Executing 32 test cases for Add
----------------------------------------------------------------------
Add_1x64x64_Texture3D_Float                                                                 [1x64x64]               3.094 μs                1.324 GFLOP/s     PASSED
Add_1x64x64_Texture3D_Half                                                                  [1x64x64]               2.574 μs                1.591 GFLOP/s    SKIPPED
Add_1x64x64_Buffer_Float                                                                    [1x64x64]               3.084 μs                1.328 GFLOP/s     PASSED
Add_1x64x64_Buffer_Half                                                                     [1x64x64]               2.668 μs                1.535 GFLOP/s    SKIPPED
Add_1x128x128_Texture3D_Float                                                             [1x128x128]               6.001 μs                2.730 GFLOP/s     PASSED
Add_1x128x128_Texture3D_Half                                                              [1x128x128]               4.004 μs                4.092 GFLOP/s    SKIPPED
Add_1x128x128_Buffer_Float                                                                [1x128x128]               6.074 μs                2.698 GFLOP/s     PASSED
Add_1x128x128_Buffer_Half                                                                 [1x128x128]               5.112 μs                3.205 GFLOP/s    SKIPPED
Add_1x256x256_Texture3D_Float                                                             [1x256x256]              17.852 μs                3.671 GFLOP/s     PASSED
Add_1x256x256_Texture3D_Half                                                              [1x256x256]              10.057 μs                6.517 GFLOP/s    SKIPPED
Add_1x256x256_Buffer_Float                                                                [1x256x256]              19.027 μs                3.444 GFLOP/s     PASSED
Add_1x256x256_Buffer_Half                                                                 [1x256x256]              15.330 μs                4.275 GFLOP/s    SKIPPED
Add_1x512x512_Texture3D_Float                                                             [1x512x512]              48.292 μs                5.428 GFLOP/s     PASSED
Add_1x512x512_Texture3D_Half                                                              [1x512x512]              26.832 μs                9.770 GFLOP/s    SKIPPED
Add_1x512x512_Buffer_Float                                                                [1x512x512]              48.828 μs                5.369 GFLOP/s     PASSED
Add_1x512x512_Buffer_Half                                                                 [1x512x512]              48.308 μs                5.427 GFLOP/s    SKIPPED
Add_1x1x1024_Texture3D_Float                                                               [1x1x1024]               2.376 μs                0.431 GFLOP/s     PASSED
Add_1x1x1024_Texture3D_Half                                                                [1x1x1024]               2.215 μs                0.462 GFLOP/s    SKIPPED
Add_1x1x1024_Buffer_Float                                                                  [1x1x1024]               2.402 μs                0.426 GFLOP/s     PASSED
Add_1x1x1024_Buffer_Half                                                                   [1x1x1024]               2.304 μs                0.445 GFLOP/s    SKIPPED
Add_1x1024x1_Texture3D_Float                                                               [1x1024x1]               6.120 μs                0.167 GFLOP/s     PASSED
Add_1x1024x1_Texture3D_Half                                                                [1x1024x1]               6.245 μs                0.164 GFLOP/s    SKIPPED
Add_1x1024x1_Buffer_Float                                                                  [1x1024x1]               2.392 μs                0.428 GFLOP/s     PASSED
Add_1x1024x1_Buffer_Half                                                                   [1x1024x1]               2.304 μs                0.445 GFLOP/s    SKIPPED
Add_32x32x32_Texture3D_Float                                                               [32x32x32]              10.249 μs                3.197 GFLOP/s     PASSED
Add_32x32x32_Texture3D_Half                                                                [32x32x32]               6.583 μs                4.978 GFLOP/s    SKIPPED
Add_32x32x32_Buffer_Float                                                                  [32x32x32]              10.468 μs                3.130 GFLOP/s     PASSED
Add_32x32x32_Buffer_Half                                                                   [32x32x32]               8.481 μs                3.864 GFLOP/s    SKIPPED
Add_16x128x64_Texture3D_Float                                                             [16x128x64]              26.000 μs                5.041 GFLOP/s     PASSED
Add_16x128x64_Texture3D_Half                                                              [16x128x64]              17.841 μs                7.347 GFLOP/s    SKIPPED
Add_16x128x64_Buffer_Float                                                                [16x128x64]              28.917 μs                4.533 GFLOP/s     PASSED
Add_16x128x64_Buffer_Half                                                                 [16x128x64]              28.792 μs                4.552 GFLOP/s    SKIPPED
```

`SKIPPED` means that correctness checking is not performed on that test case. This usually happens in one of the following cases:

* Input/output dtype is fp16. There is no fp16 dtype support in reference calculation functions
* Input sizes are too big. Since reference calculation functions are implemented in a naive manner, calculating reference data may take too long for large inputs. Larger test cases are usually meant to tests performance, not correctness.
ghstack-source-id: 306632731
@exported-using-ghexport

Differential Revision: [D81323426](https://our.internmc.facebook.com/intern/diff/D81323426/)
@pytorch-bot pytorch-bot bot added the module: vulkan Issues related to the Vulkan delegate and code under backends/vulkan/ label Aug 30, 2025
Copy link

pytorch-bot bot commented Aug 30, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13835

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 6 New Failures, 19 Pending

As of commit 760d3a5 with merge base e2098f8 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 30, 2025
Base automatically changed from gh/SS-JIA/314/orig to main August 30, 2025 13:18
@SS-JIA SS-JIA merged commit 1520f9f into main Aug 30, 2025
107 of 120 checks passed
@SS-JIA SS-JIA deleted the gh/SS-JIA/315/orig branch August 30, 2025 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: vulkan Issues related to the Vulkan delegate and code under backends/vulkan/
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants