[XPU][Fix] Fix large maxpool index #2362

Stonepia · 2025-11-14T09:26:02Z

This is to fix the pytorch/pytorch#167253 . It does the following:

Use index_t instead of int and dispatch kernels accordingly. (follows [CUDA] Large max pool fix pytorch/pytorch#167427)
Use NHWC when output > INT_MAX (follows cuda max_pool2d: switch to NHWC when output > INT_MAX to avoid overflow pytorch/pytorch#167322)
Change other related dtype (like num_wg to index_t to avoid overflow.

Details

Test case:

x = torch.zeros(74, 32, 30090, 81, device=torch.device("xpu"), dtype=torch.bfloat16)
torch.nn.functional.max_pool2d(x, kernel_size=(1,2), stride=(1,2), ceil_mode=False, padding=0)

It will throw the error:

[MaxPool2d] Input shape: [74, 32, 30090, 81] output: [74, 32, 30090, 40]
[MaxPool2d] Strides: n=77993280 c=1 h=2592 w=32
[MaxPool2d] Memory format: ChannelsLast
[MaxPool2d Forward] ChannelsLast path: numBatch=74 numPlane=32 inputH=30090 inputW=81 outputH=30090 outputW=40 index_t=int64
[MaxPool2d Forward] Using vec_size=1 num_wg=-72057583935701024
Segmentation fault from GPU at 0xff00000c04e33000, ctx_id: 1 (CCS) type: 0 (NotPresent), level: 1 (PDE), access: 0 (Read), banned: 1, aborting.
Segmentation fault from GPU at 0xff00000c04e33000, ctx_id: 1 (CCS) type: 0 (NotPresent), level: 1 (PDE), access: 0 (Read), banned: 1, aborting.
Abort was called at 279 line in file:
./shared/source/os_interface/linux/drm_neo.cpp
[1]    77805 IOT instruction (core dumped)  python

From the above code, the num_wg is overflow to negative, thus caused segfault.

Copilot

Pull Request Overview

This PR fixes integer overflow issues in XPU max pooling operations when handling large tensor sizes. The fix addresses a segmentation fault that occurred when output tensors exceeded INT_MAX by introducing index type templating and automatic memory format selection.

Key Changes:

Introduced index_t template parameter (int32_t or int64_t) for kernel functors and functions to handle both small and large tensor sizes
Added validation functions can_use_int32_nhwc and can_use_int32_nchw to determine when int32 is safe to use
Automatically switches to ChannelsLast memory format when contiguous format would exceed int32 limits

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/ATen/native/xpu/sycl/DilatedMaxPool2d.cpp

EikanWang

LGTM. Please address the copilot's comments.

EikanWang · 2025-11-17T08:34:40Z

src/ATen/native/xpu/sycl/DilatedMaxPool2d.cpp

    const vec_t* grad_output_vec = reinterpret_cast<const vec_t*>(gradOutput); \
    vec_t* grad_input_vec = reinterpret_cast<vec_t*>(gradInput);               \
-    auto kfn = MaxPool2dBackwardChannelLastVec<scalar_t, vec_t, vec_size>(     \
+    auto kfn = MaxPool2dBackwardChannelLastVec<scalar_t, vec_t, vec_size, index_t>(     \


Pls. fix the code style.

Thanks! Added in 474ac9d

github-actions · 2025-11-17T10:55:43Z

Performance outliers, please check!

🟡 [80%, 90%), may be fluctuations

Category	Model	Target vs. Baseline [Eager]	Target vs. Baseline [Inductor]
torchbench_bfloat16_training	resnext50_32x4d	0.934002	0.822647
torchbench_bfloat16_training	squeezenet1_1	1.021592	0.832874

Stonepia added 3 commits November 14, 2025 00:59

Fix large maxpool index

6d58fed

switch to NHWC when output > INT_MAX to avoid overflow

518a6d8

Fix large wg_size overflow

a6666c7

Stonepia marked this pull request as ready for review November 17, 2025 08:24

Copilot AI review requested due to automatic review settings November 17, 2025 08:24

Copilot AI reviewed Nov 17, 2025

View reviewed changes

src/ATen/native/xpu/sycl/DilatedMaxPool2d.cpp Outdated Show resolved Hide resolved

src/ATen/native/xpu/sycl/DilatedMaxPool2d.cpp Outdated Show resolved Hide resolved

EikanWang reviewed Nov 17, 2025

View reviewed changes

EikanWang approved these changes Nov 17, 2025

View reviewed changes

Stonepia added 2 commits November 17, 2025 00:37

static cast to int64_t for multiplication

86158f3

fix code style

474ac9d

Merge branch 'main' into tong/large_max_pool

c5c1d2f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[XPU][Fix] Fix large maxpool index #2362

[XPU][Fix] Fix large maxpool index #2362

Stonepia commented Nov 14, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

EikanWang left a comment

Uh oh!

EikanWang Nov 17, 2025

Uh oh!

Stonepia Nov 17, 2025

Uh oh!

github-actions bot commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[XPU][Fix] Fix large maxpool index #2362

Are you sure you want to change the base?

[XPU][Fix] Fix large maxpool index #2362

Conversation

Stonepia commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

EikanWang left a comment

Choose a reason for hiding this comment

Uh oh!

EikanWang Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Stonepia Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 17, 2025

Performance outliers, please check!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Stonepia commented Nov 14, 2025 •

edited

Loading