Fix bool type indices #1023

botbigeyes · 2025-10-30T08:27:57Z

PR Category
Operator

Type of Change
Bug Fix

Description
The issue with unsupported bool type indices has been fixed.

Issue

Progress

Change is properly reviewed (1 reviewer required, 2 recommended).
Change is responded to an issue.
Change is fully covered by a UT.

Performance
accumulate=False:

Operator: index_put  Performance Test (dtype=torch.float16, mode=kernel,level=comprehensive)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               0.370992            0.374848               0.990          [torch.Size([268435456]), [torch.Size([65536])], torch.Size([65536]), False]
SUCCESS               0.009920            0.009504               1.044          [torch.Size([32, 32]), [torch.Size([8]), torch.Size([8])], torch.Size([8]), False]
SUCCESS               0.012768            0.009664               1.321          [torch.Size([32, 32]), [torch.Size([8]), torch.Size([2, 8])], torch.Size([8]), False]
SUCCESS               0.010592            0.009888               1.071          [torch.Size([32, 32]), [torch.Size([2, 8])], torch.Size([32]), False]
SUCCESS               0.011936            0.011296               1.057          [torch.Size([1024, 1024]), [torch.Size([64]), torch.Size([64])], torch.Size([64]), False]
SUCCESS               0.016768            0.011520               1.456          [torch.Size([1024, 1024]), [torch.Size([64]), torch.Size([4, 64])], torch.Size([64]), False]
SUCCESS               0.013280            0.012208               1.088          [torch.Size([1024, 1024]), [torch.Size([4, 64])], torch.Size([1024]), False]
SUCCESS               0.188864            0.186736               1.011          [torch.Size([512, 512, 512]), [torch.Size([128]), torch.Size([128]), torch.Size([128])], torch.Size([128]), False]
SUCCESS               0.195584            0.186304               1.050          [torch.Size([512, 512, 512]), [torch.Size([2, 128]), torch.Size([128]), torch.Size([128])], torch.Size([128]), False]
SUCCESS               0.453216            0.228400               1.984          [torch.Size([512, 512, 512]), [torch.Size([2, 128])], torch.Size([512]), False]

Operator: index_put  Performance Test (dtype=torch.float32, mode=kernel,level=comprehensive)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               0.719808            0.724256               0.994          [torch.Size([268435456]), [torch.Size([65536])], torch.Size([65536]), False]
SUCCESS               0.009984            0.009536               1.047          [torch.Size([32, 32]), [torch.Size([8]), torch.Size([8])], torch.Size([8]), False]
SUCCESS               0.012544            0.009440               1.329          [torch.Size([32, 32]), [torch.Size([8]), torch.Size([2, 8])], torch.Size([8]), False]
SUCCESS               0.010976            0.010720               1.024          [torch.Size([32, 32]), [torch.Size([2, 8])], torch.Size([32]), False]
SUCCESS               0.013216            0.012800               1.033          [torch.Size([1024, 1024]), [torch.Size([64]), torch.Size([64])], torch.Size([64]), False]
SUCCESS               0.017152            0.012864               1.333          [torch.Size([1024, 1024]), [torch.Size([64]), torch.Size([4, 64])], torch.Size([64]), False]
SUCCESS               0.014560            0.013728               1.061          [torch.Size([1024, 1024]), [torch.Size([4, 64])], torch.Size([1024]), False]
SUCCESS               0.366272            0.364512               1.005          [torch.Size([512, 512, 512]), [torch.Size([128]), torch.Size([128]), torch.Size([128])], torch.Size([128]), False]
SUCCESS               0.373088            0.363904               1.025          [torch.Size([512, 512, 512]), [torch.Size([2, 128]), torch.Size([128]), torch.Size([128])], torch.Size([128]), False]
SUCCESS               0.625728            0.447040               1.400          [torch.Size([512, 512, 512]), [torch.Size([2, 128])], torch.Size([512]), False]

Operator: index_put  Performance Test (dtype=torch.bfloat16, mode=kernel,level=comprehensive)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               0.369008            0.374592               0.985          [torch.Size([268435456]), [torch.Size([65536])], torch.Size([65536]), False]
SUCCESS               0.009952            0.009536               1.044          [torch.Size([32, 32]), [torch.Size([8]), torch.Size([8])], torch.Size([8]), False]
SUCCESS               0.013888            0.009408               1.476          [torch.Size([32, 32]), [torch.Size([8]), torch.Size([2, 8])], torch.Size([8]), False]
SUCCESS               0.010912            0.009648               1.131          [torch.Size([32, 32]), [torch.Size([2, 8])], torch.Size([32]), False]
SUCCESS               0.011808            0.011424               1.034          [torch.Size([1024, 1024]), [torch.Size([64]), torch.Size([64])], torch.Size([64]), False]
SUCCESS               0.015712            0.011680               1.345          [torch.Size([1024, 1024]), [torch.Size([64]), torch.Size([4, 64])], torch.Size([64]), False]
SUCCESS               0.013312            0.012384               1.075          [torch.Size([1024, 1024]), [torch.Size([4, 64])], torch.Size([1024]), False]
SUCCESS               0.188736            0.186624               1.011          [torch.Size([512, 512, 512]), [torch.Size([128]), torch.Size([128]), torch.Size([128])], torch.Size([128]), False]
SUCCESS               0.195600            0.186336               1.050          [torch.Size([512, 512, 512]), [torch.Size([2, 128]), torch.Size([128]), torch.Size([128])], torch.Size([128]), False]
SUCCESS               0.453248            0.227648               1.991          [torch.Size([512, 512, 512]), [torch.Size([2, 128])], torch.Size([512]), False]

accumulate=False and bool:

Operator: index_put  Performance Test (dtype=torch.bool, mode=kernel,level=comprehensive)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               0.192096            0.196032               0.980          [torch.Size([268435456]), [torch.Size([65536])], torch.Size([65536]), False]
SUCCESS               0.009600            0.009392               1.022          [torch.Size([32, 32]), [torch.Size([8]), torch.Size([8])], torch.Size([8]), False]
SUCCESS               0.012384            0.009376               1.321          [torch.Size([32, 32]), [torch.Size([8]), torch.Size([2, 8])], torch.Size([8]), False]
SUCCESS               0.010560            0.009856               1.071          [torch.Size([32, 32]), [torch.Size([2, 8])], torch.Size([32]), False]
SUCCESS               0.010720            0.010464               1.024          [torch.Size([1024, 1024]), [torch.Size([64]), torch.Size([64])], torch.Size([64]), False]
SUCCESS               0.014688            0.010752               1.366          [torch.Size([1024, 1024]), [torch.Size([64]), torch.Size([4, 64])], torch.Size([64]), False]
SUCCESS               0.012448            0.010944               1.137          [torch.Size([1024, 1024]), [torch.Size([4, 64])], torch.Size([1024]), False]
SUCCESS               0.100544            0.098816               1.017          [torch.Size([512, 512, 512]), [torch.Size([128]), torch.Size([128]), torch.Size([128])], torch.Size([128]), False]
SUCCESS               0.107808            0.098144               1.098          [torch.Size([512, 512, 512]), [torch.Size([2, 128]), torch.Size([128]), torch.Size([128])], torch.Size([128]), False]
SUCCESS               0.361824            0.125248               2.889          [torch.Size([512, 512, 512]), [torch.Size([2, 128])], torch.Size([512]), False]

accumulate=True

Operator: index_put  Performance Test (dtype=torch.float16, mode=kernel,level=comprehensive)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               0.498336            0.376224               1.325          [torch.Size([268435456]), [torch.Size([65536])], torch.Size([65536]), True]
SUCCESS               0.074368            0.010144               7.331          [torch.Size([32, 32]), [torch.Size([8]), torch.Size([8])], torch.Size([8]), True]
SUCCESS               0.080256            0.012000               6.688          [torch.Size([1024, 1024]), [torch.Size([64]), torch.Size([64])], torch.Size([64]), True]
SUCCESS               0.286240            0.187104               1.530          [torch.Size([512, 512, 512]), [torch.Size([128]), torch.Size([128]), torch.Size([128])], torch.Size([128]), True]
SUCCESS               0.286640            0.187136               1.532          [torch.Size([512, 512, 512]), [torch.Size([2, 128]), torch.Size([2, 128]), torch.Size([2, 128])], torch.Size([2, 128]), True]

Operator: index_put  Performance Test (dtype=torch.float32, mode=kernel,level=comprehensive)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               0.852768            0.723968               1.178          [torch.Size([268435456]), [torch.Size([65536])], torch.Size([65536]), True]
SUCCESS               0.074016            0.009440               7.841          [torch.Size([32, 32]), [torch.Size([8]), torch.Size([8])], torch.Size([8]), True]
SUCCESS               0.081376            0.012896               6.310          [torch.Size([1024, 1024]), [torch.Size([64]), torch.Size([64])], torch.Size([64]), True]
SUCCESS               0.465376            0.364608               1.276          [torch.Size([512, 512, 512]), [torch.Size([128]), torch.Size([128]), torch.Size([128])], torch.Size([128]), True]
SUCCESS               0.464640            0.365104               1.273          [torch.Size([512, 512, 512]), [torch.Size([2, 128]), torch.Size([2, 128]), torch.Size([2, 128])], torch.Size([2, 128]), True]

gemini-code-assist · 2025-10-30T08:28:09Z

Summary of Changes

Hello @botbigeyes, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a bug where boolean type indices were not properly handled by the index_put and index_put_ operations. The changes introduce robust logic to correctly interpret and apply boolean masks, converting them into explicit coordinate indices. This ensures that these essential tensor manipulation functions now fully support boolean indexing, enhancing their flexibility and preventing errors when users provide boolean tensors for selection.

Highlights

Boolean Indexing Support: Implemented support for boolean type indices in the index_put and index_put_ functions. This involves converting a single boolean mask into coordinate indices using torch.where and adjusting the values tensor to match the number of elements identified by the mask.
Value Tensor Reshaping: Added logic to handle the values tensor when boolean indexing is used. If values is a scalar, it's expanded to match the size of the masked elements; otherwise, it's reshaped to ensure compatibility with the new indices.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request fixes an issue with boolean type indices in index_put and index_put_. The approach of converting the boolean mask to indices using torch.where is correct. However, I've found a couple of issues in the implementation for handling the values tensor that could lead to runtime errors. I've also suggested a refactoring to reduce code duplication between the two functions. Overall, this is a good fix, but some improvements are needed for robustness and maintainability.

gemini-code-assist · 2025-10-30T08:30:05Z