Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA] Fix beam search of num_beams > 32 #23599

Merged
merged 3 commits into from
Feb 7, 2025

Conversation

tianleiwu
Copy link
Contributor

@tianleiwu tianleiwu commented Feb 6, 2025

Description

  • Pass topk_scores to beam scorer in slow topk path.
  • Add an env variable ORT_BEAM_SEARCH_USE_FAST_TOPK to enable/disable fast topk path.
  • Add a test case for slow topk path.

Motivation and Context

This bug was introduced in #16272

Beam search uses fast cuda kernel when number of beams <= 32. When beam size is larger than that threshold, we use another code path (slower cuda kernel) to get topk. In such slow topk path, topk_scores shall be passed to beam scorer but it is not.

This bug will cause incorrect result when num_beams > 32. It was not found previously since such large beam size is rarely used.

Copy link
Member

@RyanUnderhill RyanUnderhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, we made the original change assuming nobody would ever do a beam search of this size.

@tianleiwu tianleiwu merged commit 09e5724 into main Feb 7, 2025
98 checks passed
@tianleiwu tianleiwu deleted the tlwu/fix_beam_search_topk_scores branch February 7, 2025 00:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants