Skip to content

[gemma 3/4] Fix bidirectional attention masking crossing sliding window boundaries#46850

Open
douglas-reid wants to merge 4 commits into
huggingface:mainfrom
douglas-reid:fix/attention-masking-sliding-window-bidirectional
Open

[gemma 3/4] Fix bidirectional attention masking crossing sliding window boundaries#46850
douglas-reid wants to merge 4 commits into
huggingface:mainfrom
douglas-reid:fix/attention-masking-sliding-window-bidirectional

Conversation

@douglas-reid

Copy link
Copy Markdown
Contributor

What does this PR do?

This fixes the behavior of attention masking for image tokens to properly respect sliding window boundaries for Gemma 3/4 models. This was needed to match the intended model behavior for local layers.

  • I confirm that this is not a pure code agent PR.

Before submitting

  • Did you read the contributor guideline and the
    Pull Request checks?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes according to the guidelines?
  • Did you write any new necessary tests?

@github-actions

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma3, gemma4, gemma4_unified

@github-actions

Copy link
Copy Markdown
Contributor

CI Dashboard: View test results in Grafana

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant