Skip to content

Conversation

@josancamon19
Copy link

  • Backward pass kernel 27ms
  • Backward pass in 2 passes 20ms, autotune 17ms
  • skipped all future masked tiles, 10ms
  • TMA tensor desc, 10ms
  • base2 ops, 9.55ms
  • separate causal phases, 8.13ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant