Skip to content

Flash-BEATs#160

Merged
mil-ad merged 3 commits intov2-devfrom
flashbeats
Feb 19, 2026
Merged

Flash-BEATs#160
mil-ad merged 3 commits intov2-devfrom
flashbeats

Conversation

@david-rx
Copy link
Copy Markdown
Collaborator

@david-rx david-rx commented Feb 19, 2026

Some changes for BEATs.

One functional change, adding hardcoded params into the config.

Then, a batch of changes modernizing the BEATs code to be efficient:

  • Rework of attention. The previous implementation was old, manual and based on fairseq. Roughly by deleting ~500 lines of custom attention calculation and replacing with torch.sdpa(), we get a ~30-40% peak memory reduction and ~1.2x throughput depending on batch size.
  • GPU-Batching of sequential spectrogram creation: BEATs hides a for-loop of spectrogram creation inside the model. It uses a niche kaldi-filterbank method which doesn't support batching, which was reimplemented with batching and GPU support -- this nearly doubled throughput.
  • Removed the unnecessary float32 upcast in the GELU activation (gelu(x.float()).type_as(x) → gelu(x)) which was slowing down bf16 training a little bit.

All changes were extensively benchmarked and compared to the original implementations. Outputs of the models were close at the attention layer, after spectrgogram creation, and at the output of the final model with all changes in-place. I finally confirmed that the NatureLM training and val loss progressed identically for 10k steps after the changes.

@david-rx david-rx changed the title [DRAFT] Flashbeats BEATs-flash Feb 19, 2026
@david-rx david-rx changed the title BEATs-flash Flash-BEATs Feb 19, 2026
@david-rx david-rx marked this pull request as ready for review February 19, 2026 10:25
@mil-ad mil-ad merged commit f33dc9c into v2-dev Feb 19, 2026
6 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants