Skip to content

fix loading of BEATs checkpoints#162

Merged
GaganNarula merged 2 commits intomainfrom
beats-fix
Feb 26, 2026
Merged

fix loading of BEATs checkpoints#162
GaganNarula merged 2 commits intomainfrom
beats-fix

Conversation

@david-rx
Copy link
Copy Markdown
Collaborator

Fixes two critical bugs in loading of BEATs checkpoints:

  1. If pretrained was set to "false" to load a fine-tuned checkpoint, including the official sl_beats models, config was initialized without checking the model's config. This caused official checkpoints to load with deep_norm=False when it should have been True, causing random-level embeddings.
  2. process_state_dict_ was not compatible with the original microsoft BEATs checkpoints

To test I confirmed that loaded BEATs models went from near-random performance of clustering by species on Xeno-canto to expected, strong performance.

@david-rx david-rx added the bug Something isn't working label Feb 26, 2026
drop_model_prefix=not target_has_model_prefix,
)

# Adapt backbone. prefix when checkpoint and model disagree
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like it will affect all models ? the tests are all passing but when i look at test_api_integration.py or test_api_load.py it looks like we either test just beats loading or a mock models. I think we should test loading all models. Ideally, we should also add some samples from xeno-canto validation or something similar with known labels, pass them through the models on cpu, get top3 logits and make sure they are as expected

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☝️ this

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a good idea. Btw the reason I consider it relatively safe is that it only happens in the case the keys don't agree

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will run that test

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran this important test: for a sample of xeno-canto examples, evaluated the model's embeddings by clustering, after this fix and at the previous commit. For all our non-beats models I tested (EffNet-based, EAT-based) the results were identical. For all BEATs models, they dramatically improved.

However, the new logging and the test revealed an additional issue with the raw SSL EAT models, which is that they're not being successfully loaded due to state mismatch, causing fallback to the original EAT checkpoint. This was there before this change and is still there after the change. It's marginally less urgent, because I don't know if the "raw" SSL EAT models are being used by anyone (the sl_ eat models we fine-tuned are unaffected.) So I would propose to follow that up in a separate pr.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created an issue referencing this discussion for a future fix

try:
config_checkpoint_path = _get_beats_checkpoint_path(use_naturelm=False, fine_tuned=fine_tuned)
beats_ckpt = universal_torch_load(config_checkpoint_path, cache_mode="use", map_location="cpu")
beats_cfg = BEATsConfig(**beats_ckpt["cfg"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this exact problem of mismatch between a default config and a checkpoint config applicable to other models as well ? I dont see it happening for efficient net but maybe EAT ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on reported results from Eklavya, EAT is working - but this is worth a check (I think independently from this PR)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used EfficienetNet and it's variants for my work just over the past few days, and I got the same scores as the what matters paper. For EATs I extracted the embeddings but don't have all the results yet, but from what I have they look normalish. Can report back as soon as I have all of them (maybe 30 mins from now?).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the EATs ACC results (my new results vs the what matters paper):

Model CBI Bats Dogs
EAT-Bio 23.95 vs 33.00 41.20 vs 63.90 71.94 vs 86.30
EAT-All 23.95 vs 32.60 42.95 vs 65.50 69.06 vs 75.50
SL-EAT-Bio 78.34 vs 81.80 62.85 vs 65.70 86.33 vs 87.10
SL-EAT-All 67.65 vs 75.50 50.75 vs 65.00 76.26 vs 86.30

SL-EAT-Bio seems to be close to the paper (within 1-3%). But the other EATs seem quite off, especially on CBI and Bats. 😞

# Falls back to BEATsConfig() defaults when the checkpoint registry
# is unavailable (e.g. in isolated unit tests).
try:
config_checkpoint_path = _get_beats_checkpoint_path(use_naturelm=False, fine_tuned=fine_tuned)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should use_naturelm be fixed to False because its not pretrained=True ? just checking

Copy link
Copy Markdown
Contributor

@GaganNarula GaganNarula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have some comments most importantly about the tests. I'm not sure we properly test that this doesn't introduce breaking changes

Copy link
Copy Markdown
Contributor

@GaganNarula GaganNarula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm approving this for now, but we do need the tests David ran as official integration tests. Made issue #164

@GaganNarula GaganNarula merged commit c837da1 into main Feb 26, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants