fix loading of BEATs checkpoints by david-rx · Pull Request #162 · earthspecies/avex

david-rx · 2026-02-26T05:32:40Z

Fixes two critical bugs in loading of BEATs checkpoints:

If pretrained was set to "false" to load a fine-tuned checkpoint, including the official sl_beats models, config was initialized without checking the model's config. This caused official checkpoints to load with deep_norm=False when it should have been True, causing random-level embeddings.
process_state_dict_ was not compatible with the original microsoft BEATs checkpoints

To test I confirmed that loaded BEATs models went from near-random performance of clustering by species on Xeno-canto to expected, strong performance.

GaganNarula · 2026-02-26T07:59:10Z

avex/models/utils/load.py

        drop_model_prefix=not target_has_model_prefix,
    )

+    # Adapt backbone. prefix when checkpoint and model disagree


this looks like it will affect all models ? the tests are all passing but when i look at test_api_integration.py or test_api_load.py it looks like we either test just beats loading or a mock models. I think we should test loading all models. Ideally, we should also add some samples from xeno-canto validation or something similar with known labels, pass them through the models on cpu, get top3 logits and make sure they are as expected

☝️ this

That seems like a good idea. Btw the reason I consider it relatively safe is that it only happens in the case the keys don't agree

I will run that test

Ran this important test: for a sample of xeno-canto examples, evaluated the model's embeddings by clustering, after this fix and at the previous commit. For all our non-beats models I tested (EffNet-based, EAT-based) the results were identical. For all BEATs models, they dramatically improved.

However, the new logging and the test revealed an additional issue with the raw SSL EAT models, which is that they're not being successfully loaded due to state mismatch, causing fallback to the original EAT checkpoint. This was there before this change and is still there after the change. It's marginally less urgent, because I don't know if the "raw" SSL EAT models are being used by anyone (the sl_ eat models we fine-tuned are unaffected.) So I would propose to follow that up in a separate pr.

I created an issue referencing this discussion for a future fix

GaganNarula · 2026-02-26T08:21:16Z

avex/models/beats_model.py

+            try:
+                config_checkpoint_path = _get_beats_checkpoint_path(use_naturelm=False, fine_tuned=fine_tuned)
+                beats_ckpt = universal_torch_load(config_checkpoint_path, cache_mode="use", map_location="cpu")
+                beats_cfg = BEATsConfig(**beats_ckpt["cfg"])


is this exact problem of mismatch between a default config and a checkpoint config applicable to other models as well ? I dont see it happening for efficient net but maybe EAT ?

Based on reported results from Eklavya, EAT is working - but this is worth a check (I think independently from this PR)

I used EfficienetNet and it's variants for my work just over the past few days, and I got the same scores as the what matters paper. For EATs I extracted the embeddings but don't have all the results yet, but from what I have they look normalish. Can report back as soon as I have all of them (maybe 30 mins from now?).

Here are the EATs ACC results (my new results vs the what matters paper):

Model CBI Bats Dogs

EAT-Bio 23.95 vs 33.00 41.20 vs 63.90 71.94 vs 86.30

EAT-All 23.95 vs 32.60 42.95 vs 65.50 69.06 vs 75.50

SL-EAT-Bio 78.34 vs 81.80 62.85 vs 65.70 86.33 vs 87.10

SL-EAT-All 67.65 vs 75.50 50.75 vs 65.00 76.26 vs 86.30

SL-EAT-Bio seems to be close to the paper (within 1-3%). But the other EATs seem quite off, especially on CBI and Bats. 😞

GaganNarula · 2026-02-26T08:59:50Z

avex/models/beats_model.py

+            # Falls back to BEATsConfig() defaults when the checkpoint registry
+            # is unavailable (e.g. in isolated unit tests).
+            try:
+                config_checkpoint_path = _get_beats_checkpoint_path(use_naturelm=False, fine_tuned=fine_tuned)


should use_naturelm be fixed to False because its not pretrained=True ? just checking

GaganNarula

i have some comments most importantly about the tests. I'm not sure we properly test that this doesn't introduce breaking changes

GaganNarula

i'm approving this for now, but we do need the tests David ran as official integration tests. Made issue #164

fix loading of BEATs checkpoints

fe4eb65

david-rx requested a review from nkundiushuti as a code owner February 26, 2026 05:32

david-rx added the bug Something isn't working label Feb 26, 2026

fix test

6ae40c4

david-rx requested review from GaganNarula and mil-ad February 26, 2026 07:12

GaganNarula reviewed Feb 26, 2026

View reviewed changes

GaganNarula requested changes Feb 26, 2026

View reviewed changes

GaganNarula mentioned this pull request Feb 26, 2026

SSL-EAT model checkpoint loading bug #163

Open

GaganNarula approved these changes Feb 26, 2026

View reviewed changes

GaganNarula merged commit c837da1 into main Feb 26, 2026
2 checks passed

Model	CBI	Bats	Dogs
EAT-Bio	23.95 vs 33.00	41.20 vs 63.90	71.94 vs 86.30
EAT-All	23.95 vs 32.60	42.95 vs 65.50	69.06 vs 75.50
SL-EAT-Bio	78.34 vs 81.80	62.85 vs 65.70	86.33 vs 87.10
SL-EAT-All	67.65 vs 75.50	50.75 vs 65.00	76.26 vs 86.30

Conversation

david-rx commented Feb 26, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GaganNarula left a comment

Choose a reason for hiding this comment

Uh oh!

GaganNarula left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants