Conversation
| drop_model_prefix=not target_has_model_prefix, | ||
| ) | ||
|
|
||
| # Adapt backbone. prefix when checkpoint and model disagree |
There was a problem hiding this comment.
this looks like it will affect all models ? the tests are all passing but when i look at test_api_integration.py or test_api_load.py it looks like we either test just beats loading or a mock models. I think we should test loading all models. Ideally, we should also add some samples from xeno-canto validation or something similar with known labels, pass them through the models on cpu, get top3 logits and make sure they are as expected
There was a problem hiding this comment.
That seems like a good idea. Btw the reason I consider it relatively safe is that it only happens in the case the keys don't agree
There was a problem hiding this comment.
I will run that test
There was a problem hiding this comment.
Ran this important test: for a sample of xeno-canto examples, evaluated the model's embeddings by clustering, after this fix and at the previous commit. For all our non-beats models I tested (EffNet-based, EAT-based) the results were identical. For all BEATs models, they dramatically improved.
However, the new logging and the test revealed an additional issue with the raw SSL EAT models, which is that they're not being successfully loaded due to state mismatch, causing fallback to the original EAT checkpoint. This was there before this change and is still there after the change. It's marginally less urgent, because I don't know if the "raw" SSL EAT models are being used by anyone (the sl_ eat models we fine-tuned are unaffected.) So I would propose to follow that up in a separate pr.
There was a problem hiding this comment.
I created an issue referencing this discussion for a future fix
| try: | ||
| config_checkpoint_path = _get_beats_checkpoint_path(use_naturelm=False, fine_tuned=fine_tuned) | ||
| beats_ckpt = universal_torch_load(config_checkpoint_path, cache_mode="use", map_location="cpu") | ||
| beats_cfg = BEATsConfig(**beats_ckpt["cfg"]) |
There was a problem hiding this comment.
is this exact problem of mismatch between a default config and a checkpoint config applicable to other models as well ? I dont see it happening for efficient net but maybe EAT ?
There was a problem hiding this comment.
Based on reported results from Eklavya, EAT is working - but this is worth a check (I think independently from this PR)
There was a problem hiding this comment.
I used EfficienetNet and it's variants for my work just over the past few days, and I got the same scores as the what matters paper. For EATs I extracted the embeddings but don't have all the results yet, but from what I have they look normalish. Can report back as soon as I have all of them (maybe 30 mins from now?).
There was a problem hiding this comment.
Here are the EATs ACC results (my new results vs the what matters paper):
| Model | CBI | Bats | Dogs |
|---|---|---|---|
| EAT-Bio | 23.95 vs 33.00 | 41.20 vs 63.90 | 71.94 vs 86.30 |
| EAT-All | 23.95 vs 32.60 | 42.95 vs 65.50 | 69.06 vs 75.50 |
| SL-EAT-Bio | 78.34 vs 81.80 | 62.85 vs 65.70 | 86.33 vs 87.10 |
| SL-EAT-All | 67.65 vs 75.50 | 50.75 vs 65.00 | 76.26 vs 86.30 |
SL-EAT-Bio seems to be close to the paper (within 1-3%). But the other EATs seem quite off, especially on CBI and Bats. 😞
| # Falls back to BEATsConfig() defaults when the checkpoint registry | ||
| # is unavailable (e.g. in isolated unit tests). | ||
| try: | ||
| config_checkpoint_path = _get_beats_checkpoint_path(use_naturelm=False, fine_tuned=fine_tuned) |
There was a problem hiding this comment.
should use_naturelm be fixed to False because its not pretrained=True ? just checking
GaganNarula
left a comment
There was a problem hiding this comment.
i have some comments most importantly about the tests. I'm not sure we properly test that this doesn't introduce breaking changes
GaganNarula
left a comment
There was a problem hiding this comment.
i'm approving this for now, but we do need the tests David ran as official integration tests. Made issue #164
Fixes two critical bugs in loading of BEATs checkpoints:
To test I confirmed that loaded BEATs models went from near-random performance of clustering by species on Xeno-canto to expected, strong performance.