MERT configuration in `fairseq` seems to differ from the configuration in `huggingface` #20

LiableFishYS · 2025-01-17T10:14:13Z

Hey!

In fairseq for both MERT-v1-95M and MERT-v1-330M you specified

task:
  _name: mert_pretraining
  ...
  pad_audio: false
  random_crop: true
  normalize: false 
  
model:
  _name: mert
  ...
  extractor_mode: default # "group"

However in huggingface you have

# config.json
{
  ...
  "feat_extract_norm": "group",
  ...
}
# preprocessor_config.json
{
  ...
  "do_normalize": true,
  "return_attention_mask": true,
  ...
}

I see discrepancy here:

normalize: false vs "do_normalize": true
According to huggingface documentation for wav2vec2 and HuBERT, "return_attention_mask": true should be specified for models trained with extractor_mode: layer and you have default/group .

Though, I haven't figured out for sure from your code did you use attention_mask or not during training. I suspect you didn't since you specified pad_audio: false.

Could you please clarify this misalignment and confirm if you were using attention_mask during training or not?

Thank you!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MERT configuration in `fairseq` seems to differ from the configuration in `huggingface` #20

MERT configuration in `fairseq` seems to differ from the configuration in `huggingface` #20

LiableFishYS commented Jan 17, 2025

MERT configuration in fairseq seems to differ from the configuration in huggingface #20

MERT configuration in fairseq seems to differ from the configuration in huggingface #20

Comments

LiableFishYS commented Jan 17, 2025

MERT configuration in `fairseq` seems to differ from the configuration in `huggingface` #20

MERT configuration in `fairseq` seems to differ from the configuration in `huggingface` #20