-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
questions about dictionary.encode_line to process encodec labels #8
Comments
I translated the inquiry to maintain the consistency language use of the repo:
Regarding to your question, although the extracted codes are the same, their corresponding audio segements are different (which can be calculated according to the frequency 75Hz of the EnCodec model). For example, the losses corresponding to audio segments at [..., t_i, t_i+1, t_i+2, t_i+3, ...] are calculated seperately. Specifically, in the practice of our codes, there are randomly initialised embeddings corresponding to all the discrete labels, i.e. the 8*1024 codecs, K_MFCC in the K-means_MFCC, or K_logmel in the K-means_logmel. If you look back to the code base, you’ll find that the NCE loss is calculated between such label-embeddings [..., y_i, y_i+1, y_i+2, y_i+3, ...] and the output of the model [..., o_i, o_i+1, o_i+2, o_i+3, ...] for diferent codebooks individually and then aggregated. |
Thanks for your quick reply. I have run the scripts provided by hubert using k-means labels and check the training data. For k-means, there are two files, one is "train.km" and the other is "dict.km.txt". For codecs, I think that prepare_codecs_from_manifest.py just provides the files like "train.km". I should count the numbers in the "train.km" and produce the "dict.km.txt" by myself. In my case, the values of labels are all 3 because the encode_line returns the unk_index which is 3. I should using the "dict.km.txt" to inialize the Dictionary class instead of "train.km". That‘s right? |
Yes! That is correct. |
I have trained MERT-95M using music4all data and the loss descend slowly, Is it normal? By the way, I make "dict.encodec_0.txt" ... "dict.encodec_7.txt" are all the same, self.num_classes = [len(d) for d in dictionaries] |
The dummy dictionary construction codes are from HuBERT. Could you confirm your dictionary are the same? |
Hello, I used the script prepare_codecs_from_manifest.py to process some music files from "music4all" and generated 8 txt files. These values are directly loaded as a dictionary in the mert_model. I printed a breakpoint to inspect the methods under the dictionary class, where line, field = line.rstrip().rsplit(" ", 1), count = int(field), and word = line represent the values in the image below. This dictionary will re-encode each line's value during training. The label I obtained is consistently the value 3. I'm curious about how to calculate the CE (cross-entropy) loss when all labels are identical?

The text was updated successfully, but these errors were encountered: