Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

questions about dictionary.encode_line to process encodec labels #8

Open
xujinchang opened this issue Oct 16, 2023 · 6 comments
Open

Comments

@xujinchang
Copy link

xujinchang commented Oct 16, 2023

Hello, I used the script prepare_codecs_from_manifest.py to process some music files from "music4all" and generated 8 txt files. These values are directly loaded as a dictionary in the mert_model. I printed a breakpoint to inspect the methods under the dictionary class, where line, field = line.rstrip().rsplit(" ", 1), count = int(field), and word = line represent the values in the image below. This dictionary will re-encode each line's value during training. The label I obtained is consistently the value 3. I'm curious about how to calculate the CE (cross-entropy) loss when all labels are identical?
image

@yizhilll
Copy link
Owner

I translated the inquiry to maintain the consistency language use of the repo:

Hello, I used the script prepare_codecs_from_manifest.py to process some music files from "music4all" and generated 8 txt files. These values are directly loaded as a dictionary in the mert_model. I printed a breakpoint to inspect the methods under the dictionary class, where line, field = line.rstrip().rsplit(" ", 1), count = int(field), and word = line represent the values in the image below. This dictionary will re-encode each line's value during training. The label I obtained is consistently the value 3. I'm curious about how to calculate the CE (cross-entropy) loss when all labels are identical?

Regarding to your question, although the extracted codes are the same, their corresponding audio segements are different (which can be calculated according to the frequency 75Hz of the EnCodec model). For example, the losses corresponding to audio segments at [..., t_i, t_i+1, t_i+2, t_i+3, ...] are calculated seperately.

Specifically, in the practice of our codes, there are randomly initialised embeddings corresponding to all the discrete labels, i.e. the 8*1024 codecs, K_MFCC in the K-means_MFCC, or K_logmel in the K-means_logmel. If you look back to the code base, you’ll find that the NCE loss is calculated between such label-embeddings [..., y_i, y_i+1, y_i+2, y_i+3, ...] and the output of the model [..., o_i, o_i+1, o_i+2, o_i+3, ...] for diferent codebooks individually and then aggregated.
In the end, all the logics are summed up and returned.

@xujinchang
Copy link
Author

Thanks for your quick reply.

I have run the scripts provided by hubert using k-means labels and check the training data. For k-means, there are two files, one is "train.km" and the other is "dict.km.txt".

For codecs, I think that prepare_codecs_from_manifest.py just provides the files like "train.km". I should count the numbers in the "train.km" and produce the "dict.km.txt" by myself.

In my case, the values of labels are all 3 because the encode_line returns the unk_index which is 3. I should using the "dict.km.txt" to inialize the Dictionary class instead of "train.km".

That‘s right?

@xujinchang xujinchang changed the title 关于Encodec压缩量化的label 使用dictionary.encode_line编码的问题 questions about dictionary.encode_line to process encodec labels Oct 16, 2023
@yizhilll
Copy link
Owner

Thanks for your quick reply.

I have run the scripts provided by hubert using k-means labels and check the training data. For k-means, there are two files, one is "train.km" and the other is "dict.km.txt".

For codecs, I think that prepare_codecs_from_manifest.py just provides the files like "train.km". I should count the numbers in the "train.km" and produce the "dict.km.txt" by myself.

In my case, the values of labels are all 3 because the encode_line returns the unk_index which is 3. I should using the "dict.km.txt" to inialize the Dictionary class instead of "train.km".

That‘s right?

Yes! That is correct.

@xujinchang
Copy link
Author

xujinchang commented Oct 18, 2023

I have trained MERT-95M using music4all data and the loss descend slowly, Is it normal?
image
image

By the way, I make "dict.encodec_0.txt" ... "dict.encodec_7.txt" are all the same,
image `

self.num_classes = [len(d) for d in dictionaries]
The num_classes is (1024+4)*8. Are these parameters correct?
dict_encodec_0.txt

@xujinchang
Copy link
Author

I have found the targets in the cross_entropy function are all zero values, is it a bug or feature? Can u show the examples of the dict_encodec_{i}.txt? I think the 2nd column is the count of each symbol but its not actually used and I can just put any number.
image
image

@yizhilll
Copy link
Owner

I have found the targets in the cross_entropy function are all zero values, is it a bug or feature? Can u show the examples of the dict_encodec_{i}.txt? I think the 2nd column is the count of each symbol but its not actually used and I can just put any number. image image

The dummy dictionary construction codes are from HuBERT. Could you confirm your dictionary are the same?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants