questions about dictionary.encode_line to process encodec labels #8

xujinchang · 2023-10-16T08:17:55Z

Hello, I used the script prepare_codecs_from_manifest.py to process some music files from "music4all" and generated 8 txt files. These values are directly loaded as a dictionary in the mert_model. I printed a breakpoint to inspect the methods under the dictionary class, where line, field = line.rstrip().rsplit(" ", 1), count = int(field), and word = line represent the values in the image below. This dictionary will re-encode each line's value during training. The label I obtained is consistently the value 3. I'm curious about how to calculate the CE (cross-entropy) loss when all labels are identical?

yizhilll · 2023-10-16T09:18:11Z

I translated the inquiry to maintain the consistency language use of the repo:

Hello, I used the script prepare_codecs_from_manifest.py to process some music files from "music4all" and generated 8 txt files. These values are directly loaded as a dictionary in the mert_model. I printed a breakpoint to inspect the methods under the dictionary class, where line, field = line.rstrip().rsplit(" ", 1), count = int(field), and word = line represent the values in the image below. This dictionary will re-encode each line's value during training. The label I obtained is consistently the value 3. I'm curious about how to calculate the CE (cross-entropy) loss when all labels are identical?

Regarding to your question, although the extracted codes are the same, their corresponding audio segements are different (which can be calculated according to the frequency 75Hz of the EnCodec model). For example, the losses corresponding to audio segments at [..., t_i, t_i+1, t_i+2, t_i+3, ...] are calculated seperately.

Specifically, in the practice of our codes, there are randomly initialised embeddings corresponding to all the discrete labels, i.e. the 8*1024 codecs, K_MFCC in the K-means_MFCC, or K_logmel in the K-means_logmel. If you look back to the code base, you’ll find that the NCE loss is calculated between such label-embeddings [..., y_i, y_i+1, y_i+2, y_i+3, ...] and the output of the model [..., o_i, o_i+1, o_i+2, o_i+3, ...] for diferent codebooks individually and then aggregated.
In the end, all the logics are summed up and returned.

xujinchang · 2023-10-16T11:13:24Z

Thanks for your quick reply.

I have run the scripts provided by hubert using k-means labels and check the training data. For k-means, there are two files, one is "train.km" and the other is "dict.km.txt".

For codecs, I think that prepare_codecs_from_manifest.py just provides the files like "train.km". I should count the numbers in the "train.km" and produce the "dict.km.txt" by myself.

In my case, the values of labels are all 3 because the encode_line returns the unk_index which is 3. I should using the "dict.km.txt" to inialize the Dictionary class instead of "train.km".

That‘s right?

yizhilll · 2023-10-16T11:59:11Z

Thanks for your quick reply.

I have run the scripts provided by hubert using k-means labels and check the training data. For k-means, there are two files, one is "train.km" and the other is "dict.km.txt".

For codecs, I think that prepare_codecs_from_manifest.py just provides the files like "train.km". I should count the numbers in the "train.km" and produce the "dict.km.txt" by myself.

In my case, the values of labels are all 3 because the encode_line returns the unk_index which is 3. I should using the "dict.km.txt" to inialize the Dictionary class instead of "train.km".

That‘s right?

Yes! That is correct.

xujinchang · 2023-10-18T02:34:47Z

I have trained MERT-95M using music4all data and the loss descend slowly, Is it normal?

By the way, I make "dict.encodec_0.txt" ... "dict.encodec_7.txt" are all the same,
`

self.num_classes = [len(d) for d in dictionaries]
The num_classes is (1024+4)*8. Are these parameters correct?
dict_encodec_0.txt

xujinchang · 2023-10-20T03:12:55Z

I have found the targets in the cross_entropy function are all zero values, is it a bug or feature? Can u show the examples of the dict_encodec_{i}.txt？ I think the 2nd column is the count of each symbol but its not actually used and I can just put any number.

yizhilll · 2023-11-23T09:59:32Z

I have found the targets in the cross_entropy function are all zero values, is it a bug or feature? Can u show the examples of the dict_encodec_{i}.txt？ I think the 2nd column is the count of each symbol but its not actually used and I can just put any number.

The dummy dictionary construction codes are from HuBERT. Could you confirm your dictionary are the same?

xujinchang changed the title ~~关于Encodec压缩量化的label 使用dictionary.encode_line编码的问题~~ questions about dictionary.encode_line to process encodec labels Oct 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

questions about dictionary.encode_line to process encodec labels #8

questions about dictionary.encode_line to process encodec labels #8

xujinchang commented Oct 16, 2023 •

edited

Loading

yizhilll commented Oct 16, 2023

xujinchang commented Oct 16, 2023

yizhilll commented Oct 16, 2023

xujinchang commented Oct 18, 2023 •

edited

Loading

xujinchang commented Oct 20, 2023

yizhilll commented Nov 23, 2023

questions about dictionary.encode_line to process encodec labels #8

questions about dictionary.encode_line to process encodec labels #8

Comments

xujinchang commented Oct 16, 2023 • edited Loading

yizhilll commented Oct 16, 2023

xujinchang commented Oct 16, 2023

yizhilll commented Oct 16, 2023

xujinchang commented Oct 18, 2023 • edited Loading

xujinchang commented Oct 20, 2023

yizhilll commented Nov 23, 2023

xujinchang commented Oct 16, 2023 •

edited

Loading

xujinchang commented Oct 18, 2023 •

edited

Loading