Some question about the "LM Adaptation" #1

qcwthu · 2021-06-09T06:36:49Z

Hello!

Sorry to bother you. After reading this great work, I have a question about the "LM Adaptation" setting in the paper. In my opinion, this adaptation is used for decoder-based model architecture. How can we use it for encoder-decoder-based model? And do you use the same max sequence length 512 and batch size 128 as the original T5 paper? In addition, as the sentence length in C4 is usually less than the max sequence length, do you combine several different sentences to form a longer sentence whose length is the max sequence length, then divide it into input and target?

Hope that you can give me some advice. Thanks in advance for any help you can give.

blester125 · 2022-01-23T01:49:23Z

Hi!

Thanks for the interest! Doing an LM objective via for encoder-decoder models is a lot like the prefix-LM objective you can do with a decoder-only model. Part of the input if feed into the encoder and the decoder completes the sequence.

We used the same max sequence lengths as in T5. The T5 code for this is open source and you can fine the SeqIO task definition we used here.

The checkpoints we trained with the LM adaptation are available here (or here in the original MeshTensorflow implementation)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some question about the "LM Adaptation" #1

Some question about the "LM Adaptation" #1

qcwthu commented Jun 9, 2021

blester125 commented Jan 23, 2022

Some question about the "LM Adaptation" #1

Some question about the "LM Adaptation" #1

Comments

qcwthu commented Jun 9, 2021

blester125 commented Jan 23, 2022