Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some question about the "LM Adaptation" #1

Open
qcwthu opened this issue Jun 9, 2021 · 1 comment
Open

Some question about the "LM Adaptation" #1

qcwthu opened this issue Jun 9, 2021 · 1 comment

Comments

@qcwthu
Copy link

qcwthu commented Jun 9, 2021

Hello!

Sorry to bother you. After reading this great work, I have a question about the "LM Adaptation" setting in the paper. In my opinion, this adaptation is used for decoder-based model architecture. How can we use it for encoder-decoder-based model? And do you use the same max sequence length 512 and batch size 128 as the original T5 paper? In addition, as the sentence length in C4 is usually less than the max sequence length, do you combine several different sentences to form a longer sentence whose length is the max sequence length, then divide it into input and target?

Hope that you can give me some advice. Thanks in advance for any help you can give.

@blester125
Copy link

Hi!

Thanks for the interest! Doing an LM objective via for encoder-decoder models is a lot like the prefix-LM objective you can do with a decoder-only model. Part of the input if feed into the encoder and the decoder completes the sequence.

We used the same max sequence lengths as in T5. The T5 code for this is open source and you can fine the SeqIO task definition we used here.

The checkpoints we trained with the LM adaptation are available here (or here in the original MeshTensorflow implementation)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants