-
Notifications
You must be signed in to change notification settings - Fork 27.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding GPT-NeoX-20B #16659
Adding GPT-NeoX-20B #16659
Conversation
The documentation is not available anymore as the PR was closed or merged. |
Incredible work! I have tested the model and seems to work as intended. I did discover one problem with the tokenizer though: Here is the full script:
I got the following error: File "testing/testDecoderOnly.py", line 104, in testModelSample |
Hm yea, I can replicate that issue too. I'm not too familiar with the tokenization code. The fast tokenizer seems to work just fine, but the Python one (which I'm basing of GPT-2's tokenizer) seems to have some issues. Here's the a minimal reproducible version: import transformers
model_name = "EleutherAI/gpt-neox-20b"
tokenizer_slow = transformers.GPTNeoXTokenizer.from_pretrained(model_name)
tokenizer_fast = transformers.GPTNeoXTokenizerFast.from_pretrained(model_name)
print("Fast", repr(tokenizer_fast.decode([50274])))
print("Slow", repr(tokenizer_slow.decode([50274])))
I believe the NeoX tokenizer handles spaces a little differently (it has special tokens for single, double, triple spaces, etc). Do you know if someone who's more familiar with tokenization code might be able to chime in? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR, @zphang, this is great! There are a few tests failing, let me give you pointers on how to solve them:
- The
check_code_quality
run fails because the quality checks weren't applied. I recommend doing the following from the root of your fork:pip install -e .[quality]
, followed bymake fixup
. This should fix most of the issues, and tell you which issues remain to be solved manually. - There's a missing mention of GPT-NeoX-20B in the
index.mdx
file of the doc. Runningmake fix-copies
from the root of your clone should solve this issue.
The rest of the issues seem to be linked to you importing many different models, most of which do not exist, in both src/transformers/models/gpt_neox/__init__.py
and src/transformers/models/auto/modeling_auto.py
. Left some comments where that applies.
Did you use the add-new-model-like
command to add this model? What was your experience like using the script? Thanks again for your contributions!
Hey @LysandreJik, thanks for taking a look! I'll look into getting the tests to pass today. Re: the model script, I did use the new model templating script, but many parts of it seemed to make the assumption that the model with be an encoder-decoder model (e.g. mentioning cross attention). I removed most of other model implementations aside from CasualLM as that's the primary format that NeoX-20B would be used for, but it looks like I missed out some other references to the other model implementations. Other than that, the script was very useful in setting up the boilerplate. |
added a PR to the PR to support |
I have resolved the merge conflicts in the config files, but I am not confidant in my understanding of how these various configs are supposed to work. I would appreciate it if someone double checked that I didn't do anything stupid. |
Are there any further blockers to merging? It would be nice to have this merged in time for ACL next week :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @zphang. Many of the comments/suggestions on the previous reviews were just ignored. I have a few more suggestions on the style.
We will merge the model as soon as they are resolved, let us know if you need any help.
Apologies, I must have missed the previous comments. I've pushed an update with the desired changes. |
There are still four open comment on the modeling file, if you could have a look. |
I think I got to all of them now (is there an easy way to check on the GitHub web interface?), let me know if I'm missing any. |
I see there are closed but not addressed, maybe you forgot to push your commit? |
Terribly sorry! Pushed now. |
Thanks again for all your wok on this model! |
* initial * first try * working 20B * 20B tokenizers * Docs * Import fixes for missing classes * Update docs, fixup * black formatting * isort * flake * dummy objects * documentation * Documentation yml * more docs * tweaks for tests * tokenization auto * fix neox tests * test * test * einsum * address PR feedback * Documentation * Update README.md Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/models/gpt_neox/__init__.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/models/gpt_neox/configuration_gpt_neox.py Co-authored-by: Sylvain Gugger <[email protected]> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> * Remove undefined LaTeX syntax * Update to full url to avoid confusion about if that's supposed to refer to the Hub * fix auto * move tests * documentation fix * more doc fixes * test refactor * fix import * fix import * fix import * fix import * fix import * style fixes * More modeling fixes Co-authored-by: Jason Phang <[email protected]> Co-authored-by: Stella Biderman <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>
What does this PR do?
Adds GPT-NeoX-20B model and tokenizers.
Fixes #15642
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@LysandreJik
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.