Add default tokenizer for gpt_neox (the same as gpt_neo) #1

aalok-sathe · 2022-04-21T17:59:25Z

The tokenization_auto.py was missing a mapping for gpt_neox, causing the AutoTokenizer initialization for GPT NeoX to fail at runtime:

File ..., in load_tokenizer(model_name_or_path='./gpt-neox-20b', **kwargs={'cache_dir': '.cache/'})
     16 def load_tokenizer(model_name_or_path: str = None, **kwargs) -> AutoTokenizer:
---> 17     return AutoTokenizer.from_pretrained(model_name_or_path, **kwargs)
        model_name_or_path = './gpt-neox-20b'
        kwargs = {'cache_dir': '.cache/'}

File lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py:525, in AutoTokenizer.from_pretrained(cls=<class 'transformers.models.auto.tokenization_auto.AutoTokenizer'>, pretrained_model_name_or_path='./gpt-neox-20b', *inputs=(), **kwargs={'_from_auto': True, 'cache_dir': .cache/'})
    522         tokenizer_class = tokenizer_class_from_name(tokenizer_class_candidate)
    524     if tokenizer_class is None:
--> 525         raise ValueError(
    526             f"Tokenizer class {tokenizer_class_candidate} does not exist or is not currently imported."
    527         )
    528     return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
    530 # Otherwise we have to be creative.
    531 # if model is an encoder decoder, the encoder tokenizer class is used by default

ValueError: Tokenizer class GPTNeoXTokenizer does not exist or is not currently imported.

zphang · 2022-04-21T19:31:59Z

Thanks! Actually NeoX doesn't use the GPT2Tokenizer. I'll fix the current PR based on this though.

@ydshieh

* chore: initial commit Copied the torch implementation of regnets and porting the code to tf step by step. Also introduced an output layer which was needed for regnets. * chore: porting the rest of the modules to tensorflow did not change the documentation yet, yet to try the playground on the model * Fix initilizations (#1) * fix: code structure in few cases. * fix: code structure to align tf models. * fix: layer naming, bn layer still remains. * chore: change default epsilon and momentum in bn. * chore: styling nits. * fix: cross-loading bn params. * fix: regnet tf model, integration passing. * add: tests for TF regnet. * fix: code quality related issues. * chore: added rest of the files. * minor additions.. * fix: repo consistency. * fix: regnet tf tests. * chore: reorganize dummy_tf_objects for regnet. * chore: remove checkpoint var. * chore: remov unnecessary files. * chore: run make style. * Update docs/source/en/model_doc/regnet.mdx Co-authored-by: Sylvain Gugger <[email protected]> * chore: PR feedback I. * fix: pt test. thanks to @ydshieh. * New adaptive pooler (#3) * feat: new adaptive pooler Co-authored-by: @Rocketknight1 * chore: remove image_size argument. Co-authored-by: matt <[email protected]> Co-authored-by: matt <[email protected]> * Empty-Commit * chore: remove image_size comment. * chore: remove playground_tf.py * chore: minor changes related to spacing. * chore: make style. * Update src/transformers/models/regnet/modeling_tf_regnet.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/regnet/modeling_tf_regnet.py Co-authored-by: amyeroberts <[email protected]> * chore: refactored __init__. * chore: copied from -> taken from./g * adaptive pool -> global avg pool, channel check. * chore: move channel check to stem. * pr comments - minor refactor and add regnets to doc tests. * Update src/transformers/models/regnet/modeling_tf_regnet.py Co-authored-by: NielsRogge <[email protected]> * minor fix in the xlayer. * Empty-Commit * chore: removed from_pt=True. Co-authored-by: Sayak Paul <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: matt <[email protected]> Co-authored-by: amyeroberts <[email protected]> Co-authored-by: NielsRogge <[email protected]>

Add default tokenizer for gpt_neox (the same as gpt_neo)

69b17d3

aalok-sathe mentioned this pull request Apr 21, 2022

Adding GPT-NeoX-20B huggingface/transformers#16659

Merged

5 tasks

aalok-sathe closed this by deleting the head repository Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add default tokenizer for gpt_neox (the same as gpt_neo) #1

Add default tokenizer for gpt_neox (the same as gpt_neo) #1

Uh oh!

aalok-sathe commented Apr 21, 2022

Uh oh!

zphang commented Apr 21, 2022

Uh oh!

Uh oh!

Add default tokenizer for gpt_neox (the same as gpt_neo) #1

Add default tokenizer for gpt_neox (the same as gpt_neo) #1

Uh oh!

Conversation

aalok-sathe commented Apr 21, 2022

Uh oh!

zphang commented Apr 21, 2022

Uh oh!

Uh oh!