-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
TokenizersdocumentationRelated to documentation of ML.NETRelated to documentation of ML.NETenhancementNew feature or requestNew feature or requestquestionFurther information is requestedFurther information is requested
Milestone
Description
Often LLM models are distributed on HuggingFace or similar where tokenizers are presumed created via transformers library. This often contains a bunch of json/txt files. I have found it hard to then now how to create a ML.Tokenizer from that. For example how would one create a tokenizer for:
https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct/tree/main
Could there be a getting started document detailing how to load tokenizers from such files and how to identify what to use to load these?
Leftyx
Metadata
Metadata
Assignees
Labels
TokenizersdocumentationRelated to documentation of ML.NETRelated to documentation of ML.NETenhancementNew feature or requestNew feature or requestquestionFurther information is requestedFurther information is requested