-In order for a large language model to understand text input, the text is first ‘tokenized’ - broken down into smaller pieces where each piece represents a token with its unique ID. A good rule of thumb is that 100 tokens are around 75 English words, however there are always differences depending on the model or the language used. After tokenization, each token will be assigned an embeddings vector. The tokens required to feed the input prompt to the model are called ‘input tokens’, the tokens required to transform the model output into for example text or images are called ‘output tokens’. Tokens are what you pay for when consuming large language model services. For Embeddings resources, only input token consumption is being measured, since only the generated embedding vectors are returned and no tokenization takes place when generating the output. Text generation resources contain both input and output tokens (text sent to the model and generated by the model).
0 commit comments