Handle models with smaller context length #587

deboer-tim · 2025-01-14T19:26:45Z

Some models (e.g. granite3.1-dense:8b) have a short context length that runs out after a few prompts. In Ollama I don't notice, I assume it is either managing the length or continually clearing it. In Ramalama I see the following message and the process exits:

context size exceeded
llama_decode: failed to decode, ret = 1
failed to decode
failed to generate response

The text was updated successfully, but these errors were encountered:

ericcurtin · 2025-01-14T19:41:27Z

Maybe we are supposed to read the context length from the model itself and use that:

https://ollama.com/library/granite3.1-dense/blobs/0a922eb99317

deboer-tim · 2025-01-14T20:31:15Z

Yes, I think that's the expectation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle models with smaller context length #587

Handle models with smaller context length #587

deboer-tim commented Jan 14, 2025

ericcurtin commented Jan 14, 2025

deboer-tim commented Jan 14, 2025

Handle models with smaller context length #587

Handle models with smaller context length #587

Comments

deboer-tim commented Jan 14, 2025

ericcurtin commented Jan 14, 2025

deboer-tim commented Jan 14, 2025