Continue to call ollama's GPT-oss: 20b model problem #7061
Replies: 4 comments 3 replies
-
| This is another program calling  This is a   | 
Beta Was this translation helpful? Give feedback.
-
| @main1015 Hi. Thank you for the detailed issue. I am wondering if this has something to do with keep alive. How much vram and system ram do you have? To test this I think we can try:  | 
Beta Was this translation helpful? Give feedback.
-
|  Running ollama on ubuntu, RTX A5000 24GB vRAM, 128GB system RAM. keepAlive: 0 I added this parameter and tried it, but it didn't seem to work | 
Beta Was this translation helpful? Give feedback.
-
| Ollama recently updated and implemented some new things under the hood relation to memory prediction and allocation. Try this: 
 I found this to work even through Continue, where similar-sized models would split when called from Continue, due to inflated memory allocation. | 
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm currently experiencing an odd issue when using
continueto connect to thegpt-oss:20bmodel onollama. Whenollamaloads this model, it appears to load into thecpurather than thegpu. However, when I connect to thegpt-oss:20bmodel through other tools or code, it loads into thegpu, which is quite puzzling. When I captured the data fromcontinue' s call toApi.chatand sent it toollamaviapostman, the model still loads into thegpu. Please help resolve this. Theconfig.yamlconfiguration file for mycontinueis as follows:Beta Was this translation helpful? Give feedback.
All reactions