Question regarding loading model data from model GGUF file to Main Memory #12455

akapoor3518 · 2025-03-18T19:32:43Z

akapoor3518
Mar 18, 2025

Hi,
Currently we have Main memory limitation for our GPU. Its only limited to 1GB. Can we use Bigger model with our Custom backend. Some larger model need more than 1GB of memory do we have to load all tensors during llama_init_from_model or we do before particular compute. I understand this is not best performance but for now we are only looking for functionality. Soon our Memory Constrain issue will resolve that time we can look for performance and do proper graph planning.

Thanks,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question regarding loading model data from model GGUF file to Main Memory #12455

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Question regarding loading model data from model GGUF file to Main Memory #12455

Uh oh!

akapoor3518 Mar 18, 2025

Replies: 0 comments

akapoor3518
Mar 18, 2025