Question regarding loading model data from model GGUF file to Main Memory #12455
                  
                    
                      akapoor3518
                    
                  
                
                  started this conversation in
                General
              
            Replies: 0 comments
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
Currently we have Main memory limitation for our GPU. Its only limited to 1GB. Can we use Bigger model with our Custom backend. Some larger model need more than 1GB of memory do we have to load all tensors during llama_init_from_model or we do before particular compute. I understand this is not best performance but for now we are only looking for functionality. Soon our Memory Constrain issue will resolve that time we can look for performance and do proper graph planning.
Thanks,
Beta Was this translation helpful? Give feedback.
All reactions