Replies: 1 comment 1 reply
-
| 
         If a layer is not loaded to the GPU, it will still use cuBLAS, only that it needs to copy the data to the device before calculation.  | 
  
Beta Was this translation helpful? Give feedback.
                  
                    1 reply
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
I have just have 6GB NVIDIA GPU. So most of the time I will be offloading some of the model layers to GPU.
Does it make sense to compile with both LLAMA_OPENBLAS=1 and LLAMA_CUBLAS=1 enabled?
Will that give any overall performance improvement?
Beta Was this translation helpful? Give feedback.
All reactions