How to support re-compute kv-cache after certain decoded token #6886
                  
                    
                      jiazhan-msft
                    
                  
                
                  announced in
                Ideas
              
            Replies: 0 comments
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
I have a feature in my model which switches model setup after certain decoded token, e.g., when decoded to the n-th token, the model requires re-compute previous kv-cache, what's the possible path to enable this support? Thanks!
Beta Was this translation helpful? Give feedback.
All reactions