Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
use correct total length to fix static kv_cache performance (#23615)
when using static kv_cache, past_sequence_length is the max sequence length of kv_cache. issue1: total_sequence_length will be larger than the cache entry issue2: we do way more calculations that needed so things are noticeable slower
- Loading branch information