perf: disable kvcache for semantic by default #368

no2chem · 2023-06-22T15:44:13Z

An analysis of my previous patches suggests that use_kv_caching=True is not a win at all for the text model.

The current KV caching code has quite a bit of overhead. In particular, it (re)allocates large tensors with adding to the cache via a torch.cat call. With the coarse model it seems to result in some performance gain, but it is definitely not a win for the smaller text model. It seems that the KV cache was what was causing the bimodality in #366, I'm guessing sometimes we would get unlucky and the model would have to reallocate.

With this change I can consistently get about 280 it/second performance for the text model on an H100 after warmup.

no2chem · 2023-06-22T16:13:12Z

Hm, nevermind here, I think I drew an invalid conclusion due to a mixup. It seems that caching still results in a small performance gain on the text model.

perf: disable kvcache for semantic by default

ae41e7f

no2chem closed this Jun 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: disable kvcache for semantic by default #368

perf: disable kvcache for semantic by default #368

Uh oh!

no2chem commented Jun 22, 2023 •

edited

Loading

Uh oh!

no2chem commented Jun 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

perf: disable kvcache for semantic by default #368

perf: disable kvcache for semantic by default #368

Uh oh!

Conversation

no2chem commented Jun 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

no2chem commented Jun 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

no2chem commented Jun 22, 2023 •

edited

Loading