AMD CPU generation is very slow

Very slow tokens/second in FP32, feels worse than it should be, but I'm not entirely sure the best way to debug. 

$ python3 torchchat.py generate --prompt "hello model" -v llama2
Using device=cpu AMD Ryzen 7 3700X 8-Core Processor
Loading model...
Time to load model: 2.35 seconds
tensor([    1, 22172,  1904], dtype=torch.int32)
hello model

[snip output]
Time for inference 1: 1043.69 sec total, 0.19 tokens/sec
Bandwidth achieved: 2.58 GB/s
Max Sequence Length Reached. Ending Conversation.
Average tokens/sec: 0

I will try it on a couple of other dtypes as well, but this feels outside the range of expectations @malfet?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMD CPU generation is very slow #588

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AMD CPU generation is very slow #588

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions