Skip to content

AMD CPU generation is very slow #588

Open
@ianbarber

Description

@ianbarber

Very slow tokens/second in FP32, feels worse than it should be, but I'm not entirely sure the best way to debug.

$ python3 torchchat.py generate --prompt "hello model" -v llama2
Using device=cpu AMD Ryzen 7 3700X 8-Core Processor
Loading model...
Time to load model: 2.35 seconds
tensor([ 1, 22172, 1904], dtype=torch.int32)
hello model

[snip output]
Time for inference 1: 1043.69 sec total, 0.19 tokens/sec
Bandwidth achieved: 2.58 GB/s
Max Sequence Length Reached. Ending Conversation.
Average tokens/sec: 0

I will try it on a couple of other dtypes as well, but this feels outside the range of expectations @malfet?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions