-
Notifications
You must be signed in to change notification settings - Fork 1k
Benchmarking
Laurent Mazare edited this page Jul 1, 2023
·
4 revisions
There are two main metrics of interest, the time to process a prompt (for large prompts) and the time to generate each subsequent token once the initial prompt has been processed.
CPU benchmarking The following command can be used to benchmark the per token generation time (note that this uses f16 and a single thread).
OMP_NUM_THREADS=1 RAYON_NUM_THREADS=1 cargo run --release --example llama -- \
--cpu --npy llama.npz --prompt "the answer to life in the universe and everything is"
On a Ryzen 5 2600X, this results in a time of ~2s per token, flamegraph.