Open
Description
I'd like to benchmark the optimized performance of LLAMA2 model with BigDL acceleration on SPR machine.
I followed the README in python/llm/example/CPU/Native-Models, which executed normally and printed the timing message.
However, in the timing message, the prompt eval time (which is also the first token latency) is abnormal, as shown below.
bigdl-llm timings: prompt eval time = 0.00 ms / 1 tokens ( 0.00 ms per token)
The prompt eval time is zero, and the number of tokens didn't include the prompt tokens, it's different from ggml llama.cpp output message.