Skip to content

Llama benchmark #112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Llama benchmark #112

wants to merge 5 commits into from

Conversation

gxsoar
Copy link

@gxsoar gxsoar commented Nov 25, 2023

Llama Benchmark

Use PyTorch with TorchDynamo to perform vicuna end-to-end inference.

Environments

Run on Ubuntu 22.04.1 LTS
CPU: Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz
GPU: NVIDIA GeForce RTX 3090
CUDA:CUDA Version: 12.0
python:python3.9
pytorch:2.0.0+cu118
Anaconda:Miniconda3

Benchmark Time

CPU time per round of inference:
pytorch average time per round of inference: 982.4393878173828 ms
pytorch with torchdynamo average time per round of inference:977.5693103027344 ms
GPU time per round of inference:
pytorch average time per round of inference: 25.33698874791463ms
pytorch with torchdynamo average time per round of inference:19.13074951807658ms

Copy link

@gxsoar gxsoar changed the title Vicuna-7b test Llama benchmark Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant