LLM Inference Frameworks Benchmark

About

This project aims to find the benchmark performance of many popular LLM inference frameworks. I am currently planning to test vLLM, TensorRT-LLM, FasterTransformer, ONNX Runtime, and DeepSpeed. The intention is to determine the best inference framework given the specified hardware and LLM model. This will be based off multiple evaluation criterias such as performance metrics (ex. throughput, latency, and scalability), whilst also considering other factors such as hardware utlization efficiency, model support, ease of use, hardware/software flexibility, optimization features, deployment complexity and etc.

Test Specs

This project will be solely ran on Kaggle (unless problems arise) and the hardware and software specifications are noted in my Kaggle Specs Tester Notebook.

More to come...

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
kaggle_specs_tester.ipynb		kaggle_specs_tester.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Inference Frameworks Benchmark

About

Test Specs

About

Releases

Packages

Languages

rayzhou4/llm-inference-frameworks-benchmark

Folders and files

Latest commit

History

Repository files navigation

LLM Inference Frameworks Benchmark

About

Test Specs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages