-
Notifications
You must be signed in to change notification settings - Fork 21
Open
Description
Current status
Currently support limited engines. Many potentially relevant engines are missing - some were previously integrated but dropped, others are new alternatives with promising performance.
- Tier 1 (High Priority)
- vLLM
- torch+transformers (true baseline)
- TRT-LLM (NVIDIA's, main competitor)
- Tier 2 (Medium Priority)
- SGLang
- any other inference backend (w/ server-mode) that could compete w/ vLLM
- Tier 3
- optimum-habana (Gaudi2/3)
- TGI/TGIS
- llama.cpp
- LMDeploy
- any other backend that might implement a few optimization techniques that we should have a look at
Plan to be initiated
- This issue serves as a tracking placeholder. Detailed implementation plan will be developed after initial investigation of each engine's integration requirements and compatibility with our framework.
Metadata
Metadata
Assignees
Labels
No labels