Skip to content

Commit c8c850a

Browse files
committed
Release 0.0.15
Signed-off-by: kerthcet <[email protected]>
1 parent 592d835 commit c8c850a

File tree

10 files changed

+66
-40
lines changed

10 files changed

+66
-40
lines changed

README.md

Lines changed: 45 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,16 @@
11
# llmlite
22

3-
A library helps to communicate with all kinds of LLMs consistently.
3+
**llmlite** is a library helps to communicate with all kinds of LLMs consistently.
4+
5+
## Features
6+
7+
- State-of-the-art LLMs support
8+
- Continuous Batching via [vLLM](https://github.com/vllm-project/vllm)
9+
- Quantization([issue#37](https://github.com/InftyAI/llmlite/issues/37))
10+
- Adapter support([issue#51](https://github.com/InftyAI/llmlite/issues/51))
11+
- Streaming support([issue#52](https://github.com/InftyAI/llmlite/issues/52))
12+
13+
### Model Support
414

515
| Model | State | System Prompt | Note |
616
| ---- | ---- | ---- | ---- |
@@ -14,28 +24,26 @@ A library helps to communicate with all kinds of LLMs consistently.
1424
| Falcon | RoadMap 📋 | | [issue#8](https://github.com/InftyAI/ChatLLM/issues/8)
1525
| StableLM | RoadMap 📋 | | [issue#11](https://github.com/InftyAI/ChatLLM/issues/11) |
1626
| Baichuan2 | RoadMap 📋 | | [issue#34](https://github.com/InftyAI/llmlite/issues/34)
17-
| ... | ... | ... | ... |
1827

19-
We're also planning to support different inference backends as below:
28+
### Backend Support
2029

2130
| backend | State | Note |
2231
| ---- | ---- | ---- |
2332
| [huggingface](https://github.com/huggingface) | Done ✅ | Support by huggingface pipeline |
2433
| [vLLM](https://github.com/vllm-project/vllm) | Done ✅ | |
25-
| ... | ... | ... |
2634

2735
## How to install
2836

2937
```cmd
30-
pip install llmlite==0.0.9
38+
pip install llmlite==0.0.15
3139
```
3240

3341
## How to use
3442

3543
### Chat
3644

3745
```python
38-
from llmlite.apis import ChatLLM, ChatMessage
46+
from llmlite import ChatLLM, ChatMessage
3947

4048
chat = ChatLLM(
4149
model_name_or_path="meta-llama/Llama-2-7b-chat-hf", # required
@@ -53,6 +61,35 @@ result = chat.completion(
5361

5462
```
5563

64+
### Continuous Batching
65+
66+
_This is mostly supported by vLLM, you can enable this by configuring the **backend**._
67+
68+
```python
69+
from llmlite import ChatLLM, ChatMessage
70+
71+
chat = ChatLLM(
72+
model_name_or_path="meta-llama/Llama-2-7b-chat-hf",
73+
backend="vllm",
74+
)
75+
76+
results = chat.completion(
77+
messages=[
78+
[
79+
ChatMessage(role="system", content="You're a honest assistant."),
80+
ChatMessage( role="user", content="There's a llama in my garden, what should I do?"),
81+
],
82+
[
83+
ChatMessage(role="user", content="What's the population of the world?"),
84+
],
85+
],
86+
max_tokens=2048,
87+
)
88+
89+
for result in results:
90+
print(f"RESULT: \n{result}\n\n")
91+
```
92+
5693
`llmlite` also supports other parameters like `temperature`, `max_length`, `do_sample`, `top_k`, `top_p` to help control the length, randomness and diversity of the generated text.
5794

5895
See **[examples](./examples/)** for reference.
@@ -62,14 +99,14 @@ See **[examples](./examples/)** for reference.
6299
You can use `llmlite` to help you generate full prompts, for instance:
63100

64101
```python
65-
from llmlite.apis import ChatMessage, LlamaChat
102+
from llmlite import ChatLLM
66103

67104
messages = [
68105
ChatMessage(role="system", content="You're a honest assistant."),
69106
ChatMessage(role="user", content="There's a llama in my garden, what should I do?"),
70107
]
71108

72-
LlamaChat.prompt(messages)
109+
ChatLLM.prompt("meta-llama/Llama-2-7b-chat-hf", messages)
73110

74111
# Output:
75112
# <s>[INST] <<SYS>>
@@ -83,12 +120,6 @@ LlamaChat.prompt(messages)
83120

84121
Set the env variable `LOG_LEVEL` for log configuration, default to `INFO`, others like DEBUG, INFO, WARNING etc..
85122

86-
## Roadmap
87-
88-
- Adapter support
89-
- Quantization
90-
- Streaming
91-
92123
## Contributions
93124

94125
🚀 All kinds of contributions are welcomed ! Please follow [Contributing](/CONTRIBUTING.md).

examples/chatglm2.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
from llmlite.apis import ChatLLM, ChatMessage
1+
from llmlite import ChatLLM, ChatMessage
22

33
chat = ChatLLM(
44
model_name_or_path="THUDM/chatglm2-6b",

examples/chatgpt.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
from llmlite.apis import ChatLLM, ChatMessage
1+
from llmlite import ChatLLM, ChatMessage
22

33
# You should set the OPENAI_API_KEY first.
44

examples/codellama.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
from llmlite.apis import ChatLLM, ChatMessage
1+
from llmlite import ChatLLM, ChatMessage
22

33
chat = ChatLLM(
44
model_name_or_path="codellama/CodeLlama-13b-instruct-hf",

examples/llama2.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
from llmlite.apis import ChatLLM, ChatMessage
1+
from llmlite import ChatLLM, ChatMessage
22

33
chat = ChatLLM(
44
model_name_or_path="meta-llama/Llama-2-7b-chat-hf",

examples/vllm.py

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
1-
from llmlite.apis import ChatLLM, ChatMessage
1+
from llmlite import ChatLLM, ChatMessage
22

33
chat = ChatLLM(
44
model_name_or_path="meta-llama/Llama-2-7b-chat-hf",
5-
task="text-generation",
65
backend="vllm",
76
)
87

9-
result = chat.completion(
8+
results = chat.completion(
109
messages=[
1110
[
1211
ChatMessage(role="system", content="You're a honest assistant."),
@@ -15,8 +14,7 @@
1514
),
1615
],
1716
[
18-
ChatMessage(role="system", content="You're a honest assistant."),
19-
ChatMessage(role="user", content="How many people are their in China?"),
17+
ChatMessage(role="user", content="How many people are there in China?"),
2018
],
2119
],
2220
max_tokens=2048,
@@ -25,4 +23,5 @@
2523
# top_k=3,
2624
)
2725

28-
print(result)
26+
for result in results:
27+
print(f"RESULT: \n{result}\n\n")

llmlite/__init__.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
from llmlite.apis.chatllm import ChatLLM
2+
from llmlite.llms.messages import ChatMessage
3+
4+
__version__ = "0.0.15"
5+
6+
__all__ = [
7+
"ChatLLM",
8+
"ChatMessage",
9+
]

llmlite/apis/__init__.py

Lines changed: 0 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +0,0 @@
1-
from llmlite.apis.chatllm import ChatLLM
2-
from llmlite.llms.messages import ChatMessage
3-
from llmlite.llms.chatglm import ChatGLM
4-
from llmlite.llms.llama import Llama
5-
from llmlite.llms.chatgpt import ChatGPT
6-
7-
__version__ = "0.0.7"
8-
9-
__all__ = [
10-
"ChatLLM",
11-
"ChatMessage",
12-
"Llama",
13-
"ChatGLM",
14-
"ChatGPT",
15-
]

llmlite/backends/vllm_backend.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ def __init__(
1313
**kwargs,
1414
):
1515
trust_remote_code = kwargs.pop("trust_remote_code", True)
16+
# 'task' is an unexpected keyword argument to vLLM.
17+
_ = kwargs.pop("task", None)
1618

1719
self._model = vllm(
1820
model=model_name_or_path,

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[tool.poetry]
22
name = "llmlite"
3-
version = "0.0.9"
3+
version = "0.0.15"
44
description = "A library helps to chat with all kinds of LLMs consistently."
55
authors = ["InftyAI"]
66
license = "MIT License"

0 commit comments

Comments
 (0)