Skip to content

Commit 6d06244

Browse files
committed
Add README and scripts which can run successfully.
1 parent e694856 commit 6d06244

File tree

4 files changed

+194
-1
lines changed

4 files changed

+194
-1
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
/.idea

README.md

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,32 @@
11
# chat2db-chatglm-6b-deploy
2-
It shows how to deploy your own chatglm-6b and use it in chat2db
2+
3+
## 📖 Introduction
4+
This project shows how to deploy chatglm-6b to the free cloud resources or your local machine. And it also shows how to use the chatglm-6b in chat2db client.
5+
6+
## 📦 Prerequisites
7+
| Model | GPU(Inference) | GPU(Finetue) |
8+
| :----: | :----: | :----: |
9+
| ChatGLM-6B-int4 | 6GB | 7GB |
10+
## 📦 Deploy
11+
### 📦 Deploy to the google colab
12+
1. Open the [chatglm-6b-int4-deploy.ipynb](https://colab.research.google.com/drive/1-jKsKISmlMCWTbaV-3HYBWbrTWxNLzOo?usp=sharing) in the google colab. In our case, we can run the model in google colab absolutely free.
13+
2. Run the code of step1 to step6 in the notebook.
14+
3. After the step6, you will get the public demo url for your model such as `https://3cef73d65765afdfea.gradio.live`. Click the url to check if the model is deployed successfully. And you can also experiment with the model as you want. Click stop button to stop the web demo.
15+
<img src="https://alidocs.oss-cn-zhangjiakou.aliyuncs.com/res/4j6OJdYA60Y7n3p8/img/4bc2c26f-fa57-44be-a336-3e5729a2d104.png?x-oss-process=image/resize,w_640,m_lfit,limit_1">
16+
4. Run the code of step7 to step9 in the notebook.
17+
5. After the step9, you will get the api url for your model such as `https://dfb1-34-87-2-137.ngrok.io`. Run below code in your local machine to check if the model is deployed successfully.
18+
<img src="https://alidocs.oss-cn-zhangjiakou.aliyuncs.com/res/4j6OJdYA60Y7n3p8/img/bec200c8-a343-45ff-b9a0-2bd21985da9a.png?x-oss-process=image/resize,w_640,m_lfit,limit_1">
19+
```bash
20+
curl -X POST "your api url" \
21+
-H 'Content-Type: application/json' \
22+
-d '{"prompt": "Hello", "history": []}'
23+
```
24+
6. After you get the result, you can copy the url and use it in the chat2db client. Set the url in the client as below:
25+
<img src="https://alidocs.oss-cn-zhangjiakou.aliyuncs.com/res/4j6OJdYA60Y7n3p8/img/ca844185-2744-49e0-ab75-245e19b872d6.png?x-oss-process=image/resize,w_640,m_lfit,limit_1">
26+
7. Now you can chat with the model in the chat2db client. Enjoy it!
27+
28+
* Note: The google colab will disconnect after 12 hours. You can rerun the notebook to get the public demo url and api url again. And also, the network speed of google colab is not very fast. So it may take a long time to download the model and run the model. Please be patient.
29+
30+
### 📦 Deploy to the local machine
31+
* Since the network in google colab is not very fast, we can also deploy the model to our local machine. The script for deploy in your local machine is similar to the script in the google colab. Just follow the steps in [chatglm-6b-int4-deploy.ipynb](https://colab.research.google.com/drive/1-jKsKISmlMCWTbaV-3HYBWbrTWxNLzOo?usp=sharing).
32+
* Note: when you deploy the model in your local machine, you need to change the model path from '/content/chatglm-6b-int4' to the path of your local machine. You need also change the api url in the chat2db client to the url of your local machine.

api.py

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
from fastapi import FastAPI, Request
2+
from transformers import AutoTokenizer, AutoModel
3+
import uvicorn, json, datetime
4+
import torch
5+
import nest_asyncio
6+
from pyngrok import ngrok
7+
8+
DEVICE = "cuda"
9+
DEVICE_ID = "0"
10+
CUDA_DEVICE = f"{DEVICE}:{DEVICE_ID}" if DEVICE_ID else DEVICE
11+
12+
13+
def torch_gc():
14+
if torch.cuda.is_available():
15+
with torch.cuda.device(CUDA_DEVICE):
16+
torch.cuda.empty_cache()
17+
torch.cuda.ipc_collect()
18+
19+
20+
app = FastAPI()
21+
22+
23+
@app.post("/")
24+
async def create_item(request: Request):
25+
global model, tokenizer
26+
json_post_raw = await request.json()
27+
json_post = json.dumps(json_post_raw)
28+
json_post_list = json.loads(json_post)
29+
prompt = json_post_list.get('prompt')
30+
history = json_post_list.get('history')
31+
max_length = json_post_list.get('max_length')
32+
top_p = json_post_list.get('top_p')
33+
temperature = json_post_list.get('temperature')
34+
response, history = model.chat(tokenizer,
35+
prompt,
36+
history=history,
37+
max_length=max_length if max_length else 2048,
38+
top_p=top_p if top_p else 0.7,
39+
temperature=temperature if temperature else 0.95)
40+
now = datetime.datetime.now()
41+
time = now.strftime("%Y-%m-%d %H:%M:%S")
42+
answer = {
43+
"response": response,
44+
"history": history,
45+
"status": 200,
46+
"time": time
47+
}
48+
log = "[" + time + "] " + '", prompt:"' + prompt + '", response:"' + repr(response) + '"'
49+
print(log)
50+
torch_gc()
51+
return response
52+
53+
54+
if __name__ == '__main__':
55+
tokenizer = AutoTokenizer.from_pretrained("/content/chatglm-6b-int4", trust_remote_code=True)
56+
model = AutoModel.from_pretrained("/content/chatglm-6b-int4", trust_remote_code=True).half().cuda()
57+
model.eval()
58+
ngrok_tunnel = ngrok.connect(8000)
59+
print('Public URL:', ngrok_tunnel.public_url)
60+
nest_asyncio.apply()
61+
uvicorn.run(app, host='0.0.0.0', port=8000, workers=1)

web_demo.py

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
from transformers import AutoModel, AutoTokenizer
2+
import gradio as gr
3+
import mdtex2html
4+
5+
tokenizer = AutoTokenizer.from_pretrained("/content/chatglm-6b-int4", trust_remote_code=True)
6+
model = AutoModel.from_pretrained("/content/chatglm-6b-int4", trust_remote_code=True).half().cuda()
7+
model = model.eval()
8+
9+
"""Override Chatbot.postprocess"""
10+
11+
12+
def postprocess(self, y):
13+
if y is None:
14+
return []
15+
for i, (message, response) in enumerate(y):
16+
y[i] = (
17+
None if message is None else mdtex2html.convert((message)),
18+
None if response is None else mdtex2html.convert(response),
19+
)
20+
return y
21+
22+
23+
gr.Chatbot.postprocess = postprocess
24+
25+
26+
def parse_text(text):
27+
"""copy from https://github.com/GaiZhenbiao/ChuanhuChatGPT/"""
28+
lines = text.split("\n")
29+
lines = [line for line in lines if line != ""]
30+
count = 0
31+
for i, line in enumerate(lines):
32+
if "```" in line:
33+
count += 1
34+
items = line.split('`')
35+
if count % 2 == 1:
36+
lines[i] = f'<pre><code class="language-{items[-1]}">'
37+
else:
38+
lines[i] = f'<br></code></pre>'
39+
else:
40+
if i > 0:
41+
if count % 2 == 1:
42+
line = line.replace("`", "\`")
43+
line = line.replace("<", "&lt;")
44+
line = line.replace(">", "&gt;")
45+
line = line.replace(" ", "&nbsp;")
46+
line = line.replace("*", "&ast;")
47+
line = line.replace("_", "&lowbar;")
48+
line = line.replace("-", "&#45;")
49+
line = line.replace(".", "&#46;")
50+
line = line.replace("!", "&#33;")
51+
line = line.replace("(", "&#40;")
52+
line = line.replace(")", "&#41;")
53+
line = line.replace("$", "&#36;")
54+
lines[i] = "<br>"+line
55+
text = "".join(lines)
56+
return text
57+
58+
59+
def predict(input, chatbot, max_length, top_p, temperature, history):
60+
chatbot.append((parse_text(input), ""))
61+
for response, history in model.stream_chat(tokenizer, input, history, max_length=max_length, top_p=top_p,
62+
temperature=temperature):
63+
chatbot[-1] = (parse_text(input), parse_text(response))
64+
65+
yield chatbot, history
66+
67+
68+
def reset_user_input():
69+
return gr.update(value='')
70+
71+
72+
def reset_state():
73+
return [], []
74+
75+
76+
with gr.Blocks() as demo:
77+
gr.HTML("""<h1 align="center">ChatGLM</h1>""")
78+
79+
chatbot = gr.Chatbot()
80+
with gr.Row():
81+
with gr.Column(scale=4):
82+
with gr.Column(scale=12):
83+
user_input = gr.Textbox(show_label=False, placeholder="Input...", lines=10).style(
84+
container=False)
85+
with gr.Column(min_width=32, scale=1):
86+
submitBtn = gr.Button("Submit", variant="primary")
87+
with gr.Column(scale=1):
88+
emptyBtn = gr.Button("Clear History")
89+
max_length = gr.Slider(0, 4096, value=2048, step=1.0, label="Maximum length", interactive=True)
90+
top_p = gr.Slider(0, 1, value=0.7, step=0.01, label="Top P", interactive=True)
91+
temperature = gr.Slider(0, 1, value=0.95, step=0.01, label="Temperature", interactive=True)
92+
93+
history = gr.State([])
94+
95+
submitBtn.click(predict, [user_input, chatbot, max_length, top_p, temperature, history], [chatbot, history],
96+
show_progress=True)
97+
submitBtn.click(reset_user_input, [], [user_input])
98+
99+
emptyBtn.click(reset_state, outputs=[chatbot, history], show_progress=True)
100+
101+
demo.queue().launch(share=True, inbrowser=True)

0 commit comments

Comments
 (0)