Skip to content

Commit 87c936c

Browse files
authored
Merge pull request #36 from LlamaEdge/michael-refactor
Refactor the docs
2 parents bec8e6c + 8c21eae commit 87c936c

File tree

85 files changed

+180
-776
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

85 files changed

+180
-776
lines changed

docs/ai-models/_category_.json

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"label": "Serve AI models",
3+
"position": 5,
4+
"link": {
5+
"type": "generated-index",
6+
"description": "Serve open-source AI models via web APIs."
7+
}
8+
}

docs/user-guide/_category_.json renamed to docs/ai-models/embeddings/_category_.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
2-
"label": "User Guide",
3-
"position": 5,
2+
"label": "Embeddings",
3+
"position": 1,
44
"link": {
55
"type": "generated-index"
66
}

docs/ai-models/embeddings/index.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
---
2+
sidebar_position: 1
3+
---
4+
5+
# Working with embedding models
6+
7+
Embedding models compute vectors from text inputs. The vectors can then be used as search index
8+
for semantic search in a vector database.
9+
10+
### Step 1: Install WasmEdge
11+
12+
First off, you'll need WasmEdge, a high-performance, lightweight, and extensible WebAssembly (Wasm) runtime optimized for server-side and edge computing. To install WasmEdge along with the necessary plugin for AI inference, open your terminal and execute the following command:
13+
14+
```
15+
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash -s
16+
```
17+
18+
This command fetches and runs the WasmEdge installation script, which automatically installs WasmEdge and the WASI-NN plugin, essential for running LLM models like Llama 3.1.
19+
20+
### Step 2: Download the embedding model
21+
22+
Next, you'll need to obtain a model file. For this tutorial, we're focusing on the **GTW Qwen2 1.5B** model, which is a top rated text embedding model from Qwen. It generates vectors of 1536 dimensions. The steps are generally applicable to other models too. Use the following command to download the model file.
23+
24+
```
25+
curl -LO https://huggingface.co/second-state/gte-Qwen2-1.5B-instruct-GGUF/resolve/main/gte-Qwen2-1.5B-instruct-Q5_K_M.gguf
26+
```
27+
28+
### Step 3: Download a portable API server app
29+
30+
Next, you need an application that can build an OpenAI compatible API server for the model.
31+
The [LlamaEdge api server app](https://github.com/LlamaEdge/LlamaEdge/tree/main/llama-api-server) is a lightweight and cross-platform Wasm app that works on any device
32+
you might have. Just download the compiled binary app.
33+
34+
```
35+
curl -LO https://github.com/second-state/LlamaEdge/releases/latest/download/llama-api-server.wasm
36+
```
37+
38+
> The LlamaEdge apps are written in Rust and compiled to portable Wasm. That means they can run across devices and OSes without any change to the binary apps. You can simply download and run the compiled wasm apps regardless of your platform.
39+
40+
### Step 4: Start the API server
41+
42+
Start the API server with the following command. Notice that the context size of this particular embedding model is
43+
32k and the prompt template is `embedding`.
44+
45+
```
46+
wasmedge --dir .:. --nn-preload default:GGML:AUTO:gte-Qwen2-1.5B-instruct-Q5_K_M.gguf llama-api-server.wasm --model-name gte-qwen2-1.5b --ctx-size 32768 --batch-size 8192 --ubatch-size 8192 --prompt-template embedding
47+
```
48+
49+
### Step 5: Use the /embeddings API
50+
51+
You can now send embedding requests to it using the OpenAI-compatible `/embeddings` API endpoint.
52+
53+
```
54+
curl http://localhost:8080/v1/embeddings \
55+
-H "Content-Type: application/json" \
56+
-d '{
57+
"input": "The food was delicious and the waiter..."
58+
}'
59+
```
60+
61+
The response is.
62+
63+
```
64+
{"object":"list","data":[{"index":0,"object":"embedding","embedding":[0.02968290634,0.04592291266,0.05229084566,-0.001912750886,-0.01647545397,0.01744602434,0.008423444815,0.01363539882,-0.005849621724,-0.004947130103,-0.02326701023,0.1068811566,0.01074867789, ... 0.005662892945,-0.01796873659,0.02428019233,-0.0333112292]}],"model":"gte-qwen2-1.5b","usage":{"prompt_tokens":9,"completion_tokens":0,"total_tokens":9}}
65+
```
66+

docs/ai-models/index.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
---
2+
sidebar_position: 1
3+
---
4+
5+
# Introduction
6+
7+
LlamaEdge is a versatile platform supporting multiple types of AI models. The most common use of LlamaEdge is to
8+
stand up API servers that can replace OpenAI as your application's backend.
9+
10+
## 🤖 Large Language Models (LLM)
11+
Explore the LLM capabilities
12+
[Get Started with LLM](/docs/category/llm)
13+
14+
## 👁️ Multimodal Vision
15+
Work with vision-language models like Llava and Qwen-VL
16+
[Get Started with Multimodal](/docs/category/multimodal)
17+
18+
## 👁️ Embeddings
19+
Work with embedding models for vector and semantic search
20+
[Get Started with Multimodal](/docs/category/embeddings)
21+
22+
## 🎙️ Speech to Text
23+
Run speech-to-text models like Whisper
24+
[Get Started with Speech to Text](/docs/category/speech-to-text)
25+
26+
## 🗣️ Text to Speech
27+
Convert text-to-speech using models like GPT-SOVITs and Piper
28+
[Get Started with Text to Speech](/docs/category/text-to-speech)
29+
30+
## 🎨 Text to Image
31+
Generate images using models like Stable Diffusion and FLUX
32+
[Get Started with Text-to-Image](/docs/category/text-to-image)
33+

docs/user-guide/llamaedge-docker.md renamed to docs/ai-models/llamaedge-docker.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,5 +96,5 @@ docker push secondstate/qwen-2-0.5b-allminilm-2:latest
9696

9797
## What's next
9898

99-
Use the container as a drop-in replacement for the OpenAI API for your favorite agent app or framework! [See some examples here](openai-api/intro.md).
99+
Use the container as a drop-in replacement for the OpenAI API for your favorite agent app or framework! [See some examples here](../llama-nexus/openai-api/intro.md).
100100

File renamed without changes.
File renamed without changes.
File renamed without changes.

docs/user-guide/llm/get-started-with-llamaedge.md renamed to docs/ai-models/llm/quick-start-llm.md

Lines changed: 32 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -40,24 +40,51 @@ curl -LO https://github.com/second-state/LlamaEdge/releases/latest/download/llam
4040

4141
> The LlamaEdge apps are written in Rust and compiled to portable Wasm. That means they can run across devices and OSes without any change to the binary apps. You can simply download and run the compiled wasm apps regardless of your platform.
4242
43-
### Step 4: Chat with the chatbot UI
4443

45-
The `llama-api-server.wasm` is a web server with an OpenAI-compatible API. You still need HTML files for the chatbot UI.
46-
Download and unzip the HTML UI files as follows.
44+
### Step 4: Use the API
45+
46+
Start the web server by running the `llama-api-server.wasm` app in WasmEdge.
47+
48+
```
49+
wasmedge --dir .:. --nn-preload default:GGML:AUTO:Llama-3.2-1B-Instruct-Q5_K_M.gguf llama-api-server.wasm -p llama-3-chat
50+
```
51+
52+
The `llama-api-server.wasm` is a web server.
53+
You can use the OpenAI-compatible `/chat/completions` API endpoint directly.
54+
55+
```
56+
curl -X POST http://localhost:8080/v1/chat/completions \
57+
-H 'accept: application/json' \
58+
-H 'Content-Type: application/json' \
59+
-d '{"messages":[{"role":"system", "content": "You are a helpful assistant. Try to be as brief as possible."}, {"role":"user", "content": "Where is the capital of Texas?"}]}'
60+
```
61+
62+
The response is.
63+
64+
```
65+
{"id":"chatcmpl-5f0b5247-7afc-45f8-bc48-614712396a05","object":"chat.completion","created":1751945744,"model":"Mistral-Small-3.1-24B-Instruct-2503-Q5_K_M","choices":[{"index":0,"message":{"content":"The capital of Texas is Austin.","role":"assistant"},"finish_reason":"stop","logprobs":null}],"usage":{"prompt_tokens":38,"completion_tokens":8,"total_tokens":46}}
66+
```
67+
68+
### Step 5: Chat with the chatbot UI
69+
70+
The Chatbot UI is a web app that can interact with the OpenAI-compatible `/chat/completions` API to
71+
provide a human-friendly chatbot in your browser.
72+
73+
Download and unzip the HTML and JS files for the Chatbot UI as follows.
4774

4875
```
4976
curl -LO https://github.com/LlamaEdge/chatbot-ui/releases/latest/download/chatbot-ui.tar.gz
5077
tar xzf chatbot-ui.tar.gz
5178
rm chatbot-ui.tar.gz
5279
```
5380

54-
Then, start the web server.
81+
Restart the web server to serve those HTML and JS files.
5582

5683
```
5784
wasmedge --dir .:. --nn-preload default:GGML:AUTO:Llama-3.2-1B-Instruct-Q5_K_M.gguf llama-api-server.wasm -p llama-3-chat
5885
```
5986

6087
Go to `http://localhost:8080` on your computer to access the chatbot UI on a web page!
6188

62-
Congratulations! You have now started an LLM app on your own device. But if you are interested in running an agentic app beyond the simple chatbot, you will need to start an API server for this LLM along with the embedding model. Check out [this guide on how to do it](/docs/user-guide/openai-api/intro.md)!
89+
Congratulations! You have now started an LLM app on your own device.
6390

docs/user-guide/llm/tool-call.md renamed to docs/ai-models/llm/tool-call.md

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ In this tutorial, we will show you a simple Python program that allows a local L
1414

1515
## Prerequisites
1616

17-
Follow [this guide](/docs/user-guide/openai-api/intro.md) to start an LlamaEdge API server.
17+
Follow [this guide](quick-start-llm.md) to start an LlamaEdge API server.
1818
For example, we will need an open source model that is capable of tool calling.
1919
The Llama 3.1 8B model is a good choice. Let's download the model file.
2020

@@ -27,14 +27,12 @@ Then start the LlamaEdge API server for this model as follows.
2727
```
2828
wasmedge --dir .:. \
2929
--nn-preload default:GGML:AUTO:Meta-Llama-3.1-8B-Instruct-Q5_K_M.gguf \
30-
--nn-preload embedding:GGML:AUTO:nomic-embed-text-v1.5.f16.gguf \
3130
llama-api-server.wasm \
32-
--model-alias default,embedding \
33-
--model-name Meta-Llama-3.1-8B-Instruct-Q5_K_M,nomic-embed \
34-
--prompt-template llama-3-tool,embedding \
35-
--batch-size 128,8192 \
36-
--ubatch-size 128,8192 \
37-
--ctx-size 8192,8192
31+
--model-name Meta-Llama-3.1-8B-Instruct-Q5_K_M \
32+
--prompt-template llama-3-tool \
33+
--batch-size 128 \
34+
--ubatch-size 128 \
35+
--ctx-size 8192
3836
```
3937

4038
Note the `llama-3-tool` prompt template. It constructs user queries and LLM responses, including JSON messages for tool calls, into proper formats that the model is finetuned to follow.
@@ -56,7 +54,7 @@ pip install -r requirements.txt
5654
Set the environment variables for the API server and model name we just set up.
5755

5856
```
59-
export OPENAI_MODEL_NAME="llama-3-groq-8b"
57+
export OPENAI_MODEL_NAME="Meta-Llama-3.1-8B-Instruct-Q5_K_M"
6058
export OPENAI_BASE_URL="http://127.0.0.1:8080/v1"
6159
```
6260

0 commit comments

Comments
 (0)