Skip to content

Commit 3f7b06a

Browse files
szabostevedavidkyleleemthompo
authored
[Search] Drafts how to use OpenAI compatible models with the inference API (#935)
## Overview Related to elastic/developer-docs-team#266 This PR describes how to install an OpenAI-compatible model on your local machine using Ollama and how to connect it to ES. ### Preview [Using OpenAI compatible models](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/935/solutions/search/semantic-search/using-openai-compatible-models) --------- Co-authored-by: David Kyle <[email protected]> Co-authored-by: Liam Thompson <[email protected]>
1 parent f481d95 commit 3f7b06a

File tree

5 files changed

+194
-0
lines changed

5 files changed

+194
-0
lines changed
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
Create a connector using the public URL from ngrok.
2+
3+
1. In Kibana, go to **Search > Playground**, and click **Connect to an LLM**.
4+
2. Select **OpenAI** on the fly-out.
5+
3. Provide a name for the connector.
6+
4. Under **Connector settings**, select **Other (OpenAI Compatible Service)** as the OpenAI provider.
7+
5. Paste the ngrok-generated URL into the **URL** field and add the `v1/chat/completions` endpoint. For example: https://your-ngrok-endpoint.ngrok-free.app/v1/chat/completions
8+
6. Specify the default model, for example, `llama3.2`.
9+
7. Provide any random string for the API key (it will not be used for requests).
10+
8. **Save**.
11+
:::{image} /solutions/images/elasticsearch-openai-compatible-connector.png
12+
:alt: Configuring an LLM connector in Playground
13+
:screenshot:
14+
:::
15+
9. Click **Add data sources** and connect your index.
16+
17+
You can now use Playground with the LLM running locally.
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
You can use your locally installed LLM with the {{infer}} API.
2+
3+
Create the {{infer}} endpoint for a `chat_completion` task type with the `openai` service with the following request:
4+
5+
```console
6+
PUT _inference/chat_completion/llama-completion
7+
{
8+
"service": "openai",
9+
"service_settings": {
10+
"api_key": "ignored", <1>
11+
"model_id": "llama3.2", <2>
12+
"url": "https://your-ngrok-endpoint.ngrok-free.app/v1/chat/completions" <3>
13+
}
14+
}
15+
```
16+
17+
1. The `api_key` parameter is required for the `openai` service and must be set, but the specific value is not important for the local AI service.
18+
2. The model name.
19+
3. The ngrok-generated URL with the chat completion endpoint (`v1/chat/completions`).
20+
21+
Verify if the {{infer}} endpoint working correctly:
22+
23+
```console
24+
POST _inference/chat_completion/llama-completion/_stream
25+
{
26+
"model": "llama3.2",
27+
"messages": [
28+
{
29+
"role": "user",
30+
"content": "What is the capital of France?"
31+
}
32+
],
33+
"temperature": 0.7,
34+
"max_completion_tokens": 300
35+
}
36+
```
37+
38+
The request results in a response similar to this:
39+
40+
```console-result
41+
event: message
42+
data: {
43+
"id" : "chatcmpl-416",
44+
"choices" : [
45+
{
46+
"delta" : {
47+
"content" : "The",
48+
"role" : "assistant"
49+
},
50+
"index" : 0
51+
}
52+
],
53+
"model" : "llama3.2",
54+
"object" : "chat.completion.chunk"
55+
}
56+
(...)
57+
```
Loading
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
---
2+
applies_to:
3+
stack: ga
4+
serverless: ga
5+
navigation_title: Using OpenAI compatible models
6+
---
7+
8+
# Using OpenAI compatible models with the {{infer-cap}} API
9+
10+
{{es}} enables you to use LLMs through the {{infer}} API, supporting providers such as Amazon Bedrock, Cohere, Google AI, HuggingFace, OpenAI, and more, as a service.
11+
It also allows you to use models deployed in your local environment that have an OpenAI compatible API.
12+
13+
This page shows you how to connect local models to {{es}} using Ollama.
14+
15+
[Ollama](https://ollama.com/) enables you to download and run LLM models on your own infrastructure.
16+
For a list of available models compatible with Ollama, refer to this [page](https://ollama.com/library).
17+
18+
Using Ollama ensures that your interactions remain private, as the models run on your infrastructure.
19+
20+
## Overview
21+
22+
In this tutorial, you learn how to:
23+
24+
* download and run Ollama,
25+
* use ngrok to expose your local web server hosting Ollama over the internet
26+
* connect your local LLM to Playground
27+
28+
## Download and run Ollama
29+
30+
1. [Download Ollama](https://ollama.com/download).
31+
2. Install Ollama using the downloaded file.
32+
Enable the command line tool for Ollama during installation.
33+
3. Choose a model from the [list of supported LLMs](https://ollama.com/library).
34+
This tutorial uses `llama 3.2`.
35+
4. Run the following command:
36+
```shell
37+
ollama pull llama3.2
38+
```
39+
40+
### Test the installed model
41+
42+
After installation, test the model.
43+
44+
1. Run `ollama run llama3.2` and ask a question, for example, "Are you working?"
45+
If the model is installed successfully, you receive a valid response.
46+
2. When the model is running, an API endpoint is enabled by default on port `11434`.
47+
To test it, make a request to the API using the following command:
48+
```shell
49+
curl http://localhost:11434/api/generate -d '{
50+
"model": "llama3.2",
51+
"prompt": "What is the capital of France?"
52+
}'
53+
```
54+
55+
Refer to the API [documentation](https://github.com/ollama/ollama/blob/main/docs/api.md) to learn more.
56+
The API returns a response similar to this:
57+
```json
58+
{"model":"llama3.2","created_at":"2025-03-26T10:07:05.500614Z","response":"The","done":false}
59+
{"model":"llama3.2","created_at":"2025-03-26T10:07:05.519131Z","response":" capital","done":false}
60+
{"model":"llama3.2","created_at":"2025-03-26T10:07:05.537432Z","response":" of","done":false}
61+
{"model":"llama3.2","created_at":"2025-03-26T10:07:05.556016Z","response":" France","done":false}
62+
{"model":"llama3.2","created_at":"2025-03-26T10:07:05.574815Z","response":" is","done":false}
63+
{"model":"llama3.2","created_at":"2025-03-26T10:07:05.592967Z","response":" Paris","done":false}
64+
{"model":"llama3.2","created_at":"2025-03-26T10:07:05.611558Z","response":".","done":false}
65+
{"model":"llama3.2","created_at":"2025-03-26T10:07:05.630715Z","response":"","done":true,"done_reason":"stop","context":[128006,9125,128007,271,38766,1303,33025,2696,25,6790,220,2366,18,271,128009,128006,882,128007,271,3923,374,279,6864,315,9822,30,128009,128006,78191,128007,271,791,6864,315,9822,374,12366,13],"total_duration":2232589542,"load_duration":1052276792,"prompt_eval_count":32,"prompt_eval_duration":1048833625,"eval_count":8,"eval_duration":130808916}
66+
```
67+
68+
## Expose the endpoint using ngrok
69+
70+
Since the created endpoint only works locally, it cannot be accessed from external services (for example, your Elastic Cloud instance).
71+
[ngrok](https://ngrok.com/) enables you to expose a local port with a public URL.
72+
73+
::::{warning}
74+
Exposing a local endpoint to the internet can introduce security risks. Anyone with the public URL may be able to send requests to your service. Avoid exposing sensitive data or functionality, and consider using authentication or access restrictions to limit who can interact with the endpoint.
75+
::::
76+
77+
1. Create an ngrok account and follow the [official setup guide](https://dashboard.ngrok.com/get-started/setup).
78+
2. After installing and configuring the ngrok agent, expose the Ollama port by running:
79+
```shell
80+
ngrok http 11434 --host-header="localhost:11434"
81+
```
82+
The command returns a public link that works as long as ngrok and the Ollama server are running locally:
83+
```shell
84+
Session Status online
85+
Account [email protected] (Plan: Free)
86+
Version 3.18.4
87+
Region United States (us)
88+
Latency 561ms
89+
Web Interface http://127.0.0.1:4040
90+
Forwarding https://your-ngrok-endpoint.ngrok-free.app -> http://localhost:11434
91+
92+
93+
Connections ttl opn rt1 rt5 p50 p90
94+
0 0 0.00 0.00 0.00 0.00
95+
```
96+
97+
3. Copy the ngrok-generated URL from the `Forwarding` line.
98+
4. Test the endpoint again using the new URL:
99+
```shell
100+
curl https://your-ngrok-endpoint.ngrok-free.app/api/generate -d '{
101+
"model": "llama3.2",
102+
"prompt": "What is the capital of France?"
103+
}'
104+
```
105+
The response should be similar to the previous one.
106+
107+
## Connecting the local LLM to Playground
108+
109+
:::{include} ../_snippets/connect-local-llm-to-playground.md
110+
:::
111+
112+
## Using the local LLM with the {{infer}} API
113+
114+
:::{include} ../_snippets/use-local-llm-inference-api.md
115+
:::
116+
117+
## Further reading
118+
119+
* [Using Ollama with the {{infer}} API](https://www.elastic.co/search-labs/blog/ollama-with-inference-api#expose-endpoint-to-the-internet-using-ngrok): A more comprehensive, end-to-end guide to using Ollama with {{es}}.

solutions/toc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ toc:
4343
- file: search/semantic-search/semantic-search-inference.md
4444
- file: search/semantic-search/semantic-search-elser-ingest-pipelines.md
4545
- file: search/semantic-search/cohere-es.md
46+
- file: search/using-openai-compatible-models.md
4647
- file: search/rag.md
4748
children:
4849
- file: search/rag/playground.md

0 commit comments

Comments
 (0)