Skip to content

Commit 5b18c17

Browse files
committed
Added a tutorial in order to create a RAG using OpenSearch ML on Scalingo
resolves #3941
1 parent 166075f commit 5b18c17

File tree

8 files changed

+144
-0
lines changed

8 files changed

+144
-0
lines changed

_community_members/samirakarioh.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
short_name: samirakarioh
3+
name: Samir Akarioh
4+
photo: '/assets/media/community/members/samirakarioh.jpg'
5+
github: SC-Samir
6+
linkedin: 'samir-akarioh'
7+
---
8+
**Samir Akarioh** is Devrel at Scalingo, a European PAAS; his hobbies include hiking, video games, and conference.
Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
---
2+
layout: post
3+
title: "Build Your First RAG with OpenSearch® and Scalingo"
4+
authors:
5+
- samirakarioh
6+
date: 2025-09-12
7+
categories:
8+
- technical-post
9+
meta_keywords: opensearch, vector database, retrieval augmented generation, rag tutorial, huggingface, semantic search, ai search, embeddings, scalingo, ml, GenAI, machine learning
10+
meta_description: A step-by-step tutorial on building a Retrieval-Augmented Generation (RAG) pipeline using a HuggingFace model and OpenSearch® on Scalingo’s PaaS platform, with full setup and code examples
11+
has_math: false
12+
has_science_table: false
13+
---
14+
15+
In the past, building a RAG (Retrieval-Augmented Generation) meant juggling many different tools. Today, the process is much simpler: you just need [HuggingFace](https://huggingface.co/) to get your model and OpenSearch® as a vector database. In this tutorial, we’ll walk you through the entire process step by step, and show you how to build your own RAG using Scalingo and their OpenSearch® offering.
16+
17+
<aside>
18+
📼
19+
20+
If you’d rather watch than read, [here’s the video version](https://youtu.be/Wmr-F72EUYs) of this tutorial.
21+
22+
</aside>
23+
24+
## Getting started
25+
26+
The first step is to [create an account on Scalingo](https://auth.scalingo.com/users/sign_uphttps://scalingo.com/blog/30-days-to-explore-scalingo-free-trial-details?utm_source=devrel&utm_medium=partner-post&utm_campaign=opensearch&utm_content=tutorial) or [log in](https://auth.scalingo.com/users/sign_in?utm_source=devrel&utm_medium=partner-post&utm_campaign=opensearch&utm_content=tutorial) to your existing one.
27+
28+
Keep in mind that the 30-day free trial offered at sign-up does **not** include the integration, use, or activation of OpenSearch®. If you want to follow this tutorial right away, you’ll need to end your trial by adding a payment method.
29+
30+
Alternatively, you can use your free trial period to explore other features of the platform, and then come back to this tutorial once you’re ready to get started with OpenSearch®.
31+
32+
<aside>
33+
💡
34+
35+
[More info](https://scalingo.com/fr/blog/30-jours-pour-decouvrir-scalingo-tous-les-details-de-la-version-d-essai) on their free trial and what is included.
36+
37+
</aside>
38+
39+
Once your account is set up, [choose one of the OpenSearch-provided pretrained models](https://docs.opensearch.org/latest/ml-commons-plugin/pretrained-models/). In our example, we’ll be using `huggingface/sentence-transformers/all-MiniLM-L6-v2`.
40+
41+
## Creating Your App on Scalingo
42+
43+
Now, head back to your Scalingo dashboard. We’re going to create an application on the platform, to set up the OpenSearch® Dashboard.
44+
45+
46+
![Create an app](/assets/media/blog-images/2025-09-18-Build-Your-First-RAG-with-OpenSearch-and-Scalingo/creation_of_app.png){:class="img-centered"}
47+
48+
Choose the Git deployment option, selecting the HDS ([Health Data Hosting](https://scalingo.com/blog/health-data-hosting)) or [SecNumCloud](https://scalingo.com/qualification-secnumcloud) offering if your app uses sensitive data. Else, leave the default option.
49+
50+
51+
![Choose a repo](/assets/media/blog-images/2025-09-18-Build-Your-First-RAG-with-OpenSearch-and-Scalingo/choose_git.png){:class="img-centered"}
52+
53+
54+
Back in the Scalingo dashboard, it’s time to add an OpenSearch® database to our application. To do this, click on your application, and in the “addons” section, click on “manage”. Next, click on “add an addon” and select OpenSearch®.
55+
56+
![Add a OpenSearch Addon](/assets/media/blog-images/2025-09-18-Build-Your-First-RAG-with-OpenSearch-and-Scalingo/opensearch_addon.png){:class="img-centered"}
57+
58+
59+
Scalingo offers several database plans, depending on your needs. But, for this app, we recommend choosing the Business plan so you can take advantage of high availability and multi-node setups.
60+
61+
![Price of Opensearch Plan](/assets/media/blog-images/2025-09-18-Build-Your-First-RAG-with-OpenSearch-and-Scalingo/opensearch_plan.png){:class="img-centered"}
62+
63+
64+
<aside>
65+
💡
66+
67+
Need help choosing the right plan? Visit the [comparison page](https://scalingo.com/databases/opensearch) or [reach out to their team](https://scalingo.com/contact).
68+
69+
</aside>
70+
71+
Now it's time to install the OpenSearch® dashboard. To do this, go to the **Environment Variables** section of your OpenSearch® Dashboard app and add the following environment variable:
72+
73+
```
74+
BUILDPACK_URL="https://github.com/Scalingo/opensearch-dashboards-buildpack"
75+
```
76+
77+
Installing the OpenSearch® dashboard will make it easier to track each stage of the process and give you access to the Dev Tools.
78+
79+
In your code editor, clone our repository for OpenSearch® Dashboard:
80+
81+
```
82+
git clone https://github.com/Scalingo/opensearch-dashboards-scalingo
83+
```
84+
85+
Navigate into the folder (`cd`) and add the remote connection with: `git remote add scalingo <your_opensearch_dashboard_app_url>` Replace <your_opensearch_dashboard_app_url> with the remote URL of your OpenSearch® Dashboard application on Scalingo.
86+
87+
Finally, push your commit to Scalingo.
88+
89+
## Setting Up the Model and Vectors
90+
91+
Now it’s time to deploy and register the model in OpenSearch®.
92+
Registering the model tells OpenSearch® how to connect to your custom model server.
93+
94+
To do this, your model must be in the ONNX format. You can find more details on how to configure your model on its page on Hugging Face.
95+
96+
Go back to Scalingo and select the application that contains your OpenSearch® Dashboard. Open it and make sure the OpenSearch® dashboard page loads correctly. Log in using your user credentials, which can be found in the environment variable `SCALINGO_OPENSEARCH_URL` on your application dashboard, then navigate to **Dev Tools**.
97+
98+
Next add the [following parameters](https://docs.opensearch.org/latest/ml-commons-plugin/pretrained-models/#prerequisites):
99+
100+
![OpenSearchML Settings](/assets/media/blog-images/2025-09-18-Build-Your-First-RAG-with-OpenSearch-and-Scalingo/opensearch_settings.png){:class="img-centered"}
101+
102+
- The first setting allows OpenSearch® to download the model online
103+
- The second allows the model to be launched on all OpenSearch® nodes
104+
- The last two remove memory limits and enable access control.
105+
106+
These parameters are crucial to ensure your model is correctly loaded and optimised across your entire cluster.
107+
108+
This is also where you’ll be able to register your model group, by entering [this request](https://docs.opensearch.org/latest/tutorials/vector-search/semantic-search/semantic-search-asymmetric/#step-3-register-a-model-group) in the DevTools. You can choose the name you’d like for your group, but make sure to keep the ID obtained after sending your request. Follow the steps 4 and 5 of [this page](https://docs.opensearch.org/latest/tutorials/vector-search/semantic-search/semantic-search-asymmetric/#step-4-register-the-model) to complete the registration of your model and its deployment. All the information about the model you chose, like its name and version, are available on the OpenSearch® website. After these steps, keep your model ID handy.
109+
110+
Now, you’ll need a way to convert your documents into embeddings. To do this, create an ingestion pipeline by following the process described [here](https://docs.opensearch.org/latest/vector-search/ai-search/semantic-search/#step-1-create-an-ingest-pipeline). Make sure to put the ID obtained in the previous step in the `model_id` field .
111+
112+
Next, you’ll need to create a [vector index](https://docs.opensearch.org/latest/vector-search/ai-search/semantic-search/#step-2-create-an-index-for-ingestion). A vector index is a structure that allows you to store and efficiently retrieve vectors. Enter the request indicated on the OpenSearch® website and make sure to modify the “default_pipeline” field so that it matches the name you gave to your pipeline created in the previous step.
113+
114+
**Note:** Make sure that the dimension in your mapping matches the output dimension of your model.
115+
116+
Finally, we’ll add documents to our index. To do this, ingest the documents you chose with the following request:
117+
118+
```
119+
PUT /my-nlp-index/_doc/1
120+
{
121+
"passage_text": "Hello world",
122+
"id": "s1"
123+
}
124+
```
125+
126+
Perform the operation as many times as necessary, changing the number at the end of the endpoint, as shown in [this example](https://docs.opensearch.org/latest/vector-search/ai-search/semantic-search/#step-3-ingest-documents-into-the-index).
127+
128+
You can also add several documents at the same time, with the `/_bulk` endpoint, as you can see in [this example](https://docs.opensearch.org/latest/tutorials/vector-search/semantic-search/semantic-search-asymmetric/#step-74-ingest-data). Make sure to edit the index so it matches yours.
129+
130+
After this step, you can set up your research pipeline and send in a request to make sure everything is working. The request can be found [here](https://docs.opensearch.org/latest/vector-search/ai-search/semantic-search/#step-4-search-the-index). Don’t forget to edit the request to include your own model ID.
131+
132+
## Conclusion
133+
134+
You now have everything you need to build your own RAG with OpenSearch® and Scalingo: automatic embedding generation and an ingestion pipeline. From here, simply add documents to your OpenSearch® index, and you’ll be able to run queries directly from the OpenSearch® dashboard.
135+
136+
Need more guidance on using OpenSearch® with Scalingo? [Reach out to their friendly team!](https://scalingo.com/book-a-demo?utm_source=devrel&utm_medium=partner-post&utm_campaign=opensearch&utm_content=tutorial)
449 KB
Loading
313 KB
Loading
428 KB
Loading
634 KB
Loading
145 KB
Loading
31 KB
Loading

0 commit comments

Comments
 (0)