Skip to content

Commit 7875cf6

Browse files
committed
update docs
1 parent 8e4f8e4 commit 7875cf6

19 files changed

+672
-222
lines changed

docs/pages/apis.mdx

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# API Examples
2-
The repository already includes API pipeline manifest definitions for generating knowledge bases from several REST APIs. Each demonstrates how to define a YAML manifest for extracting data from target API endpoints using different Authentication/Pagination strategies. For a more in-depth review of how to build a manifest for creating a RAG pipeline for your own API, visit [Defining the API Pipeline Manifest](/manifest-definition).
2+
The repository already includes a few API pipeline manifest definitions that showcase how to use the `rag-api-pipeline` for generating knowledge bases from REST APIs.
3+
Each example demonstrates how to define a YAML manifest for extracting data from target API endpoints using different Authentication/Pagination strategies.
4+
For a more in-depth review of how to build a manifest for creating a RAG pipeline for your own API, remember to visit [Defining the API Pipeline Manifest](/manifest-definition) section.
35

46
## Boardroom Governance API
57
[Boardroom](https://boardroom.io/) offers its `Boardrooms Governance API` to provide comprehensive data on 350+ DAOs across chains. It offers endpoints that fetch information about proposals, delegates, discussions, and much more. You can find the complete API documentation at this [link](https://docs.boardroom.io/docs/api/cd5e0c8aa2bc1-overview).
@@ -12,4 +14,4 @@ The [Agora](https://www.agora.xyz/#Product) OP API provides various endpoints to
1214
Check the [Agora API](/apis/agora-api) section for details on how to extract data from the API and generate a knowledge base related to RetroPGF projects and proposals within the OP collective.
1315

1416
## Working with Other APIs
15-
If you are interested in working with any other API, visit the [API Examples](/apis/other-api-sources) section to get started.
17+
If you are interested in working with any other API, visit the [Other API Sources](/apis/other-api-sources) section to get started.

docs/pages/apis/agora-api.mdx

Lines changed: 91 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,65 @@
11
# Optimism Agora API
22

3-
This repository contains the [OpenAPI specification](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/agora_openapi.yaml) and [API pipeline manifest](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/agora_api_pipeline.yaml) needed to create a RAG pipeline. This pipeline generates a knowledge base from RetroPGF projects and proposals within the OP collective. These files are typically located in the `config` folder.
3+
This repository contains the [OpenAPI specification](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/agora_openapi.yaml) and [API pipeline manifest](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/agora_api_pipeline.yaml) needed to create a RAG pipeline.
4+
This pipeline generates a knowledge base from RetroPGF projects and proposals within the OP collective.
45

5-
To access this API, you'll need an API key. You can request one through [Agora's Discord server](https://www.agora.xyz/#Product). Once obtained, store the key in `config/secrets/api-key` or provide it directly using the `--api-key` CLI argument.
6+
## Pre-requisites
67

7-
## API Pipeline Manifest - Overview
8+
To access this API, you'll need an API key. You can request one through the [Agora's Discord server](https://www.agora.xyz/#Product). You can run the `rag-api-pipeline setup` command to set the REST API Key,
9+
or your can directly store the key in the `config/secrets/api-key` file. A less secure option is to provide it using the `--api-key` CLI argument.
810

9-
The API pipeline extracts data from the `/proposals` and `/projects` [endpoints](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/agora_api_pipeline.yaml#L79). Since no `api_parameters` are required, this section remains [empty](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/agora_api_pipeline.yaml#L5).
11+
## Getting the Agora API OpenAPI Spec
1012

11-
Below is the requester definition. The API implements a BearerAuthenticator schema and retrieves the `api_token` from the `config` object:
13+
TODO:
1214

13-
```yaml
14-
# agora_api_pipeline.yaml
15+
## Defining the RAG API Pipeline Manifest
16+
17+
This pipeline will extract data related to DAO proposals (`/proposals`) and RetroPGF projects (`/projects`).
18+
Next, you can find an overview of the main sections in the API pipeline manifest.
19+
20+
### Basic Configuration
21+
22+
Since no `api_parameters` are required, this section remains empty.
23+
24+
```yaml [agora_api_pipeline.yaml]
25+
api_name: "optimism_agora_api"
26+
27+
api_parameters:
28+
29+
api_config:
30+
request_method: "get"
31+
content_type: "application/json"
32+
response_entrypoint_field: "data"
33+
```
34+
35+
### Connector Specification
36+
37+
The manifest then defines some metadata and the request parameters needed for making calls to the API. In this case, it only needs an `api_key`
38+
parameter for authentication:
39+
40+
```yaml [agora_api_pipeline.yaml]
41+
spec:
42+
connection_specification:
43+
$schema: http://json-schema.org/draft-07/schema#
44+
additionalProperties: true
45+
properties:
46+
api_key:
47+
airbyte-secret: true
48+
description: Agora API Key.
49+
type: string
50+
required:
51+
- api_key
52+
title: Agora API Spec
53+
type: object
54+
documentation_url: https://docs.airbyte.com/integrations/sources/agora
55+
type: Spec
56+
```
57+
58+
### API Request Configuration
59+
60+
Below is the `requester_base` definition. The API implements a BearerAuthenticator schema and retrieves the `api_token` from the `config` object:
61+
62+
```yaml [agora_api_pipeline.yaml]
1563
definition:
1664
requester_base:
1765
type: HttpRequester
@@ -22,10 +70,11 @@ definition:
2270
api_token: "{{ config['api_key'] }}"
2371
```
2472

73+
### Record Selection and Pagination
74+
2575
The API uses an Offset-based pagination strategy. The `page_size` is set to 50, while `offset` and `limit` parameters are dynamically inserted into the URL as request parameters:
2676

27-
```yaml
28-
# agora_api_pipeline.yaml
77+
```yaml [agora_api_pipeline.yaml]
2978
definition:
3079
paginator: # Details at https://docs.airbyte.com/connector-development/config-based/understanding-the-yaml-file/pagination
3180
type: DefaultPaginator
@@ -42,12 +91,41 @@ definition:
4291
field_name: "limit"
4392
```
4493

45-
## Generating a Knowledge Base Using the `rag-api-pipeline` CLI
94+
### Endpoint Configuration
95+
96+
Below are the target endpoints with their respective schemas:
4697

47-
Before running the `run-all` command, ensure that `Ollama` is running locally with your preferred LLM embeddings model:
98+
```yaml [agora_api_pipeline.yaml]
99+
endpoints:
100+
/proposals:
101+
id: "proposals"
102+
primary_key: "id"
103+
responseSchema: "#/schemas/Proposal"
104+
textSchema:
105+
$ref: "#/textSchemas/Proposal"
106+
/projects:
107+
id: "projects"
108+
primary_key: "id"
109+
responseSchema: "#/schemas/Project"
110+
textSchema:
111+
$ref: "#/textSchemas/Project"
112+
```
113+
114+
## Using the RAG Pipeline to generate a Knowledge Base for the OP Collective
115+
116+
### RAG Pipeline CLI
117+
118+
1. Make sure to setup the pipeline initial settings by running the `rag-api-pipeline setup` command.
119+
2. Execute the following command:
48120

49121
```bash
50-
poetry run rag-api-pipeline run-all config/agora_api_pipeline.yaml --openapi-spec-file config/agora_openapi.yaml --llm-provider ollama
122+
rag-api-pipeline run all config/agora_api_pipeline.yaml config/agora_openapi.yaml
51123
```
52124

53-
After execution, you'll find a compressed knowledge base snapshot in `{OUTPUT_FOLDER}/optimism_agora_api/` named `optimism_agora_api_collection-xxxxxxxxxxxxxxxx-yyyy-mm-dd-hh-mm-ss.snapshot.tar.gz`. For instructions on importing this into your Gaianet node, refer to the documentation on [selecting a knowledge base](https://docs.gaianet.ai/node-guide/customize#select-a-knowledge-base). Find recommended prompts and node configuration settings [here](/cli/node-deployment#recommended-gaianet-node-configuration).
125+
After execution, you'll find the processed data and compressed knowledge base snapshot in the `output/optimism_agora_api` folder.
126+
127+
### Import the KB Snapshot into a Gaianet Node
128+
129+
1. Locate the generated snapshot in `output/optimism_agora_api/` (named `optimism_agora_api_collection-xxxxxxxxxxxxxxxx-yyyy-mm-dd-hh-mm-ss.snapshot.tar.gz`) or download it from the HuggingFace link above.
130+
2. Follow the official [knowledge base selection guide](https://docs.gaianet.ai/node-guide/customize#select-a-knowledge-base)
131+
3. Configure your node using the recommended settings from the [node deployment guide](/cli/node-deployment#recommended-gaianet-node-configuration)

docs/pages/apis/boardroom-api.mdx

Lines changed: 59 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,26 @@
11
# Boardroom Governance API
22

3-
This repository contains the [OpenAPI specification](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/boardroom_openapi.yaml) and the [API pipeline manifest](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/boardroom_api_pipeline.yaml) needed to create a RAG pipeline. This pipeline generates a knowledge base from any DAO/Protocol hosted by the Boardroom Governance API. All configuration files are located in the `config` folder.
3+
The repository already contains the [OpenAPI specification](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/boardroom_openapi.yaml) and the [API pipeline manifest](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/boardroom_api_pipeline.yaml) needed to create a RAG API pipeline.
4+
This pipeline generates a knowledge base from any DAO/Protocol hosted by the Boardroom Governance API.
45

5-
## Prerequisites
6+
## Pre-requisites
67

7-
To use this API, you'll need an API key. Request one from [Boardroom's developer portal](https://boardroom.io/developers/billing). Store the key in `config/secrets/api-key` or provide it directly using the `--api-key` CLI argument.
8+
To use this API, you'll need an API key. Request one from [Boardroom's developer portal](https://boardroom.io/developers/billing). You can run the `rag-api-pipeline setup` command to set the REST API Key,
9+
or your can directly store the key in the `config/secrets/api-key` file. A less secure option is to provide it using the `--api-key` CLI argument.
810

9-
## API Pipeline Manifest Overview
11+
## Getting the Boardroom API OpenAPI Spec
12+
13+
TODO:
14+
15+
## Defining the RAG API Pipeline Manifest
16+
17+
This pipeline will extract data related to protocol metadata (`/protocols/aave`), DAO proposals (`/protocols/aave/proposals`) and discussion posts from the Discourse forum site (`discourseTopics`, `discourseCategories` and `discourseTopicPosts`) if there's any.
1018

1119
### Basic Configuration
1220

13-
The manifest begins with defining the API name and parameters. This example uses the [Aave Governance DAO](https://boardroom.io/aave/insights):
21+
The manifest starts by defining the API name, parameters and requests settings. You can visit this [link](https://docs.boardroom.io/docs/api/5b445a81af241-get-all-protocols) to get the list of all DAO protocols in Boardroom. This example focuses on the [Aave Governance DAO](https://boardroom.io/aave/insights):
1422

15-
```yaml
23+
```yaml [boardroom_api_pipeline.yaml]
1624
api_name: "aave_boardroom_api"
1725

1826
api_parameters:
@@ -27,9 +35,9 @@ api_config:
2735
2836
### Connector Specification
2937
30-
The manifest defines parameters required for API requests:
38+
The manifest then defines some metadata and the request parameters needed for making calls to the API:
3139
32-
```yaml
40+
```yaml [boardroom_api_pipeline.yaml]
3341
spec:
3442
type: Spec
3543
documentation_url: https://docs.airbyte.com/integrations/sources/boardroom
@@ -63,9 +71,9 @@ spec:
6371
6472
### API Request Configuration
6573
66-
The `requester_base` defines how to interact with the API:
74+
Then, the `requester_base` defines the how connector should make requests to the API. Here, an `ApiKeyAuthenticator` schema is required and gets the `api_token` value from the `config` object:
6775

68-
```yaml
76+
```yaml [boardroom_api_pipeline.yaml]
6977
definitions:
7078
requester_base:
7179
type: HttpRequester
@@ -82,9 +90,9 @@ definitions:
8290

8391
### Record Selection and Pagination
8492

85-
Data records are wrapped in the `data` field:
93+
Data records returned by the API are always wrapped in the `data` field, while pagination is handled using a Cursor-based approach:
8694

87-
```yaml
95+
```yaml [boardroom_api_pipeline.yaml]
8896
definitions:
8997
selector:
9098
type: RecordSelector
@@ -106,9 +114,9 @@ definitions:
106114

107115
### Endpoint Configuration
108116

109-
Define endpoints with their respective schemas. Example for proposals endpoint:
117+
Now it's time to define the target endpoints with their respective schemas. Below is an example for the *proposals* endpoint:
110118

111-
```yaml
119+
```yaml [boardroom_api_pipeline.yaml]
112120
endpoints:
113121
"/protocols/{cname}/proposals":
114122
id: "proposals"
@@ -120,9 +128,9 @@ endpoints:
120128

121129
### Schema Definitions
122130

123-
The `responseSchema` defines the complete data structure:
131+
The `responseSchema` reference from above defines the complete *unwrappd* data schema that is returned by the API endpoint:
124132

125-
```yaml
133+
```yaml [boardroom_api_pipeline.yaml]
126134
schemas:
127135
Proposals:
128136
type: object
@@ -205,9 +213,10 @@ schemas:
205213
type: integer
206214
```
207215

208-
The `textSchema` specifies fields for text parsing. Note that all properties must be listed in the `responseSchema`. In this case, `title`, `content`, and `summary` will be parsed as texts, while other fields will be included as metadata properties in a JSON object:
216+
On the other hand, the endpoint's `textSchema` reference specifies the list of fields for text parsing. Note that all properties are also listed in the `responseSchema`.
217+
In this case, `title`, `content`, and `summary` will be parsed as texts, while other fields will be included as metadata properties in a JSON object:
209218

210-
```yaml
219+
```yaml [boardroom_api_pipeline.yaml]
211220
textSchemas:
212221
Proposal:
213222
type: object
@@ -222,9 +231,9 @@ textSchemas:
222231

223232
### Chunking Parameters
224233

225-
Configure text chunking behavior:
234+
This section set the settings to be used when applying text chunking to the extracted content:
226235

227-
```yaml
236+
```yaml [boardroom_api_pipeline.yaml]
228237
chunking_params:
229238
mode: "elements"
230239
chunking_strategy: "by_title"
@@ -237,25 +246,44 @@ chunking_params:
237246
multipage_sections: true
238247
```
239248

240-
## Usage Guide
249+
## Using the RAG Pipeline to generate a Knowledge Base for Aave
241250

242-
### Generating a Knowledge Base
251+
### RAG Pipeline CLI
243252

244-
1. Ensure `Ollama` is running locally with your preferred LLM embeddings model
245-
2. Run the following command:
253+
1. Make sure to setup the pipeline initial settings by running the `rag-api-pipeline setup` command.
254+
2. Execute the following command:
246255

247256
```bash
248-
poetry run rag-api-pipeline run-all config/boardroom_api_pipeline.yaml --openapi-spec-file config/openapi.yaml --llm-provider ollama
257+
rag-api-pipeline run all config/boardroom_api_pipeline.yaml config/boardroom_openapi.yaml
249258
```
250259

251-
The processed data and knowledge base snapshot for Aave are available on [Hugging Face](https://huggingface.co/datasets/uxman/aave_snapshot_boardroom/tree/main).
260+
The processed data and knowledge base snapshot for Aave will be available in the `output/aave_boardroom_api` folder. You can also find a public knowledge base snapshot on [Hugging Face](https://huggingface.co/datasets/uxman/aave_snapshot_boardroom/tree/main).
261+
262+
### Import the KB Snapshot into a Gaianet Node
263+
264+
1. Locate the generated snapshot in `output/aave_boardroom_api/` (named `aave_boardroom_api_collection-xxxxxxxxxxxxxxxx-yyyy-mm-dd-hh-mm-ss.snapshot.tar.gz`) or download it from the HuggingFace link above.
265+
2. Follow the official [knowledge base selection guide](https://docs.gaianet.ai/node-guide/customize#select-a-knowledge-base)
266+
3. Configure your node using the recommended settings from the [node deployment guide](/cli/node-deployment#recommended-gaianet-node-configuration)
267+
268+
Once the command above finishes, you'll find a compressed knowledge base snapshot in
269+
`{OUTPUT_FOLDER}/aave_boardroom_api/` with name aave_boardroom_api_collection-xxxxxxxxxxxxxxxx-yyyy-mm-dd-hh-mm-ss.snapshot.tar.gz`. Now it's time to import it
270+
into your gaianet node. You can find the instructions on how to select a knowledge base [here](https://docs.gaianet.ai/node-guide/customize#select-a-knowledge-base).
271+
The recommended prompts and node config settings can be found [here](/cli/node-deployment#recommended-gaianet-node-configuration).
272+
273+
### Example user prompts
274+
275+
- Asking what information the RAG bot is able to provide
276+
277+
![intro_prompt](https://raw.githubusercontent.com/raid-guild/gaianet-rag-api-pipeline/72403cc4503ce65da4e737eb8f68c03aa5772f44/aave_samples/intro.png)
278+
279+
- Asking for information about the proposal [Enable Metis as Collateral on the Metis Chain](https://boardroom.io/aave/proposal/cHJvcG9zYWw6YWF2ZTpvbmNoYWluLXVwZ3JhZGU6MTUy)
280+
281+
![proposal1_prompt](https://raw.githubusercontent.com/raid-guild/gaianet-rag-api-pipeline/72403cc4503ce65da4e737eb8f68c03aa5772f44/aave_samples/proposal1_summary.png)
252282

253-
### Importing into Gaianet Node
283+
- Asking for information about [Onboarding USDS and sUSDS to Aave v3](https://boardroom.io/aave/discussions/18987)
254284

255-
1. Locate the generated snapshot in `{OUTPUT_FOLDER}/aave_boardroom_api/` (named `aave_boardroom_api_collection-xxxxxxxxxxxxxxxx-yyyy-mm-dd-hh-mm-ss.snapshot.tar.gz`)
256-
2. Follow the [knowledge base selection guide](https://docs.gaianet.ai/node-guide/customize#select-a-knowledge-base)
257-
3. Configure using the recommended settings from the [node deployment guide](/cli/node-deployment#recommended-gaianet-node-configuration)
285+
![proposal1_prompt](https://raw.githubusercontent.com/raid-guild/gaianet-rag-api-pipeline/72403cc4503ce65da4e737eb8f68c03aa5772f44/aave_samples/proposal2_summary.png)
258286

259287
### Customizing for Other DAOs
260288

261-
To generate a knowledge base for a different DAO, modify the `api_name` and `api_parameters` in the [boardroom_api_pipeline.yaml](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/boardroom_api_pipeline.yaml) file.
289+
To generate a knowledge base for a different DAO, you just need to modify the `api_name` and `api_parameters` values in the [boardroom_api_pipeline.yaml](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/boardroom_api_pipeline.yaml) manifest file.

docs/pages/apis/other-api-sources.mdx

Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -16,16 +16,28 @@ Want to supercharge your RAG pipeline with different APIs? We've got you covered
1616
* Look for existing OpenAPI/Swagger specifications
1717

1818
### 2. Schema Setup
19-
* Use help from LLMs to create OpenAPI schemas
19+
* Do checkout this guide on [OpenAPI](https://docs.bump.sh/guides/openapi/specification/v3.1/introduction/what-is-openapi/)
20+
* Look for an official OpenAPI spec file from the API provider.
21+
* Use some help from LLMs to create OpenAPI schemas
2022
* Validate your schema:
2123
* New schemas: [Swagger Editor](https://editor.swagger.io/)
2224
* Existing specs: [Swagger Validator](https://validator.swagger.io/)
2325

24-
### 3. Test and Deploy
25-
* Test each endpoint thoroughly
26-
* Use AI assistance to fix validation issues
27-
* Connect everything to your RAG pipeline
28-
29-
Do checkout this guide on [OpenAPI](https://docs.bump.sh/guides/openapi/specification/v3.1/introduction/what-is-openapi/)
26+
### 3. Define the RAG API Pipeline manifest
27+
* Define the target endpoints and required request parameters
28+
* Get an API Key if needed.
29+
* Check out our [guide](/manifest-definition) or [API examples](/apis) for inspiration.
30+
31+
### 4. Test and Deploy
32+
* Setup the pipeline initial configuration by running the `rag-api-pipeline setup` command.
33+
* Test each endpoint thoroughly:
34+
* Run `rag-api-pipeline run all <API_MANIFEST_FILE> <OPENAPI_SPEC_FILE>` and check for any errors.
35+
* Comment other endpoints in the API manifest.
36+
* Use the `--normalized-only` CLI option and check results in the `output` folder.
37+
* Adjust data chunking parameter settings:
38+
* Use the `--chunked-only` CLI option and analyze results (e.g. using a Jupyter notebook)
39+
* If you want to include recent endpoint data, use the `--full-refresh` CLI option to cleanup the cache.
40+
* Use AI assistance to fix validation issues.
41+
* Connect everything to your RAG pipeline.
3042

3143
Still need help? Feel free reach out or open an issue on this repository!

0 commit comments

Comments
 (0)