Skip to content

Commit f3fd016

Browse files
authored
DOC-783 | Graph analytics on the ArangoDB Platform (#776)
* Properly prevent text wrapping in kbd shortcode * Avoid whitespace between HTML elements in the admonition shortcodes The indentation can otherwise break the layout of the entire page if an admonition is nested in a tab * WIP: Graph Analytics on the ArangoDB Platform * Feedback * Graph Analytics still not working on Google Cloud AFAICT * AG: Explain that engine ID = id attribute
1 parent fb2ee05 commit f3fd016

File tree

9 files changed

+401
-82
lines changed

9 files changed

+401
-82
lines changed

site/content/3.12/graphs/graph-analytics.md

Lines changed: 176 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ description: |
88
aliases:
99
- ../data-science/graph-analytics
1010
---
11+
{{< tag "ArangoDB Platform" "ArangoGraph" >}}
12+
1113
Graph analytics is a branch of data science that deals with analyzing information
1214
networks known as graphs, and extracting information from the data relationships.
1315
It ranges from basic measures that characterize graphs, over PageRank, to complex
@@ -16,12 +18,13 @@ and network flow analysis.
1618

1719
ArangoDB offers a feature for running algorithms on your graph data,
1820
called Graph Analytics Engines (GAEs). It is available on request for the
19-
[ArangoGraph Insights Platform](https://dashboard.arangodb.cloud/home?utm_source=docs&utm_medium=cluster_pages&utm_campaign=docs_traffic).
21+
[ArangoGraph Insights Platform](https://dashboard.arangodb.cloud/home?utm_source=docs&utm_medium=cluster_pages&utm_campaign=docs_traffic)
22+
and included in the [ArangoDB Platform](../components/platform.md).
2023

2124
Key features:
2225

2326
- **Separation of storage and compute**: GAEs are a solution that lets you run
24-
graph analytics independent of your ArangoDB deployments on dedicated machines
27+
graph analytics independent of your ArangoDB Core, including on dedicated machines
2528
optimized for compute tasks. This separation of OLAP and OLTP workloads avoids
2629
affecting the performance of the transaction-oriented database systems.
2730

@@ -37,6 +40,26 @@ Key features:
3740
The following lists outlines how you can use Graph Analytics Engines (GAEs).
3841
How to perform the steps is detailed in the subsequent sections.
3942

43+
{{< tabs "platforms" >}}
44+
45+
{{< tab "ArangoDB Platform" >}}
46+
1. Determine the approximate size of the data that you will load into the GAE
47+
and ensure the machine to run the engine on has sufficient memory. The data as well as the
48+
temporarily needed space for computations and results needs to fit in memory.
49+
2. [Start a `graphanalytics` service](#start-a-graphanalytics-service) via the GenAI service
50+
that manages various Platform components for graph intelligence and machine learning.
51+
It only takes a few seconds until the engine service can be used. The engine
52+
runs adjacent to the pods of the ArangoDB Core.
53+
3. [Load graph data](#load-data) from the ArangoDB Core into the engine. You can load
54+
named graphs or sets of node and edge collections. This loads the edge
55+
information and a configurable subset of the node attributes.
56+
4. [Run graph algorithms](#run-algorithms) on the data. You only need to load the data once per
57+
engine and can then run various algorithms with different settings.
58+
5. [Write the computation results back](#store-job-results) to the ArangoDB Core.
59+
6. [Stop the engine service](#stop-a-graphanalytics-service) once you are done.
60+
{{< /tab >}}
61+
62+
{{< tab "ArangoGraph Insights Platform" >}}
4063
{{< info >}}
4164
Before you can use Graph Analytics Engines, you need to request the feature
4265
via __Request help__ in the ArangoGraph dashboard for a deployment.
@@ -59,9 +82,28 @@ Single server deployments using ArangoDB version 3.11 are not supported.
5982
engine and can then run various algorithms with different settings.
6083
5. Write the computation results back to ArangoDB.
6184
6. Delete the engine once you are done.
85+
{{< /tab >}}
86+
87+
{{< /tabs >}}
6288

6389
## Authentication
6490

91+
{{< tabs "platforms" >}}
92+
93+
{{< tab "ArangoDB Platform" >}}
94+
You can use any of the available authentication methods the ArangoDB Platform
95+
supports to start and stop `graphanalytics` services via the GenAI service as
96+
well as to authenticate requests to the [Engine API](#engine-api).
97+
98+
- HTTP Basic Authentication
99+
- Access tokens
100+
- JWT session tokens
101+
<!-- TODO
102+
- Single Sign-On (SSO)
103+
-->
104+
{{< /tab >}}
105+
106+
{{< tab "ArangoGraph Insights Platform" >}}
65107
The [Management API](#management-api) for deploying and deleting engines requires
66108
an ArangoGraph **API key**. See
67109
[Generating an API Key](../arangograph/api/get-started.md#generating-an-api-key)
@@ -81,18 +123,74 @@ setting in ArangoGraph:
81123
These session tokens need to be renewed every hour by default. See
82124
[HTTP API Authentication](../develop/http-api/authentication.md#jwt-user-tokens)
83125
for details.
126+
{{< /tab >}}
84127

85-
## Management API
128+
{{< /tabs >}}
86129

87-
You can save an ArangoGraph access token created with `oasisctl login` in a
88-
variable to ease scripting. Note that this should be the token string only and
89-
not include quote marks. The following examples assume Bash as the shell and
90-
that the `curl` and `jq` commands are available.
130+
## Start and stop Graph Analytics Engines
91131

92-
```bash
93-
ARANGO_GRAPH_TOKEN="$(oasisctl login --key-id "<AG_KEY_ID>" --key-secret "<AG_KEY_SECRET>")"
132+
The interface for managing the engines depends on the environment you use:
133+
134+
- **ArangoDB Platform**: [GenAI service](#genai-service)
135+
- **ArangoGraph**: [Management API](#management-api)
136+
137+
### GenAI service
138+
139+
{{< tag "ArangoDB Platform" >}}
140+
141+
GAEs are deployed and deleted via the [GenAI service](../data-science/graphrag/services/gen-ai.md)
142+
in the ArangoDB Platform.
143+
144+
If you use cURL, you need to use the `-k` / `--insecure` option for requests
145+
if the Platform deployment uses a self-signed certificate (default).
146+
147+
#### Start a `graphanalytics` service
148+
149+
`POST <ENGINE_URL>/gen-ai/v1/graphanalytics`
150+
151+
Start a GAE via the GenAI service with an empty request body:
152+
153+
```sh
154+
# Example with a JWT session token
155+
ADB_TOKEN=$(curl -sSk -d '{"username":"root", "password": ""}' -X POST https://127.0.0.1:8529/_open/auth | jq -r .jwt)
156+
157+
Service=$(curl -sSk -H "Authorization: bearer $ADB_TOKEN" -X POST https://127.0.0.1:8529/gen-ai/v1/graphanalytics)
158+
ServiceID=$(echo "$Service" | jq -r ".serviceInfo.serviceId")
159+
if [[ "$ServiceID" == "null" ]]; then
160+
echo "Error starting gral engine"
161+
else
162+
echo "Engine started successfully"
163+
fi
164+
echo "$Service" | jq
165+
```
166+
167+
#### List the services
168+
169+
`POST <ENGINE_URL>/gen-ai/v1/list_services`
170+
171+
You can list all running services managed by the GenAI service, including the
172+
`graphanalytics` services:
173+
174+
```sh
175+
curl -sSk -H "Authorization: bearer $ADB_TOKEN" -X POST https://127.0.0.1:8529/gen-ai/v1/list_services | jq
94176
```
95177

178+
#### Stop a `graphanalytics` service
179+
180+
Delete the desired engine via the GenAI service using the service ID:
181+
182+
```sh
183+
curl -sSk -H "Authorization: bearer $ADB_TOKEN" -X DELETE https://127.0.0.1:8529/gen-ai/v1/service/$ServiceID | jq
184+
```
185+
186+
### Management API
187+
188+
{{< tag "ArangoGraph" >}}
189+
190+
GAEs are deployed and deleted with the Management API for graph analytics on the
191+
ArangoGraph Insights Platform. You can also list the available engine sizes and
192+
get information about deployed engines.
193+
96194
To determine the base URL of the management API, use the ArangoGraph dashboard
97195
and copy the __APPLICATION ENDPOINT__ of the deployment that holds the graph data
98196
you want to analyze. Replace the port with `8829` and append
@@ -111,15 +209,24 @@ To authenticate requests, you need to use the following HTTP header:
111209
Authorization: bearer <ARANGO_GRAPH_TOKEN>
112210
```
113211

114-
For example, with cURL and using the token variable:
212+
You can create an ArangoGraph access token with `oasisctl login`. Save it in a
213+
variable to ease scripting. Note that this should be the token string only and
214+
not include quote marks. The following examples assume Bash as the shell and
215+
that the `curl` and `jq` commands are available.
216+
217+
```bash
218+
ARANGO_GRAPH_TOKEN="$(oasisctl login --key-id "<AG_KEY_ID>" --key-secret "<AG_KEY_SECRET>")"
219+
```
220+
221+
Example with cURL that uses the token variable:
115222

116223
```bash
117224
curl -H "Authorization: bearer $ARANGO_GRAPH_TOKEN" "$BASE_URL/api-version"
118225
```
119226

120227
Request and response payloads are JSON-encoded in the management API.
121228

122-
### Get the API version
229+
#### Get the API version
123230

124231
`GET <BASE_URL>/api-version`
125232

@@ -129,7 +236,7 @@ Retrieve the version information of the management API.
129236
curl -H "Authorization: bearer $ARANGO_GRAPH_TOKEN" "$BASE_URL/api-version"
130237
```
131238

132-
### List engine sizes
239+
#### List engine sizes
133240

134241
`GET <BASE_URL>/enginesizes`
135242

@@ -140,7 +247,7 @@ and the size of the RAM, starting at 1 CPU and 4 GiB of memory (`e4`).
140247
curl -H "Authorization: bearer $ARANGO_GRAPH_TOKEN" "$BASE_URL/enginesizes"
141248
```
142249

143-
### List engine types
250+
#### List engine types
144251

145252
`GET <BASE_URL>/enginetypes`
146253

@@ -151,28 +258,32 @@ called `gral`.
151258
curl -H "Authorization: bearer $ARANGO_GRAPH_TOKEN" "$BASE_URL/enginetypes"
152259
```
153260

154-
### Deploy an engine
261+
#### Deploy an engine
155262

156263
`POST <BASE_URL>/engines`
157264

158265
Set up a GAE adjacent to the ArangoGraph deployment, for example, using an
159266
engine size of `e4`.
160267

268+
The engine ID is returned in the `id` attribute.
269+
161270
```bash
162271
curl -H "Authorization: bearer $ARANGO_GRAPH_TOKEN" -X POST -d '{"type_id":"gral","size_id":"e4"}' "$BASE_URL/engines"
163272
```
164273

165-
### List all engines
274+
#### List all engines
166275

167276
`GET <BASE_URL>/engines`
168277

169278
List all deployed GAEs of a ArangoGraph deployment.
170279

280+
The engine IDs are in the `id` attributes.
281+
171282
```bash
172283
curl -H "Authorization: bearer $ARANGO_GRAPH_TOKEN" "$BASE_URL/engines"
173284
```
174285

175-
### Get an engine
286+
#### Get an engine
176287

177288
`GET <BASE_URL>/engines/<ENGINE_ID>`
178289

@@ -183,7 +294,7 @@ ENGINE_ID="zYxWvU9876"
183294
curl -H "Authorization: bearer $ARANGO_GRAPH_TOKEN" "$BASE_URL/engines/$ENGINE_ID"
184295
```
185296

186-
### Delete an engine
297+
#### Delete an engine
187298

188299
`DELETE <BASE_URL>/engines/<ENGINE_ID>`
189300

@@ -196,11 +307,56 @@ curl -H "Authorization: bearer $ARANGO_GRAPH_TOKEN" -X DELETE "$BASE_URL/engines
196307

197308
## Engine API
198309

310+
### Determine the engine URL
311+
312+
{{< tabs "platforms" >}}
313+
314+
{{< tab "ArangoDB Platform" >}}
315+
To determine the base URL of the engine API, use the base URL of the Platform
316+
deployment and append `/gral/<SERVICE_ID>`, e.g.
317+
`https://127.0.0.1:8529/gral/arangodb-gral-tqcge`.
318+
319+
The service ID is returned by the call to the GenAI service for
320+
[starting the `graphanalytics` service](#start-a-graphanalytics-service).
321+
You can also list the service IDs like so:
322+
323+
```sh
324+
kubectl -n arangodb get svc arangodb-gral -o jsonpath="{.spec.selector.release}"
325+
```
326+
327+
Store the base URL in a variable called `ENGINE_URL`:
328+
329+
```bash
330+
ENGINE_URL='https://...'
331+
```
332+
333+
To authenticate requests, you need to use a bearer token in HTTP header:
334+
```
335+
Authorization: bearer <TOKEN>
336+
```
337+
338+
You can save the token in a variable to ease scripting. Note that this should be
339+
the token string only and not include quote marks. The following examples assume
340+
Bash as the shell and that the `curl` and `jq` commands are available.
341+
342+
An example of authenticating a request using cURL and a session token:
343+
344+
```bash
345+
PLATFORM_BASEURL="https://127.0.0.1:8529"
346+
347+
ADB_TOKEN=$(curl -X POST -d "{\"username\":\"<ADB_USER>\",\"password\":\"<ADB_PASS>\"}" "$PLATFORM_BASEURL/_open/auth" | jq -r '.jwt')
348+
349+
curl -H "Authorization: bearer $ADB_TOKEN" "$ENGINE_URL/v1/jobs"
350+
```
351+
{{< /tab >}}
352+
353+
{{< tab "ArangoGraph Insights Platform" >}}
199354
To determine the base URL of the engine API, use the ArangoGraph dashboard
200355
and copy the __APPLICATION ENDPOINT__ of the deployment that holds the graph data
201356
you want to analyze. Replace the port with `8829` and append
202357
`/graph-analytics/engines/<ENGINE_ID>`, e.g.
203358
`https://<123456abcdef>.arangodb.cloud:8829/graph-analytics/engines/zYxWvU9876`.
359+
If you can't remember the engine ID, you can [List all engines](#list-all-engines).
204360

205361
Store the base URL in a variable called `ENGINE_URL`:
206362

@@ -230,6 +386,9 @@ ADB_TOKEN=$(curl -X POST -d "{\"username\":\"<ADB_USER>\",\"password\":\"<ADB_PA
230386

231387
curl -H "Authorization: bearer $ADB_TOKEN" "$ENGINE_URL/v1/jobs"
232388
```
389+
{{< /tab >}}
390+
391+
{{< /tabs >}}
233392

234393
All requests to the engine API start jobs, each representing an operation.
235394
You can check the progress of operations and check if errors occurred.

0 commit comments

Comments
 (0)