From a14e02818bc9000a9c9e5baa9554424feece6dff Mon Sep 17 00:00:00 2001 From: stefanbaxter Date: Sun, 19 Apr 2026 19:15:16 +0000 Subject: [PATCH] docs: API instructions for discovery, metadata, and queries Consolidated reference for the /api/v1 surface: authentication (WorkOS, FraiOS, legacy Hasura), discovery, metadata, and query endpoints. Serves as the single source of truth for integrators building against the Synmetrix API. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/api-instructions.md | 384 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 384 insertions(+) create mode 100644 docs/api-instructions.md diff --git a/docs/api-instructions.md b/docs/api-instructions.md new file mode 100644 index 00000000..2da9ff55 --- /dev/null +++ b/docs/api-instructions.md @@ -0,0 +1,384 @@ +# Synmetrix API Instructions — Discovery, Metadata & Queries + +Base URL: `http://{host}:4000` + +--- + +## Authentication + +All endpoints accept a JWT Bearer token in the `Authorization` header. Three token types are supported: + +| Token Type | Algorithm | Detection | Verification | +|---|---|---|---| +| **WorkOS** | RS256 | JWT header `alg === "RS256"` | JWKS from `https://api.workos.com/sso/jwks/{WORKOS_CLIENT_ID}` | +| **FraiOS** | HS256 | Payload contains `accountId` claim | HMAC with `TOKEN_SECRET` env var | +| **Legacy Hasura** | HS256 | Fallback (not RS256, no `accountId`) | HMAC with `JWT_KEY` env var | + +WorkOS and FraiOS tokens trigger just-in-time user provisioning — the user is created in the system on first use. + +--- + +## Step 1: Discover Available Datasources + +### `GET /api/v1/discover` + +Returns all datasources and cubes visible to the authenticated user. This is the entry point — call it first to get the IDs needed for all other endpoints. + +**Auth:** WorkOS or FraiOS token only. Legacy Hasura tokens are rejected. +**Headers:** `Authorization: Bearer ` (no datasource headers needed) + +**Response:** + +```json +{ + "datasources": [ + { + "id": "a1b2c3d4-...", + "name": "Production ClickHouse", + "db_type": "clickhouse", + "team_id": "e5f6g7h8-...", + "branch_id": "i9j0k1l2-...", + "version_id": "m3n4o5p6-...", + "cubes": [ + { "name": "Orders", "description": "All customer orders" }, + { "name": "Users", "description": null } + ] + } + ], + "usage": { ... } +} +``` + +The response fields map to headers required by subsequent endpoints: + +| Response field | Header to send | Required? | +|---|---|---| +| `id` | `x-hasura-datasource-id` | Yes | +| `branch_id` | `x-hasura-branch-id` | No (defaults to active branch) | +| `version_id` | `x-hasura-branch-version-id` | No (defaults to latest version) | + +**Partition filtering:** If the JWT contains a `partition` claim, only datasources from teams whose `settings.partition` matches are returned. + +**Errors:** + +| Status | Meaning | +|---|---| +| 403 | Missing auth, invalid token, or legacy Hasura token used | +| 500 | Internal error | + +--- + +## Step 2: Get Cube Metadata + +### `GET /api/v1/meta` + +Returns full metadata for all cubes in the selected datasource — dimensions, measures, segments, joins, and types. This is a Cube.js built-in endpoint. + +**Auth:** All three token types. +**Headers:** + +``` +Authorization: Bearer +x-hasura-datasource-id: +x-hasura-branch-id: (optional) +x-hasura-branch-version-id: (optional) +``` + +**Response:** + +```json +{ + "cubes": [ + { + "name": "Orders", + "title": "Orders", + "description": "All customer orders", + "measures": [ + { + "name": "Orders.count", + "title": "Orders Count", + "type": "number", + "aggType": "count", + "drillMembers": ["Orders.id", "Orders.createdAt"] + }, + { + "name": "Orders.totalRevenue", + "title": "Orders Total Revenue", + "type": "number", + "aggType": "sum" + } + ], + "dimensions": [ + { + "name": "Orders.id", + "title": "Orders Id", + "type": "number", + "primaryKey": true + }, + { + "name": "Orders.status", + "title": "Orders Status", + "type": "string" + }, + { + "name": "Orders.city", + "title": "Orders City", + "type": "string" + } + ], + "segments": [ + { + "name": "Orders.active", + "title": "Orders Active" + } + ], + "joins": {} + } + ] +} +``` + +Use this to build queries — every `measures`, `dimensions`, `timeDimensions`, and `segments` value in a load query must reference a `name` from this metadata. + +--- + +## Step 3: Run a Query + +### `POST /api/v1/load` + +Execute a Cube.js query against the selected datasource. + +**Auth:** All three token types. +**Headers:** Same as `/meta` (Authorization + datasource headers). + +**Request body:** + +```json +{ + "query": { + "measures": ["Orders.count", "Orders.totalRevenue"], + "dimensions": ["Orders.city"], + "timeDimensions": [ + { + "dimension": "Orders.createdAt", + "granularity": "month", + "dateRange": ["2024-01-01", "2024-12-31"] + } + ], + "filters": [ + { + "member": "Orders.status", + "operator": "equals", + "values": ["shipped"] + } + ], + "segments": ["Orders.active"], + "order": { "Orders.count": "desc" }, + "limit": 100, + "offset": 0 + } +} +``` + +Also available as `GET` with `query` as a URL-encoded JSON query parameter. + +#### Query Object Reference + +| Field | Type | Description | +|---|---|---| +| `measures` | `string[]` | Aggregated values to compute. References from `/meta`. | +| `dimensions` | `string[]` | Group-by columns. References from `/meta`. | +| `timeDimensions` | `object[]` | Time-based grouping with granularity and optional date range. | +| `filters` | `object[]` | Row-level filters applied before aggregation. | +| `segments` | `string[]` | Named filter presets defined in the cube model. | +| `order` | `object` | `{ "Member.name": "asc" | "desc" }` | +| `limit` | `number` | Max rows returned (default 10,000). | +| `offset` | `number` | Pagination offset. | + +#### Filter Operators + +| Operator | Applies to | Description | +|---|---|---| +| `equals` | all | Exact match (array of values = OR) | +| `notEquals` | all | Exclude values | +| `contains` | string | Case-insensitive substring match | +| `notContains` | string | Exclude substring matches | +| `startsWith` | string | Prefix match | +| `endsWith` | string | Suffix match | +| `gt` | number, time | Greater than | +| `gte` | number, time | Greater than or equal | +| `lt` | number, time | Less than | +| `lte` | number, time | Less than or equal | +| `set` | all | Value is not null | +| `notSet` | all | Value is null | +| `inDateRange` | time | Within date range `["start", "end"]` | +| `notInDateRange` | time | Outside date range | +| `beforeDate` | time | Before a date | +| `afterDate` | time | After a date | + +#### Time Dimension Granularities + +`second`, `minute`, `hour`, `day`, `week`, `month`, `quarter`, `year` + +#### Response (JSON, default) + +```json +{ + "query": { /* normalized query */ }, + "data": [ + { + "Orders.city": "London", + "Orders.count": 42, + "Orders.totalRevenue": 12500.00, + "Orders.createdAt.month": "2024-01-01T00:00:00.000" + } + ], + "annotation": { + "measures": { + "Orders.count": { "title": "Orders Count", "type": "number" }, + "Orders.totalRevenue": { "title": "Orders Total Revenue", "type": "number" } + }, + "dimensions": { + "Orders.city": { "title": "Orders City", "type": "string" } + }, + "timeDimensions": { + "Orders.createdAt.month": { "title": "Orders Created At", "type": "time" } + } + }, + "total": 42 +} +``` + +The `annotation` object provides display metadata for each column in the result — use it for labeling and type coercion. + +#### Async Polling Protocol + +Long-running queries return HTTP 200 with: + +```json +{ "error": "Continue wait" } +``` + +This is **not** an error. The client must retry the same request (same body, same headers). Cube.js processes the query in the background. Typically resolves within a few retries. Recommended retry interval: 1–2 seconds. + +#### Output Formats + +Add `format` to the request body or as a query parameter: + +| Format | Content-Type | Notes | +|---|---|---| +| `json` (default) | `application/json` | Standard response above | +| `csv` | `text/csv` | Streamed. Header row + data rows. Limit raised to `CUBEJS_DB_QUERY_LIMIT` (default 1M rows). | +| `arrow` | `application/vnd.apache.arrow.stream` | Apache Arrow IPC stream. 100K row safety limit on buffered path. Field mapping in `X-Synmetrix-Arrow-Field-Mapping` header (base64url JSON). | +| `jsonstat` | `application/json` | JSON-stat dataset format for statistical tools. | + +**Errors:** + +| Status | Meaning | +|---|---| +| 400 | Invalid query or unsupported format | +| 403 | Access denied (user lacks permission to queried members) | +| 500 | Query execution failure | + +--- + +## Step 4 (optional): Preview Generated SQL + +### `GET /api/v1/sql` + +Returns the SQL that Cube.js would generate for a query, without executing it. Useful for debugging or auditing. + +**Auth:** All three token types. +**Headers:** Same as `/meta`. +**Query parameter:** `query` — URL-encoded Cube.js query JSON (same format as `/load`). + +**Response:** + +```json +{ + "sql": { + "sql": ["SELECT city, count(*) FROM orders WHERE status = ? GROUP BY 1", ["shipped"]], + "order": [{ "id": "Orders.count", "desc": true }] + } +} +``` + +--- + +## Step 5 (optional): Run Raw SQL + +### `POST /api/v1/run-sql` + +Execute arbitrary SQL directly against the datasource. Useful for ad-hoc exploration. + +**Auth:** All three token types. +**Headers:** Same as `/meta`. + +**Request body:** + +```json +{ + "query": "SELECT city, count(*) as cnt FROM orders GROUP BY city ORDER BY cnt DESC LIMIT 10", + "format": "json" +} +``` + +**Response (JSON):** + +```json +[ + { "city": "London", "cnt": "42" }, + { "city": "Berlin", "cnt": "31" } +] +``` + +Supports `format`: `json`, `csv`, `arrow`, `jsonstat` (same as `/load`). + +**Access control:** When query rewrite rules are active for the datasource, freeform SQL is blocked unless the query is HMAC-signed (`sql_signature` field, SHA-256 hex digest using `JWT_KEY`). Signed queries come from the governed SQL generation pipeline. + +**Errors:** + +| Status | Body | Meaning | +|---|---|---| +| 400 | `{ "code": "query_missing" }` | No SQL provided | +| 403 | `{ "code": "sql_api_blocked" }` | Rules active, unsigned SQL | +| 500 | `{ "code": "run_sql_failed" }` | Execution error | + +--- + +## Complete Flow Example + +``` +1. GET /api/v1/discover + Authorization: Bearer + + → Pick a datasource from response, note id, branch_id, version_id + +2. GET /api/v1/meta + Authorization: Bearer + x-hasura-datasource-id: + + → Browse cubes, pick measures and dimensions to query + +3. POST /api/v1/load + Authorization: Bearer + x-hasura-datasource-id: + Content-Type: application/json + + { "query": { "measures": ["Orders.count"], "dimensions": ["Orders.city"] } } + + → If response is { "error": "Continue wait" }, retry after 1-2s + → Otherwise, data is in response.data[] +``` + +--- + +## Access Control + +Two layers of access control may restrict query results: + +1. **Query rewrite rules** (applies to all roles): Row-level filters injected based on team/member properties. If a required filter dimension doesn't exist on a cube, the query is blocked entirely. + +2. **Field-level access list** (non-owner/non-admin roles): Each queried dimension/measure must appear in the user's `access_list.config.datasources[id].cubes` config. Missing fields are rejected with 403. + +Both are transparent to the caller — queries either succeed with filtered data or fail with 403.