Skip to content

feat: ArcadeDB hybrid adapter (graph + vector)#94

Open
lvca wants to merge 6 commits into
topoteretes:mainfrom
ArcadeData:feat/arcadedb-hybrid-adapter
Open

feat: ArcadeDB hybrid adapter (graph + vector)#94
lvca wants to merge 6 commits into
topoteretes:mainfrom
ArcadeData:feat/arcadedb-hybrid-adapter

Conversation

@lvca
Copy link
Copy Markdown
Contributor

@lvca lvca commented Apr 9, 2026

Summary

Upgrades the ArcadeDB adapter from graph-only to a hybrid adapter supporting both graph and vector operations in a single database:

  • Moved from packages/graph/arcadedb/ to packages/hybrid/arcadedb/ (same pattern as FalkorDB)
  • Implements both VectorDBInterface and GraphDBInterface in a single ArcadeDBAdapter class
  • Graph operations use Bolt protocol when available (auto-detected), with HTTP fallback
  • Vector operations use HTTP API with SQL for LSM_VECTOR indexes and vectorNeighbors()
  • neo4j is an optional dependency (pip install ...[bolt])
  • Added dataset database handlers for multi-tenant support

Why?

ArcadeDB natively supports vector search with HNSW indexes. Rather than requiring a separate vector database (Qdrant, Milvus, etc.), Cognee users can now use ArcadeDB as a unified graph+vector store, simplifying infrastructure and keeping embeddings co-located with the knowledge graph.

Implementation Details

  • Graph (Bolt): Binary protocol via neo4j async driver, auto-verified on first query
  • Graph (HTTP fallback): OpenCypher via HTTP API with parameterized queries
  • Vectors: stored as ARRAY_OF_FLOATS properties on vertices
  • Index: CREATE INDEX ... LSM_VECTOR METADATA { dimensions: N, similarity: 'COSINE' }
  • Search: SELECT expand(vectorNeighbors('Type[prop]', [vec], k)) returns results sorted by distance
  • Upsert: node created via Cypher MERGE, vector properties set via SQL UPDATE

ArcadeDB Setup

# HTTP only (simplest)
docker run -p 2480:2480 -e JAVA_OPTS="-Darcadedb.server.rootPassword=pwd" arcadedata/arcadedb:latest

# With Bolt (recommended)
docker run -p 2480:2480 -p 7687:7687 \
  -e JAVA_OPTS="-Darcadedb.server.rootPassword=pwd \
     -Darcadedb.server.plugins=Bolt:com.arcadedb.bolt.BoltProtocolPlugin" \
  arcadedata/arcadedb:latest

Test plan

All tests run against ArcadeDB 26.4.x in Docker:

  • Bolt transport: graph CRUD + vector ops (Bolt for Cypher, HTTP for SQL)
  • HTTP transport: graph CRUD + vector ops (HTTP for everything)
  • Bolt fallback: Bolt failure auto-falls back to HTTP mid-session
  • No neo4j package: adapter works with HTTP-only when neo4j not installed
  • LSM_VECTOR index creation and vectorNeighbors() KNN search
  • Full hybrid flow (Cypher graph + SQL vectors + KNN search)
  • Cross-protocol operations (Cypher-created nodes + SQL vector UPDATE)
  • Dataset handler registration (graph + vector adapters verified with cognee 0.5.8)
  • Integration test simulating cognee.cognify() pipeline (entity creation, embedding, vector indexing, KNN search, graph traversal, batch search, retrieve, delete, prune)

…pter

ArcadeDB natively supports both graph traversal and vector search (HNSW indexes
with vectorNeighbors()). This moves the adapter from packages/graph/ to
packages/hybrid/ and implements VectorDBInterface alongside GraphDBInterface.

Graph operations continue to use Neo4j Bolt protocol with OpenCypher.
Vector operations use ArcadeDB's HTTP API for SQL-based vector indexing and
KNN search via the vectorNeighbors() function.
@lvca lvca marked this pull request as draft April 9, 2026 23:04
lvca added 3 commits April 9, 2026 19:18
Major changes based on live testing against ArcadeDB 26.x:

- Replaced neo4j Bolt driver with HTTP API for all operations
  (ArcadeDB does not expose Bolt protocol, only HTTP on port 2480)
- Graph operations now use Cypher via HTTP with parameterized queries
- Fixed vector property type: ARRAY_OF_FLOATS (not ARRAY OF FLOAT)
- Fixed index creation: LSM_VECTOR METADATA { dimensions: N,
  similarity: 'COSINE' } (not HNSW N)
- Fixed vector search: expand(vectorNeighbors('Type[prop]', [vec], k))
  which returns 'distance' field (lower = closer)
- Removed neo4j dependency, only aiohttp needed
- Replace timestamp() with Python-side millisecond timestamps ($now param)
  since ArcadeDB Cypher doesn't support the timestamp() function
- Fix serialize_properties to handle datetime, None, and list types
- Fix get_node/get_nodes/extract_nodes/retrieve to handle flat vertex
  responses (ArcadeDB HTTP Cypher returns vertices as flat dicts, not
  wrapped in {"node": {...}} like Bolt does)
- Fix get_edges/get_connections to use aliased scalar returns instead
  of full vertex returns

Tested with cognee 0.5.8 DataPoint model including all base fields
(created_at, updated_at, metadata, belongs_to_set, etc.)
Graph operations now use Neo4j Bolt protocol when available (requires
BoltProtocolPlugin on port 7687 and neo4j Python package), falling
back to HTTP API transparently if Bolt is not reachable.

- Bolt verification happens on first query; if it fails, all
  subsequent Cypher queries use HTTP for the session lifetime
- neo4j is an optional dependency: pip install ...[bolt]
- Vector operations always use HTTP API (SQL)
- Bolt Node objects are normalized to flat dicts for consistent
  response format across both transports

Tested all 4 scenarios against ArcadeDB 26.4.x:
1. Bolt transport (graph + vector ops)
2. HTTP transport (graph + vector ops)
3. Bolt failure with automatic HTTP fallback
4. Without neo4j package installed (HTTP only)
@lvca lvca marked this pull request as ready for review April 10, 2026 01:48
@lvca
Copy link
Copy Markdown
Contributor Author

lvca commented Apr 10, 2026

I've also added automatic switch between Bolt to HTTP (if ArcadeDB server started without the Bolt plugin).

@AKC777
Copy link
Copy Markdown
Contributor

AKC777 commented Apr 16, 2026

Couldn't do it?

Copy link
Copy Markdown
Contributor

@siillee siillee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @lvca, thanks for opening this PR and your contribution!

I tried running this locally, and I am stuck with the following error:
RuntimeError: ArcadeDB HTTP error (500): {"error":"Internal error","detail":"Database 'cognee' is not available","exception":"com.arcadedb.exception.DatabaseOperationException"}

Is this something on my side? I used the docker image that is in the README and tried to run the example, just to see if it works, but after a couple of small fixes I ran into this error and cannot seem to get past it.

Thanks in advance for your reply!

ArcadeDB does not create databases on connect, so the very first HTTP
or Bolt call against a fresh server failed with a 500 "Database
'<name>' is not available". The adapter now POSTs a server-level
`create database` command lazily (once per session) and treats
"already exists" responses as success, removing a setup step from
the example in the README.
@lvca
Copy link
Copy Markdown
Contributor Author

lvca commented Apr 26, 2026

Hi @siillee — thanks for the careful test and sorry for the friction. The error you hit (Database 'cognee' is not available) is the right one to flag: ArcadeDB does not auto-create databases on connect, and the README/example didn't tell you to create it manually. That's a setup papercut every first-time user would hit, so I've fixed it in the adapter rather than in the docs.

Just pushed 4ebcc6b — the adapter now lazily POSTs create database <name> to /api/v1/server on the first HTTP/Bolt call (once per session, idempotent: "already exists" responses are swallowed). Smoke-tested against a clean arcadedata/arcadedb:latest container:

  • 1st call: {"result":"ok"} HTTP 200 → flag set, no retry
  • 2nd call: Database 'cognee' already exists HTTP 400 → flag set, swallowed
  • Subsequent SQL/Cypher: works against the freshly created DB

Could you re-run the example on top of the new commit? With a stock Docker container and the README's config, it should now go end-to-end without any manual curl step. Happy to push more changes if you hit anything else.

@AKC777 — fix is in, give it another spin and let us know.

@siillee
Copy link
Copy Markdown
Contributor

siillee commented Apr 27, 2026

Hi @siillee — thanks for the careful test and sorry for the friction. The error you hit (Database 'cognee' is not available) is the right one to flag: ArcadeDB does not auto-create databases on connect, and the README/example didn't tell you to create it manually. That's a setup papercut every first-time user would hit, so I've fixed it in the adapter rather than in the docs.

Just pushed 4ebcc6b — the adapter now lazily POSTs create database <name> to /api/v1/server on the first HTTP/Bolt call (once per session, idempotent: "already exists" responses are swallowed). Smoke-tested against a clean arcadedata/arcadedb:latest container:

* 1st call: `{"result":"ok"}` HTTP 200 → flag set, no retry

* 2nd call: `Database 'cognee' already exists` HTTP 400 → flag set, swallowed

* Subsequent SQL/Cypher: works against the freshly created DB

Could you re-run the example on top of the new commit? With a stock Docker container and the README's config, it should now go end-to-end without any manual curl step. Happy to push more changes if you hit anything else.

@AKC777 — fix is in, give it another spin and let us know.

Hey @lvca, thanks for the contribution!

I tried to run the example again, now this is the error I get, which has almost zero information about what could be incorrect, other than the 401 code:
RuntimeError: ArcadeDB create database failed (401): {"error":"","detail":""}
Again, not sure if it something on my end or not. I am sure I used the correct username and password. I see the logs that I created the database and, on subsequent calls, that the database exists, but it still fails.

Calling POST /api/v1/server requires server-level (root) credentials.
Users connecting with a non-root account to a pre-created database
were therefore hitting "ArcadeDB create database failed (401)" even
though the database already existed and they could query it fine.

The adapter now first issues GET /api/v1/exists/<db>, which any
authenticated database user can call. The server-level create
endpoint is only invoked when the database is genuinely missing,
and a 401/403 there now produces an actionable error message.
@lvca
Copy link
Copy Markdown
Contributor Author

lvca commented Apr 27, 2026

Hi @siillee — sorry, that one's on me. The issue is that POST /api/v1/server is a server-level endpoint and requires root (server-admin) credentials. If your user has database-level access (enough to query the cognee database, which is why you saw it created and visible) but is not a server admin, ArcadeDB returns 401 with an empty body for that POST. My v1 fix called the create endpoint unconditionally, so it tripped on your setup.

Just pushed 2b0b9f7. New behaviour:

  1. First, GET /api/v1/exists/<db> — any authenticated database user can call this.
  2. If the result is true, the adapter is done — no server-level call is made.
  3. Only if the database is missing does it fall through to POST /api/v1/server create database <db>.
  4. If that fall-through then returns 401/403, you now get an actionable error: "... does not exist and the supplied credentials lack the server-level privileges required to create it. Either create the database manually or supply root credentials."

Smoke-tested all four paths against arcadedata/arcadedb:latest:

  • exists=true → skip create entirely ✓
  • exists=false → create succeeds with root ✓
  • wrong password on /api/v1/exists → clear auth error ✓
  • create fails with 401/403 → actionable error message ✓

Could you re-run the example one more time? In your case the database already exists, so step 2 should short-circuit and you'll never hit the server endpoint.

@AKC777
Copy link
Copy Markdown
Contributor

AKC777 commented May 7, 2026

│ Hi @lvca 👋

│ Quick heads-up — I've been running your adapter in production since April and it's been solid. However, after Cognee upgraded to 1.0+, some things broke (new get_neighborhood() abstract method, vectors not being populated by index_data_points, ArcadeDB 26.4 changing Cypher type casing to PascalCase).

│ I've fixed these issues and opened a new PR that builds on top of your work: #104

│ Key changes:
│ - index_data_points() now delegates to create_data_points() — without this, vectors were never stored on nodes, making vector search silently return empty
│ - get_neighborhood() implementation for Cognee 1.0+ GraphDBInterface
│ - Auto-detection of Cypher type casing (vertex vs Vertex) across ArcadeDB versions
│ - Retry on HTTP 503 ConcurrentModificationException (ArcadeDB's optimistic concurrency)

│ Thank you for building ArcadeDB and for this PR. The idea of a single engine doing both graph + vector is exactly what we needed — eliminated a whole separate vector DB from our stack. 🙏

@lvca
Copy link
Copy Markdown
Contributor Author

lvca commented May 7, 2026

@AKC777 Great! Let me know if you need my help ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants