feat: ArcadeDB hybrid adapter (graph + vector)#94
Conversation
…pter ArcadeDB natively supports both graph traversal and vector search (HNSW indexes with vectorNeighbors()). This moves the adapter from packages/graph/ to packages/hybrid/ and implements VectorDBInterface alongside GraphDBInterface. Graph operations continue to use Neo4j Bolt protocol with OpenCypher. Vector operations use ArcadeDB's HTTP API for SQL-based vector indexing and KNN search via the vectorNeighbors() function.
Major changes based on live testing against ArcadeDB 26.x:
- Replaced neo4j Bolt driver with HTTP API for all operations
(ArcadeDB does not expose Bolt protocol, only HTTP on port 2480)
- Graph operations now use Cypher via HTTP with parameterized queries
- Fixed vector property type: ARRAY_OF_FLOATS (not ARRAY OF FLOAT)
- Fixed index creation: LSM_VECTOR METADATA { dimensions: N,
similarity: 'COSINE' } (not HNSW N)
- Fixed vector search: expand(vectorNeighbors('Type[prop]', [vec], k))
which returns 'distance' field (lower = closer)
- Removed neo4j dependency, only aiohttp needed
- Replace timestamp() with Python-side millisecond timestamps ($now param)
since ArcadeDB Cypher doesn't support the timestamp() function
- Fix serialize_properties to handle datetime, None, and list types
- Fix get_node/get_nodes/extract_nodes/retrieve to handle flat vertex
responses (ArcadeDB HTTP Cypher returns vertices as flat dicts, not
wrapped in {"node": {...}} like Bolt does)
- Fix get_edges/get_connections to use aliased scalar returns instead
of full vertex returns
Tested with cognee 0.5.8 DataPoint model including all base fields
(created_at, updated_at, metadata, belongs_to_set, etc.)
Graph operations now use Neo4j Bolt protocol when available (requires BoltProtocolPlugin on port 7687 and neo4j Python package), falling back to HTTP API transparently if Bolt is not reachable. - Bolt verification happens on first query; if it fails, all subsequent Cypher queries use HTTP for the session lifetime - neo4j is an optional dependency: pip install ...[bolt] - Vector operations always use HTTP API (SQL) - Bolt Node objects are normalized to flat dicts for consistent response format across both transports Tested all 4 scenarios against ArcadeDB 26.4.x: 1. Bolt transport (graph + vector ops) 2. HTTP transport (graph + vector ops) 3. Bolt failure with automatic HTTP fallback 4. Without neo4j package installed (HTTP only)
|
I've also added automatic switch between Bolt to HTTP (if ArcadeDB server started without the Bolt plugin). |
|
Couldn't do it? |
siillee
left a comment
There was a problem hiding this comment.
Hey @lvca, thanks for opening this PR and your contribution!
I tried running this locally, and I am stuck with the following error:
RuntimeError: ArcadeDB HTTP error (500): {"error":"Internal error","detail":"Database 'cognee' is not available","exception":"com.arcadedb.exception.DatabaseOperationException"}
Is this something on my side? I used the docker image that is in the README and tried to run the example, just to see if it works, but after a couple of small fixes I ran into this error and cannot seem to get past it.
Thanks in advance for your reply!
ArcadeDB does not create databases on connect, so the very first HTTP or Bolt call against a fresh server failed with a 500 "Database '<name>' is not available". The adapter now POSTs a server-level `create database` command lazily (once per session) and treats "already exists" responses as success, removing a setup step from the example in the README.
|
Hi @siillee — thanks for the careful test and sorry for the friction. The error you hit ( Just pushed 4ebcc6b — the adapter now lazily POSTs
Could you re-run the example on top of the new commit? With a stock Docker container and the README's config, it should now go end-to-end without any manual @AKC777 — fix is in, give it another spin and let us know. |
Hey @lvca, thanks for the contribution! I tried to run the example again, now this is the error I get, which has almost zero information about what could be incorrect, other than the 401 code: |
Calling POST /api/v1/server requires server-level (root) credentials. Users connecting with a non-root account to a pre-created database were therefore hitting "ArcadeDB create database failed (401)" even though the database already existed and they could query it fine. The adapter now first issues GET /api/v1/exists/<db>, which any authenticated database user can call. The server-level create endpoint is only invoked when the database is genuinely missing, and a 401/403 there now produces an actionable error message.
|
Hi @siillee — sorry, that one's on me. The issue is that Just pushed 2b0b9f7. New behaviour:
Smoke-tested all four paths against
Could you re-run the example one more time? In your case the database already exists, so step 2 should short-circuit and you'll never hit the server endpoint. |
|
│ Hi @lvca 👋 |
|
@AKC777 Great! Let me know if you need my help ;-) |
Summary
Upgrades the ArcadeDB adapter from graph-only to a hybrid adapter supporting both graph and vector operations in a single database:
packages/graph/arcadedb/topackages/hybrid/arcadedb/(same pattern as FalkorDB)VectorDBInterfaceandGraphDBInterfacein a singleArcadeDBAdapterclassneo4jis an optional dependency (pip install ...[bolt])Why?
ArcadeDB natively supports vector search with HNSW indexes. Rather than requiring a separate vector database (Qdrant, Milvus, etc.), Cognee users can now use ArcadeDB as a unified graph+vector store, simplifying infrastructure and keeping embeddings co-located with the knowledge graph.
Implementation Details
neo4jasync driver, auto-verified on first queryARRAY_OF_FLOATSproperties on verticesCREATE INDEX ... LSM_VECTOR METADATA { dimensions: N, similarity: 'COSINE' }SELECT expand(vectorNeighbors('Type[prop]', [vec], k))returns results sorted by distanceArcadeDB Setup
Test plan
All tests run against ArcadeDB 26.4.x in Docker: