Problem Statement
Currently, Semantica does not support SQLite as a vector store backend. This limits its usability in lightweight, embedded, or edge scenarios where running an external vector database (e.g., PostgreSQL + pgvector, Milvus, etc.) is overkill or impractical.
Many developers want a zero-dependency, file-based vector store for local development, prototyping, offline usage, CI environments, and small-scale deployments. SQLite is already widely used in these contexts, and the sqlite-vec extension enables efficient vector similarity search directly within SQLite.
Proposed Solution
Add first-class support for SQLite as a vector store, backed by the sqlite-vec extension.
This would allow users to:
- Store embeddings in a local SQLite database
- Perform similarity search using
sqlite-vec
- Use Semantica without requiring a separate vector database service
- Ship an immutable, auditable database artifact
- Enable an embedded disk based vector store.
This should be implemented as an additional vector store backend that fits into Semantica’s existing abstraction for storage providers.
Alternatives Considered
-
Using a full external vector database (e.g., pgvector, Milvus):
Too heavyweight for local, embedded, offline, or auditable deployments. Requires operational overhead and network access.
-
Remote SaaS vector stores:
Introduces network latency, availability concerns, and makes reproducibility/auditing more difficult (not to mention cost).
-
In-memory vector stores:
Not persistent and unsuitable for many real-world or repeatable workloads.
-
Rolling custom SQLite solutions per project:
Leads to duplicated effort, inconsistent APIs, and fragmented implementations across users.
Use Cases
-
Local development & prototyping
Developers can experiment with Semantica using a single SQLite file without additional infrastructure.
-
Embedded / edge deployments
Applications running on edge devices or constrained environments can perform semantic search locally with no network dependency.
-
Auditable & immutable deployments
A compiled SQLite database can be treated as an immutable artifact for compliance, auditing, or reproducible builds.
-
Low-latency semantic search
By embedding the database directly with the application, vector search avoids network hops entirely, resulting in significantly lower latency.
-
Small-to-medium scale production workloads
Many real-world datasets comfortably fit on disk or in memory once processed, making SQLite a practical and fast option.
-
Cost reduction
For many users, their big data isn't as big once its been processed. This allows for most databases to be run on relatively modest hardware. Removing the need for yet another cloud service can remove another bill for many users.
Impact Assessment
-
Who would benefit?
Developers building local-first, embedded, edge, or audit-sensitive applications; users who want minimal setup and maximum portability.
-
Priority: Medium–High
-
Breaking Changes: No (does not add regressions to existing code base)
-
Dependencies:
sqlite-vec (optional dependency?, enabled only when using the SQLite backend)
- SQLite (already bundled in many python runtimes).
Implementation Ideas
A possible approach is to add a new vector store implementation (e.g. SQLiteVectorStore) that conforms to Semantica’s existing vector store interface.
An alternative approach would be to enable third party plugins for VectorStores (and consequently GraphStores). At the moment they're hardcoded into the library.
High-level ideas:
- Use
sqlite-vec for vector indexing and similarity search
- Lazy-load or optionally install
sqlite-vec
- Keep schema simple and aligned with Semantica’s existing abstractions
- Support read-only / immutable database modes
- Provide clear errors or fallbacks if
sqlite-vec is not available
Additional Context
sqlite-vec: https://github.com/asg017/sqlite-vec
- SQLite’s single-file format enables easy embedding, distribution, and auditing
- Similar SQLite-backed vector approaches exist in other libraries for local-first workflows
Contribution
Problem Statement
Currently, Semantica does not support SQLite as a vector store backend. This limits its usability in lightweight, embedded, or edge scenarios where running an external vector database (e.g., PostgreSQL + pgvector, Milvus, etc.) is overkill or impractical.
Many developers want a zero-dependency, file-based vector store for local development, prototyping, offline usage, CI environments, and small-scale deployments. SQLite is already widely used in these contexts, and the
sqlite-vecextension enables efficient vector similarity search directly within SQLite.Proposed Solution
Add first-class support for SQLite as a vector store, backed by the
sqlite-vecextension.This would allow users to:
sqlite-vecThis should be implemented as an additional vector store backend that fits into Semantica’s existing abstraction for storage providers.
Alternatives Considered
Using a full external vector database (e.g., pgvector, Milvus):
Too heavyweight for local, embedded, offline, or auditable deployments. Requires operational overhead and network access.
Remote SaaS vector stores:
Introduces network latency, availability concerns, and makes reproducibility/auditing more difficult (not to mention cost).
In-memory vector stores:
Not persistent and unsuitable for many real-world or repeatable workloads.
Rolling custom SQLite solutions per project:
Leads to duplicated effort, inconsistent APIs, and fragmented implementations across users.
Use Cases
Local development & prototyping
Developers can experiment with Semantica using a single SQLite file without additional infrastructure.
Embedded / edge deployments
Applications running on edge devices or constrained environments can perform semantic search locally with no network dependency.
Auditable & immutable deployments
A compiled SQLite database can be treated as an immutable artifact for compliance, auditing, or reproducible builds.
Low-latency semantic search
By embedding the database directly with the application, vector search avoids network hops entirely, resulting in significantly lower latency.
Small-to-medium scale production workloads
Many real-world datasets comfortably fit on disk or in memory once processed, making SQLite a practical and fast option.
Cost reduction
For many users, their big data isn't as big once its been processed. This allows for most databases to be run on relatively modest hardware. Removing the need for yet another cloud service can remove another bill for many users.
Impact Assessment
Who would benefit?
Developers building local-first, embedded, edge, or audit-sensitive applications; users who want minimal setup and maximum portability.
Priority: Medium–High
Breaking Changes: No (does not add regressions to existing code base)
Dependencies:
sqlite-vec(optional dependency?, enabled only when using the SQLite backend)Implementation Ideas
A possible approach is to add a new vector store implementation (e.g.
SQLiteVectorStore) that conforms to Semantica’s existing vector store interface.An alternative approach would be to enable third party plugins for VectorStores (and consequently GraphStores). At the moment they're hardcoded into the library.
High-level ideas:
sqlite-vecfor vector indexing and similarity searchsqlite-vecsqlite-vecis not availableAdditional Context
sqlite-vec: https://github.com/asg017/sqlite-vecContribution