Skip to content

[FEATURE] Add support for SQLite with sqlite-vec as a vector store #240

Description

@jdrew1303

Problem Statement

Currently, Semantica does not support SQLite as a vector store backend. This limits its usability in lightweight, embedded, or edge scenarios where running an external vector database (e.g., PostgreSQL + pgvector, Milvus, etc.) is overkill or impractical.

Many developers want a zero-dependency, file-based vector store for local development, prototyping, offline usage, CI environments, and small-scale deployments. SQLite is already widely used in these contexts, and the sqlite-vec extension enables efficient vector similarity search directly within SQLite.

Proposed Solution

Add first-class support for SQLite as a vector store, backed by the sqlite-vec extension.

This would allow users to:

  • Store embeddings in a local SQLite database
  • Perform similarity search using sqlite-vec
  • Use Semantica without requiring a separate vector database service
  • Ship an immutable, auditable database artifact
  • Enable an embedded disk based vector store.

This should be implemented as an additional vector store backend that fits into Semantica’s existing abstraction for storage providers.

Alternatives Considered

  • Using a full external vector database (e.g., pgvector, Milvus):
    Too heavyweight for local, embedded, offline, or auditable deployments. Requires operational overhead and network access.

  • Remote SaaS vector stores:
    Introduces network latency, availability concerns, and makes reproducibility/auditing more difficult (not to mention cost).

  • In-memory vector stores:
    Not persistent and unsuitable for many real-world or repeatable workloads.

  • Rolling custom SQLite solutions per project:
    Leads to duplicated effort, inconsistent APIs, and fragmented implementations across users.

Use Cases

  1. Local development & prototyping
    Developers can experiment with Semantica using a single SQLite file without additional infrastructure.

  2. Embedded / edge deployments
    Applications running on edge devices or constrained environments can perform semantic search locally with no network dependency.

  3. Auditable & immutable deployments
    A compiled SQLite database can be treated as an immutable artifact for compliance, auditing, or reproducible builds.

  4. Low-latency semantic search
    By embedding the database directly with the application, vector search avoids network hops entirely, resulting in significantly lower latency.

  5. Small-to-medium scale production workloads
    Many real-world datasets comfortably fit on disk or in memory once processed, making SQLite a practical and fast option.

  6. Cost reduction
    For many users, their big data isn't as big once its been processed. This allows for most databases to be run on relatively modest hardware. Removing the need for yet another cloud service can remove another bill for many users.

Impact Assessment

  • Who would benefit?
    Developers building local-first, embedded, edge, or audit-sensitive applications; users who want minimal setup and maximum portability.

  • Priority: Medium–High

  • Breaking Changes: No (does not add regressions to existing code base)

  • Dependencies:

    • sqlite-vec (optional dependency?, enabled only when using the SQLite backend)
    • SQLite (already bundled in many python runtimes).

Implementation Ideas

A possible approach is to add a new vector store implementation (e.g. SQLiteVectorStore) that conforms to Semantica’s existing vector store interface.

An alternative approach would be to enable third party plugins for VectorStores (and consequently GraphStores). At the moment they're hardcoded into the library.

High-level ideas:

  • Use sqlite-vec for vector indexing and similarity search
  • Lazy-load or optionally install sqlite-vec
  • Keep schema simple and aligned with Semantica’s existing abstractions
  • Support read-only / immutable database modes
  • Provide clear errors or fallbacks if sqlite-vec is not available

Additional Context

  • sqlite-vec: https://github.com/asg017/sqlite-vec
  • SQLite’s single-file format enables easy embedding, distribution, and auditing
  • Similar SQLite-backed vector approaches exist in other libraries for local-first workflows

Contribution

  • I'm willing to help implement this feature
  • I can help with documentation
  • I can help with testing

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestintegrationIntegrating an external database, framework, or SDKmedium-scopeRequires moderate effort (1–3 days)
No fields configured for Feature.

Projects

Status
In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions