Skip to content

Add Supabase integration #2862

@kacperlukawski

Description

@kacperlukawski

Supabase Integration

Summary and motivation

Supabase is open-source backend platform built on top of PostgreSQL that offers rich ecosystem of storage and search capabilities beyond plain pgvector. Comprehensive Haystack integration would unlock several complementary use cases under single, unified provider:

  • Vector search via pgvector – most straightforward path, giving users managed Postgres + pgvector experience with operational simplicity of Supabase.
  • Cost-efficient vector storage via Vector Buckets – Supabase's S3-backed vector store (currently in alpha) is aimed at scenarios where query latency is less critical than cost, similar to alternatives like turbopuffer or TopK.
  • Full-text search via PGroonga – Supabase exposes PGroonga, PostgreSQL extension for fast, multilingual full-text search, which could complement or replace BM25-based retrievers in certain pipelines.
  • File/object storage via Supabase Buckets – analogous to S3, Supabase Buckets can serve as document source, similar to existing S3Downloader.

Detailed design

Integration would live in integrations/supabase inside haystack-core-integrations and be published as supabase-haystack on PyPI. It would be structured around four sub-features:


1. SupabasePgVectorDocumentStore & SupabasePgvectorEmbeddingRetriever

Thin wrapper around existing pgvector integration (pgvector-haystack), pre-configured to work with Supabase's connection strings and auth model (service role key / JWT). It reuses PgvectorDocumentStore under the hood and mainly handles Supabase-specific connection setup.

from haystack_integrations.document_stores.supabase import SupabasePgVectorDocumentStore
from haystack_integrations.components.retrievers.supabase import SupabasePgvectorEmbeddingRetriever

document_store = SupabasePgVectorDocumentStore(
    supabase_url="https://<project>.supabase.co",
    supabase_key=Secret.from_env_var("SUPABASE_SERVICE_KEY"),
    table_name="haystack_documents",
    embedding_dimension=1536,
)

retriever = SupabasePgvectorEmbeddingRetriever(document_store=document_store, top_k=5)

2. SupabaseVectorBucketDocumentStore & SupabaseVectorBucketEmbeddingRetriever

Backed by Supabase's S3-compatible Vector Buckets (currently alpha). Suitable for large-scale, latency-tolerant workloads where cost is primary concern. Implementation would use Supabase Storage API or S3-compatible endpoint.

⚠️ This feature is in alpha upstream and may need to be gated behind explicit opt-in or marked as experimental in Haystack.

from haystack_integrations.document_stores.supabase import SupabaseVectorBucketDocumentStore

document_store = SupabaseVectorBucketDocumentStore(
    supabase_url="https://<project>.supabase.co",
    supabase_key=Secret.from_env_var("SUPABASE_SERVICE_KEY"),
    bucket_name="my-vector-bucket",
    embedding_dimension=1536,
)

3. SupabaseGroongaDocumentStore & SupabaseGroongaRetriever

Full-text search powered by PGroonga, which supports multilingual tokenization and fast index-based FTS on top of Postgres. This retriever would work without embeddings, complementing dense retrieval in hybrid search pipelines.

from haystack_integrations.document_stores.supabase import SupabaseGroongaDocumentStore
from haystack_integrations.components.retrievers.supabase import SupabaseGroongaRetriever

document_store = SupabaseGroongaDocumentStore(
    supabase_url="https://<project>.supabase.co",
    supabase_key=Secret.from_env_var("SUPABASE_SERVICE_KEY"),
    table_name="haystack_fts_documents",
)

retriever = SupabaseGroongaRetriever(document_store=document_store, top_k=10)

4. SupabaseBucketDownloader

Component analogous to S3Downloader that fetches files from Supabase Storage buckets and returns them as ByteStream objects for further processing in indexing pipelines.

from haystack_integrations.components.fetchers.supabase import SupabaseBucketDownloader

downloader = SupabaseBucketDownloader(
    supabase_url="https://<project>.supabase.co",
    supabase_key=Secret.from_env_var("SUPABASE_SERVICE_KEY"),
    bucket_name="my-documents",
)

Authentication & Configuration

All components should support:

  • supabase_url and supabase_key (via Secret for secure handling)
  • Optional schema and table_name overrides
  • Connection pooling configuration where relevant

Primary dependency would be supabase-py client library.


Future Considerations

Supabase Auth could be explored as identity provider for RBAC in Haystack pipelines, allowing row-level security policies defined in Supabase to be honored at query time.

Checklist

If the request is accepted, ensure the following checklist is complete before closing this issue.

Tasks

  • The code is documented with docstrings and was merged in the main branch
  • Docs are published at https://docs.haystack.deepset.ai/
  • There is a Github workflow running the tests for the integration nightly and at every PR
  • A new label named like integration:supabase has been added to the list of labels for this repository
  • The labeler.yml file has been updated
  • The package has been released on PyPI
  • An integration tile with a usage example has been added to https://github.com/deepset-ai/haystack-integrations
  • The integration has been listed in the Inventory section of this repo README
  • The feature was announced through social media

Metadata

Metadata

Assignees

No one assigned

    Labels

    new integrationDiscuss the creation of a new integration in Core

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions