-
Notifications
You must be signed in to change notification settings - Fork 216
Description
Supabase Integration
Summary and motivation
Supabase is open-source backend platform built on top of PostgreSQL that offers rich ecosystem of storage and search capabilities beyond plain pgvector. Comprehensive Haystack integration would unlock several complementary use cases under single, unified provider:
- Vector search via pgvector – most straightforward path, giving users managed Postgres + pgvector experience with operational simplicity of Supabase.
- Cost-efficient vector storage via Vector Buckets – Supabase's S3-backed vector store (currently in alpha) is aimed at scenarios where query latency is less critical than cost, similar to alternatives like turbopuffer or TopK.
- Full-text search via PGroonga – Supabase exposes PGroonga, PostgreSQL extension for fast, multilingual full-text search, which could complement or replace BM25-based retrievers in certain pipelines.
- File/object storage via Supabase Buckets – analogous to S3, Supabase Buckets can serve as document source, similar to existing
S3Downloader.
Detailed design
Integration would live in integrations/supabase inside haystack-core-integrations and be published as supabase-haystack on PyPI. It would be structured around four sub-features:
1. SupabasePgVectorDocumentStore & SupabasePgvectorEmbeddingRetriever
Thin wrapper around existing pgvector integration (pgvector-haystack), pre-configured to work with Supabase's connection strings and auth model (service role key / JWT). It reuses PgvectorDocumentStore under the hood and mainly handles Supabase-specific connection setup.
from haystack_integrations.document_stores.supabase import SupabasePgVectorDocumentStore
from haystack_integrations.components.retrievers.supabase import SupabasePgvectorEmbeddingRetriever
document_store = SupabasePgVectorDocumentStore(
supabase_url="https://<project>.supabase.co",
supabase_key=Secret.from_env_var("SUPABASE_SERVICE_KEY"),
table_name="haystack_documents",
embedding_dimension=1536,
)
retriever = SupabasePgvectorEmbeddingRetriever(document_store=document_store, top_k=5)2. SupabaseVectorBucketDocumentStore & SupabaseVectorBucketEmbeddingRetriever
Backed by Supabase's S3-compatible Vector Buckets (currently alpha). Suitable for large-scale, latency-tolerant workloads where cost is primary concern. Implementation would use Supabase Storage API or S3-compatible endpoint.
⚠️ This feature is in alpha upstream and may need to be gated behind explicit opt-in or marked as experimental in Haystack.
from haystack_integrations.document_stores.supabase import SupabaseVectorBucketDocumentStore
document_store = SupabaseVectorBucketDocumentStore(
supabase_url="https://<project>.supabase.co",
supabase_key=Secret.from_env_var("SUPABASE_SERVICE_KEY"),
bucket_name="my-vector-bucket",
embedding_dimension=1536,
)3. SupabaseGroongaDocumentStore & SupabaseGroongaRetriever
Full-text search powered by PGroonga, which supports multilingual tokenization and fast index-based FTS on top of Postgres. This retriever would work without embeddings, complementing dense retrieval in hybrid search pipelines.
from haystack_integrations.document_stores.supabase import SupabaseGroongaDocumentStore
from haystack_integrations.components.retrievers.supabase import SupabaseGroongaRetriever
document_store = SupabaseGroongaDocumentStore(
supabase_url="https://<project>.supabase.co",
supabase_key=Secret.from_env_var("SUPABASE_SERVICE_KEY"),
table_name="haystack_fts_documents",
)
retriever = SupabaseGroongaRetriever(document_store=document_store, top_k=10)4. SupabaseBucketDownloader
Component analogous to S3Downloader that fetches files from Supabase Storage buckets and returns them as ByteStream objects for further processing in indexing pipelines.
from haystack_integrations.components.fetchers.supabase import SupabaseBucketDownloader
downloader = SupabaseBucketDownloader(
supabase_url="https://<project>.supabase.co",
supabase_key=Secret.from_env_var("SUPABASE_SERVICE_KEY"),
bucket_name="my-documents",
)Authentication & Configuration
All components should support:
supabase_urlandsupabase_key(viaSecretfor secure handling)- Optional
schemaandtable_nameoverrides - Connection pooling configuration where relevant
Primary dependency would be supabase-py client library.
Future Considerations
Supabase Auth could be explored as identity provider for RBAC in Haystack pipelines, allowing row-level security policies defined in Supabase to be honored at query time.
Checklist
If the request is accepted, ensure the following checklist is complete before closing this issue.
Tasks
- The code is documented with docstrings and was merged in the
mainbranch - Docs are published at https://docs.haystack.deepset.ai/
- There is a Github workflow running the tests for the integration nightly and at every PR
- A new label named like
integration:supabasehas been added to the list of labels for this repository - The labeler.yml file has been updated
- The package has been released on PyPI
- An integration tile with a usage example has been added to https://github.com/deepset-ai/haystack-integrations
- The integration has been listed in the Inventory section of this repo README
- The feature was announced through social media