-
Notifications
You must be signed in to change notification settings - Fork 111
Feature: Support metadata based filtering for delete operations #272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feature: Support metadata based filtering for delete operations #272
Conversation
- Add filter parameter to adelete and delete methods in both async and sync vectorstores - Support complex filter syntax (operators, , etc.) for bulk deletion - Add comprehensive test cases for filter-based deletion scenarios - Update documentation with examples for filter-based deletion
… operations - Add note that filters only work on metadata_columns, not metadata_json_column - Update tests to use metadata_columns instead of langchain_metadata - Ensure tests properly test filtering functionality with dedicated metadata columns
|
@yukiharada1228 Thank you for opening this PR! |
dishaprakash
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the documentation (https://github.com/langchain-ai/langchain-postgres/blob/main/examples/pg_vectorstore_how_to.ipynb) mentioning these changes.
|
Updated! I've added documentation for the metadata filter-based deletion feature in the how-to notebook, including:
|
|
Fixed formatting issues discovered when running Changes Made
Fix Details
Verification Resultsmake lint✅ All checks passed
Commit: |
|
@averikitsch Thank you very much for the review and approval! 🙏 |
|
@dishaprakash Thank you for the review and guidance on the documentation updates! I've addressed the feedback by updating the how-to notebook with examples for metadata-based deletion, and the changes have now been approved. |
Add metadata-based filtering support for delete operations
Summary
This PR adds support for metadata-based filtering in
delete()andadelete()methods, enabling bulk deletion of documents based on metadata criteria rather than just by IDs.Motivation
Currently, the delete methods only support deletion by document IDs, which is limiting for common use cases:
Other vector stores (Chroma, Pinecone, Weaviate) already support metadata-based deletion, and the infrastructure for metadata filtering already exists in this codebase via the
_create_filter_clause()method.Changes
Modified Files
langchain_postgres/v2/async_vectorstore.pyadelete()method to accept optionalfilterparameter_create_filter_clause()for consistent filter syntaxlangchain_postgres/v2/vectorstores.pyadelete()anddelete()methods to acceptfilterparameterTest files
metadata_columnsto ensure proper filtering behaviorUsage Examples
Setup: Define metadata columns
Delete by metadata filter only
Delete by IDs only (existing behavior)
Delete by both IDs and filter (must match both criteria)
Sync methods work identically
Filter Syntax
The
filterparameter supports the same rich filtering syntax assimilarity_search():{"field": "value"}{"field": {"$lt": 100}}($eq, $ne, $lt, $lte, $gt, $gte){"field": {"$in": [1, 2, 3]}}($in, $nin){"field": {"$like": "pattern%"}}($like, $ilike){"$and": [...]},{"$or": [...]},{"$not": {...}}{"field": {"$exists": True}}{"field": {"$between": [10, 20]}}Filters only work on fields defined in
metadata_columns, not on fields stored inmetadata_json_column.This is consistent with how
similarity_search()filtering works. To use metadata-based deletion, you must define the metadata fields as actual database columns when creating the vectorstore:Fields stored only in
metadata_json_columncannot be used in filters. This design choice provides better query performance and leverages PostgreSQL's native indexing capabilities.Implementation Details
adelete(ids=[...])continues to work unchanged_create_filter_clause()methodsimilarity_search()for consistencyTest Coverage
Added comprehensive test coverage (all tests passing):
test_adelete_with_filter: Basic metadata filter deletiontest_adelete_with_filter_and_operator: Deletion with comparison operatorstest_adelete_with_complex_filter: Complex filters with logical operatorstest_adelete_with_filter_and_ids: Combined ID and filter deletiontest_adelete_with_filter_no_matches: Graceful handling of no matchestest_adelete_with_filter(sync): Async method in sync wrappertest_delete_with_filter(sync): Sync method filteringtest_adelete: Existing tests continue to passAll tests follow existing patterns and integrate with the current test suite.
Breaking Changes
None. This is a backward-compatible enhancement:
filterparameter is optionalChecklist
rufflintingmypytype checkingRelated Issues
Closes #271
Additional Notes
Why metadata_columns are required for filtering
This implementation reuses the robust
_create_filter_clause()method that's already extensively tested for search operations. The method generates SQL WHERE clauses that operate on actual database columns, which provides:similarity_search()filteringThis design is consistent with the existing filtering implementation and aligns with how other parts of the codebase handle metadata filtering.