Skip to content

Support metadata-based filtering for delete operations #271

@yukiharada1228

Description

@yukiharada1228

Support metadata-based filtering for delete operations

Feature Request

Current Behavior

Currently, the delete() and adelete() methods only support deletion by document IDs:

async def adelete(
    self,
    ids: Optional[list] = None,
    **kwargs: Any,
) -> Optional[bool]:
    """Delete records from the table."""
    if not ids:
        return False
    # ... only deletes by IDs

Reference: langchain_postgres/v2/async_vectorstore.py:400-419

Proposed Enhancement

Add support for metadata-based filtering in delete operations, similar to how similarity_search supports filtering:

# Example usage
await vectorstore.adelete(filter={"source": "documentation"})
await vectorstore.adelete(filter={"$and": [{"category": "obsolete"}, {"year": {"$lt": 2020}}]})

Motivation

  1. Bulk deletions: Users often need to delete groups of documents based on metadata criteria (e.g., all documents from a specific source, time period, or category)
  2. Existing infrastructure: The codebase already has _create_filter_clause() method that supports metadata filtering for search operations
  3. Consistency: Other vector stores (like Chroma, Pinecone) support metadata-based deletion
  4. Practical use cases:
    • Remove all documents from a deprecated data source
    • Delete documents older than a certain date
    • Clean up documents with specific tags or categories

Implementation Suggestion

The adelete() method could be enhanced to accept an optional filter parameter:

async def adelete(
    self,
    ids: Optional[list] = None,
    filter: Optional[dict] = None,
    **kwargs: Any,
) -> Optional[bool]:
    """Delete records from the table.

    Args:
        ids: List of document IDs to delete
        filter: Metadata filter dictionary for bulk deletion
    """
    if not ids and not filter:
        return False

    if filter:
        safe_filter, filter_dict = self._create_filter_clause(filter)
        query = f'DELETE FROM "{self.schema_name}"."{self.table_name}" WHERE {safe_filter}'
        async with self.engine.connect() as conn:
            await conn.execute(text(query), filter_dict)
            await conn.commit()
    else:
        # ... existing ID-based deletion logic

    return True

This would leverage the existing _create_filter_clause() infrastructure and provide a consistent filtering API across the library.

Alternative: Support both IDs and filters

The implementation could also support combining both IDs and filters:

# Delete specific IDs that also match filter criteria
await vectorstore.adelete(
    ids=["id1", "id2", "id3"],
    filter={"status": "archived"}
)

This would provide maximum flexibility for deletion operations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions