-
Notifications
You must be signed in to change notification settings - Fork 18
feat: Add get_by_ids and aget_by_ids to vectorstore #276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,7 +17,7 @@ | |
|
|
||
| import json | ||
| import uuid | ||
| from typing import Any, Callable, Iterable, Optional, Sequence | ||
| from typing import Any, Callable, Iterable, List, Optional, Sequence | ||
|
|
||
| import numpy as np | ||
| from langchain_core.documents import Document | ||
|
|
@@ -291,6 +291,54 @@ async def __aadd_embeddings( | |
|
|
||
| return ids | ||
|
|
||
| async def aget_by_ids(self, ids: Sequence[str]) -> List[Document]: | ||
| """Get documents by ids.""" | ||
|
|
||
| quoted_ids = [f"'{id_val}'" for id_val in ids] | ||
| id_list_str = ", ".join(quoted_ids) | ||
|
|
||
| columns = self.metadata_columns + [ | ||
| self.id_column, | ||
| self.content_column, | ||
| ] | ||
| if self.metadata_json_column: | ||
| columns.append(self.metadata_json_column) | ||
|
|
||
| column_names = ", ".join(f'"{col}"' for col in columns) | ||
|
|
||
| query = f'SELECT {column_names} FROM "{self.schema_name}"."{self.table_name}" WHERE {self.id_column} IN ({id_list_str});' | ||
|
||
|
|
||
| async with self.pool.connect() as conn: | ||
| result = await conn.execute(text(query)) | ||
| result_map = result.mappings() | ||
| results = result_map.fetchall() | ||
|
|
||
| documents = [] | ||
| for row in results: | ||
| metadata = ( | ||
| row[self.metadata_json_column] | ||
| if self.metadata_json_column and row[self.metadata_json_column] | ||
| else {} | ||
| ) | ||
| for col in self.metadata_columns: | ||
| metadata[col] = row[col] | ||
| documents.append( | ||
| ( | ||
| Document( | ||
| page_content=row[self.content_column], | ||
| metadata=metadata, | ||
| id=row[self.id_column], | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am wondering if we should cast this to a string just in case
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we have that option, but wouldn't that negate the value of having customizable ID columns?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Document interface is enforcing the str Id in the document object https://github.com/langchain-ai/langchain/blob/33354f984fba660e71ca0039cfbd3cf37643cfab/libs/core/langchain_core/documents/base.py#L34. Users will still be able to insert and use the id data type of their choice |
||
| ) | ||
| ) | ||
| ) | ||
|
|
||
| return documents | ||
|
|
||
| def get_by_ids(self, ids: Sequence[str]) -> List[Document]: | ||
| raise NotImplementedError( | ||
| "Sync methods are not implemented for AsyncAlloyDBVectorStore. Use AlloyDBVectorStore interface instead." | ||
| ) | ||
|
|
||
| async def aadd_texts( | ||
| self, | ||
| texts: Iterable[str], | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add a test for
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, added a test for the sync function in the async vectorstore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use lower case "list" to follow guidance of https://peps.python.org/pep-0585/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved