Skip to content

Conversation

aclark4life
Copy link
Collaborator

No description provided.

@aclark4life aclark4life changed the title INTPYTHON-752 Integrate pymongo-vectorsearch-utils INTPYTHON-752 Integrate pymongo-search-utils Oct 6, 2025
@aclark4life aclark4life marked this pull request as ready for review October 6, 2025 15:29
@aclark4life aclark4life requested a review from Copilot October 6, 2025 15:30
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR integrates the pymongo-search-utils package to consolidate MongoDB search functionality and reduce code duplication.

  • Migrates vector search index creation/update functions to pymongo-search-utils
  • Updates Python version requirement from 3.9 to 3.10
  • Replaces local bulk_embed_and_insert_texts implementation with the external utility

Reviewed Changes

Copilot reviewed 6 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
pyproject.toml Updates Python version requirement to 3.10
libs/langchain-mongodb/pyproject.toml Updates Python version requirement and adds pymongo-search-utils dependency
libs/langchain-mongodb/tests/utils.py Imports and uses external bulk_embed_and_insert_texts function
libs/langchain-mongodb/langchain_mongodb/vectorstores.py Removes local bulk_embed_and_insert_texts method and uses external function
libs/langchain-mongodb/langchain_mongodb/utils.py Removes local _append_client_metadata function and imports from external package
libs/langchain-mongodb/langchain_mongodb/index.py Removes local index creation/update functions and imports from external package

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +364 to 368
batch_res = bulk_embed_and_insert_texts(
texts_batch, metadatas_batch, ids[i : j + 1]
)
Copy link

Copilot AI Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The external bulk_embed_and_insert_texts function is being called without the required self parameter that was present in the original method. This function needs to receive the embedding model, collection, and field configuration to work properly.

Copilot uses AI. Check for mistakes.

Comment on lines +368 to 372
batch_res = bulk_embed_and_insert_texts(
texts_batch, metadatas_batch
)
Copy link

Copilot AI Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The external bulk_embed_and_insert_texts function is being called without the required self parameter that was present in the original method. This function needs to receive the embedding model, collection, and field configuration to work properly.

Copilot uses AI. Check for mistakes.

Comment on lines +378 to 382
batch_res = bulk_embed_and_insert_texts(
texts_batch, metadatas_batch, ids[i : j + 1]
)
Copy link

Copilot AI Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The external bulk_embed_and_insert_texts function is being called without the required self parameter that was present in the original method. This function needs to receive the embedding model, collection, and field configuration to work properly.

Copilot uses AI. Check for mistakes.

batch_res = self.bulk_embed_and_insert_texts(
texts_batch, metadatas_batch
)
batch_res = bulk_embed_and_insert_texts(texts_batch, metadatas_batch)
Copy link

Copilot AI Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The external bulk_embed_and_insert_texts function is being called without the required self parameter that was present in the original method. This function needs to receive the embedding model, collection, and field configuration to work properly.

Copilot uses AI. Check for mistakes.

result_ids.extend(
self.bulk_embed_and_insert_texts(
bulk_embed_and_insert_texts(
texts=texts, metadatas=metadatas, ids=ids[start:end]
Copy link

Copilot AI Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The external bulk_embed_and_insert_texts function is being called without the required self parameter that was present in the original method. This function needs to receive the embedding model, collection, and field configuration to work properly.

Suggested change
texts=texts, metadatas=metadatas, ids=ids[start:end]
embedding_model=self._embedding,
collection=self._collection,
text_key=self._text_key,
embedding_key=self._embedding_key,
metadata_key=self._metadata_key,
texts=texts,
metadatas=metadatas,
ids=ids[start:end],

Copilot uses AI. Check for mistakes.

) -> List:
"""Patched insert_texts that waits for data to be indexed before returning"""
ids_inserted = super().bulk_embed_and_insert_texts(texts, metadatas, ids)
ids_inserted = bulk_embed_and_insert_texts(texts, metadatas, ids)
Copy link

Copilot AI Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The external bulk_embed_and_insert_texts function is being called without the required instance context that was available when it was called as super().bulk_embed_and_insert_texts(). This function needs to receive the embedding model, collection, and field configuration to work properly.

Suggested change
ids_inserted = bulk_embed_and_insert_texts(texts, metadatas, ids)
ids_inserted = bulk_embed_and_insert_texts(
self.embedding, self.collection, self._embedding_field_config, texts, metadatas, ids
)

Copilot uses AI. Check for mistakes.

@aclark4life
Copy link
Collaborator Author

aclark4life commented Oct 6, 2025

@NoahStapp Re: test failures Should driver_info be optional or passed in where it's not currently passed in to append_client_metadata ?

@NoahStapp
Copy link
Collaborator

@NoahStapp Re: test failures Should driver_info be optional or passed in where it's not currently passed in to append_client_metadata ?

It should be passed in. The previous version of append_client_metadata we used here used a DriverInfo set at the package level. The new one from pymongo-search-utils is generic for flexibility so we need to pass in the package-level DriverInfo here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants