-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat(client): migrate MilvusClient to AsyncMilvusClient #3376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(client): migrate MilvusClient to AsyncMilvusClient #3376
Conversation
bd99ac2
to
ba8c755
Compare
@@ -106,7 +107,13 @@ async def query_vector(self, embedding: NDArray, k: int, score_threshold: float) | |||
return QueryChunksResponse(chunks=chunks, scores=scores) | |||
|
|||
async def delete(self): | |||
await maybe_await(self.client.delete_collection(self.collection.name)) | |||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes in a separate commit
The test was failing due to accessing a collection after being deleted
The issue is that the ChromaIndex.initialize() method was always empty (doing nothing), but the test pattern of calling delete() followed by initialize() only worked before because:
- The collection object was created once in the test fixture and reused
- The delete() method removes the collection from Chroma
- The empty initialize() method doesn't recreate the collection
- When add_chunks() is called, it tries to use a deleted collection
After the migration the test failed consistently, the fix was to re-add the deleted collection.
Please let me know if there is a better approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Elbehery was that commit squashed? That link is not working for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to the unit/integration test configuration or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I meant this commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason you don't use list_collections()
or get_collection() '
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just learnt from previous PRs and reviews, to minimize the changes as much as possible
So i used has_collection()
as it was used in the original code.
Let me try to use list_collections()
and see if the tests passes without more changes 👍🏽
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has_collection()
is used in milvus, which i don't believe is available in Chroma (I tested the client locally and don't see it listed here: https://docs.trychroma.com/docs/collections/manage-collections).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has_collection() is used in milvus
yes that is what i meant. Therefore i have used the same method, and not list_collection()
to minimize code changes.
which i don't believe is available in Chroma
the chroma test that I cited above, started failing after the migration to AsyncMilvusClient
Thus I have made these changes to fix the test in a separate commit, so that it is easier for the reviewer.
Now I am trying use list_collection()
in milvus client, and testing whether this would not fail the Chroma
test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7a4fc9b
to
19f1374
Compare
self.kvstore = kvstore | ||
|
||
async def initialize(self): | ||
pass | ||
self.collection = await maybe_await(self.client.get_or_create_collection(self.collection_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there is a better way to avoid this code change ?
|
||
async def initialize(self): | ||
# MilvusIndex does not require explicit initialization | ||
# TODO: could move collection creation into initialization but it is not really necessary | ||
pass | ||
|
||
async def delete(self): | ||
if await asyncio.to_thread(self.client.has_collection, self.collection_name): | ||
await asyncio.to_thread(self.client.drop_collection, collection_name=self.collection_name) | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use two try and except blocks here?
squashing the list_collections
and drop_collection
into one isn't ideal.
|
||
async def add_chunks(self, chunks: list[Chunk], embeddings: NDArray): | ||
assert len(chunks) == len(embeddings), ( | ||
f"Chunk length {len(chunks)} does not match embedding length {len(embeddings)}" | ||
) | ||
|
||
if not await asyncio.to_thread(self.client.has_collection, self.collection_name): | ||
try: | ||
collections = await self.client.list_collections() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given the usage here gain, it may make sense to just add a helper _has_collection()
that has the try and except block and checks the collection_name
if "attached to a different loop" in str(e): | ||
logger.warning("Recreating client due to event loop issue") | ||
|
||
if hasattr(self, "_parent_adapter"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this _parent_adapter? did you have issues with the event loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i had this issue
Problem: AsyncMilvusClient connections were established in one event loop but used in different event loops during test execution.
# BROKEN - Client created in setup loop, used in test loop
RuntimeError: Task <Task pending> got Future <Future pending> attached to a different loop
Symptoms:
- RuntimeError: attached to a different loop errors
- Tests failing during both execution and teardown
- Milvus operations hanging or crashing
the cause was that MilvusIndex
which reference MilvusClient
was being initialized before MilvusClient
Therefore the failure, the fix is to swap the init order, which is being fixed separately in this commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is that log from? Is that from Claude or some other AI tool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, Claude
collection_name: str, | ||
consistency_level="Strong", | ||
kvstore: KVStore | None = None, | ||
parent_adapter=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concerned about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is part of the fix of the init order in commit
please let me know if there is a better approach :_
index_params=index_params, | ||
consistency_level=self.consistency_level, | ||
) | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice thanks for adding this.
@@ -143,8 +175,7 @@ async def add_chunks(self, chunks: list[Chunk], embeddings: NDArray): | |||
} | |||
) | |||
try: | |||
await asyncio.to_thread( | |||
self.client.insert, | |||
await self.client.insert( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably we should improve the error in the raise
@@ -321,6 +346,15 @@ def __init__( | |||
|
|||
async def initialize(self) -> None: | |||
self.kvstore = await kvstore_impl(self.config.kvstore) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it'd be good to also add exception handling here. i think someone had an issue with the db_path at some point and we probably could've caught that bug better since it can be configured in the run.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, could you propose the exception to add ?
also shall I add it here, or in a follow-up PR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this PR it's fine to add!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm doing a little scope creep but while we're here it's a good thing to add a little more resilience as end users will be happy :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm 💯
will do tmrw, i am sleeping on my desk :D
if self.client: | ||
await self.client.close() | ||
|
||
async def _recreate_client(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not convinced we need this and it feels like we shouldn't need it but maybe i'm missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its part of the init order in commit
I really did not have other options, all these failures happened after the migration, which actually was supposed to be straight forward :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think we need _recreate_client
can you provide more info on why this is necessary?
yes, I saved all the failure of this, as I spent time fixing it and knew would get a lot of questions
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR migrates the Milvus client from synchronous MilvusClient
to asynchronous AsyncMilvusClient
to improve performance and maintain consistency with the async architecture.
Key changes:
- Import statements updated from
MilvusClient
toAsyncMilvusClient
- Removed
asyncio.to_thread()
wrappers for all Milvus operations - Updated test mocks and fixtures to use
AsyncMock
for async operations
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
llama_stack/providers/remote/vector_io/milvus/milvus.py | Core migration from sync to async client with error handling improvements |
tests/unit/providers/vector_io/remote/test_milvus.py | Updated test mocks from MagicMock to AsyncMock for async operations |
tests/unit/providers/vector_io/conftest.py | Updated fixture to use AsyncMilvusClient and added proper cleanup |
tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py | Added await keywords to vector index operations |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
if hasattr(self, "_parent_adapter"): | ||
await self._parent_adapter._recreate_client() | ||
collections = await self.client.list_collections() | ||
collection_exists = self.collection_name in collections |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After recreating the client through the parent adapter, the self.client
reference may still point to the old client. The client reference should be updated to the new client instance after recreation.
Copilot uses AI. Check for mistakes.
Initializing the MilvusClient before the index makes sense and we should keep that. What errors were you seeing that resulted in the alleged event loop issues? Can you provide the exact command you used to reproduce that? It would be better to reproduce those errors in the CI. Probably you could do that by removing the |
after the migration,
It was caught by the CI tests 👍🏽
I am searching the logs of Claude for more context |
the unit test failure is not related to changes in this PR |
there are all i could find
but i pasted here the actual context |
1eaed76
to
a9d5bac
Compare
looks like unit tests are failing, i updated your branch but this was the failure:
Which I think suggests that the event loop issue may not be resolved by the |
thanks so much for your deep support 🙏🏽 So i checked last thing yesterday the unit-test failure, and i thought its not related as i commented here Please correct me if i am wrong I will come back to your reviews soonish, just trying to finish a release :) |
12b6827
to
1006200
Compare
The commit makes the follwing changes. - Import statements updated: MilvusClient → AsyncMilvusClient - Removed asyncio.to_thread() wrappers: All Milvus operations now use native async/await - Test compatibility: Mock objects and fixtures updated to work with AsyncMilvusClient Signed-off-by: Mustafa Elbehery <[email protected]>
Signed-off-by: Mustafa Elbehery <[email protected]>
Signed-off-by: Mustafa Elbehery <[email protected]>
Signed-off-by: Mustafa Elbehery <[email protected]>
…tion" This reverts commit aac26e8.
…on() with list_collections() Signed-off-by: Mustafa Elbehery <[email protected]>
1006200
to
295d8b9
Compare
What does this PR do?
This PR migrates
MilvusClient
toAsyncMilvusClient
.The commit makes the follwing changes.
Closes #2684