Is there a potential issue with the Dify design when using Milvus as a vector database? #39444

ducanh997 · 2025-01-20T07:04:57Z

ducanh997
Jan 20, 2025

I'm using Milvus as the vector database for Dify, and every time I upload a data file, Dify creates a new collection within the configured database. However, I’m aware that Milvus only allows a maximum of 65536 collections per database.

Given this, is there a risk that Dify's design might hit this collection limit over time? How should this be handled in large-scale use cases, considering that my company has 6000 employees, and I’ve deployed Dify for internal chatbot development?

Answered by yhmo

Jan 20, 2025

In Dify source code, the MilvusVector is a wrapper of pymilvus(milvus python sdk), MilvusVector is derived from BaseVector class, which implements the methods such as create(), add_texts(), search_by_vector().

The MilvusVector internally creates a milvus collection with a pre-defined schema: https://github.com/langgenius/dify/blob/bc3a570dda37cbb8539d9a5c1494e3a4317fc090/api/core/rag/datasource/vdb/milvus/milvus_vector.py#L257

The schema might contain several fields: "id", "metadata", "vector", "text", "sparse_vector". I didn't see any possibility of customizing this schema. And the MilvusVector.search_by_vector() didn't pass the "**kwargs" to the pymilvus.MilvusClient.search() so that us…

View full answer

xiaofan-luan · 2025-01-20T07:25:14Z

xiaofan-luan
Jan 20, 2025
Maintainer

I'm using Milvus as the vector database for Dify, and every time I upload a data file, Dify creates a new collection within the configured database. However, I’m aware that Milvus only allows a maximum of 65536 collections per database.

Given this, is there a risk that Dify's design might hit this collection limit over time? How should this be handled in large-scale use cases, considering that my company has 6000 employees, and I’ve deployed Dify for internal chatbot development?

Hi @ducanh997

For multi tenant use case, we recommend you to use partition key rather than create multiple collections.

In latest 2.5.4 we will improve the collection number to 10K, this will be released this week.

0 replies

yhmo · 2025-01-20T08:02:54Z

yhmo
Jan 20, 2025
Collaborator

In Dify source code, the MilvusVector is a wrapper of pymilvus(milvus python sdk), MilvusVector is derived from BaseVector class, which implements the methods such as create(), add_texts(), search_by_vector().

The MilvusVector internally creates a milvus collection with a pre-defined schema: https://github.com/langgenius/dify/blob/bc3a570dda37cbb8539d9a5c1494e3a4317fc090/api/core/rag/datasource/vdb/milvus/milvus_vector.py#L257

The schema might contain several fields: "id", "metadata", "vector", "text", "sparse_vector". I didn't see any possibility of customizing this schema. And the MilvusVector.search_by_vector() didn't pass the "**kwargs" to the pymilvus.MilvusClient.search() so that users could not apply extra parameters to milvus.

The collection name is determined by the "dataset" you provide when you call the MilvusVectorFactory.init_vector(). I suppose you should use MilvusVectorFactory.init_vector() to return a MilvusVector object and call MilvusVector.add_text() to append texts to the collection.

I think you should raise issues in Dify repo to ask for more features.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a potential issue with the Dify design when using Milvus as a vector database? #39444

{{title}}

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Is there a potential issue with the Dify design when using Milvus as a vector database? #39444

ducanh997 Jan 20, 2025

Replies: 2 comments

xiaofan-luan Jan 20, 2025 Maintainer

yhmo Jan 20, 2025 Collaborator

ducanh997
Jan 20, 2025

xiaofan-luan
Jan 20, 2025
Maintainer

yhmo
Jan 20, 2025
Collaborator