Is there a potential issue with the Dify design when using Milvus as a vector database? #39444
-
I'm using Milvus as the vector database for Dify, and every time I upload a data file, Dify creates a new collection within the configured database. However, I’m aware that Milvus only allows a maximum of 65536 collections per database. Given this, is there a risk that Dify's design might hit this collection limit over time? How should this be handled in large-scale use cases, considering that my company has 6000 employees, and I’ve deployed Dify for internal chatbot development? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hi @ducanh997 For multi tenant use case, we recommend you to use partition key rather than create multiple collections. In latest 2.5.4 we will improve the collection number to 10K, this will be released this week. |
Beta Was this translation helpful? Give feedback.
-
In Dify source code, the MilvusVector is a wrapper of pymilvus(milvus python sdk), MilvusVector is derived from BaseVector class, which implements the methods such as create(), add_texts(), search_by_vector(). The MilvusVector internally creates a milvus collection with a pre-defined schema: https://github.com/langgenius/dify/blob/bc3a570dda37cbb8539d9a5c1494e3a4317fc090/api/core/rag/datasource/vdb/milvus/milvus_vector.py#L257 The schema might contain several fields: "id", "metadata", "vector", "text", "sparse_vector". I didn't see any possibility of customizing this schema. And the MilvusVector.search_by_vector() didn't pass the "**kwargs" to the pymilvus.MilvusClient.search() so that users could not apply extra parameters to milvus. The collection name is determined by the "dataset" you provide when you call the MilvusVectorFactory.init_vector(). I suppose you should use MilvusVectorFactory.init_vector() to return a MilvusVector object and call MilvusVector.add_text() to append texts to the collection. I think you should raise issues in Dify repo to ask for more features. |
Beta Was this translation helpful? Give feedback.
In Dify source code, the MilvusVector is a wrapper of pymilvus(milvus python sdk), MilvusVector is derived from BaseVector class, which implements the methods such as create(), add_texts(), search_by_vector().
The MilvusVector internally creates a milvus collection with a pre-defined schema: https://github.com/langgenius/dify/blob/bc3a570dda37cbb8539d9a5c1494e3a4317fc090/api/core/rag/datasource/vdb/milvus/milvus_vector.py#L257
The schema might contain several fields: "id", "metadata", "vector", "text", "sparse_vector". I didn't see any possibility of customizing this schema. And the MilvusVector.search_by_vector() didn't pass the "**kwargs" to the pymilvus.MilvusClient.search() so that us…