Replies: 1 comment
-
|
Dify supports integrating local multimodal embedding models like Qwen3-VL-Embedding through its plugin-based provider system. To set this up, you need to register your local model as a provider plugin and configure the endpoint and credentials in Dify’s Model Provider settings. The backend supports multimodal embedding and retrieval for knowledge base and RAG scenarios, but the chat UI does not support multimodal input/output as of v1.11.2 reference. Your local model must expose an API endpoint (often via a plugin daemon) that Dify can call for multimodal embedding. When configuring, ensure you enable the “vision” feature if required, and provide the correct endpoint and credentials. Dify treats local and cloud models the same way as long as the API is compatible reference. For multimodal input, Dify may send images as base64-encoded data reference. If your model or plugin depends on external resources (like vocab files), you may need to download these manually and update paths for offline use reference. Also, make sure your plugin is up to date to avoid compatibility issues reference. In summary: implement or configure a plugin that wraps your Qwen3-VL-Embedding inference API, register it in Dify’s Model Provider settings with the correct endpoint and credentials, and select it when building your knowledge base. If you run into errors, double-check endpoint URLs, credential schemas, and plugin compatibility. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Self Checks
1. Is this request related to a challenge you're experiencing? Tell me about your story.
I want to use a multimodal knowledge base, but I don't want to use online APIs. How can I do this?
2. Additional context or comments
No response
Beta Was this translation helpful? Give feedback.
All reactions