fix: prefer native sparse embeddings when available#7431
fix: prefer native sparse embeddings when available#7431rogernogueira wants to merge 10 commits intoagno-agi:mainfrom
Conversation
PR TriageA few things to address before this PR can be reviewed: Missing issue link: Please link the issue this PR addresses using Missing tests: This PR modifies source code but does not include any test changes. Please add or update tests to cover your changes. |
|
Please make sure to fix the failing pipeline as well @rogernogueira. |
|
@rogernogueira please make sure that gh pipeline is green. |
2b8a2f8 to
7a4e21b
Compare
|
updated the PR to keep sparse vector resolution consistent across the Qdrant paths. Changes included: _get_sparse_vector() helper The branch has also been updated with the latest changes and tests are passing locally. |
rogernogueira
left a comment
There was a problem hiding this comment.
I tried to simplify the PR.
rogernogueira
left a comment
There was a problem hiding this comment.
I tried to simplify the PR.
sannya-singal
left a comment
There was a problem hiding this comment.
Why did we revert the cookbook?
Please ensure a green CI pipeline @rogernogueira.
|
I’ve added a cookbook for this change. |
|
Pls review the changes |
Summary
This PR fixes Qdrant hybrid retrieval behavior by preferring embedder-native sparse vectors when available, while preserving the existing FastEmbed BM25 fallback.
fixes #7432
Problem
Currently, the Agno Qdrant integration always instantiates FastEmbed's
SparseTextEmbedding(BM25) when search_type=hybrid`, even when the configured embedder already provides sparse vectors natively.This creates two issues:
A real example is
bge-m3, which can produce sparse vectors natively.Type of change
Summary of Changes
Checklist
./scripts/format.shand./scripts/validate.sh)Duplicate and AI-Generated PR Check
Validation
Added unit tests covering:
_get_sparse_vector()These tests validate the new native-sparse-first behavior while preserving the existing FastEmbed fallback.