fix: filter empty documents in OpenAI rerank client by yc111233 · Pull Request #1345 · volcengine/OpenViking

yc111233 · 2026-04-09T18:23:10Z

Summary

Rerank providers like DashScope (qwen3-rerank) return HTTP 400 when any document in the batch is an empty string
This can happen when vector records have empty abstract fields (e.g. due to fix: backfill abstract from file content in vectorize_file #1343)
Currently the entire rerank call fails, causing fallback to raw vector scores for all results

Fix

Filter out empty/whitespace-only documents before sending to the rerank API, and map scores back to original indices. Empty documents receive a score of 0.0.

valid_indices = [i for i, d in enumerate(documents) if d and d.strip()]
if not valid_indices:
    return [0.0] * len(documents)
# ... call API with filtered_docs ...
# ... map scores back to original positions ...

This acts as a defensive safety net — rerank degrades gracefully (empty docs get score 0.0) instead of failing the entire batch.

Test plan

Call rerank_batch with a mix of valid and empty documents
Verify valid documents get proper rerank scores
Verify empty documents get score 0.0
Verify no HTTP 400 errors from DashScope

🤖 Generated with Claude Code

Rerank providers like DashScope (qwen3-rerank) return HTTP 400 when any document in the batch is an empty string. This can happen when vector records have empty abstract fields. Fix: filter out empty/whitespace-only documents before sending to the rerank API, and map scores back to original indices (empty documents receive a score of 0.0). This acts as a safety net so that rerank degrades gracefully instead of failing entirely. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-09T18:24:11Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶 1343 - Partially compliant Compliant requirements: Provides defense-in-depth to prevent rerank 400 errors by filtering empty documents Non-compliant requirements: Does not address the root cause (backfilling abstract in vectorize_file) Requires further human verification: Verify that empty documents receive a score of 0.0 Verify that valid documents get proper rerank scores Verify no HTTP 400 errors from rerank providers
⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🏅 Score: 92
🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

github-actions · 2026-04-09T18:24:55Z

PR Code Suggestions ✨

No code suggestions found for the PR.

qin-ctx

Thanks for the defense-in-depth fix. I found one blocking correctness issue and one non-blocking test gap. The main concern is the all-empty batch path, which currently returns zero scores and suppresses the retriever's existing fallback to vector scores.

qin-ctx · 2026-04-10T08:06:12Z

openviking/models/rerank/openai_rerank.py

+        # empty strings with HTTP 400.
+        valid_indices = [i for i, d in enumerate(documents) if d and d.strip()]
+        if not valid_indices:
+            return [0.0] * len(documents)


[Bug] (blocking) When every input document is empty or whitespace, this returns an all-zero score list instead of signaling rerank failure. HierarchicalRetriever._rerank_scores() treats any numeric list with the expected length as a successful rerank, so this path bypasses fallback to vector scores and can filter out otherwise retrievable results at the rerank threshold. Mixed batches should keep the current behavior, but an all-empty batch should return None or otherwise trigger the existing fallback path.

qin-ctx · 2026-04-10T08:06:12Z

openviking/models/rerank/openai_rerank.py


+        # Filter out empty documents — rerank providers (e.g. DashScope) reject
+        # empty strings with HTTP 400.
+        valid_indices = [i for i, d in enumerate(documents) if d and d.strip()]


[Suggestion] (non-blocking) Please add regression tests for the new filtering behavior. The current suite does not cover either a mixed batch like ['doc', '', ' '] with index remapping or the all-empty batch path, so the rerank/fallback semantics introduced here are not locked down.

github-project-automation bot added this to OpenViking project Apr 9, 2026

github-project-automation bot moved this to Backlog in OpenViking project Apr 9, 2026

github-actions bot added the Review effort 2/5 label Apr 9, 2026

qin-ctx requested changes Apr 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: filter empty documents in OpenAI rerank client#1345

fix: filter empty documents in OpenAI rerank client#1345
yc111233 wants to merge 1 commit intovolcengine:mainfrom
yc111233:fix/rerank-filter-empty-documents

yc111233 commented Apr 9, 2026

Uh oh!

github-actions bot commented Apr 9, 2026

Uh oh!

github-actions bot commented Apr 9, 2026

Uh oh!

qin-ctx left a comment

Uh oh!

qin-ctx Apr 10, 2026

Uh oh!

qin-ctx Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yc111233 commented Apr 9, 2026

Summary

Fix

Related

Test plan

Uh oh!

github-actions bot commented Apr 9, 2026

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Apr 9, 2026

PR Code Suggestions ✨

Uh oh!

qin-ctx left a comment

Choose a reason for hiding this comment

Uh oh!

qin-ctx Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

qin-ctx Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants