Skip to content

[DRAFT][llm][kv] KV-aware scoring via KvRouter.select_worker#64108

Draft
jeffreywang88 wants to merge 1 commit into
tok-plus-kv-connectorfrom
kv-scoring
Draft

[DRAFT][llm][kv] KV-aware scoring via KvRouter.select_worker#64108
jeffreywang88 wants to merge 1 commit into
tok-plus-kv-connectorfrom
kv-scoring

Conversation

@jeffreywang88

Copy link
Copy Markdown
Contributor

Thank you for contributing to Ray! 🚀
Please review the Ray Contribution Guide before opening a pull request.

⚠️ Remove these instructions before submitting your PR.

💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete.

Description

Briefly describe what this PR accomplishes and why it's needed.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
@jeffreywang88 jeffreywang88 changed the title [serve][llm] KV-aware scoring: route via KvRouter.select_worker [DRAFT][llm][kv] KV-aware scoring via KvRouter.select_worker Jun 15, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the core KV-aware routing logic by updating KVAwareActor to delegate worker selection, implementing state initialization and replica selection in KVAwareRouter, and adding unit tests for worker selection. Feedback highlights a critical issue in choose_replicas where a lack of error handling and fallback mechanisms for empty candidate lists, missing token IDs, or invalid worker IDs could cause routing crashes.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +97 to +112
# TODO: fall back to default routing when there are no token ids to score
# on (``pending_request`` is None, or a body the tokenizer skipped). This
# branch implements the KV-scoring happy path only.
token_ids = pending_request.kwargs[REQUEST_TOKEN_IDS_KWARG]

worker_id_to_replica = {
get_worker_id(replica.replica_id.unique_id): replica
for replica in candidate_replicas
}
selection = await self._kv_router_actor.select_worker.remote(
pending_request.metadata.request_id,
token_ids,
list(worker_id_to_replica),
)
chosen = worker_id_to_replica[selection["worker_id"]]
return [[chosen]]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of choose_replicas does not handle cases where pending_request is None or when REQUEST_TOKEN_IDS_KWARG is missing from pending_request.kwargs, which will lead to an AttributeError or KeyError and crash the routing task. Additionally, if candidate_replicas is empty or if the selected worker_id is not found in worker_id_to_replica, it can raise an IndexError or KeyError.

We should implement a robust fallback mechanism:

  1. If candidate_replicas is empty, return [].
  2. If pending_request is None or lacks token IDs, fall back to default routing by returning [candidate_replicas].
  3. If the selected worker_id is missing or invalid, fall back to the first candidate replica.
        if not candidate_replicas:
            return []

        if (
            pending_request is None
            or pending_request.kwargs is None
            or REQUEST_TOKEN_IDS_KWARG not in pending_request.kwargs
        ):
            return [candidate_replicas]

        token_ids = pending_request.kwargs[REQUEST_TOKEN_IDS_KWARG]

        worker_id_to_replica = {
            get_worker_id(replica.replica_id.unique_id): replica
            for replica in candidate_replicas
        }
        selection = await self._kv_router_actor.select_worker.remote(
            pending_request.metadata.request_id,
            token_ids,
            list(worker_id_to_replica),
        )
        chosen_worker_id = selection.get("worker_id")
        if chosen_worker_id in worker_id_to_replica:
            chosen = worker_id_to_replica[chosen_worker_id]
        else:
            chosen = candidate_replicas[0]
        return [[chosen]]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant