[DRAFT][llm][kv] KV-aware scoring via KvRouter.select_worker#64108
[DRAFT][llm][kv] KV-aware scoring via KvRouter.select_worker#64108jeffreywang88 wants to merge 1 commit into
KvRouter.select_worker#64108Conversation
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
KvRouter.select_worker
There was a problem hiding this comment.
Code Review
This pull request implements the core KV-aware routing logic by updating KVAwareActor to delegate worker selection, implementing state initialization and replica selection in KVAwareRouter, and adding unit tests for worker selection. Feedback highlights a critical issue in choose_replicas where a lack of error handling and fallback mechanisms for empty candidate lists, missing token IDs, or invalid worker IDs could cause routing crashes.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| # TODO: fall back to default routing when there are no token ids to score | ||
| # on (``pending_request`` is None, or a body the tokenizer skipped). This | ||
| # branch implements the KV-scoring happy path only. | ||
| token_ids = pending_request.kwargs[REQUEST_TOKEN_IDS_KWARG] | ||
|
|
||
| worker_id_to_replica = { | ||
| get_worker_id(replica.replica_id.unique_id): replica | ||
| for replica in candidate_replicas | ||
| } | ||
| selection = await self._kv_router_actor.select_worker.remote( | ||
| pending_request.metadata.request_id, | ||
| token_ids, | ||
| list(worker_id_to_replica), | ||
| ) | ||
| chosen = worker_id_to_replica[selection["worker_id"]] | ||
| return [[chosen]] |
There was a problem hiding this comment.
The current implementation of choose_replicas does not handle cases where pending_request is None or when REQUEST_TOKEN_IDS_KWARG is missing from pending_request.kwargs, which will lead to an AttributeError or KeyError and crash the routing task. Additionally, if candidate_replicas is empty or if the selected worker_id is not found in worker_id_to_replica, it can raise an IndexError or KeyError.
We should implement a robust fallback mechanism:
- If
candidate_replicasis empty, return[]. - If
pending_requestisNoneor lacks token IDs, fall back to default routing by returning[candidate_replicas]. - If the selected
worker_idis missing or invalid, fall back to the first candidate replica.
if not candidate_replicas:
return []
if (
pending_request is None
or pending_request.kwargs is None
or REQUEST_TOKEN_IDS_KWARG not in pending_request.kwargs
):
return [candidate_replicas]
token_ids = pending_request.kwargs[REQUEST_TOKEN_IDS_KWARG]
worker_id_to_replica = {
get_worker_id(replica.replica_id.unique_id): replica
for replica in candidate_replicas
}
selection = await self._kv_router_actor.select_worker.remote(
pending_request.metadata.request_id,
token_ids,
list(worker_id_to_replica),
)
chosen_worker_id = selection.get("worker_id")
if chosen_worker_id in worker_id_to_replica:
chosen = worker_id_to_replica[chosen_worker_id]
else:
chosen = candidate_replicas[0]
return [[chosen]]
Description
Related issues
Additional information