Add aliveness to rollout engine state by fzyzcjy · Pull Request #941 · radixark/miles

fzyzcjy · 2026-04-07T10:02:04Z

No description provided.

This reverts commit 1535dd3.

gemini-code-assist

Code Review

This pull request introduces more granular state management for rollout engines by distinguishing between 'allocated-uninitialized' and 'alive' states. It updates the ServerEngine class with new state transitions and modifies ServerGroup to track specific engine indices during startup and recovery. Review feedback indicates that the mark_alive call in rollout_server.py is premature because it occurs before the asynchronous initialization of Ray actors is complete. Additionally, a suggestion was made to update the start_engines docstring to accurately reflect its new return type and behavior.

gemini-code-assist · 2026-04-07T10:05:21Z

miles/ray/rollout/rollout_server.py

            )
-            handles, _ = group.start_engines(port_cursors)
+            handles, new_engine_indices = group.start_engines(port_cursors)
+            group.mark_alive(engine_indices=new_engine_indices)


The mark_alive call here is premature. group.start_engines returns Ray ObjectRefs for the asynchronous init calls, which are only resolved later at line 86 via ray.get(all_init_handles). Marking the engines as alive before they have finished initializing is inconsistent with the state's intended meaning. Additionally, per repository rules, when waiting for a server process to start, checking process liveness is not sufficient; the check must also verify that the server is actively listening for connections on its designated port.

References

When waiting for a server process to start, checking process liveness (e.g., is_alive()) is not sufficient. The check must also verify that the server is actively listening for connections on its designated port, for instance by attempting a socket connection or making an HTTP request.

gemini-code-assist · 2026-04-07T10:05:21Z

miles/ray/rollout/server_group.py

+    def start_engines(self, port_cursors: PortCursors) -> tuple[list, list[int]]:
        """Create Ray actors, allocate ports, and fire ``engine.init()`` without waiting.

        Returns ``(init_handles, curr_num_new_engines)`` where *init_handles* is a list
        of Ray ObjectRefs and *port_cursors* maps node index -> next free port.
        """


The docstring for start_engines is outdated. It still refers to curr_num_new_engines (an integer) as the second return value, but the method now returns new_engine_indices (a list of integers). Additionally, it incorrectly implies that port_cursors is part of the return value, whereas it is modified in-place.

Suggested change

def start_engines(self, port_cursors: PortCursors) -> tuple[list, list[int]]:

"""Create Ray actors, allocate ports, and fire ``engine.init()`` without waiting.

Returns ``(init_handles, curr_num_new_engines)`` where *init_handles* is a list

of Ray ObjectRefs and *port_cursors* maps node index -> next free port.

"""

def start_engines(self, port_cursors: PortCursors) -> tuple[list, list[int]]:

"""Create Ray actors, allocate ports, and fire engine.init() without waiting.

Returns (init_handles, new_engine_indices) where *init_handles* is a list

of Ray ObjectRefs and *new_engine_indices* is a list of indices of the new engines.

"""

fzyzcjy added 5 commits April 7, 2026 17:56

Revert "simp"

70ae54b

This reverts commit 1535dd3.

more

fd3063b

more

f7b43fd

more

a6d5989

fmt

e35fe34

fzyzcjy requested review from maocheng23 and yueming-yuan as code owners April 7, 2026 10:02

fzyzcjy added 2 commits April 7, 2026 18:03

more

a3f4975

more

e01c982

gemini-code-assist bot reviewed Apr 7, 2026

View reviewed changes

fzyzcjy added 4 commits April 7, 2026 18:05

more

4791ea7

more

6d47973

more

88364c7

more

508eba9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add aliveness to rollout engine state#941

Add aliveness to rollout engine state#941
fzyzcjy wants to merge 11 commits intorollout_ft/23from
rollout_ft/24

fzyzcjy commented Apr 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fzyzcjy commented Apr 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant