-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Closed
Labels
bugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tcommunity-backlogcoreIssues that should be addressed in Ray CoreIssues that should be addressed in Ray CorestabilitytriageNeeds triage (eg: priority, bug/not-bug, and owning component)Needs triage (eg: priority, bug/not-bug, and owning component)
Description
What happened + What you expected to happen
See video for the bug demo https://drive.google.com/file/d/1jmqsiEF4-pPCTwZfDkNLHbdXZVLh12Hg/view?usp=sharing
Basically I got the following for running a hello world ray
example.
ray-exp git:(master) ✗ uv run main.py
2025-09-16 10:56:31,462 INFO worker.py:1951 -- Started a local Ray instance.
2025-09-16 10:56:31,481 INFO packaging.py:588 -- Creating a file package for local module '/home/costa/Documents/go/github.com/vwxyzjn/ray-exp'.
2025-09-16 10:56:31,498 INFO packaging.py:380 -- Pushing file package 'gcs://_ray_pkg_797cccfb8efdba24.zip' (0.09MiB) to Ray cluster...
2025-09-16 10:56:31,499 INFO packaging.py:393 -- Successfully pushed file package 'gcs://_ray_pkg_797cccfb8efdba24.zip'.
(raylet) warning: `VIRTUAL_ENV=/home/costa/Documents/go/github.com/vwxyzjn/ray-exp/.venv` does not match the project environment path `.venv` and will be ignored; use `--active` to target the active environment instead
(raylet) Using CPython 3.13.2
(raylet) Creating virtual environment at: .venv
(raylet) Installed 17 packages in 61ms
(raylet) Traceback (most recent call last):
File "/tmp/ray/session_2025-09-16_10-56-30_809544_3266270/runtime_resources/working_dir_files/_ray_pkg_797cccfb8efdba24/.venv/lib/python3.13/site-packages/ray/_private/worker.py", line 2472, in connect
node.check_version_info()
~~~~~~~~~~~~~~~~~~~~~~~^^
File "/tmp/ray/session_2025-09-16_10-56-30_809544_3266270/runtime_resources/working_dir_files/_ray_pkg_797cccfb8efdba24/.venv/lib/python3.13/site-packages/ray/_private/node.py", line 437, in check_version_info
ray._private.utils.check_version_info(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
cluster_metadata, f"node {node_ip_address}"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/tmp/ray/session_2025-09-16_10-56-30_809544_3266270/runtime_resources/working_dir_files/_ray_pkg_797cccfb8efdba24/.venv/lib/python3.13/site-packages/ray/_private/utils.py", line 1286, in check_version_info
raise RuntimeError(error_message)
RuntimeError: Version mismatch: The cluster was started with:
Ray: 2.49.1
Python: 3.12.5
This process on node 192.168.1.205 was started with:
Ray: 2.49.1
Python: 3.13.2
(MyActor pid=3267944) *** SIGSEGV received at time=1758034593 on cpu 13 ***
(MyActor pid=3267944) PC: @ 0x7f27ef7a7900 (unknown) _PyEval_EvalFrameDefault
(MyActor pid=3267944) @ 0x7f27ef042520 (unknown) (unknown)
(MyActor pid=3267944) [2025-09-16 10:56:33,100 E 3267944 3267944] logging.cc:474: *** SIGSEGV received at time=1758034593 on cpu 13 ***
(MyActor pid=3267944) [2025-09-16 10:56:33,100 E 3267944 3267944] logging.cc:474: PC: @ 0x7f27ef7a7900 (unknown) _PyEval_EvalFrameDefault
(MyActor pid=3267944) [2025-09-16 10:56:33,100 E 3267944 3267944] logging.cc:474: @ 0x7f27ef042520 (unknown) (unknown)
(MyActor pid=3267944) Fatal Python error: Segmentation fault
(MyActor pid=3267944)
(MyActor pid=3267944) Stack (most recent call first):
(MyActor pid=3267944) File "/home/costa/Documents/go/github.com/vwxyzjn/ray-exp/.venv/lib/python3.12/site-packages/ray/util/tracing/tracing_helper.py", line 449 in _resume_span
(MyActor pid=3267944) File "/tmp/ray/session_2025-09-16_10-56-30_809544_3266270/runtime_resources/working_dir_files/_ray_pkg_797cccfb8efdba24/.venv/lib/python3.13/site-packages/ray/_private/function_manager.py", line 689 in actor_method_executor
(MyActor pid=3267944) File "/tmp/ray/session_2025-09-16_10-56-30_809544_3266270/runtime_resources/working_dir_files/_ray_pkg_797cccfb8efdba24/.venv/lib/python3.13/site-packages/ray/_private/worker.py", line 984 in main_loop
(MyActor pid=3267944) File "/home/costa/Documents/go/github.com/vwxyzjn/ray-exp/.venv/lib/python3.12/site-packages/ray/_private/workers/default_worker.py", line 323 in <module>
(MyActor pid=3267944)
(MyActor pid=3267944) Extension modules: msgpack._cmsgpack, google._upb._message, psutil._psutil_linux, psutil._psutil_posix, yaml._yaml, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, ray._raylet (total: 9)
(raylet) A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff7c7d31f5e3e031a26d1fad7101000000 Worker ID: 4f9bb9147fd46bf9efcc12f476dfa2c8ee2718bfe5f3bacfc59f15f9 Node ID: 55318fd37f6d980afe78ceb071a747fd2913f340648020759061a266 Worker IP address: 192.168.1.205 Worker port: 35017 Worker PID: 3267944 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
Traceback (most recent call last):
File "/home/costa/Documents/go/github.com/vwxyzjn/ray-exp/main.py", line 13, in <module>
ray.get(actor.f.remote())
File "/home/costa/Documents/go/github.com/vwxyzjn/ray-exp/.venv/lib/python3.12/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/costa/Documents/go/github.com/vwxyzjn/ray-exp/.venv/lib/python3.12/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/costa/Documents/go/github.com/vwxyzjn/ray-exp/.venv/lib/python3.12/site-packages/ray/_private/worker.py", line 2882, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/costa/Documents/go/github.com/vwxyzjn/ray-exp/.venv/lib/python3.12/site-packages/ray/_private/worker.py", line 970, in get_objects
raise value
ray.exceptions.ActorDiedError: The actor died unexpectedly before finishing this task.
class_name: MyActor
actor_id: 7c7d31f5e3e031a26d1fad7101000000
pid: 3267944
namespace: 91d7391d-facd-4529-8d16-0307241f2020
ip: 192.168.1.205
The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
The actor never ran - it was cancelled before it started running.
Versions / Dependencies
➜ ray-exp git:(master) ✗ uv pip freeze
attrs==25.3.0
certifi==2025.8.3
charset-normalizer==3.4.3
click==8.2.1
filelock==3.18.0
idna==3.10
jsonschema==4.25.1
jsonschema-specifications==2025.9.1
msgpack==1.1.1
packaging==25.0
protobuf==6.32.1
pyyaml==6.0.2
ray==2.49.1
referencing==0.36.2
requests==2.32.5
rpds-py==0.27.1
typing-extensions==4.14.1
urllib3==2.5.0
Reproduction script
import ray
ray.init()
class MyActor:
def __init__(self):
pass
def f(self):
print(f"Hello, world! {ray.get_gpu_ids()}")
actor = ray.remote(MyActor).options(num_gpus=1).remote()
ray.get(actor.f.remote())
Issue Severity
None
Metadata
Metadata
Assignees
Labels
bugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tcommunity-backlogcoreIssues that should be addressed in Ray CoreIssues that should be addressed in Ray CorestabilitytriageNeeds triage (eg: priority, bug/not-bug, and owning component)Needs triage (eg: priority, bug/not-bug, and owning component)