Skip to content

RayOnSpark would fail when running k8s from remote #3605

Open
@hkvision

Description

@hkvision

Each k8s pods has a virtual ip, when spark gets the ip of each executor, it gets ['172.30.22.6', '172.30.1.6', '172.30.1.7', '172.30.45.6', '172.30.1.7']
and when launching ray on the driver, the command is

Executing command: ray start --address 172.30.22.7:10937 --redis-password 123456 --num-cpus 0 --node-ip-address 172.16.0.188

Since a remote node can't recognize this virtual ip within the k8s cluster, the launch ray process gets stuck.

Any ideas @Le-Zheng @glorysdj
cc @jason-dai

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions