Open
Description
Each k8s pods has a virtual ip, when spark gets the ip of each executor, it gets ['172.30.22.6', '172.30.1.6', '172.30.1.7', '172.30.45.6', '172.30.1.7']
and when launching ray on the driver, the command is
Executing command: ray start --address 172.30.22.7:10937 --redis-password 123456 --num-cpus 0 --node-ip-address 172.16.0.188
Since a remote node can't recognize this virtual ip within the k8s cluster, the launch ray process gets stuck.
Any ideas @Le-Zheng @glorysdj
cc @jason-dai