Open
Description
Follow-up issue for: #3549
Using this PR to test: #3603
Add a python local file on driver to extra_python_lib
(i.e. --py-files in submit command)
- If only run Spark related code (e.g. rdd.map), then this file can be found and imported as expected.
- If run Ray related code, then there would be
ModuleNotFoundError
The sys.path returned by each Spark task:
['/opt/spark/work-dir', '.', '/opt/spark/python/lib/pyspark.zip', '/opt/spark/python/lib/py4j-0.10.9-src.zip', '/opt/spark/jars/spark-core_2.12-3.1.2.jar', '/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip', '/opt/spark/python/lib/py4j-*.zip', '/opt/models/research/slim', '/usr/local/envs/pytf1/lib/python37.zip', '/usr/local/envs/pytf1/lib/python3.7', '/usr/local/envs/pytf1/lib/python3.7/lib-dynload', '/usr/local/envs/pytf1/lib/python3.7/site-packages']
The sys.path returned by each Ray task:
['/usr/local/envs/pytf1/lib/python3.7/site-packages/ray/thirdparty_files', '/opt/spark/work-dir/kai', '/opt/spark/work-dir/kai', '/usr/local/envs/pytf1/lib/python3.7/site-packages/ray/pickle5_files', '/usr/local/envs/pytf1/lib/python3.7/site-packages/ray/workers', '/opt/spark/python/lib/pyspark.zip', '/opt/spark/python/lib/py4j-0.10.9-src.zip', '/opt/spark/jars/spark-core_2.12-3.1.2.jar', '/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip', '/opt/spark/python/lib/py4j-*.zip', '/opt/models/research/slim', '/usr/local/envs/pytf1/lib/python37.zip', '/usr/local/envs/pytf1/lib/python3.7', '/usr/local/envs/pytf1/lib/python3.7/lib-dynload', '/usr/local/envs/pytf1/lib/python3.7/site-packages']
The main difference is .
is in Spark python path but not in ray python path.
Manually adding .
to sys.path in an Ray actor could be a workaround to solve the problem. But we may need to have a way to better support this in our code.
Seems adding the file to persistVolume won't solve the issue, at least we need to add the volumn path to PYTHONPATH in the docker image.
cc @jason-dai