Skip to content

Follow up issue for extra_python_lib on k8s #3604

Open
@hkvision

Description

@hkvision

Follow-up issue for: #3549

Using this PR to test: #3603
Add a python local file on driver to extra_python_lib (i.e. --py-files in submit command)

  • If only run Spark related code (e.g. rdd.map), then this file can be found and imported as expected.
  • If run Ray related code, then there would be ModuleNotFoundError

The sys.path returned by each Spark task:

['/opt/spark/work-dir', '.', '/opt/spark/python/lib/pyspark.zip', '/opt/spark/python/lib/py4j-0.10.9-src.zip', '/opt/spark/jars/spark-core_2.12-3.1.2.jar', '/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip', '/opt/spark/python/lib/py4j-*.zip', '/opt/models/research/slim', '/usr/local/envs/pytf1/lib/python37.zip', '/usr/local/envs/pytf1/lib/python3.7', '/usr/local/envs/pytf1/lib/python3.7/lib-dynload', '/usr/local/envs/pytf1/lib/python3.7/site-packages']

The sys.path returned by each Ray task:

['/usr/local/envs/pytf1/lib/python3.7/site-packages/ray/thirdparty_files', '/opt/spark/work-dir/kai', '/opt/spark/work-dir/kai', '/usr/local/envs/pytf1/lib/python3.7/site-packages/ray/pickle5_files', '/usr/local/envs/pytf1/lib/python3.7/site-packages/ray/workers', '/opt/spark/python/lib/pyspark.zip', '/opt/spark/python/lib/py4j-0.10.9-src.zip', '/opt/spark/jars/spark-core_2.12-3.1.2.jar', '/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip', '/opt/spark/python/lib/py4j-*.zip', '/opt/models/research/slim', '/usr/local/envs/pytf1/lib/python37.zip', '/usr/local/envs/pytf1/lib/python3.7', '/usr/local/envs/pytf1/lib/python3.7/lib-dynload', '/usr/local/envs/pytf1/lib/python3.7/site-packages']

The main difference is . is in Spark python path but not in ray python path.
Manually adding . to sys.path in an Ray actor could be a workaround to solve the problem. But we may need to have a way to better support this in our code.
Seems adding the file to persistVolume won't solve the issue, at least we need to add the volumn path to PYTHONPATH in the docker image.
cc @jason-dai

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions