Skip to content

[HWORKS-2190][APPEND] Updating job configuration to include file, pyfiles, archives and jars #478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion docs/user_guides/projects/jobs/pyspark_job.md
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,7 @@ The following table describes the JSON payload returned by `jobs_api.get_configu
| Field | Type | Description | Default |
| ------------------------------------------ | -------------- |-----------------------------------------------------| -------------------------- |
| `type` | string | Type of the job configuration | `"sparkJobConfiguration"` |
| `appPath` | string | Project path to script (e.g `Resources/foo.py`) | `null` |
| `appPath` | string | Project path to script (e.g `Resources/foo.py`) | `null` |
| `environmentName` | string | Name of the project spark environment | `"spark-feature-pipeline"` |
| `spark.driver.cores` | number (float) | Number of CPU cores allocated for the driver | `1.0` |
| `spark.driver.memory` | number (int) | Memory allocated for the driver (in MB) | `2048` |
Expand All @@ -229,6 +229,10 @@ The following table describes the JSON payload returned by `jobs_api.get_configu
| `spark.dynamicAllocation.maxExecutors` | number (int) | Maximum number of executors with dynamic allocation | `2` |
| `spark.dynamicAllocation.initialExecutors` | number (int) | Initial number of executors with dynamic allocation | `1` |
| `spark.blacklist.enabled` | boolean | Whether executor/node blacklisting is enabled | `false` |
| `files` | string | HDFS path(s) to files to be provided to the Spark application. Multiple files can be included in a single string, separated by commas. <br>Example: `"hdfs:///Project/<project_name>/Resources/file1.py,hdfs:///Project/<project_name>/Resources/file2.txt"` | `null` |
| `pyFiles` | string | HDFS path(s) to Python files to be provided to the Spark application. These will be added to the `PYTHONPATH` so they can be imported as modules. Multiple files can be included in a single string, separated by commas. <br>Example: `"hdfs:///Project/<project_name>/Resources/module1.py,hdfs:///Project/<project_name>/Resources/module2.py"` | `null` |
| `jars` | string | HDFS path(s) to JAR files to be provided to the Spark application. These will be added to the classpath. Multiple files can be included in a single string, separated by commas. <br>Example: `"hdfs:///Project/<project_name>/Resources/lib1.jar,hdfs:///Project/<project_name>/Resources/lib2.jar"` | `null` |
| `archives` | string | HDFS path(s) to archive files to be provided to the Spark application. Multiple files can be included in a single string, separated by commas. <br>Example: `"hdfs:///Project/<project_name>/Resources/archive1.zip,hdfs:///Project/<project_name>/Resources/archive2.tar.gz"` | `null` |


## Accessing project data
Expand Down
7 changes: 6 additions & 1 deletion docs/user_guides/projects/jobs/spark_job.md
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,12 @@ The following table describes the JSON payload returned by `jobs_api.get_configu
| `spark.dynamicAllocation.minExecutors` | number (int) | Minimum number of executors with dynamic allocation | `1` |
| `spark.dynamicAllocation.maxExecutors` | number (int) | Maximum number of executors with dynamic allocation | `2` |
| `spark.dynamicAllocation.initialExecutors` | number (int) | Initial number of executors with dynamic allocation | `1` |
| `spark.blacklist.enabled` | boolean | Whether executor/node blacklisting is enabled | `false` |
| `spark.blacklist.enabled` | boolean | Whether executor/node blacklisting is enabled | `false`
| `files` | string | HDFS path(s) to files to be provided to the Spark application. Multiple files can be included in a single string, separated by commas. <br>Example: `"hdfs:///Project/<project_name>/Resources/file1.py,hdfs:///Project/<project_name>/Resources/file2.txt"` | `null` |
| `pyFiles` | string | HDFS path(s) to Python files to be provided to the Spark application. These will be added to the `PYTHONPATH` so they can be imported as modules. Multiple files can be included in a single string, separated by commas. <br>Example: `"hdfs:///Project/<project_name>/Resources/module1.py,hdfs:///Project/<project_name>/Resources/module2.py"` | `null` |
| `jars` | string | HDFS path(s) to JAR files to be provided to the Spark application. These will be added to the classpath. Multiple files can be included in a single string, separated by commas. <br>Example: `"hdfs:///Project/<project_name>/Resources/lib1.jar,hdfs:///Project/<project_name>/Resources/lib2.jar"` | `null` |
| `archives` | string | HDFS path(s) to archive files to be provided to the Spark application. Multiple files can be included in a single string, separated by commas. <br>Example: `"hdfs:///Project/<project_name>/Resources/archive1.zip,hdfs:///Project/<project_name>/Resources/archive2.tar.gz"` | `null` |


## Accessing project data

Expand Down