Add support for Slurm arrays#2059
Conversation
Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
Signed-off-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
| # Guarantee every emitted task has a task_id (derived id, or uuid fallback). | ||
| results = self._post_process_task_ids(tasks, results) | ||
|
|
||
| self._record_failed_tasks([r for r in results if isinstance(r, FailedTask)]) |
There was a problem hiding this comment.
Discussed with @abhinavg4 . For now the PR keeps track of FailedTask instances by looking for a user-set FAILED_TASKS_DIR_ENV_VAR = "NEMO_CURATOR_FAILED_TASKS_DIR" and writing a JSON file per failed task in the specified directory.
I did the environment variable and write approach because it seems more reliable than trying to handle a global Python variable, etc. And the reason it is an environment variable is so that BaseStageAdapter does not have to propagate an additional parameter for every single stage (which I think would involve having to update the executors as well?). Open to other suggestions.
There was a problem hiding this comment.
Ok I think a lot of the util functions are coming because of this feature, and there might be an easier way for this. Continuing in DMs.
There was a problem hiding this comment.
The functions are really only:
- Write info about failed tasks
- Use task IDs to filter by Slurm array index
but I can move to util scripts if that makes it easier to read.
praateekmahajan
left a comment
There was a problem hiding this comment.
Took a super quick look, here are some general thoughts
- Instead of adding the same 3/4 fields to every "source" stage, can we have a base class and inherit that?
- Alternatively (or maybe in addition),
pipeline.buildiirc now dynamically sets the first stage as is_source_stage=True, so can we just rely on those? If we do then inside backends/base.py we can say "if this is a source stage AND slurm is enabled then just usetask_idas my key and decide which shard it belongs to"... this is something @abhinavg4 and I had discussed, this reduces the number of changes needed across curator code base, and also generalizes, since source_stage have task_id which is likely assigned usingget_determenistic_task_idwhich is a hash(metadat['source_files'])
For 1, sure. For 2, we could but it makes this PR dependent on the resumability PR, which is what we were trying to avoid I thought... also, I guess it is not immediately obvious to me how it can work for source stages that are not a |
Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
…stage Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
abhinavg4
left a comment
There was a problem hiding this comment.
I think I still have some major comments about the algorithm. Especially at scale, how will the writing and reading work. Will continue in the DMs.
| SLURM_ARRAY_ENABLED_ENV_VAR = "NEMO_CURATOR_SLURM_ARRAY_ENABLED" | ||
| SLURM_ARRAY_SHARD_INDEX_ENV_VAR = "NEMO_CURATOR_SLURM_ARRAY_SHARD_INDEX" | ||
| SLURM_ARRAY_TOTAL_SHARDS_ENV_VAR = "NEMO_CURATOR_SLURM_ARRAY_TOTAL_SHARDS" | ||
| SLURM_ARRAY_MINIMUM_SHARD_INDEX_ENV_VAR = "NEMO_CURATOR_SLURM_ARRAY_MINIMUM_SHARD_INDEX" |
There was a problem hiding this comment.
Do we need all these variables, or can they be self-inferred? I think the initial design that @praateekmahajan had in mind was we just add --array=1-100 to the slurm submit command, and everything else works OOTB. Currently, it seems like the effort from the user side is a bit more than that?
There was a problem hiding this comment.
So the idea is we want to give the user full control if they need to override total shards (needed for reruns) or minimum shard index (needed to get around any Slurm array size limits). Really to enable Slurm array partitioning, the only thing explicitly needed is:
NEMO_CURATOR_SLURM_ARRAY_ENABLED=1
and it can automatically grab the environment variables without any issues. And then the user can override with NEMO_CURATOR_SLURM_ARRAY_SHARD_INDEX, etc. as desired (but not required).
| # Guarantee every emitted task has a task_id (derived id, or uuid fallback). | ||
| results = self._post_process_task_ids(tasks, results) | ||
|
|
||
| self._record_failed_tasks([r for r in results if isinstance(r, FailedTask)]) |
There was a problem hiding this comment.
Ok I think a lot of the util functions are coming because of this feature, and there might be an easier way for this. Continuing in DMs.
Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
Signed-off-by: Sarah Yurick <sarahyurick@gmail.com>
TODO:
FailedTasksupportnemo-curator-slurm-cli(not planned for this PR)SLURM_ARRAY_TASK_COUNT> cluster limit