Skip to content

return tres_per_task as a dict #384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 18, 2025
Merged

Conversation

plaguedbypenguins
Copy link
Contributor

the old job API had tres_req_str which we were using when folks request gpus-per-task. eg. requests like this

#SBATCH --nodes=6-24 --ntasks=24 --cpus-per-task=1 --gpus-per-task=1 --mem-per-cpu=8g

I couldn't find anything equivalent to tres_req_str in the new Job API, so I added this code that parses tres_per_task into a dict in a vaguely similar way to how you do gres_per_task. ie.

cpu=1,gres/gpu=1 --> {'cpu': 1, 'gres/gpu': 1}

full disclosure - this is my first attempt at cython code, so is probably sketchy :)

sadly(?) there are a bunch of other tres_per_{job,node,task,socket} things in slurm now, not just this one.
however this patch solves our problem for the moment.

anyway, I thought I'd contrib this back and maybe it'd kickstart a discussion on how/if you want to support these kind of tres_per_* things, and if you just want to pass back a _str, or if a dict is nicer, or ... ?

@tazend tazend self-assigned this Jul 11, 2025
Copy link
Member

@tazend tazend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

thanks for the patch and ideas.
I think I've mainly held back on the these tres_per_* things, because I wasn't sure if just returning a dict is good enough, or whether I'd want to introduce some other type for it.

Another thing was that most of the info these strings have is already available through dedicated attributes on the job instance, like cpu, cpus_per_task, memory, memory_per_cpu, memory_per_node, gres_per_task, ntasks, etc.

I think I've wanted people to use these dedicated attributes instead, so I left them out for the moment. Also some of them seem to be really confusing in the slurm api: tres_per_job just contains GRES stuff (why name it tres_per_job then?), while tres_per_task seems to include everything else. And the others, I don't even know :)

Anyway, this small rant aside: there is no drawback of exposing the tres_per_task thing as an additional option to use, so we can add that :)

@plaguedbypenguins
Copy link
Contributor Author

thanks for being ok with adding tres_per_task. that will help us.
no probs at all if it changes to something else in the future.

agree on the rant :) tres's seems all a bit inconsistent with a lot of gres syntax in there still.
I guess there's a limit to how much pyslurm can make slurm's internals appear neat and tidy :)

I've updated the patch to use the to_dict() as you suggested. that is much better. thanks!
I'd previously tried to_gres_dict() but had completely missed that there was a to_dict().

@tazend
Copy link
Member

tazend commented Jul 14, 2025

Hi @plaguedbypenguins

I just tested the tres_per_task on my Cluster - has the dict it returns actually any information in it if you do it on your Cluster?

I tried it with multiple Jobs, but tres_per_task (even just the string slurm is supposed to return) is always empty for me.

@plaguedbypenguins
Copy link
Contributor Author

Hi,

oh, really? darn. what do you see from this?
scontrol show job | grep TresPerTask

FWIW about 2/3 of our current running and submitted jobs have
tres_per_task = {}
but the other 1/3 show values in both scontrol's TresPerTask and pyslurm's tres_per_task, like below.

whether a job has a TresPerTask is very dependent upon the job submission args. eg.
#SBATCH --ntasks=1 --cpus-per-task=64
results in pyslurm returning
tres_per_task = {'cpu': 64}

or
#SBATCH --nodes=6-24 --ntasks=24 --cpus-per-task=1 --gpus-per-task=1 --mem-per-cpu=8g
results in
tres_per_task = {'cpu': 1, 'gres/gpu': 1}

BTW, Re: future devel:
I was vaguely thinking over the weekend that maybe pyslurm could return all tres_per_* with one call as a dict of dicts? eg.
tres_per = {'task':{}, 'node':{}, 'job':{}, 'step':{}}
although we only seem to have jobs using 'task' and 'node' at the moment, so this seems like overkill.

cheers,
robin

@tazend
Copy link
Member

tazend commented Jul 18, 2025

Hey,

sorry for the late response. Yeah I goofed up and didn't actually request anything like --cpus-per-task=5, so it wasn't showing up. Everything works now.

In regards to adding the other tres_per_* things in the future: I think I would probably prefer having them separetely as an attribute to the job class, just like tres_per_task now :) Would probably also go better with documentation - but lets see when there is actually demand for it :)

Have a nice weekend.

@tazend tazend merged commit 546f3d2 into PySlurm:main Jul 18, 2025
1 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants