Skip to content

Conversation

@harshit-soora
Copy link
Contributor

@harshit-soora harshit-soora commented Dec 1, 2025

Workflows Epic Link - #4338

  • Extended Job Info to also include - State, Dependency and Reason (why the process is in the current state) into a single field called Message
    • :dependency=>"afterok:14935262(unfulfilled)", :reason=>"Dependency"
    • :dependency=>"afterok:14935270(failed)" :reason=>"DependencyNeverSatisfied"
  • Remove empty line from launcher selector.

I have not added any human readability for reasons as the string are different for each scheduler. This will hinder scalability in future.

@Bubballoo3
Copy link
Contributor

Bubballoo3 commented Dec 1, 2025

I like this feature, and have no issues with the implementation. I do wonder if it would be helpful to convert the afterok:14935262(unfulfilled) message to something more (human) readable, like Awaiting JOBID success (unfulfilled). I am not sure what the possibilities are besides afterok, or how well they could be communicated using the same strategy. If you don't see any huge problems, this should just be accomplished with a method in WorkflowsHelper.

@harshit-soora
Copy link
Contributor Author

harshit-soora commented Dec 2, 2025

I am thinking of converting all Status / Dependency / Reason into one thing - Lets call it Message.
Else dependency/reason doesn't make sense for a task launched outside workflow (ie. Project). Also status and state can confuse users.

Will add this in ProjectHelper.

@harshit-soora harshit-soora changed the title Added important Native Job Info for Workflows Added important Native job info as Message for Workflows Dec 2, 2025
@harshit-soora
Copy link
Contributor Author

image

@Bubballoo3
Copy link
Contributor

I have not added any human readability for reasons as the string are different for each scheduler. This will hinder scalability in future.

I wonder if this will be a future extension of ood_core, so that people can add human readable versions for each scheduler separately, and access it through the adapter. Definitely not something we should tackle now, but that is likely how we will get around the scheduler issue if we do it later.

Copy link
Contributor

@Bubballoo3 Bubballoo3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach of assembling those fields together into a single message, and added some notes on how we can make sure to do this clearly and consistently across schedulers.

end

# Special case: Showing dependency of cancelled job will confuse users
if state=="CANCELLED" and reason=="Dependency"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our style guide, we should use && in place of and both here and on line 34.

msg = "Current job state is #{state}"

reason = native.dig(:reason)
if reason != "None"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mention in the opening comment that

I have not added any human readability for reasons as the string are different for each scheduler.

Are we still able to rely on 'None' and 'Dependency' as being consistent across schedulers? Otherwise this seems error-prone and we should find an agnostic way of determining how we want to assemble this message.

For example another way to be super clear about what is happening (or what strings are missing/'None") is to put every variable inside labelled quotes, so something like
" Job has current state:'JOBSTATE' because of reason:'REASON'. Waiting for dependency:'DEPENDENCY' to be satisfied"

That way we don't care what the strings themselves actually are, or even if they are present, as having a nil value for any of those could be useful for debugging what is actually going on.

Copy link
Contributor Author

@harshit-soora harshit-soora Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of ' ' is nice, then users can google search the code and check for readable reason/state.

Nice catch, in PBS / Torque;
:state => :job_state
:reason => :comment (human readable)
:dependency => :depend

:afterok is standard in all scheduler

@Bubballoo3
Copy link
Contributor

That all being said I do really like the message content how it is, and would hate to make it uglier with a less elegant composition. So my ideal outcome is that we ensure every conditional is scheduler agnostic and keep your message structure the same.

@harshit-soora
Copy link
Contributor Author

every conditional is scheduler agnostic and keep your message structure the same.

I agree, I removed the special case as it will just make the parsing uglier

@Bubballoo3
Copy link
Contributor

So this now accounts for slurm and pbspro? I think that covers the vast majority of users, and is probably a good compromise that keeps us from having to mess with ood_core right now.

Now that I look at the adapters, I am not even sure if workflows will be supported on some of the other schedulers. We should likely figure that out and include some guardrails. Have you looked into workflows support at all?

@Bubballoo3 Bubballoo3 closed this Dec 4, 2025
@github-project-automation github-project-automation bot moved this from Awaiting Review to Merged/Closed in PR Review Pipeline Dec 4, 2025
@Bubballoo3 Bubballoo3 reopened this Dec 4, 2025
@github-project-automation github-project-automation bot moved this from Merged/Closed to Awaiting Review in PR Review Pipeline Dec 4, 2025
@harshit-soora
Copy link
Contributor Author

harshit-soora commented Dec 4, 2025

Workflow is supported on all the schedulers as afterok dependency in job is available on every scheduler. We don't need guardrails here.

I was looking for what scheduler OOD supports - slurm, PB Pro, Torque; which I have handled in the function. Are there any other major scheduler that I should be aware of? and handle.

@Bubballoo3
Copy link
Contributor

Bubballoo3 commented Dec 4, 2025

To be exhaustive we would want to cover all the adapters in https://github.com/OSC/ood_core/tree/master/lib/ood_core/job/adapters.
These encompass both 'real' schedulers like slurm, pbs, torque, lsf, etc. as well as support for linuxhost, kubernetes, systemd, etc. These all vary in capability and support, for example

https://github.com/OSC/ood_core/blob/bc307005c9753afc923941fa34c07643ab96affc/lib/ood_core/job/adapters/fujitsu_tcs.rb#L213-L220

will raise an error when an afterok parameter is passed.

Digging directly out of the native is bad practice for this reason, but might be alright given the limited time we have to finish this. So I would say check out each adapter, and if the afterok parameter is accepted and passed to the scheduler, then we should also try and find the native key that this info will be stored under. Long term, we can implement this in ood_core and handle all the conversion there.

@Bubballoo3
Copy link
Contributor

Also having conditionals for a lot of different schedulers will get very messy, not to mention the work involved figuring out what the proper keys are, so if you want to defer until @johrstrom can give us some advice that might be a better use of your time. If you do want to take a stab at it it wouldn't be bad though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Awaiting Review

Development

Successfully merging this pull request may close these issues.

3 participants