Upper api #609

marcmengel · 2025-03-07T04:25:27Z

This is adding an upper level api to the big_jobsub_api branch / pull #594, which has 2 main calls,
and an extenstion of the Job class, SubmittedJob.

The two bare entry points are

submit( group="...", executable="..." ...) which has a plethora of jobsub_submit arguments and returns a SubmittedJob object for the newly submitted job and
q(group="...", user=""..."...) which returns a list of SubmittedJob objects, one for each entry returned by jobsub_q with the mapped argumetns. It may need more arguments mapped to meet peoples neesds.

Job objects (existing in condor.py) have members

id for jobids
cluster
proc
schedd

SubmittedJob objects add

pool
group
role
auth_methods
submit_output (onlyt if the result of submit())
owner (if jobsub_q output or q() methods called)
submitted (datetime.datetime) ( "" )
runtime (datetime.timedelta) ( "" )
status (htcondor.JobStatus) ( "" )
prio
size
command

as well as methods:

hold()
release()
rm()
fetchlog( destdir, condor=False)
q() (update owner, status, etc. with jobsub_q)
q_long() (...and return --long info as dictionary)
q_analyze() (return jobsub_q --better-analyze output)
wait() (run q() periodically until status COMPLETED, HELD, or REMOVED.)

Any of those SubmittedJob objects have the ability to manage the jobs with the methods mentioned above.

marcmengel · 2025-03-07T20:42:38Z

Recent playing with new api interactively:
demo.txt
although the job ended before I could demo hold() and release()

shreyb · 2025-03-07T21:48:26Z

In #594: @retzkek had a few comments, which we decided would move to this PR's scope. For completeness, I wanted to include those here:

General comment:

We should be careful and deliberate when adding an external API that it's something we can evolve without breaking compatibility, especially knowing how experiment code evolves (or doesn't).

Most APIs will offer high- and low-level interfaces, so I'm ok with offering jobsub_call() documented as a low-level interface that accepts arguments with little checking and returns the raw output as a string that is likely to change. It certainly meets the immediate request.

We should also look to offer high-level interfaces that return concrete types, but they don't all have to be (or should be) created now - we can let users drive that by requesting them. A basic jobsub_submit() and jobsub_q() is fine, and would be mostly wrapping up the already defined regexes.

In lib/jobsub_api.py; for jobsub_submit_re, and jobsub_q_re:

Can we wrap this in a small function that returns a concrete type as a high-level interface instead of exposing this detail? There already is a Job class that represents a job/submission.

jobsub_lite/lib/condor.py

Line 522 in 138cb3a

class Job:

Maybe something like def jobsub_submit(*args: str, **kwargs) -> Job

shreyb · 2025-03-11T21:31:16Z

lib/jobsub_api.py

+# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=
+
+
+def optfix(s: Optional[str]) -> str:


Rather than have this function wrapping every jobsub_call invocation, perhaps it's better to have jobsub_call only return a string. Looking at that function, it looks like the only reason it needs to be Optional[str] is because you initialize res to None. Maybe if that's "", then you can always return a string, and then this function won't be necessary.

shreyb · 2025-03-11T21:42:43Z

lib/jobsub_api.py

+    def hold(self, verbose: int = 0) -> str:
+        """Hold this job with jobsub_hold"""
+        args = ["jobsub_hold"]
+        self._update_args(verbose, args)


Since these last three lines are run for hold, release, and rm, could they be combined into a single function, which gets called from each of those funcs? Something like:

def some_better_name(self, command, verbose): args = [command] self._update_args(verbose, args) rs = optfix(jobsub_call(args, True)) return rs def hold(self, verbose): return self.some_better_name("jobsub_hold", verbose) def release(self, verbose): return self.some_better_name("jobsub_release", verbose) def rm(self, verbose): return self.some_better_name("jobsub_rm", verbose)

So that was the Idea of the _update_args routine already, so I collapsed it into one called _cmd() that takes
the initial args with the base command and base options, and moved the optfix(jobsub_call...) up there.

… the api

shreyb · 2025-03-12T15:48:15Z

lib/jobsub_api.py

+        """run 'jobsub_q' on this job and update values, status"""
+        rs = self._cmd(verbose, ["jobsub_q"])
+        lines = rs.split("\n")
+        if len(lines) == 2 and self.status is not None:


I worry about specifying the lines count to see if the jobsub_q output. For example in this case, I think the current check breaks:

[sbhat@host~]$ jobsub_q -G uboone [email protected] Attempting to get token from https://vaultserver.fnal.gov:8200 ... succeeded Storing bearer token in /tmp/bt_token_uboone_Analysis_UID JOBSUBJOBID OWNER SUBMITTED RUNTIME ST PRIO SIZE COMMAND 0 total; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended

By my count, that's 4 lines, but it could apply to a job that we previously saw, but has now left the queue. Maybe this check needs to be more specific, perhaps something that looks for the standard jobsub_q header, in the lines, and then sees if the next element of lines has the jobid in it? Just thinking "out loud" here, I think that could look something like:

def has_q_output(self, q_lines) -> bool: header_idx = 0 for i, line in enumerate(lines): if re.search("^JOBSUBJOBID\tOWNER.+$", line): header_idx = i break else: return False try: return lines[header_idx + 1].find(self.jobid) != -1: except IndexError: return False

There may be a more intelligent way than this example to check the jobsub_q output - but either way, I think we should make the check stronger.

shreyb · 2025-03-12T15:55:07Z

lib/jobsub_api.py

+            # we saw it previously, and now it is not showing up..
+            self.status = JobStatus.COMPLETED
+            return
+        if len(lines) > 1:


Now reading this part, if we implement an output checker like above, the logic of this whole method could be something along the lines of:

if not self.has_q_output(lines): if self.status is not None: self.status = JobStatus.COMPLETED return raise RuntimeError(....) # This is the happy path line = rs.split("\n")[1] ... return

I think that helps with readability as well - what do you think?

shreyb · 2025-03-12T16:00:08Z

lib/jobsub_api.py

+            if not line.find(" = ") > 0:
+                continue
+            k, v = line.split(" = ", 1)
+            if v[0] == '"':


I don't think you have to do this check:

>>> print('"abc"'.strip('"')) abc >>> print('abc'.strip('"')) abc >>> 'abc'.strip('"') == '"abc"'.strip('"') True

shreyb · 2025-03-12T16:08:26Z

lib/jobsub_api.py

+            if v[0] == '"':
+                v = v.strip('"')
+            res[k] = v
+        if len(lines) == 1 and self.status is not None:


VERY nitpicky here, but I think the intent of this would be clearer if rather than checking len(lines), you check to see if rs == "", or possibly if not rs. This case applies to a blank result from rs, if I understand correctly, and one of those ways would communicate that clearer.

The other issue is that len(lines) COULD be 1 and the result of rs non-blank and still if the condor devs in the future forget to put line breaks in. To illustrate the point

>>> lines = "".split("\n") >>> lines, len(lines) ([''], 1) >>> lines = "blahblahoopsnolineendingkey=value".split("\n") >>> lines, len(lines) (['blahblahoopsnolineendingkey=value'], 1)

Well, here I'm trying to handle the case that jobs just stop showing up when they complete around here; so you do
jobsub_q and it comes up empty. When you're doing plain jobsub_q , you still get the headers, so length(lines)==2, (["JOBSUBJOBID ...", ""])l; but for jobsub_q --long we don't get the header so length(lines)==1... So I was trying to have parallel tests in the two cases. But you're right, at least a comment here about what the len(lines) == n means would be appropriate.

Added a little dissertation on detecting the emtpy result, and why we're marking it "COMPLETED", in comments.

I understand where you're coming from, but I'm still not too keen on the check counting lines, if there's a more reliable way. How about some internal function that checks both cases? Like a

def _jobsub_q_output_nonempty(stdout: str, long: bool) -> bool: if long: return stdout != "" # logic if not long stuff()

Then you could have the same function called in both cases, so you'd satisfy having parallel checks.

retzkek · 2025-03-26T19:20:51Z

We should also give some thought to how this API will be documented for users.

Conveniently that's the topic of tomorrow's CSAID roadmap meeting.

shreyb · 2025-03-26T21:17:51Z

I'm sorry I missed the context of your comment in the meeting earlier today, Kevin. Yes, we should have this be a part of #612 as well. It's kind of in both items' scopes.

marcmengel added 4 commits March 6, 2025 17:43

fleshing out q() method

ba3c21f

fetchlog working in api

63e0649

datetimes, JobStatus-es, etc

db39665

test passing

e968a27

marcmengel marked this pull request as draft March 7, 2025 04:25

marcmengel added 3 commits March 6, 2025 22:41

format string fixup

9d00a99

redid test with job.id

b68be6a

Draft of collapsing Job classes, adding per-job q methods

b191ec7

This was referenced Mar 7, 2025

Jobsub subcomand parser and API interface #594

Merged

Change spec file to include new files #611

Open

Document new API #612

Open

marcmengel added 4 commits March 10, 2025 22:39

more fixups, and wait() method

df04235

sigh

34f07fc

Merge branch 'master' into upper_api

0d534ab

year boundary date conversion fix

98da040

retzkek self-requested a review March 11, 2025 18:36

shreyb reviewed Mar 11, 2025

View reviewed changes

marcmengel added 3 commits March 11, 2025 16:49

Shouldn't raise SystemExit in api-called code

10aa78a

wrap any jobsub exceptions in RuntimeError with the jobsub command in…

f8b3e6f

… the api

Addressing @shreyb's suggestion

5c97a75

shreyb reviewed Mar 12, 2025

View reviewed changes

marcmengel added 2 commits March 12, 2025 15:02

comments on odd q(), q_long() dissappearing job case

b391fcd

maxConcurrent 0 vs None

676d994

more fun

ddebf4a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upper api #609

Upper api #609

marcmengel commented Mar 7, 2025 •

edited

Loading

marcmengel commented Mar 7, 2025 •

edited

Loading

shreyb commented Mar 7, 2025

shreyb Mar 11, 2025

shreyb Mar 11, 2025

marcmengel Mar 12, 2025

shreyb Mar 12, 2025

shreyb Mar 12, 2025

shreyb Mar 12, 2025

shreyb Mar 12, 2025 •

edited

Loading

marcmengel Mar 12, 2025

marcmengel Mar 12, 2025

shreyb Mar 13, 2025 •

edited

Loading

retzkek commented Mar 26, 2025

shreyb commented Mar 26, 2025

		# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=


		def optfix(s: Optional[str]) -> str:

Upper api #609

Are you sure you want to change the base?

Upper api #609

Conversation

marcmengel commented Mar 7, 2025 • edited Loading

marcmengel commented Mar 7, 2025 • edited Loading

shreyb commented Mar 7, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shreyb Mar 12, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shreyb Mar 13, 2025 • edited Loading

Choose a reason for hiding this comment

retzkek commented Mar 26, 2025

shreyb commented Mar 26, 2025

marcmengel commented Mar 7, 2025 •

edited

Loading

marcmengel commented Mar 7, 2025 •

edited

Loading

shreyb Mar 12, 2025 •

edited

Loading

shreyb Mar 13, 2025 •

edited

Loading