Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jobsub subcomand parser and API interface #594

Merged
merged 23 commits into from
Mar 10, 2025
Merged

Jobsub subcomand parser and API interface #594

merged 23 commits into from
Mar 10, 2025

Conversation

marcmengel
Copy link
Contributor

Subcommand Parser

You can now say "jobsub submit ..." instead of "jobsub_submit ..." and jobsub gives help for all the commands.

There is also a jobsub_api which you use by saying:
jobsub_call( [argv] , collect_ouput) Which if you pass collect_ouput = True will return the commands output as a string, otherwise the output goes to stdout. There are some utility regexes to parse said output;
a simple example follows:

from jobsub_api import jobsub_call, jobsub_submit_re, jobsub_q_re

def test_api_demo():
    group = "fermilab"
    out1 = jobsub_call(
        ["jobsub_submit", "-G", group, "file://./job_scripts/lookaround.sh"], True
    )
    m1 = jobsub_submit_re.search(out1)
    if m1:
        jobid = m1.group("jobid")

        time.sleep(1)

        out2 = jobsub_call(["jobsub_q", "-G", group, jobid], True)
        for line in out2.split("\n")[1:]:
            m = jobsub_q_re.search(line)
            if m:
                print(f"saw job {m.group('jobid')} command {m.group('command')}")
    else:
        print("submission failed: output:\n", out1)

@marcmengel
Copy link
Contributor Author

So this looks like a HUGE change, because code moved from the various jobsub_xxx scripts into lib/mains.py
and then was sliced up into routines. But if you snip out the moved code and diff the chunks, it's much more reasonable to look at: here it is:

compare_moved.txt

@marcmengel
Copy link
Contributor Author

marcmengel commented Dec 20, 2024

At this stage, lots of things work, but I need to run and clean up from the whole test suite. It's looking like around 10 test failures. So look for a few more changes before this is ready for prime time.

@shreyb
Copy link
Collaborator

shreyb commented Jan 6, 2025

Do you want to move the PR into draft status until it passes the tests, or do you think it's close enough at this point?

@marcmengel
Copy link
Contributor Author

I think it's about ready. The only test I don't see why it's failing is test_get_schedd_1,
Maybe another pair of eyes on that would help.

@shreyb
Copy link
Collaborator

shreyb commented Jan 6, 2025

test_get_schedd_1 passes when I run it with this PR checked out, actually. What error are you getting?

I'm not sure why the pylint is now failing though - it looks like import errors, but those worked before. I'll check that out as well.

@marcmengel
Copy link
Contributor Author

Well, I've gotten completely sidetracked here on to this condor_submit test; I'm going to leave it broken for now; its having trouble copying the output files back to where jobsub_fetchlog wants to find them, and then it's finding the wrong flavor of ifdhc, and getting lbrary link errors... I'm going to leave that broken for now and redo the submit files for that test separately.

@shreyb
Copy link
Collaborator

shreyb commented Jan 13, 2025

Alright - so should we go ahead and start reviewing this PR with an eye to fixing the condor_submit test before the next release?

@marcmengel
Copy link
Contributor Author

Yes please. The problems with the condor_submit test are longstanding; it was passing for the wrong reasons previously, and there are several issues to debug so that the job submitted by condor_submit is retrievable by jobsub_fetchlog without reusing the same file output directory, etc.

@marcmengel
Copy link
Contributor Author

So this latest push gets you the split-out mains/cmd.py, mains/submit.py and mains/fetchlog.py with the
respective mainlines and support code included. Mostly passes regression tests (except test_condor_unit.py::TestCondorUnit::test_get_schedd_1 and one or two test jobs output tests, which
are unrelated)

@shreyb
Copy link
Collaborator

shreyb commented Feb 24, 2025

I'll re-review, but I think this is good to go.

@retzkek retzkek self-requested a review March 5, 2025 20:22
Copy link
Contributor

@retzkek retzkek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be careful and deliberate when adding an external API that it's something we can evolve without breaking compatibility, especially knowing how experiment code evolves (or doesn't).

Most APIs will offer high- and low-level interfaces, so I'm ok with offering jobsub_call() documented as a low-level interface that accepts arguments with little checking and returns the raw output as a string that is likely to change. It certainly meets the immediate request.

We should also look to offer high-level interfaces that return concrete types, but they don't all have to be (or should be) created now - we can let users drive that by requesting them. A basic jobsub_submit() and jobsub_q() is fine, and would be mostly wrapping up the already defined regexes.



# so clients can easily parse the result strings
jobsub_submit_re = re.compile(r"Use job id (?P<jobid>[0-9.]+\@[^ ]+) to retrieve")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we wrap this in a small function that returns a concrete type as a high-level interface instead of exposing this detail? There already is a Job class that represents a job/submission.

class Job:

Maybe something like def jobsub_submit(*args: str, **kwargs) -> Job

# so clients can easily parse the result strings
jobsub_submit_re = re.compile(r"Use job id (?P<jobid>[0-9.]+\@[^ ]+) to retrieve")

jobsub_q_re = re.compile(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, but a bit trickier to do "right." Should we add the attributes to the existing Job class and return a list of those, or create a new JobStatus class specific for this? Simpler API (with some added internal complexity) or easier?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be in favor of adding the attribute to the Job class, and returning a list of those. We wouldn't even have to make up a new enumeration in that case, as the HTCondor python bindings provide those:

https://htcondor.readthedocs.io/en/latest/apis/python-bindings/api/htcondor.html#htcondor.JobStatus

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@retzkek - given that we plan on addressing these in #609, are you OK with us moving ahead with merging this after my re-review?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my main concern was addressed (doc strings).

@marcmengel marcmengel mentioned this pull request Mar 7, 2025
@marcmengel marcmengel requested a review from retzkek March 7, 2025 15:52
@marcmengel
Copy link
Contributor Author

So I went ahead and moved the upper level api to its own pull request, #609 .
That way we can merge this one, and haggle over the upper API in that pull request.
It adds about 400 lines....

Copy link
Contributor

@retzkek retzkek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@shreyb shreyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@shreyb shreyb added this to the 1.12 milestone Mar 10, 2025
@shreyb shreyb merged commit 20e6afa into master Mar 10, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants