Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: SST1RSoXSDB: flexible tool for finding run numbers #31

Open
pbeaucage opened this issue Jun 5, 2022 · 2 comments · Fixed by #51
Open

feat: SST1RSoXSDB: flexible tool for finding run numbers #31

pbeaucage opened this issue Jun 5, 2022 · 2 comments · Fixed by #51
Assignees
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers

Comments

@pbeaucage
Copy link
Collaborator

One major obstacle to the use of the DataBroker loader is fishing the run you want out of the sea of all 40,000+ runs (as of summer 2022) on the instrument.

One approach to this is to use a hybrid sort of scheme with, say, a Pandas frame that contains basic metadata, from which scans could be loaded.

API would be vaguely like summarize_run(proposal=None,saf=None,user=None,institution=None,project=None,sample=None,plan=None) which could return a pd.Dataframe with columns in the start document.

The user could then select down that data frame, even using pandas tools like:
df.sample_id.drop_duplicates() to view unique sample id's, say.

Closing this loop would probably involve actually doing isel, sel, or where calls to get a reasonable number of scans, then passing that frame back into a loader function that would make one big xarray out of those scans.

@pbeaucage
Copy link
Collaborator Author

Remaining needs on this:

@pbeaucage pbeaucage added documentation Improvements or additions to documentation good first issue Good for newcomers labels Sep 21, 2022
@BijalBPatel BijalBPatel linked a pull request Oct 9, 2022 that will close this issue
BijalBPatel added a commit that referenced this issue Oct 12, 2022
Addresses feat #31 

Expanded functionality of the summarize_run function in SST1RSiXSDB.py (#31). I believe the signature matches that of the old version, so existing code should remain functional.

New Features:

Slightly expanded set of preset keyword search terms, made many case-insensitive, made many regex-based.
Allowed for additional search terms to be provided as keyword arguments, specifying the match method
If the catalog is reduced to zero, the user is notified which search term failed to match.
Expanded the variety of metadata that is output to the dataframe and provided a set of preset collections of metadata through the outputType parameter, including scan numbers only
Allowed for additional output metadata fields to be requested through the userOutputs keyword argument
Implemented failing gracefully at multiple stages
Implemented limited 'troubleshooting tips' on bad user input
See signature docstring for full documentation and example functions.
@pbeaucage
Copy link
Collaborator Author

pbeaucage commented Oct 12, 2022

I'm going to reopen this one with the documentation-goodfirstissue labels solely because it would be really nice to have docs and test coverage for this function. Anybody that wants to take @BijalBPatel's outstanding, best-in-class example docstring for SST1RSoXSDB.summarizeRun() and start a Sphinx page for it, please do so!

Feel free to reach out to me or @pdudenas if anyone would like a hand getting started w Sphinx docs.

@pbeaucage pbeaucage reopened this Oct 12, 2022
@andrewjlevin andrewjlevin self-assigned this Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants