-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: SST1RSoXSDB: flexible tool for finding run numbers #31
Comments
Remaining needs on this:
|
Addresses feat #31 Expanded functionality of the summarize_run function in SST1RSiXSDB.py (#31). I believe the signature matches that of the old version, so existing code should remain functional. New Features: Slightly expanded set of preset keyword search terms, made many case-insensitive, made many regex-based. Allowed for additional search terms to be provided as keyword arguments, specifying the match method If the catalog is reduced to zero, the user is notified which search term failed to match. Expanded the variety of metadata that is output to the dataframe and provided a set of preset collections of metadata through the outputType parameter, including scan numbers only Allowed for additional output metadata fields to be requested through the userOutputs keyword argument Implemented failing gracefully at multiple stages Implemented limited 'troubleshooting tips' on bad user input See signature docstring for full documentation and example functions.
I'm going to reopen this one with the documentation-goodfirstissue labels solely because it would be really nice to have docs and test coverage for this function. Anybody that wants to take @BijalBPatel's outstanding, best-in-class example docstring for Feel free to reach out to me or @pdudenas if anyone would like a hand getting started w Sphinx docs. |
One major obstacle to the use of the DataBroker loader is fishing the run you want out of the sea of all 40,000+ runs (as of summer 2022) on the instrument.
One approach to this is to use a hybrid sort of scheme with, say, a Pandas frame that contains basic metadata, from which scans could be loaded.
API would be vaguely like
summarize_run(proposal=None,saf=None,user=None,institution=None,project=None,sample=None,plan=None)
which could return apd.Dataframe
with columns in the start document.The user could then select down that data frame, even using pandas tools like:
df.sample_id.drop_duplicates()
to view unique sample id's, say.Closing this loop would probably involve actually doing
isel
,sel
, orwhere
calls to get a reasonable number of scans, then passing that frame back into a loader function that would make one big xarray out of those scans.The text was updated successfully, but these errors were encountered: