Catch det match failures and add to failed list#1239
Catch det match failures and add to failed list#1239mmccrackan wants to merge 1 commit intomasterfrom
Conversation
|
Okay, this appears to have avoided the error based on the latest LAT det match run: |
mhasself
left a comment
There was a problem hiding this comment.
That error message suggests to me that this it's trying to analyze a book that is missing. (It's from Sept. 2024 so no surprise.) But I guess you're saying it would not have tried to do that, if this were truly a new detset that needs processing. Regardless...
The docstring advertises "if match fails for a known reason ..." (my emphasis :) ), and a general try/except is at odds with that, I think. A specific test should be written here.
The obs_id = obs_ids[0] is suspicious, too. I can see why it would be unlikely for that to fail due to a Book having been auto-cleared. (I.e. obs_ids[0] is gone, but obs_ids[-1] is still around and looking good.)
With automated processes we need to be careful not to just give up at the first signs of trouble -- that can lead to big problems going unnoticed for long periods, because the pipeline so gracefully recovers.
Apologies if I'm not sufficiently understanding the issue and the fix!
The
update_det_matchprocess onPrefectis failing with the error:State message: Flow run encountered an exception. FileNotFoundError: [Errno 2] No such file or directory: '/so/data/lati6/obs/17268/obs_1726844446_lati6_001/M_index.yaml'I believe this is related to some detsets failing due to only a handful of detectors having detcal. In the
update_det_matchoutput, I see:Which is one of the obsids that is failing. This is preceded by:
WARNING: sotodlib.core.metadata.loader: Only 4 of 1794 detectors have data for metadata specified by spec={'db': '/global/cfs/cdirs/sobs/metadata/lat/manifests//det_cal/v0/det_cal_local.sqlite', 'name': 'det_cal'}. Trimming.This branch just wraps a try and except around the match function and makes sure that the detset in question will be added to the failed list. This doesn't directly address the problem which is older files missing but once
update_det_matchis re-run onNerscwith this it should skip these files. It will still fail if newer obsids files are missing, which I think is the behavior we want.