Changes to CoDICE I-ALiRT processing in prepration for I-ALiRT SIT#1832
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adapts the CoDICE I-ALiRT processing to work with SIT input data by mocking output products and updating tests to use the new packet parsing utility.
- Tests now load packets via
packet_file_to_datasetsand expect two lists of dicts instead of anxarray.Dataset. process_codicehas been rewritten to generate mock CoDICE-Lo and CoDICE-Hi data with fill values.- New constants for I-ALiRT data fields were introduced, and packet grouping now accepts a
prefixparameter.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| imap_processing/tests/ialirt/unit/test_process_codice.py | Switched from decom_packets to packet_file_to_datasets, updated fixtures and assertions for tuple output |
| imap_processing/ialirt/l0/process_codice.py | Replaced real processing with mock loops producing two lists of dicts using fill values |
| imap_processing/codice/constants.py | Added CODICE_LO_IAL_DATA_FIELDS and CODICE_HI_IAL_DATA_FIELDS |
| imap_processing/codice/codice_l1a.py | Extended group_ialirt_data to take a prefix and handle empty data in create_ialirt_dataset |
Comments suppressed due to low confidence (2)
imap_processing/tests/ialirt/unit/test_process_codice.py:53
- [nitpick] These assertions only verify types; consider adding checks for expected list lengths or sample field values to improve coverage of the new mock data logic.
assert all(isinstance(item, dict) for item in cod_lo_data)
imap_processing/ialirt/l0/process_codice.py:17
- Docstring should be updated to describe that this function now returns two lists (
cod_lo_data,cod_hi_data) instead of a single dataset.
) -> tuple[list[dict[str, Any]], list[dict[str, Any]]]:
|
|
||
| else: | ||
| logger.warning("No I-ALiRT data found") | ||
| return None |
There was a problem hiding this comment.
Returning None when no I-ALiRT data is found may lead to downstream NoneType errors; consider returning an empty xarray.Dataset or raising a specific exception.
| return None | |
| return xr.Dataset() |
| grouped_data = group_ialirt_data(packets, data_field_range) | ||
|
|
||
| # Process each group to get the science data and corresponding metadata | ||
| science_values, metadata_values = process_ialirt_data_streams(grouped_data) | ||
|
|
||
| # How data are processed is different for lo-iarlirt and hi-ialirt | ||
| if apid == CODICEAPID.COD_HI_IAL: | ||
| # Set some necessary values and process as a binned dataset similar to | ||
| # a hi-omni data product | ||
| metadata_for_processing = [ | ||
| "table_id", | ||
| "plan_id", | ||
| "plan_step", | ||
| "view_id", | ||
| "spin_period", | ||
| "suspect", | ||
| ] | ||
| for var in metadata_for_processing: | ||
| packets[var] = metadata_values[var.upper()] | ||
| dataset = create_binned_dataset(apid, packets, science_values) | ||
|
|
||
| elif apid == CODICEAPID.COD_LO_IAL: | ||
| # Create a nominal instance of the pipeline and process similar to a | ||
| # lo-sw-species data product | ||
| pipeline = CoDICEL1aPipeline( | ||
| metadata_values["TABLE_ID"][0], | ||
| metadata_values["PLAN_ID"][0], | ||
| metadata_values["PLAN_STEP"][0], | ||
| metadata_values["VIEW_ID"][0], | ||
| ) | ||
| pipeline.set_data_product_config(apid, packets) | ||
| pipeline.decompress_data(science_values) | ||
| pipeline.reshape_data() | ||
|
|
||
| # The calculate_epoch_values method needs acq_start_seconds and | ||
| # acq_start_subseconds attributes on the dataset | ||
| pipeline.dataset["acq_start_seconds"] = ( | ||
| "_", | ||
| metadata_values["ACQ_START_SECONDS"], | ||
| ) | ||
| pipeline.dataset["acq_start_subseconds"] = ( | ||
| "_", | ||
| metadata_values["ACQ_START_SUBSECONDS"], | ||
| ) | ||
| grouped_data = group_ialirt_data(packets, data_field_range, prefix) | ||
|
|
||
| if grouped_data: | ||
| # Process each group to get the science data and corresponding metadata | ||
| science_values, metadata_values = process_ialirt_data_streams(grouped_data) | ||
|
|
||
| # How data are processed is different for lo-iarlirt and hi-ialirt | ||
| if apid == CODICEAPID.COD_HI_IAL: | ||
| # Set some necessary values and process as a binned dataset similar to | ||
| # a hi-omni data product | ||
| metadata_for_processing = [ | ||
| "table_id", | ||
| "plan_id", | ||
| "plan_step", | ||
| "view_id", | ||
| "spin_period", | ||
| "suspect", | ||
| ] | ||
| for var in metadata_for_processing: | ||
| packets[var] = metadata_values[var.upper()] | ||
| dataset = create_binned_dataset(apid, packets, science_values) | ||
|
|
||
| elif apid == CODICEAPID.COD_LO_IAL: | ||
| # Create a nominal instance of the pipeline and process similar to a | ||
| # lo-sw-species data product | ||
| pipeline = CoDICEL1aPipeline( | ||
| metadata_values["TABLE_ID"][0], | ||
| metadata_values["PLAN_ID"][0], | ||
| metadata_values["PLAN_STEP"][0], | ||
| metadata_values["VIEW_ID"][0], | ||
| ) | ||
| pipeline.set_data_product_config(apid, packets) | ||
| pipeline.decompress_data(science_values) | ||
| pipeline.reshape_data() |
There was a problem hiding this comment.
Nothing much is changing here, I just put this all in an if/else statement to check if there are grouped datasets, and if there are none, throw a warning.
laspsandoval
left a comment
There was a problem hiding this comment.
Approving. I think once you have addressed the comments it is good to start testing.
| VALIDMAX: 1.7976931348623157e+308 | ||
| dtype: float64 | ||
|
|
||
| codicehi_h: |
There was a problem hiding this comment.
Keep in mind that each of these variables will be a single column in the database. So there will not be dimensions here. For SWE, for example, I have a separate variable for each energy level. Does this need to be broken down further?
There was a problem hiding this comment.
Good question/good point -- For this particular data variable, there are 4 dimensions: epoch, energy (there are 15 levels), ssd_index (there are 4 of these), and spin_sector (there are 4 of these). So the dimensions of the data are (<# epochs>, 15, 4, 4).
I am not sure how to break this down. Would this mean I need to make (15*4*4=) 240 separate variables?
| # TODO: Once I-ALiRT test data is acquired that actually has data in it, | ||
| # this can be turned back on | ||
| # codicelo_data = create_ialirt_dataset(CODICEAPID.COD_LO_IAL, dataset) | ||
| # codicehi_data = create_ialirt_dataset(CODICEAPID.COD_HI_IAL, dataset) |
There was a problem hiding this comment.
Right after launch we will immediately begin to receive I-ALiRT packets that are empty. So we might have this scenario in real-life. Something to consider maybe in another PR, though, if it is too complicated to change now.
There was a problem hiding this comment.
Is there a preferred way to handle empty packets? i.e. in that case, should the code return a completely empty list? Or a list with dicts that have all the expected fields but are empty?
laspsandoval
left a comment
There was a problem hiding this comment.
One more thing to add based on recent changes to db.
| epoch_data = { | ||
| "apid": int(dataset.pkt_apid[epoch].data), | ||
| "met": met, | ||
| "utc": utc, |
There was a problem hiding this comment.
| "utc": utc, | |
| "met_to_utc": utc, |
This PR makes various changes to the CoDICE I-ALiRT processing so that the code works on I-ALiRT SIT input test data and creates a mock CoDICE I-ALiRT dataset.