Skip to content

Eurec4a test#216

Open
chitvansingh03 wants to merge 10 commits intoatmdrops:mainfrom
chitvansingh03:eurec4a_test
Open

Eurec4a test#216
chitvansingh03 wants to merge 10 commits intoatmdrops:mainfrom
chitvansingh03:eurec4a_test

Conversation

@chitvansingh03
Copy link
Copy Markdown

@chitvansingh03 chitvansingh03 commented Aug 6, 2025

I have modified some of the files so as process EUREC4A's HALO level 0 data using Pydropsonde. The first change mentioned needs to be generalized so both EUREC4A and ORCESTRA can be processed simultaneously.

  1. Changed in processor.py - modified the method get_circle_times_from_segementation() so it the 'flight_id' corresponds to 'flight_id's in EUREC4A's flight segmentation file (Type - 'Platform-MMDD'). They are not the same as flight id which pydropsonde derives from Level 0 data (type - 'YYYYMMDD'). So this change may end up not working for orcestra. This needs to be generalized, so both orchestra and eurec4a can be processed using same code.

  2. Change in rawreader.py - modified opening and reading of A-files. Earlier it stirctly only read characters of type UTF-8, just changed that if a non-utf-8 , character comes, ignore the error and proceed. (this problem was in P3 files, and on making this change, it magically worked - until another error came).

  3. Added 2 more files relevant for EUREC4A- segmentation file and config file I made (path needs to be modified)
    Thank you
    Chitvan Singh

…concat_circle functions respectively. And 4 new files added which are to be removed
…te_and_populate_flight_object, to check if their output is empty or not. turns out they are empty!"
…fied for EUREC4A needs. pipeline.py - create_and_populate_circle_object , the print and saving statement updated.
… processor.py - generalised finding flight_id for P3 and HALO
…ad A- files in all functions doing so 2) In processor.py - modified get_circle_times_from_segmentation() for eurec4a data, it won't work on orchestra.
@tmieslinger
Copy link
Copy Markdown
Collaborator

Thank you for testing pydropsonde on the EUREC4A dropsonde data!
This PR covers three things. We've already discussed some offline and I'll try to summarise my suggestions below:

  1. reading some of the A-files from P3 results in errors. This was actually a very good discovery as some of the P3 files are indeed broken with bit-flips and cannot be decoded properly with ASCII. Instead of catching but ignoring the error message, I implemented a change in the reading which is detailed in the respective PR change reading code for a-file to handle P3 cases #217
  2. The EUREC4A or JOANNE Level 0 data (raw files and folder names) do not directly fit into the input scheme of pydropsonde. Also, some files are duplicated (A-files) or partly duplicated (files with short naming scheme). I would generally suggest that we improve the sparse pydropsonde documentation to clarify what the input shall look like. In the end, the cleaning of duplicate files is independent from pydropsonde and should be done beforehand. More complicated is the folder naming: pydropsonde assumes that the folder which includes all files from a measurement flight has a name that uniquely identifies this flight (or measurement sequence). Respectively, the folder name is written to the flight_idvariable used within pdropsonde and also to the output files from Level 2 onwards. Your suggestion to extract the flight_id later on in the processing from a flight segmentation file is a neat solution, but it would only work for the Level 4 dataset, leading to an inconsistency in flight_id between Level 2, 3, and 4. Also, pydropsonde is designed to do additional QC (Level2) and concatenate single profile measurements that are meant to be evaluated together (Level 3). It includes further good things like adding derived variables. All of that is independent from a flight segmentation and the possibility to combine certain profiles into mesoscale products (e.g. omega, Level 4). In fact, for most datasets that pydropsonde could be applied, there is likely no flight segmentation information available. Therefore, I'd suggest to update to documentation to clarify that the folder names must be such that they uniquely identify a flight and also suggest the users to preferably make use of the official campaign-wide flight IDs.
  3. the EUREC4A config files would be super cool to have! I'd suggest however to add it not here, but to the orcestra-campaign/dropsondes repository. The current repo is meant to only cover the pydropsonde code including minimalistic example data and a respective config for it to run tests on the code. Config files and further information to specific campaigns and their datasets are better placed in separate repos. For the EUREC4A case, I think it's a good option to add it to the ORCESTRA dropsondes as the above linked repo shall include everything needed to reproduce the processing that we do for the final ORCESTRA dropsonde datasets and respective data paper. We include a comparison to JOANNE anyway, such that it would be perfect to further compare the JOANNE profiles to reprocessed EUREC4A profiles with the most recent pydropsonde version :)

Overall, this was a super helpful test and I would appreciate if you add the eurec4a config to the linked repo and open a PR there. Thanks again for all the work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants