Open
Conversation
…appropriate dat for the report type
ian-noaa
reviewed
Aug 2, 2024
Collaborator
Author
|
I recommend using 3.12 because I have done all my prepbufr testing on 3.12.
I know that version works with xarray, NCEPLIBS-bufr, and netcdf all of
which are needed for the ingest container.
…On Fri, Aug 2, 2024 at 9:40 AM Ian McGinnis ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pyproject.toml
<#395 (comment)>:
> -python = "^3.11"
+python = "^3.12"
If we're updating to Python 3.12, we should update our dockerfile & CI
too. Unless 3.12 is required for the RAOBS, I could see an argument for
making that a separate change/PR to keep the number of changes down. That
being said, I'd be in favor of updating to 3.12 if all of our dependencies
work with it.
------------------------------
In pyproject.toml
<#395 (comment)>:
> @@ -25,6 +26,7 @@ pytest = "^8.1.1"
types-pyyaml = "^6.0.12.20240311"
ruff = "^0.3.5"
coverage = "^7.4.4"
+ipykernel = "^6.29.4"
Do we need the jupyter python kernel for this project? I know sometimes
VSCode insists on including it but if it's not required, I'd rather remove
it so we don't have to worry about updating it for the security scanners.
------------------------------
In pyproject.toml
<#395 (comment)>:
> pyyaml = "^6.0.1"
xarray = "^2024.3.0"
+pyarrow = "^16.1.0"
I wasn't seeing pyarrow imported anywhere, did I miss a use case? I know
it can be useful for working with Parquet files & doing columnar operations
on data in memory. So it could be interesting to evaluate at some point.
------------------------------
In src/vxingest/grib2_to_cb/grib_builder.py
<#395 (comment)>:
> - spfh = []
+ specific_humidity = []
This is a great naming improvement. Thanks!
------------------------------
In src/vxingest/grib2_to_cb/grib_builder.py
<#395 (comment)>:
> for station in self.domain_stations:
geo_index = get_geo_index(
self.ds_translate_item_variables_map["fcst_valid_epoch"], station["geo"]
)
x_gridpoint = station["geo"][geo_index]["x_gridpoint"]
y_gridpoint = station["geo"][geo_index]["y_gridpoint"]
- spfh.append((float)(self.interp_grid_box(values, y_gridpoint, x_gridpoint)))
- return spfh
+ specific_humidity.append(
+ (float)(self.interp_grid_box(values, y_gridpoint, x_gridpoint))
We could lose an extra set of parentheses here to improve readability -
the ones around float don't do anything.
⬇️ Suggested change
- (float)(self.interp_grid_box(values, y_gridpoint, x_gridpoint))
+ float(self.interp_grid_box(values, y_gridpoint, x_gridpoint))
------------------------------
In src/vxingest/grib2_to_cb/grib_builder.py
<#395 (comment)>:
> +class RaobModelNativeBuilderV01(GribModelBuilderV01):
+ """This is the builder for model data that is ingested from grib2 NATIVE levels files.
+ It is a concrete builder specifically for the model raob data that are organized based
+ on the models preset vertical levels. This varies quite a bit from model to model
+ and is dependent on the configuration set up before the model runs.
+ This builder is a subclass of the GribModelBuilderV01 class.
+ The primary differences in these two classes are the handlers that derive the pressure level.
+ The pressure level needs to be interpolated according to a specific algorithm.
+
+ Args:
+ load_spec (Object): The load spec used to init the parent
+ ingest_document (Object): the ingest document
+ number_stations (int, optional): the maximum number of stations to process (for debugging). Defaults to sys.maxsize.
More so I can follow along - do we ingest RAOB's from GRIB? I thought it
was all sourced from PREPBUFR right now.
------------------------------
In src/vxingest/prepbufr_to_cb/README.md
<#395 (comment)>:
> +```text
+ To begin with, a PREPBUFR file does not always contain, within each single data subset, the data for an entire report! Instead, for reports which contain mass (i.e. temperature, moisture, etc.) as well as wind (i.e. direction and speed, U and V component, etc.) data values, such data values are stored within two separate but adjacent (within the overall file) data subsets, where each related subset, quite obviously, contains the same report time, location, station identification, etc. information as the other, but where the "mass" subset contains the pressures and/or height levels at which "mass" data values occur, while the corresponding "wind" subset contains the levels at which "wind" data values occur. While it is true that this may, in some cases, cause the same pressure and/or height level to appear in both subsets, this separation is nonetheless maintained for historical reasons peculiar to NCEP. At any rate, the below program will actually merge all of the data from both subsets into a single, unified report in such cases, so that the final decoded output is clearer and more intuitive.
+```
Using a code block for this confused my for a minute if you were quoting a
source document or if this was an aside, if it's a quote you can use
Markdown's "quote" syntax (>) and do:
⬇️ Suggested change
-```text
- To begin with, a PREPBUFR file does not always contain, within each single data subset, the data for an entire report! Instead, for reports which contain mass (i.e. temperature, moisture, etc.) as well as wind (i.e. direction and speed, U and V component, etc.) data values, such data values are stored within two separate but adjacent (within the overall file) data subsets, where each related subset, quite obviously, contains the same report time, location, station identification, etc. information as the other, but where the "mass" subset contains the pressures and/or height levels at which "mass" data values occur, while the corresponding "wind" subset contains the levels at which "wind" data values occur. While it is true that this may, in some cases, cause the same pressure and/or height level to appear in both subsets, this separation is nonetheless maintained for historical reasons peculiar to NCEP. At any rate, the below program will actually merge all of the data from both subsets into a single, unified report in such cases, so that the final decoded output is clearer and more intuitive.
-```
+> To begin with, a PREPBUFR file does not always contain, within each single data subset, the data for an entire report! Instead, for reports which contain mass (i.e. temperature, moisture, etc.) as well as wind (i.e. direction and speed, U and V component, etc.) data values, such data values are stored within two separate but adjacent (within the overall file) data subsets, where each related subset, quite obviously, contains the same report time, location, station identification, etc. information as the other, but where the "mass" subset contains the pressures and/or height levels at which "mass" data values occur, while the corresponding "wind" subset contains the levels at which "wind" data values occur. While it is true that this may, in some cases, cause the same pressure and/or height level to appear in both subsets, this separation is nonetheless maintained for historical reasons peculiar to NCEP.
I also removed the last line since I originally took it to refer to our
ingest when it's referring to Fortran code displayed on the website.
------------------------------
In src/vxingest/prepbufr_to_cb/README.md
<#395 (comment)>:
> +I'm only putting this here temporarily so that I don't lose it before it gets implemented.
+RUC domain
+RRFS North American domain
+Great Lakes
+Global (all lat/lon)
+Tropics (-20 <= lat <= +20)
+Southern Hemisphere (-80 <= lat < -20)
+Northern Hemisphere (+20 < lat <= +80)
+Arctic (lat >= +70) -- Might want to change this to lat >= 60N to match EMC?
+Antarctic (lat <= -70) -- Might want to change this to lat <= 60S to match EMC?
+Alaska
+Hawaii
+HRRR domain
+Eastern HRRR domain
+Western HRRR domain
+CONUS
+Eastern CONUS (lon <= 100W)
+Western CONUS (lon <= 100W)
+Northeastern CONUS
+Southeastern CONUS
+Central CONUS
+Southern CONUS
+Northwest CONUS
+Southern Plain
Markdown treats single newlines as part of a paragraph so this renders
weirdly. If this is still needed, the following will render better:
⬇️ Suggested change
-I'm only putting this here temporarily so that I don't lose it before it gets implemented.
-RUC domain
-RRFS North American domain
-Great Lakes
-Global (all lat/lon)
-Tropics (-20 <= lat <= +20)
-Southern Hemisphere (-80 <= lat < -20)
-Northern Hemisphere (+20 < lat <= +80)
-Arctic (lat >= +70) -- Might want to change this to lat >= 60N to match EMC?
-Antarctic (lat <= -70) -- Might want to change this to lat <= 60S to match EMC?
-Alaska
-Hawaii
-HRRR domain
-Eastern HRRR domain
-Western HRRR domain
-CONUS
-Eastern CONUS (lon <= 100W)
-Western CONUS (lon <= 100W)
-Northeastern CONUS
-Southeastern CONUS
-Central CONUS
-Southern CONUS
-Northwest CONUS
-Southern Plain
+I'm only putting this here temporarily so that I don't lose it before it gets implemented.
+
+* RUC domain
+* RRFS North American domain
+* Great Lakes
+* Global (all lat/lon)
+* Tropics (-20 <= lat <= +20)
+* Southern Hemisphere (-80 <= lat < -20)
+* Northern Hemisphere (+20 < lat <= +80)
+* Arctic (lat >= +70) -- Might want to change this to lat >= 60N to match EMC?
+* Antarctic (lat <= -70) -- Might want to change this to lat <= 60S to match EMC?
+* Alaska
+* Hawaii
+* HRRR domain
+* Eastern HRRR domain
+* Western HRRR domain
+* CONUS
+* Eastern CONUS (lon <= 100W)
+* Western CONUS (lon <= 100W)
+* Northeastern CONUS
+* Southeastern CONUS
+* Central CONUS
+* Southern CONUS
+* Northwest CONUS
+* Southern Plain
------------------------------
In src/vxingest/prepbufr_to_cb/README.md
<#395 (comment)>:
> +HRRR domain
+Eastern HRRR domain
+Western HRRR domain
+CONUS
+Eastern CONUS (lon <= 100W)
+Western CONUS (lon <= 100W)
+Northeastern CONUS
+Southeastern CONUS
+Central CONUS
+Southern CONUS
+Northwest CONUS
+Southern Plain
+
+## Ingest template
+The ingest template for prepbufr RAOBS is "MD:V01:RAOB:obs:ingest:prepbufr".
+It follows the same small Domain Specific Language (DSL) that all ingest templates follow. This is the template portion...
For my knowledge - do we have the DSL documented somewhere? You've
explained the syntax with *, &, and | but I keep forgetting it, so it'd
be handy to have a reference. I was thinking it'd be nice to have a link to
that reference here.
If not, I'll make an issue to document it in something like a
docs/ingest-dsl.md file and then link to it where appropriate.
------------------------------
In src/vxingest/prepbufr_to_cb/README.md
<#395 (comment)>:
> +There are four sections of mappings.
+1 header basic header data like lat, lon, and station name
+2 q_marker quality data
+3 obs_err observation error data
+4 obs_data_120 observation MASS data
+5 obs_data_220 observation WIND data
This also renders weirdly due to how Markdown handles newlines. If it's an
ordered list, I'd recommend:
⬇️ Suggested change
-There are four sections of mappings.
-1 header basic header data like lat, lon, and station name
-2 q_marker quality data
-3 obs_err observation error data
-4 obs_data_120 observation MASS data
-5 obs_data_220 observation WIND data
+There are four sections of mappings.
+
+1. `header` basic header data like lat, lon, and station name
+2. `q_marker` quality data
+3. `obs_err` observation error data
+4. `obs_data_120` observation MASS data
+5. `obs_data_220` observation WIND data
Otherwise, you could do a code block or a table to preserve the formatting:
⬇️ Suggested change
-There are four sections of mappings.
-1 header basic header data like lat, lon, and station name
-2 q_marker quality data
-3 obs_err observation error data
-4 obs_data_120 observation MASS data
-5 obs_data_220 observation WIND data
+```text
+There are four sections of mappings.
+1 header basic header data like lat, lon, and station name
+2 q_marker quality data
+3 obs_err observation error data
+4 obs_data_120 observation MASS data
+5 obs_data_220 observation WIND data
+```
------------------------------
On src/vxingest/prepbufr_to_cb/run_ingest_threads.py
<#395 (comment)>:
A general note, we'll need to incorporate this into main.py.
We'll import the prepbufr_to_cb module here with an identifiable name
like PrepbufrIngest if this is general to PREPBUFR files, or
PrepbufrRaobIngest if we'll need to add a new module for other PREPBUFR
data types:
https://github.com/NOAA-GSL/VxIngest/blob/b43bb43838716d365eb19d76189d9ae40a4a395b/src/vxingest/main.py#L26-L35
And then, it's not my favorite approach, but we'll need to add an entry to
our large switch statement, here:
https://github.com/NOAA-GSL/VxIngest/blob/b43bb43838716d365eb19d76189d9ae40a4a395b/src/vxingest/main.py#L404-L467
We also may need to update the CLI flags if we have custom flags in the
code here.
------------------------------
On src/vxingest/prepbufr_to_cb/prepbufr_builder.py
<#395 (comment)>:
Generally, I'm concerned about how deeply nested parts (e.g. -
interpolate_data and handle_document) of the prepbufr_to_cb module are.
Deeply nested code has been shown to be difficult to reason about and debug
and is mentioned in the Zen of Python - "Flat is better than nested."
<https://peps.python.org/pep-0020/> Typically, folks recommend keeping
nesting to 2-3 levels deep. They deal with deeper nesting via guard clauses
and extracting functions. If you'd like to refactor this, I'd find it
really useful to pair with you on it. Selfishly, I'd love to better
understand the ingest code & PREPBUFR data.
Some resources:
- Google's Code Health Blog: Nesting guidance
<https://testing.googleblog.com/2017/06/code-health-reduce-nesting-reduce.html?m=1>
- Google's Code Health Blog: Simplifying Control flow
<https://testing.googleblog.com/2023/10/simplify-your-control-flows.html?m=1>
- Jeff Attwood on "Flattening arrow code"
<https://blog.codinghorror.com/flattening-arrow-code/>
- Two methods for refactoring deeply nested code
<https://shuhanmirza.medium.com/two-simple-methods-to-refactor-deeply-nested-code-78eb302bb0b4>
------------------------------
In src/vxingest/prepbufr_to_cb/prepbufr_builder.py
<#395 (comment)>:
> + # I cannot process this station - there is no array of pressure data
+ del interpolated_data[station]
Do we want to remove the station from the data if it can't be processed or
do we want to set it to None? del has its uses but it can be rare to see
in Python so this caught my eye.
------------------------------
In src/vxingest/prepbufr_to_cb/prepbufr_builder.py
<#395 (comment)>:
> + :return: the document_map
+ """
+ try:
+ if len(self.same_time_rows) != 0:
+ self.handle_document()
+ return self.document_map
+ except Exception as _e:
+ logger.exception(
+ "%s get_document_map: Exception in get_document_map: %s",
+ self.__class__.__name__,
+ str(_e),
+ )
+ return None
+
+ # named functions
+ def meterspersecond_to_milesperhour(self, params_dict):
Two thoughts:
1. Can we utilize Pint for these conversions?
2. If not, should we move these "data transformation" functions into a
separate module so we can use them across the various ingest types?
Bigger picture, and more as a discussion point, I wonder if we should make
more use of Pint within the new Ingest to help wrangle units. If we did
want to do that, it'd be handled in a separate issue.
------------------------------
In tests/vxingest/README.md
<#395 (comment)>:
> @@ -40,9 +40,9 @@ Note that this currently (as of 1/2024) disables most of the tests.
## Test data
-For now, you'll need test resources from: https://drive.google.com/drive/folders/18YY74S8w2S0knKQRN-QxZdnfRjKxDN69?usp=drive_link unpacked to `/opt/data` in order to run the test suite.
+For now, you'll need test resources from: [opt_data.tar.gz](https://drive.google.com/drive/folders/18YY74S8w2S0knKQRN-QxZdnfRjKxDN69?usp=drive_linkunpacked) to '/opt/data' in order to run the test suite.
Looks like "unpacked" was added to the URL here and that made it invalid:
⬇️ Suggested change
-For now, you'll need test resources from: [opt_data.tar.gz](https://drive.google.com/drive/folders/18YY74S8w2S0knKQRN-QxZdnfRjKxDN69?usp=drive_linkunpacked) to '/opt/data' in order to run the test suite.
+For now, you'll need test resources from: [opt_data.tar.gz](https://drive.google.com/drive/folders/18YY74S8w2S0knKQRN-QxZdnfRjKxDN69) to '/opt/data' in order to run the test suite.
------------------------------
In tests/vxingest/prepbufr_to_cb/test_unit_prepbufr_builder.py
<#395 (comment)>:
> +def test_read_header(mock_header_bufr):
+ # Create an instance of PrepbufrBuilder
+ builder = PrepbufrRaobsObsBuilderV01(
+ None,
+ {
+ "template": {"subset": "RAOB"},
+ "ingest_document_ids": {},
+ "file_type": "PREPBUFR",
+ "origin_type": "GDAS",
+ "mnemonic_mapping": hdr_template,
+ },
+ )
+
+ # Call the read_header method with the mock bufr object
+ header_data = builder.read_data_from_bufr(mock_header_bufr, hdr_template)
+
+ # Assert the expected values
+ assert header_data["station_id"] == "SID123"
+ assert header_data["lon"] == 45.679
+ assert header_data["lat"] == -123.457
+ assert header_data["obs-cycle_time"] == 0.5
+ assert header_data["station_type"] == 1
+ assert header_data["elevation"] == 100.0
+ assert header_data["report_type"] == 2
I like the way the data is mocked out here!
------------------------------
In tests/vxingest/prepbufr_to_cb/test_int_read_data_from_file.py
<#395 (comment)>:
> +def test_read_header():
+ queue_element = (
+ "/opt/data/prepbufr_to_cb/input_files/241011200.gdas.t12z.prepbufr.nr"
+ )
+ vx_ingest = setup_connection()
+ ingest_doc = vx_ingest.collection.get("MD:V01:RAOB:obs:ingest:prepbufr").content_as[
+ dict
+ ]
+ template = ingest_doc["mnemonic_mapping"]
+ builder = PrepbufrRaobsObsBuilderV01(
+ None,
+ ingest_doc,
+ )
+
+ bufr = ncepbufr.open(queue_element)
+ bufr.advance()
+ assert bufr.msg_type == template["bufr_msg_type"], "Expected ADPUPA message type"
+ bufr.load_subset()
+ header = builder.read_data_from_bufr(bufr, template["header"])
+ bufr.close()
+ assert header is not None
+ assert header["station_id"] == "89571"
+ assert header["lon"] == 77.97
+ assert header["lat"] == -68.58
+ assert header["obs-cycle_time"] == -0.5
+ assert header["elevation"] == 18.0
+ assert header["data_dump_report_type"] == 11.0
+ assert header["report_type"] == 120
We may need to mark these as integration tests for CI with
@pytest.mark.integration().
@pytest.mark.integration()def test_read_header():
...
------------------------------
In src/vxingest/grib2_to_cb/grib_builder.py
<#395 (comment)>:
> +class RaobModelNativeBuilderV01(GribModelBuilderV01):
+ """This is the builder for model data that is ingested from grib2 NATIVE levels files.
+ It is a concrete builder specifically for the model raob data that are organized based
+ on the models preset vertical levels. This varies quite a bit from model to model
+ and is dependent on the configuration set up before the model runs.
+ This builder is a subclass of the GribModelBuilderV01 class.
+ The primary differences in these two classes are the handlers that derive the pressure level.
+ The pressure level needs to be interpolated according to a specific algorithm.
+
+ Args:
+ load_spec (Object): The load spec used to init the parent
+ ingest_document (Object): the ingest document
+ number_stations (int, optional): the maximum number of stations to process (for debugging). Defaults to sys.maxsize.
We discussed this during an in-person meeting - these are placeholder
classes for extracting model data to compare with the RAOB obs data.
—
Reply to this email directly, view it on GitHub
<#395 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGDVQPUCKDYDWFHRKSD7VJTZPOR7PAVCNFSM6AAAAABLZFRCMSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDEMJTGM3TKNRWGI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Randy Pierce
|
Co-authored-by: Ian McGinnis <67600557+ian-noaa@users.noreply.github.com>
Co-authored-by: Ian McGinnis <67600557+ian-noaa@users.noreply.github.com>
Co-authored-by: Ian McGinnis <67600557+ian-noaa@users.noreply.github.com>
good catch Co-authored-by: Ian McGinnis <67600557+ian-noaa@users.noreply.github.com>
…ub.com/NOAA-GSL/VxIngest into implement_prepbufr_RaobsObsBuilder_185
ian-noaa
reviewed
Oct 30, 2024
tests/vxingest/utilities/get_data_for_raobs_from_adpupa_dump.py
Outdated
Show resolved
Hide resolved
Contributor
|
This PR is stale because it has been open 90 days with no activity. |
Collaborator
Author
|
makes sense.
…On Wed, May 14, 2025 at 11:53 AM Ian McGinnis ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pyproject.toml
<#395 (comment)>:
> +ncepbufr = [
+ { platform = "linux", markers = "platform_machine == 'x86_64'", file = "./third_party/NCEPLIBS-bufr/wheel_dist/ncepbufr-12.1.0-py312-none-linux_x86_64.whl" },
+ { platform = "darwin", markers = "platform_machine == 'arm64'", file = "./third_party/NCEPLIBS-bufr/wheel_dist/ncepbufr-12.1.0-py312-none-macosx_14_0_arm64.whl" }
+]
I'd probably spin up a VM on ARM Mac, or this could be a good occasion to
explore using the ARM Github Actions runners.
—
Reply to this email directly, view it on GitHub
<#395 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGDVQPQ44DJUT27OXBZDTXL26N7KVAVCNFSM6AAAAABLZFRCMSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDQNBRGA2DGMZTHA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Randy Pierce
|
Collaborator
Author
|
I am updating all the things that include that wheel in my feature
branch, I think.
randy
…On Wed, May 14, 2025 at 2:56 PM Ian McGinnis ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
On third_party/NCEPLIBS-bufr/build.sh
<#395 (comment)>:
The VxIngest/third_party/NCEPLIBS-bufr/wheel_dist directory only has
ncepbufr-12.0.1-py312-none-*.whl wheels in it - so the include paths in
pyproject.toml will fail if we're trying to include 12.1 or 12.2 in the
code.
I could see git rm-ing the old wheels and adding new ones when we update
to 12.2.
—
Reply to this email directly, view it on GitHub
<#395 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGDVQPWX3F2ZQOBWQPJSMXT26OUXBAVCNFSM6AAAAABLZFRCMSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDQNBRGQ4TQMZYGA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Randy Pierce
|
…obsObsBuilder_185
Contributor
|
This PR is stale because it has been open 90 days with no activity. |
Collaborator
Author
|
This issue is still ongoing it just got delayed with the vxformsui work. |
Contributor
|
This PR is stale because it has been open 90 days with no activity. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is the implementation of the prepbufr builder.