Skip to content

Add compression to SBN atmospheric product generation#4649

Open
ChristopherHill-NOAA wants to merge 7 commits intoNOAA-EMC:dev/gfs.v17from
ChristopherHill-NOAA:feature/iss4614_v17_SBN-product-vol
Open

Add compression to SBN atmospheric product generation#4649
ChristopherHill-NOAA wants to merge 7 commits intoNOAA-EMC:dev/gfs.v17from
ChristopherHill-NOAA:feature/iss4614_v17_SBN-product-vol

Conversation

@ChristopherHill-NOAA
Copy link
Contributor

Description

NCO requires that the total volume of post-processing products to be transmitted through the SBN not exceed current operational levels. The generation of these products in retrospective runs of GFSv17 have marginally exceeded operational levels, and the files must be reduced in size (in lieu of the outright removal of some).

This PR simply adds a compression/packing attribute to the generation of WMO-headed atmospheric products within the script exgfs_atmos_awips_20km_1p0deg.sh, accomplished by revising the -set_grib_type option of WGRIB2 from "same" to "complex2".

This PR is intended to resolve #4614.

Type of change

  • Bug fix (fixes something broken)
  • New feature (adds functionality)
  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

  • Is this change expected to change outputs (e.g. value changes to existing outputs, new files stored in COM, files removed from COM, filename changes, additions/subtractions to archives)? YES (If YES, please indicate to which system(s))
    • GFS
    • GEFS
    • SFS
    • GCAFS
  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO
  • Does this change require an update to any of the following submodules? NO (If YES, please add a link to any PRs that are pending.)
    • EMC verif-global
    • GDAS
    • GFS-utils
    • GSI
    • GSI-monitor
    • GSI-utils
    • UFS-utils
    • UFS-weather-model
    • wxflow

How has this been tested?

The GFS workflow was cloned and built on WCOSS. A CI test only confirmed nominal functionality with the code change, as the products resulting from the low resolution run were too small for the code change to produce the necessary effect.

Offline tests executing a segment of exgfs_atmos_awips_20km_1p0deg.sh with files from products/atmos/grib2/0p25 as input resulted in a reduction to the size of the products.

It is recommended that the changes from this PR be tested in a single cycle, high-resolution test of GFSv17, to confirm its intended effect to reduce SBN product volume to the extent required by NCO.

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added
  • Any new scripts have been added to the .github/CODEOWNERS file with owners
  • I have made corresponding changes to the system documentation if necessary

@JessicaMeixner-NOAA
Copy link
Contributor

@DavidHuber-NOAA @TravisElless-NOAA - I see that the shfmt scan failed. I've looked into this but it doesn't appear related to this PR. Can you confirm if that's true or not.

I have previously run a case with all output on. To avoid having to re-run a forecast, I am copying over the com products folder to original.products after this completes, I will use the code here to re-run the meta tasks gfs_awips_20km_1p0deg to see the effect of these changes on a single cycle high resolution run.

Hopefully then we can finally have our answer about the SBN that is needed.

Do any files need to be deleted from the COM directory before this re-run for a successful re-run? Do others see other flaws in this test process?

@ChristopherHill-NOAA - You said you ran CI on this for technical testing, do you have the output of that to share?

@DavidHuber-NOAA
Copy link
Contributor

@JessicaMeixner-NOAA You are correct. It's not related to this PR. I'll open a separate PR to resolve this.

@JessicaMeixner-NOAA
Copy link
Contributor

Original products can be found in:
/lfs/h2/emc/ptmp/jessica.meixner/comroot/prod01/gfs.20260305/06/original.productsoriginal.products

Running with the two line change from Chris, the output can be found at:
/lfs/h2/emc/ptmp/jessica.meixner/comroot/prod01/gfs.20260305/06/products

All gfs_awips_20km_1p0deg were rerun.

@JessicaMeixner-NOAA
Copy link
Contributor

9.1G original.products/atmos/wmo
9.0G products/atmos/wmo/

This at least has trended in the correct direction.

We do need to confirm that the packing was applied and that quality of the output is acceptable.

@ChristopherHill-NOAA
Copy link
Contributor Author

A spot check of output from the grib_util/bin/degrib2 function for the original and new products from cycle 2026030506 shows differences only with 1) Section 0 record size value, and 2) values within the Data Representation (or DRS) template - each of which is described in #4614. The degrib2 output includes minimum, maximum, and average values for each variable - with no differences observed with these values between the original and new versions of select products.

Sample degrib2 output for original.products/atmos/wmo/grib2.awpgfs_20km_conus_f072, products/atmos/wmo/grib2.awpgfs_20km_conus_f072

Values of the total volume of operational products for 2026030506 from v16.3, original products from v17, and newly generated products from v17 are shown below, and entered in this spreadsheet:

v16.3 9287.98 MB | orig. v17 9423.79 MB | new v17 9356.20 MB
The difference of product volume between new v17 from original v17: -67.59 MB
The difference of product volume between new v17 from v16.3: +68.22 MB

The greatest value of v16.3 SBN product volume observed over the past week was 9336.94 MB, from cycle 2026031012.

@JessicaMeixner-NOAA
Copy link
Contributor

@ChristopherHill-NOAA - does your decrease in volume size also account for the files that are being removed? I believe we were going to have about 70 MB from removed files?

@ChristopherHill-NOAA
Copy link
Contributor Author

The total volume values include xtrn.awpgfs* files for v16.3, which are scheduled to be removed and are not being generating with v17. Assuming we are having to reduce SBN product volume for v17 against the same cycle for v16, then a further reduction of 70 MB would be needed in the case of 2026030506. If there is otherwise a hard limit volume (e.g. 9400 MB) to be met, that would be easier to reconcile.

As noted with #4614, I will try different versions of g2lib for potentially reducing product volume.

@JessicaMeixner-NOAA
Copy link
Contributor

Our goal for SBN is to be neutral or decreased from GFSv16.

@ChristopherHill-NOAA
Copy link
Contributor Author

As discussed in #4614, the initially committed code change to exgfs_atmos_awips_20km_1p00.sh in this PR did not result in an adequate reduction of the SBN product volume and prompted additional action. The latest committed change restores the wgrib2 -set_grib_type same to the WGRIB2 generation of the product file and adds the wgrib2 -set_grib_type complex2 statement after product generation.

From a spot test of the relevant segment of code with F024 GRIB2 files generated by retrov17_01_realtime run of cycle 2026030506, the original, first commit, and second commit product file sizes (in bytes) are as follows:

product                       |  original |1st commit | 2nd commit
grib2.awpgfs_20km_ak_f024     |  13975428 |  13905757 |  13787115
grib2.awpgfs_20km_conus_f024  |  21132785 |  20969252 |  20768071
grib2.awpgfs_20km_pac_f024    | 122661267 | 122482405 | 121880110
grib2.awpgfs_20km_prico_f024  |  13642174 |  13013060 |  12893948
grib2.awpgfs024.003           |   6953598 |   6749455 |   6745135

When expanding these results to a hypothetical full forecast range for each product, which is 40 forecast hours for the '003' grid and 54 forecast hours for the other grids, the file size values (in MB) become:

product                       |  original |1st commit | 2nd commit
grib2.awpgfs_20km_ak_fFFF     |   719.712 |   716.124 |   710.015
grib2.awpgfs_20km_conus_fFFF  |  1088.305 |  1079.883 |  1069.523
grib2.awpgfs_20km_pac_fFFF    |  6316.861 |  6307.649 |  6276.632
grib2.awpgfs_20km_prico_fFFF  |   702.550 |   670.152 |   664.018
grib2.awpgfsFFF.003           |   265.259 |   257.471 |   257.306
						
total	                      |  9092.687 |  9031.280 |  8977.494
difference                    |           |   -61.407 |  -115.193

A partial workflow test, similar to one conducted on March 12, may help to confirm the volume reduction of a full cycle of SBN products that are stored to $COMROOT/{date}/{cycle}/products/atmos/wmo when incorporating the latest change to exgfs_atmos_awips_20km_1p00.sh.

@ChristopherHill-NOAA
Copy link
Contributor Author

According to degrib2 output, the addition of 'undefined' values for a PV-coordinate variables (and cloud-layer PRES) corresponds to an increase of the listed number of data points available for the variable to the maximum number available for any variable, but with no change to the calculated minimum, maximum, or average values.

In the case of the F024 CONUS grid product (grib.awpgfs_20km_conus_f024), the number of available data points for PV-coordinate variables is increased to the maximum of 94833.

The degrib2 output for the F024 sample of original products and new products are available here:

grib2_awpgfs_003_f024_new_degrib2.txt
grib2_awpgfs_003_f024_orig_degrib2.txt
grib2_awpgfs_20km_ak_f024_new_degrib2.txt
grib2_awpgfs_20km_ak_f024_orig_degrib2.txt
grib2_awpgfs_20km_conus_f024_new_degrib2.txt
grib2_awpgfs_20km_conus_f024_orig_degrib2.txt
grib2_awpgfs_20km_pac_f024_new_degrib2.txt
grib2_awpgfs_20km_pac_f024_orig_degrib2.txt
grib2_awpgfs_20km_prico_f024_new_degrib2.txt
grib2_awpgfs_20km_prico_f024_orig_degrib2.txt

ChristopherHill-NOAA and others added 5 commits March 19, 2026 10:05
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@ChristopherHill-NOAA
Copy link
Contributor Author

@JessicaMeixner-NOAA Apologies for the multiple commits, as I should have made and pushed the changes from WCOSS. The most recent commit of code should be ready for your testing.

@JessicaMeixner-NOAA
Copy link
Contributor

Updated output is here: /lfs/h2/emc/ptmp/jessica.meixner/comroot/prod01/gfs.20260305/06/products

The original v17 products are in original.products -- I forgot to copy over the first test of this PR, so I do not have that.

@ChristopherHill-NOAA
Copy link
Contributor Author

The SBN product size spreadsheet is updated to reflect the latest workflow test with the code changes newly committed to this PR. A summary of volume values (MB) for cycle 2026030506 is provided here:

totals
      v16      | v17 original  | v17 1st commit  | v17 2nd commit
  9287.95      |      9423.90  |        9356.31  |        9304.87
						
difference            +135.95  |         +68.36  |         +16.92
from v16

In addition to the grib2.awpsgfs* files ...

... the volume for v16 includes gfs_collective#.postsnd_{CC} files (178.95 MB), xtrn.awpgfs* files (70.93 MB), as well as gfs_500_hgt_tmp_nh_anl_{CC}.tif (0.11 MB) and tran.fbwnd_pacific.gfs_atmos_fbwind_{CC} (0.01 MB)

... the volumes for v17 include products/atmos/bufr/gfs_collective#.fil files (178.95 MB) and gfs_500_hgt_tmp_nh_anl_{CC}.tif (0.11 MB) and gfs.atmos.t{CC}z.fbwind.pacific.ascii (<0.01 MB)

.
The difference of +16.92 MB is significantly less than the standard deviation (68.87 MB) of the v16 product volume from 70 recent cycles.

@ChristopherHill-NOAA
Copy link
Contributor Author

ChristopherHill-NOAA commented Mar 19, 2026

{reposting revised comment}
The SBN product volume spreadsheet has been updated to include volume size information from the workflow test of code changes newly committed to this PR.

With the latest changes to exgfs_atmos_awips_20km_1p00.sh, the total difference of the v17 product volume from the v16 product volume for cycle 2026030506 is now calculated as +16.92 MB, which is significantly less than the standard deviation (68.87 MB) of the values of v16 product volume tabulated over 70 recent cycles.

@ChristopherHill-NOAA
Copy link
Contributor Author

Following Jessica's successful test of the committed code changes, and in following Jason's email from this morning, reiterating the request for review of this PR.

@JessicaMeixner-NOAA
Copy link
Contributor

I am running CI tests on WCOSS2. It would be good to have a reviewer from the products team as well. @ChristopherHill-NOAA do you have a suggestion for that?

@ChristopherHill-NOAA
Copy link
Contributor Author

@JessicaMeixner-NOAA I agree with adding a reviewer from the products team. I am adding @WenMeng-NOAA and @BenjaminBlake-NOAA .

export opt25=":(APCP|ACPCP|PRATE|CPRAT):"
export opt26=' -set_grib_max_bits 25 -fi -if '
export opt27=":(APCP|ACPCP|PRATE|CPRAT|DZDT):"
export opt28=' -new_grid_interpolation budget -fi '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChristopherHill-NOAA You might add new option (opt29) for switching compression here

Suggested change
export opt28=' -new_grid_interpolation budget -fi '
export opt29='-set_grib_type complex2'

gridconus="lambert:265.0:25.0:25.0 226.541:369:20318.0 12.19:257:20318.0"
# shellcheck disable=SC2086,SC2248
${WGRIB2} tmp_masterfile ${opt1uv} ${opt21} ${opt22} ${opt23} ${opt24} ${opt25} ${opt26} \
${opt27} ${opt28} -new_grid ${gridconus} "awps_file_f${fcsthr}_${GRID}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChristopherHill-NOAA Switching compression can be completed here.

Suggested change
${opt27} ${opt28} ${opt29}-new_grid ${gridconus} "awps_file_f${fcsthr}_${GRID}"

gridak="nps:210.0:60.0 170.0:277:22500 35.0:225:22500"
# shellcheck disable=SC2086,SC2248
${WGRIB2} tmp_masterfile ${opt1uv} ${opt21} ${opt22} ${opt23} ${opt24} ${opt25} ${opt26} \
${opt27} ${opt28} -new_grid ${gridak} "awps_file_f${fcsthr}_${GRID}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
${opt27} ${opt28} ${opt29} -new_grid ${gridak} "awps_file_f${fcsthr}_${GRID}"

gridprico="latlon 271.75:275:0.25 50.75:205:-0.25"
# shellcheck disable=SC2086,SC2248
${WGRIB2} tmp_masterfile ${opt1} ${opt21} ${opt22} ${opt23} ${opt24} ${opt25} ${opt26} \
${opt27} ${opt28} -new_grid ${gridprico} "awps_file_f${fcsthr}_${GRID}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
${opt27} ${opt28} ${opt29} -new_grid ${gridprico} "awps_file_f${fcsthr}_${GRID}"

gridpac="mercator:20.0 110.0:837:20000:270.0 -45.0:725:20000:65.7345"
# shellcheck disable=SC2086,SC2248
${WGRIB2} tmp_masterfile ${opt1} ${opt21} ${opt22} ${opt23} ${opt24} ${opt25} ${opt26} \
${opt27} ${opt28} -new_grid ${gridpac} "awps_file_f${fcsthr}_${GRID}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
${opt27} ${opt28} ${opt29} -new_grid ${gridpac} "awps_file_f${fcsthr}_${GRID}"

grid003="latlon 0:360:1.0 90:181:-1.0"
# shellcheck disable=SC2086,SC2248
${WGRIB2} tmp_masterfile ${opt1} ${opt21} ${opt22} ${opt23} ${opt24} ${opt25} ${opt26} \
${opt27} ${opt28} -new_grid ${grid003} "awps_file_f${fcsthr}_${GRID}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
${opt27} ${opt28} ${opt29} -new_grid ${grid003} "awps_file_f${fcsthr}_${GRID}"

@WenMeng-NOAA
Copy link
Contributor

@ChristopherHill-NOAA I suggest to combine compression switch with the interpolation process. You might also test the overall runtime, as these SBN data file generation could have impact data dissemination latency via SBN in GFS operation.

@BenjaminBlake-NOAA
Copy link

@ChristopherHill-NOAA Thanks for your work on this. I agree with @WenMeng-NOAA 's suggestion to combine the -set_grib_type complex2 command with the wgrib2 interpolation step - it should have the same effect as doing the compression in a separate command.

@JessicaMeixner-NOAA
Copy link
Contributor

@ChristopherHill-NOAA - I will re-run after any code changes requested by @WenMeng-NOAA and @BenjaminBlake-NOAA

In the meantime, I wanted to let you know that the regression tests failed for one test due to large gempak log files which was also reported here: #3630 (comment)

I will not finish running the regression tests until the code changes have been confirmed. I'll also re-run the high resolution tests to confirm any code changes are not unintentionally changing output sizes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

compliance GFS V17 This issue/PR is targeting GFS V17.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants