Skip to content

BCS processing to support HWSDv2 soil preprocessing and new Peat GPA22#1196

Draft
biljanaorescanin wants to merge 15 commits intodevelopfrom
feature/borescan_peat_and_hwsd
Draft

BCS processing to support HWSDv2 soil preprocessing and new Peat GPA22#1196
biljanaorescanin wants to merge 15 commits intodevelopfrom
feature/borescan_peat_and_hwsd

Conversation

@biljanaorescanin
Copy link
Contributor

@biljanaorescanin biljanaorescanin commented Jan 7, 2026

This PR has two parts.

First, it adds the full HWSDv2 soil preprocessing workflow (STEP1–STEP5), including supporting QA utilities and documentation. This workflow is included in the PR for continued development and testing, but it is not intended to be enabled for production BCS generation at this time, since current testing shows it produces substantially different land/soil outcomes than the existing baseline.

Second, it updates raster BCS soil processing to improve PEATMAP/GPA22-based peat handling in SoilProperties generation. The main production-targeted change in this PR is the strict GPA22 peat logic, which removes problematic HWSD-driven peat assignments and makes peat classification/property assignment internally consistent in the generated tiles.

Production intent in this PR:
retain legacy non-GPA22 behavior unchanged - tested for zero diff to v12 BCs
enable strict GPA22 peat handling through flags - tested with v15 BCs
prevent HWSD from generating peat outside the GPA22 peat support
preserve consistent peat/mineral tile identity during downstream soil parameter repair

Included in this PR:
HWSDv2 STEP1–STEP4 preprocessing (R / Python / Fortran)
STEP5 IDL tile generation (noMASK mode)
PEATMAP/GPA22 updates in SoilProperties generation
QA utilities (STATSGO comparison, plotting, tile checks)
README documenting code sequence, flags, and usage

Current status:
HWSDv2 full preprocessing path:
included for testing/development, not yet intended for production BCSs
PEATMAP/GPA22 updates:
validated for the targeted peat-classification fix and intended for use in the current BCS workflow

More details on the HWSDv2 workflow, inputs, and processing sequence are documented in the README.

NOTEs:
This PR does not switch production BCS generation over to the full HWSDv2 path by default. That path remains available behind flags while the GPA22 peat updates are the primary functional change being advanced here.

We enable use of GPA22 Peat with BCS v15.

This PR is zero diff trivial for GCM since all is preprocessing for BCS. It is also zero diff tested for v12 BCs.

@biljanaorescanin biljanaorescanin added 0 diff trivial The changes in this pull request are trivially zero-diff (documentation, build failure, &c.) Draft labels Jan 7, 2026
@biljanaorescanin
Copy link
Contributor Author

@gmao-rreichle I've tested with last commit BCS for v12 EASE M36 vs what is on develop now is zero diff and new ones are doing what they suppose to. So backward compatibility is preserved.

Copy link
Contributor

@gmao-rreichle gmao-rreichle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@biljanaorescanin : Many thanks for making the processing changes. I looked through the code in some detail but haven't yet had a chance to inspect the new bcs datasets. I'll do that later, but I'm confident that's just a formality since you already checked them.
See below for a few more comments and open questions.
Also, I added one commit that mostly edits the in-code comments but also makes one (probably non-zero-diff) change: 3d03f10?w=1
(The link hides white-space changes.)
We can discuss at the next tag-up.

Comment on lines +3823 to +3824
! In all PEATMAP variants, subsurface orgC must *not* fall into peat class.
! Reassign HWSD subsurface peat to peat-rich mineral Group 3.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still wondering if this legacy behavior is what we wanted originally, and if it makes sense for the new processing approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that’s a fair question. In this PR I kept that behavior because the intended rule in the strict GPA22 path was that tile peat should be controlled by mapped top-layer peat support, not created independently by HWSD subsurface OC. So carrying forward the oc_sub cap was a way to prevent HWSD sub-layer peat from reintroducing peat outside the mapped top-layer signal.

In that sense it is internally consistent with the current strict approach, but I agree the implementation is not very transparent, especially since profile peat is then effectively made to follow top peat later in the PEATMAP path. If we want to revisit this more cleanly, I think the right rule would be: top-layer peat support determines whether the tile is peat, and profile follows consistently, rather than relying on capped oc_sub plus later write-time forcing.

For this PR I left the legacy oc_sub handling unchanged to minimize scope, but I’m happy to change it if you want the new processing path to express that rule more explicitly. Just let me know should I change this?

! threshold relative to the full weighted tile support. Otherwise
! choose the dominant mineral orgC class (1–3).
!
if (cFamily(4)/(real(i) + 2.33*real(i)) > PEATMAP_THRESHOLD_2) then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments here:

  1. Like for the top layer, for the tile to have o_clp=4 (i.e., profile is peat), the if condition requires that an absolute majority of all raster grid cells that make up the tile is peat, regardless of how many raster grid cells are no-data.

  2. I'm still wondering how we want to handle the peat class for the sub-layer. Early on in the processing, we limit oc_sub to 8/sf (peat-rich mineral class). Then here we compute a weighted average of orgC in the top and sub layers with the goal to determine the orgC profile class. This makes little sense. I guess our choice of oc_top=33/sf for peat raster grid cells ensure that the profile class becomes peat regardless of the value of oc_sub: even when oc_sub =0, we have oc_profile = (0.333+0.70)=9.9 > cF_Lim(4). This works except for the case when the sub-layer orgC is good data but the top layer orgC is missing (e.g., outside the coverage of PEATMAP or GPA22). Then the result is case-dependent.
    Would it make more sense to simply assign the peat class of the tile based on oc_top alone? That is, if the tile has oc_top>cF_Lim(4), then o_cl = 4 and o_clp = 4 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a fair point. In the strict GPA22 path, we already remove sub-layer peat support earlier by capping oc_sub below the peat threshold. That means the weighted profile peat test here is no longer really the mechanism that assigns peat profile class. In practice, peat profile output is currently coming from the later PEATMAP rule that forces written profile peat when the top class is peat. So the intended science is really that top-layer GPA22 peat support determines tile peat, and profile follows consistently.
A cleaner strict-GPA22 implementation would therefore be to set o_clp = 4 whenever the strict top peat condition is met, and otherwise use the weighted profile support only to distinguish mineral classes 1–3.

Is that the behavior we want in the strict GPA22 path? I can change this if it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this if condition:
if (cFamily(4)/(real(i) + 2.33*real(i)) > PEATMAP_THRESHOLD_2)
can ever be true. In the loop above where cFamily is computed, we should never get cFamily>real(i) because oc_sub was earlier capped at cF_lim(4), so the sub layer should never contribute to the sum. After normalization with (real(i) + 2.33*real(i)), we should then always be below PEATMAP_THRESHOLD_2. In other words, we would never get o_clp=4 from this section of the code.
This would then always put us into the "mineral" block of the code below (~Line 4524), and poc_vec would be computed from the organic carbon content values of the contributing mineral raster grid cells.
What I don't understand is how still we end up with soil_class_com=253 even when o_clp<=3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that comment is right.
In the strict GPA22 path, after oc_sub is capped below the peat threshold, the weighted profile peat test can no longer make o_clp = 4.
The reason we still see profile peat in the output is that later in the PEATMAP path the written profile class fac is forced to 253 whenever the top class is peat.
In addition, soil_class_com(n) itself can still become 253 through the strict fallback (soil_class_com < 1 -> soil_class_top) or neighbor fill. So the output is peat/peat, but the current implementation is still mixed.

I think I found fix in my notes and will try that and see.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the strict GPA22 path, after oc_sub is capped below the peat threshold, the weighted profile peat test can no longer make o_clp = 4.

Then the if block makes no sense at all.

The reason we still see profile peat in the output is that later in the PEATMAP path the written profile class fac is forced to 253 whenever the top class is peat.

Where is that exactly?

In addition, soil_class_com(n) itself can still become 253 through the strict fallback (soil_class_com < 1 -> soil_class_top) or neighbor fill.

This should not happen very often. But we get %OCProf<8.72 for nearly all peat tiles.

Copy link
Contributor Author

@biljanaorescanin biljanaorescanin Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The place where written profile peat is still being forced is the later PEATMAP output block:

fac_surf = soil_class_top(n)
fac = soil_class_com(n)
if(use_PEATMAP) then
! the maximum peat soil depth is set to the value Michel used to derive parameters (5000.)
if (fac_surf == 253) soildepth(n) = 5000. ! max(soildepth(n),5000.)
! reset subsurface to peat if surface soil type is peat
if (fac_surf == 253) fac = 253
endif

So the final written prof can become peat even when the upstream profile classification path remained on the mineral side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks. But at this point in the code, only the profile soil class is changed, and the associated parameters (e.g., poc_vec [%OCProf]) remain unchanged, even though for peat tiles they were calculated from the mineral raster grid cells.
It will NOT work to add modifications of poc_vec etc here. For backward compatibility, we need to keep the legacy path as is. For "STRICT GPA22", we need more invasive changes further up. It makes no sense to reset oc_sub to mineral, and then diagnose o_clp from that. After the reset, o_clp can never be peat, at least not if it's based on the logic in what's currently in L4347.
My hope is that with the appropriate in-line "if (STRICT GPA)" conditions, we can salvage the overall structure of the code and fix the new logic while maintaining legacy calcs in one subroutine. If we had more time and resources, we'd want to rewrite all of this stuff from scratch in a new subroutine soil_para_hwsd_peat() or some such, but that would almost certainly require more effort than we can afford at this time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last commit should resolve this. I've shared BCs set produced with this commit with you on chat to confirm.

endif

! If the profile class is missing, search "neighboring" tiles for a fill value,
! where "neighbors" are defined in terms of proximity of tile index (not necessarily geographical proximity)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that "neighbor" here is simply a neighbor in indexing space, and not necessarily in geographic space. This approach is used in the legacy approach and can't be changed for the legacy approach, but it may be worth thinking about something more geographically meaningful for "STRICT_GPA22" processing. Not a high priority though, and probably doesn't impact many tiles anyway. I'm adding the comment mostly so we can revisit the question later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. In this section, “neighbor” is still defined in index space, not geographic space. For the legacy path that needs to stay unchanged for backward compatibility. In the strict GPA22 path we only tightened the consistency of what gets copied once a donor is chosen, we did not redesign the donor search itself. A geographically based search could be cleaner for strict GPA22, but it would also be more computationally expensive. Since this fill is rare and the current behavior is already much improved, I’d treat that as a follow-up rather than part of this PR. But we can change this if you insist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to revise my earlier response to this question after the follow-up investigation.

The concern about index-space “neighbors” is more relevant to STRICT_GPA22 than I first implied. The raw GPA22 → conservative PEATMAP rasterization checks out, and the first discrepancy appears later, in the PEATMAP → M09 stage. In the strict path, the early donor fill still uses index-space neighbor selection, and once a donor is chosen it can copy peat-defining state (soil_class_top, soil_class_com, pmap) from that donor. So this is now a plausible source of the most suspicious anomaly subset we found.

In particular, the strongest anomaly set is small: 20 tiles where the raw GPA22 tile box is fill-only but M09 still ends up peat; 18 of those 20 are in Maritime SE Asia. That makes the donor-fill behavior worth tightening in STRICT_GPA22.

If we want to resolve this in the PR, I see two options:

Option 1 (minimal fix): keep the legacy donor search, but in STRICT_GPA22 do not allow the fill step to create peat identity from the donor. In practice, that means tightening the strict-path fill so donor-selected values cannot newly assign peat-defining state (soil_class_top, soil_class_com, pmap) to a target tile without direct strict GPA22 / PEATMAP support.

Option 2 (cleaner fix): switch donor selection in STRICT_GPA22 from index-space to geographic-space.

My preference for this PR is Option 1, because it enforces the intended strict GPA22 logic with much less algorithmic risk. Option 2 is conceptually cleaner, but it is a broader redesign and computationally more expensive.

The main location that needs attention is here:

if (PEATMAP_STRICT_GPA22) then
! Strict GPA22:
! borrow only from a neighbor with both top and profile classes
! defined, and copy the associated tile properties consistently.
if (i1 >= 1) then
if ((soil_class_top(i1) >= 1) .and. (soil_class_com(i1) >= 1)) j = i1 ! start with "lower" neighbor unless out of range
endif
if (i2 <= n_land) then
if ((soil_class_top(i2) >= 1) .and. (soil_class_com(i2) >= 1)) j = i2 ! "upper" neighbor prevails unless out of range
endif
if (j > 0) then
soil_class_com(n) = soil_class_com(j)
soil_class_top(n) = soil_class_top(j)
grav_vec(n) = grav_vec(j)
soc_vec(n) = soc_vec(j)
poc_vec(n) = poc_vec(j)
pmap(n) = pmap(j)
soildepth(n) = soildepth(j)
endif
else

I have an idea for the minimal fix and will test it to see whether it resolves this anomaly set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: Unfortunately, after testing, this block does not appear to be responsible for the 20-tile anomaly subset. The strict donor-fill path never triggered in this case, and replacing it did not change the results.

! NGDC : Soil parameters from Reynolds et al. 2000, doi:10.1029/2000WR900130 (MERRA-2, Fortuna, Ganymed, Icarus)
! HWSD : Merged HWSDv1.21-STATSGO2 soil properties on 43200x21600 with Woesten et al. (1999) parameters
! HWSD_b : As in HWSD but with surgical fix of Argentina peatland issue (38S,60W)
! HWSD_c : HWSD updated from v1.2 to v2. And sub and top are D2 layer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of HWSD_c we should call this HWSDv2 for clarity.


logical, public, save :: use_PEATMAP = .false.
logical, public, save :: jpl_height = .false.
logical, public, save :: PEATMAP_STRICT_GPA22 = .false.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config variables and the variable names in the code will need some cleanup.

  • We won't need to run the legacy PEATMAP inputs with the "STRICT_GPA22" processing tweaks. It would be cleaner to identify the name of the peat map data product, and have the if conditions based on the product name.
  • The name "PEATMAP" is specific to the legacy peat map, and the name of the "GPA22" map in the published literature is "Global Peat Map 2.0". (I think GPA22 was just a working name during the earlier phase of product development.)
  • So instead of the logical "use_PEATMAP", we may want to have an integer config variables that something like:
  PEAT_INFO =              0 (HWSD), 
                           1 (HWSD and PEATMAP)
                           2 (Global Peat Map 2.0)

E.g., the if condition "if (use_PEATMAP)" would then translate to something like:

select case (PEAT_INFO)
case (0)
   ! use_PEATMAP=.false.
case (1,2)
   ! use_PEATMAP=.true.

etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense. The current use_PEATMAP + PEATMAP_STRICT_GPA22 split was mainly introduced to isolate and validate the GPA22 logic while preserving legacy behavior. I agree that the long term configuration would be cleaner if it keyed off the peat information product explicitly rather than overloading the legacy PEATMAP name. I can work on cleaning that up and push a follow-up commit once it is done and retested.

Copy link
Contributor

@gmao-rreichle gmao-rreichle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@biljanaorescanin : I still don't understand how this code works. See comment below

! threshold relative to the full weighted tile support. Otherwise
! choose the dominant mineral orgC class (1–3).
!
if (cFamily(4)/(real(i) + 2.33*real(i)) > PEATMAP_THRESHOLD_2) then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this if condition:
if (cFamily(4)/(real(i) + 2.33*real(i)) > PEATMAP_THRESHOLD_2)
can ever be true. In the loop above where cFamily is computed, we should never get cFamily>real(i) because oc_sub was earlier capped at cF_lim(4), so the sub layer should never contribute to the sum. After normalization with (real(i) + 2.33*real(i)), we should then always be below PEATMAP_THRESHOLD_2. In other words, we would never get o_clp=4 from this section of the code.
This would then always put us into the "mineral" block of the code below (~Line 4524), and poc_vec would be computed from the organic carbon content values of the contributing mineral raster grid cells.
What I don't understand is how still we end up with soil_class_com=253 even when o_clp<=3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

0 diff trivial The changes in this pull request are trivially zero-diff (documentation, build failure, &c.) Draft

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants