Skip to content

Updating section names to more useful categories, alphabetizing names within each section#140

Open
mkavulich wants to merge 26 commits intoESCOMP:mainfrom
mkavulich:feature/update_sections
Open

Updating section names to more useful categories, alphabetizing names within each section#140
mkavulich wants to merge 26 commits intoESCOMP:mainfrom
mkavulich:feature/update_sections

Conversation

@mkavulich
Copy link
Collaborator

Description

This PR reorganizes the existing standard names (with no changes, except some updated descriptions) into a new section heirarchy that removes references to specific modeling systems (specifically GFS typedefs). The new sections are mostly descriptive of the way the variable is used, and I attempted to make the sections as generic as possible. With this type of natural language organization I believe it's impossible to unambiguously assign certain variables to certain sections, but I have attempted to keep things as organized as possible. The new sections are in bold below

  • Base names
    • Generic names
    • Chemical species
    • Base standard Names
  • Dimensions
  • Constants
  • Coordinates
  • Timing Variables defining or relating to timing, dates, calendar, and related concepts
  • Atmospheric properties
  • Marine
  • Tracers Tracers are numerically zero-mass particles advected in fluid flow, typically representing some trace gas, particle, or other physical substance
  • Atmospheric composition
    • Gasses
    • Precipitation, cloud, and hydrometeor variables
    • Aerosols
    • Emissions Emissions variables, contributed for the Community Emissions Data System (CEDS)
  • Application-specific variables
    • Required CCPP framework-provided variables
    • Optional CCPP framework-provided variables
  • System variables
  • Control variables Variables that indicate or control some action.
  • Indices Values indicating the index of some array or other data structure
  • Coefficients Coefficients includes scaling factors, tunable parameters, and other similar variables
  • Thresholds Thresholds represent some value at which the behavior of some process changes, including maximums and minimums
  • Stochastic physics variables
  • Radiation
  • Atmospheric surface and boundary layer
  • Land surface, subsurface, and vegetation properties
  • Convective physics parameters
  • Gravity wave drag parameters
  • Tendencies
  • Chemistry processes

I am very open to feedback about changing these specific section names, so please review away. I tried to keep the sections as generic as possible, avoiding references to specific models or types of parameterization, but it wasn't always possible from my point of view.

Within each section, standard names are now alphabetized to give a more logical and unambiguous sorting. This was accomplished with a new tool, tools/sort_standard_names.py, written by Claude Code running locally with gpt-oss:20b. I also added another Claude-Code-written tool, tools/list_names.py, that gives a monolithic alphabetized list of all standard names; I used this to ensure that no names were accidentally lost in the reorganization. I have thoroughly reviewed the Claude-generated scripts, and attest that I understand and approve of their functionality.

I have integrated the alphabetization check into the GitHub CI, and added a rule about this alphabetization to the Rules document.

Because the alphabetization is maintained by a tool, it does constrain the formatting and indentation of the XML. I believe this is a fine tradeoff, since the Metadata files are designed to be human-readable and it's a lesser concern for the XML. But I'm open to feedback on this.

Finally, there was a lot of text in the comment of the Dimensions section that was specific to CCPP; I have removed this text and added it to the CCPP technical documentation (NCAR/ccpp-doc#80)

Issues

Resolves #135

  - New sections "timing" and "stochastic physics"
  - Continue populating dimensions, coordinates, system variables
 - Rename "state_variables" --> "atmospheric properties"
 - Delete and reallocate "diagnostics" section
 - Rearrange atmospheric_composition into subsections
 - New "radiation" section
 - Continuing to depopulate bad "GFS_typedef" sections
 - Fix some indentation
 - Rename "precipitation and hydrometeors" to "precipitation, cloud, and hydrometeor variables"
 - New section "control variables"; move all "do_" prefix variables here
…adding more parameterization-specific sections

 - New sections "Convective physics parameters", "Gravity wave drag parameters"
 - Merged the two different Aerosol sections; those that are model-specific added to description
 - Merged "Land and water surface properties" into "Land surface, subsurface, and vegetation properties"
 - Added an "Other" section for now, I hope to clean this up into more discrete categories going forward
…; can be used for comparisons after reorganization
 - Update CI tests to consistently call python scripts with python3
 - Add execution permissions to python scripts
run: |
tools/check_xml_unique.py standard_names.xml
tools/check_xml_unique.py standard_names.xml --field="description"
python3 tools/check_xml_unique.py standard_names.xml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't hurt, but why do we need this?

All of the scripts have

#!/usr/bin/env python3

in the shebang.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to go for consistency, but you're right it's unnecessary, so I've removed it from all script calls.

from pathlib import Path

try:
from lxml import etree
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we can't use functionality in lib/xml_tools.py, or at least the same Python XML libraries? Why install another, potentially redundant lxml library?

import xml.etree.ElementTree as ET

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of this, but for some reason I thought the built-in XML couldn't output these nicely formatted and indented XML files. Turns out the built-in is actually better, fixing a few indent problems I noticed.

- name: Marine
comment: null
standard_names:
- name: derivative_of_diurnal_thermocline_layer_thickness_wrt_surface_skin_temperature
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that for the atmospheric variables, the surface and boundary levels, (often used for coupling?) are in a separate category from the atmospheric properties. Should we have a similar structure for Marine variables?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed to further categorization, but I personally don't have a good sense of how ocean variables might be binned in this way. It seems to me that in all our current contexts (being atmosphere-centric), ocean modeling deals with just the surface and boundary layers, with nothing really done below that, so it might be redundant? I'm not sure to be honest.

mkavulich and others added 4 commits March 12, 2026 14:32
…output these well-formatted XMLs with just the standard libraries. It even fixes some indent problems with the original script
Wording change from Dom

Co-authored-by: Dom Heinzeller <dom.heinzeller@icloud.com>
Copy link
Collaborator

@climbfuji climbfuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for addressing my comments. This is a lot nicer now.

I am happy with the proposed sections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Need a better organization for the naming sections

3 participants