Skip to content

Graphics Maintenance - compatible with latest npl#2

Closed
ahijevyc wants to merge 13 commits intoJCSDA:developfrom
ahijevyc:maintenance/graphics/use_pandas_not_custom_functions
Closed

Graphics Maintenance - compatible with latest npl#2
ahijevyc wants to merge 13 commits intoJCSDA:developfrom
ahijevyc:maintenance/graphics/use_pandas_not_custom_functions

Conversation

@ahijevyc
Copy link
Copy Markdown
Contributor

@ahijevyc ahijevyc commented Mar 5, 2026

TYPE: maintenance

DESCRIPTION OF CHANGES:
This PR provides a comprehensive update to the MPAS-JEDI graphics suite. Key improvements focus on code simplicity, robustness, and compatibility with lastest Python conda environment (npl-2025b).

1. Modernization

  • Pandas Integration: Migrated datetime parsing in AnalyzeStats.py to pd.to_datetime for more flexible input handling.
  • Vectorized Data Extraction: Rewrote DFWrapper and associated helper functions (dfIndexLevels, uniquevals) to use native Pandas .unique(), .isin(), and boolean masking. This eliminates slow Python loops and drastically reduces memory overhead for large datasets.
  • NumPy Compatibility: Replaced deprecated np.int and np.float with native int and float types to support NumPy 1.24+ and prevent future deprecation errors.

2. Warning Elimination (Limits & Normalization)

  • Zero-Centered Limiter Logic: Reordered the logic in oneHundredCenteredLimiter and zeroCenteredLimiter to apply safety clamps before performing division. This silences RuntimeWarning: divide by zero encountered in scalar divide which occurred when dmin was initially 0.

  • Safe Normalization: Fixed BinValAxes2D.py to explicitly check if normalizingStat is non-zero and finite before calculating relative differences. Gaps in control data now correctly result in NaN rather than inf or runtime warnings.

3. Dependency and Code Cleanup

  • plot_utils Decoupling: Severed dependencies on plot_utils in several core modules. Replaced pu.prepends and pu.postpends with native Python .startswith() and .endswith().
  • String Parsing: Refactored TDelta_dir into a cleaner dictionary-based replacement loop.
  • Path Resilience: Updated analyze_config.py to use os.path.join and generic environment variables ($SCRATCH) instead of hardcoded user paths.

Note: This PR removes the requirement for the npl-2023a conda environment; the suite now runs on npl-2025b. No more divide by zero or invalid value warnings. Code is shorter but preserves legacy functionality.

ISSUE:
N/A

LIST OF MODIFIED FILES:

  • graphics/AnalyzeStats.py
  • graphics/analysis/AnalysisBase.py
  • graphics/analysis/BinValAxisStatsComposite.py
  • graphics/analysis/StatisticsDatabase.py
  • graphics/analysis/category/CategoryBinMethodBase.py
  • graphics/analysis/category/FCScoreCard.py
  • graphics/analysis/multidim/BinValAxes2D.py
  • graphics/analysis/multidim/BinValAxisProfile.py
  • graphics/analyze_config.py
  • graphics/basic_plot_functions.py
  • graphics/standalone/plot_diag.py

TESTS CONDUCTED:

  • Conducted side-by-side comparison of generated plots; all output remains identical to legacy code.
  • Verified that logs are now silent regarding "Divide by zero" and "Invalid value" warnings during standard workflows.
  • Verified execution in the latest NPL environment without legacy environment activation.

ahijevyc added 13 commits March 4, 2026 10:16
is always a group in the h5 file. Sometimes it is DerivedObsError.
Provide a warning to look in the file if a group does not exist.
predictors are plotted. Also make plot_predictors_distri a command line
option. False by default, as usual.
to warn user instead of print statements. May want to use specific logger as done
in scripts in parent graphics dir, but this is a start.
In older versions of NumPy, np.float was just an alias for the Python built-in float.
to parse first and last cycle times. Allows '20250610' and
'20250610T00'.
instead of hard-coding /derecho/scratch/user

Add checks:
  1. if firstCycle time is defined, lastCycle must be too
  2. firstCycle < lastCycle
Overhauled DataFrame Slicing (loc, levels):
Replaced the fragile locTuple and locdf helper functions with native Pandas boolean masking inside DFWrapper.loc().

Replaced self.df.append with pd.concat()

Stripped out custom Python loops in uniquevals, min, max, va, loc1 and
replaced with native Pandas
Fixed insert method of MultipleBinnedStatistics
Don't isinstance(val, Iterable) to verify val is array. It could be a
str and slice it up. Fix Assertion message (comma inside parentheses).
Unneeded copy (val[:]).
and oneHundredCenteredLimiter
@ahijevyc ahijevyc closed this Mar 5, 2026
@ahijevyc
Copy link
Copy Markdown
Contributor Author

ahijevyc commented Mar 5, 2026

Sorry. I meant to push to JCSDA-internal not JCSDA

@ahijevyc ahijevyc deleted the maintenance/graphics/use_pandas_not_custom_functions branch March 5, 2026 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant