A MATLAB-based framework for spatiotemporal analysis and estimation of tropospheric ozone concentrations using Bayesian Maximum Entropy (BME) methods. This codebase processes TOAR-II (Tropospheric Ozone Assessment Report) observational data and performs data fusion with RAMP-corrected chemical transport model (CTM) outputs to generate high-resolution ozone maps.
This framework implements a comprehensive spatiotemporal data fusion system that:
- Processes multi-source ozone data: Integrates TOAR-II observational monitoring data with CTM model outputs
- Applies advanced statistical methods: Uses BME and kriging techniques for optimal spatial estimation
- Handles large-scale datasets: Processes global ozone data with efficient caching and HPC support
- Provides uncertainty quantification: Generates both mean predictions and variance estimates
- Supports multiple estimation strategies: Offers different global offset scenarios, covariance models, and data formats
- Global coverage: Analyzes ozone data across multiple regions (CONUS, Europe, East Asia, etc.)
- Temporal analysis: Monthly time resolution from 2015-2020
- Soft data integration: Incorporates RAMP-corrected CTM model outputs as probabilistic constraints
- Validation framework: Leave-one-out cross-validation (LOOCV) with HPC parallelization
- Diagnostic tools: Comprehensive visualization and quality control utilities
- Multiple kriging formats: Optimized algorithms for vector (STV), grid (STG), and unstructured grid (STUG) data
.
├── README.md # This file
├── README_diagnostic_visualization.md # Diagnostic tools documentation
├── 2softdata/ # RAMP-corrected CTM soft data
│ └── README.md # Soft data integration guide
├── hpc_validation/ # HPC validation system
│ └── README.md # HPC job submission guide
│
├── Core Data Processing
│ ├── getTOARobservationalData.m # Load and QC observational data
│ ├── getTOARglobalOffset.m # Estimate global offset (7 scenarios)
│ ├── getTOARautoCov.m # Fit spatiotemporal covariance model
│ └── getTOARknowledgeBase.m # Create BME knowledge base
│
├── BME Estimation
│ ├── estTOARsBME.m # Main BME spatial estimation
│ ├── krigingME_stg.m # STG format kriging
│ ├── krigingME_stug.m # STUG format kriging (optimized)
│ └── estBMEs_stg.m # Alternative BME interface
│
├── Validation
│ ├── validateTOAR_LOOCV.m # Leave-one-out cross-validation
│ ├── validateTOAR_monthly.m # Monthly validation statistics
│ ├── validateTOARsBME.m # BME estimation validation
│ └── calculateValidationStats.m # Compute performance metrics
│
├── Soft Data System
│ ├── loadRAMPdata.m # Load RAMP-corrected parquet files
│ ├── createSoftDataStructure.m # Create BME-compatible soft data
│ ├── plotSoftData.m # Visualize soft data fields
│ └── subsetSoftData.m # Spatial/temporal subsetting
│
├── Visualization & Diagnostics
│ ├── plotTOARsBME.m # Plot BME mean estimates
│ ├── plotTOARsBMEvar.m # Plot BME variance/uncertainty
│ ├── visualizeBMEdiagnostic.m # Diagnostic tool for artifacts
│ ├── plotTOARvalidation.m # Validation scatter plots
│ └── createTOARanalysisReport.m # Generate analysis reports
│
├── Utilities
│ ├── getTOARmapGrid.m # Generate estimation grids
│ ├── getTOARareaBoundaries.m # Define regional boundaries
│ ├── getCTMspatialGrid.m # Extract CTM model grids
│ ├── extractModelSpatialInfo.m # Process model spatial info
│ ├── parseBMEmethod.m # Parse BME method codes
│ └── BMEmethodType.m # BME method configuration
│
├── Analysis & Exploration
│ ├── analyzeTOAR.m # Data exploration and QC
│ ├── exploreTOARdata.m # Interactive data exploration
│ ├── assessTOARdataQuality.m # Quality assessment
│ └── calculateTOARseasonality.m # Seasonal pattern analysis
│
└── Examples & Tests
├── example_diagnose_vertical_lines.m # Diagnostic workflow example
├── test_softdata_workflow.m # Soft data integration test
└── test_gos.m # Global offset testing
Required Software:
- MATLAB R2019b or later
- Statistics and Machine Learning Toolbox
- Mapping Toolbox (optional, for enhanced visualization)
Required Libraries:
- BMELIB - Bayesian Maximum Entropy library
- Parquet file reader (included in MATLAB R2019a+)
-
Clone the repository:
git clone <repository-url> cd gO3_II_mo
-
Install BMELIB:
# Download and add to MATLAB path git clone https://github.com/wiesnerfriedman/BMELIB.gitIn MATLAB:
addpath(genpath('/path/to/BMELIB')); savepath;
-
Prepare data directories:
% Create required directories mkdir('1data/TOAR-II'); mkdir('1data/CTM'); mkdir('1data/CTM/model_output_data/spatial_grids'); mkdir('2globalOffset'); mkdir('3covariance'); mkdir('4knowledgeBase'); mkdir('5BMEspatialPlots'); mkdir('7validation');
-
Download TOAR-II data: Place TOAR-II monthly MDA8 CSV files in
1data/TOAR-II/:TOAR-II-monthly-mda8-2015.csv TOAR-II-monthly-mda8-2016.csv ... TOAR-II-monthly-mda8-2020.csv -
Prepare CTM model data (optional, for soft data):
- Place RAMP-corrected parquet files in
1data/CTM/ - Run
extractModelSpatialInfo.mto generate spatial grid files
- Place RAMP-corrected parquet files in
Here's a complete example for estimating ozone over Europe in 2016:
%% 1. Load observational data
obs = getTOARobservationalData('all', [2015 2020], 0);
% 'all' = all station types, [2015 2020] = years, 0 = no log transform
%% 2. Estimate global offset
go = getTOARglobalOffset(obs, 3, 1);
% Scenario 3 = regional smoothing (recommended)
% Plot level 1 = basic plots
%% 3. Fit covariance model
cov = getTOARautoCov(obs, go, [], 1);
% Auto-fit exponential covariance model with plotting
%% 4. Create BME knowledge base
BMEmethod = '10000132'; % Hard data only, nhmax=100
[KG, KS, BMEparam] = getTOARknowledgeBase(obs, go, cov, [], BMEmethod);
%% 5. Set up estimation parameters
estParam.areaCode = 2; % Europe
estParam.mapResolution = 1.0; % 1 degree grid
estParam.tkVec = 2016:1/12:2017; % Monthly for 2016
estParam.forceEstimation = 0; % Use cache if available
estParam.plotResults = 1; % Create plots
estParam.keepOnlyLand = 1; % Land points only
estParam.includeAntarctica = 0; % Exclude Antarctica
%% 6. Run BME estimation
estTOARsBME(obs, go, cov, KG, KS, BMEparam, estParam);
% Results saved to: 5BMEspatialPlots/BME10000132_go3_lt0_area2_res1.00_*.matTo incorporate RAMP-corrected CTM model outputs:
%% Load RAMP-corrected CTM data
ctmData = loadRAMPdata('MERRA2-GMI', [2015:2020]);
%% Create soft data structure
options.spatialBounds = [-125 -65 24 50]; % CONUS
options.thinningFactor = 2; % Use every 2nd grid point
softData = createSoftDataStructure(ctmData, obs, options);
%% Create knowledge base with soft data
BMEmethod = '11000142'; % Soft data enabled, nsmax=50
[KG, KS, BMEparam] = getTOARknowledgeBase(obs, go, cov, softData, BMEmethod);
%% Proceed with estimation as above
estTOARsBME(obs, go, cov, KG, KS, BMEparam, estParam);Run leave-one-out cross-validation:
%% Single year validation
validationYears = 2016;
validationMonths = 1:12; % All months
forceEstimation = 0; % Use cache
results = validateTOAR_LOOCV(obs, go, cov, BMEparam, ...
validationYears, validationMonths, forceEstimation);
%% View validation statistics
fprintf('R² = %.3f\n', results.R2);
fprintf('RMSE = %.2f ppb\n', results.RMSE);
fprintf('MAE = %.2f ppb\n', results.MAE);The 8-digit BME method code controls estimation behavior:
Format: ABCDEFGH
A (Estimation Type):
1 = BME probabilities (BMEprobaMoments)
2 = Kriging with measurement error (krigingME)
B (Soft Data):
0 = Hard data only
1 = Hard + soft data (CTM models)
C-E: Reserved (use 000)
F (Hard Data Neighbors):
0 = 0 neighbors
1 = 100 neighbors
2 = 200 neighbors
3 = 300 neighbors
4 = 400 neighbors
G (Soft Data Neighbors):
0 = 0 neighbors
1 = 3 neighbors
2 = 4 neighbors
4 = 50 neighbors
6 = 200 neighbors
H (Probability Type):
1 = BME probabilities
2 = Kriging
Common configurations:
10000132: Hard data only, nhmax=100, kriging (default)11000142: Hard + soft, nhmax=100, nsmax=50, kriging (recommended with CTM)20000132: BME probabilities, hard data only, nhmax=100
| Scenario | Type | Spatial Scale | Temporal Scale | Use Case |
|---|---|---|---|---|
| 0 | Zero | None | None | Testing/debugging |
| 1 | Flat | None | None | Constant offset |
| 2 | Domain-wide | Global (180°) | Long (50 years) | Continental patterns |
| 3 | Regional | Regional (90°) | Medium (20 years) | Recommended |
| 4 | Local | Local (45°) | Short (10 years) | Urban-scale |
| 5 | Regional S/T | Regional (90°) | Short (10 years) | Seasonal regional |
| 6 | Local S/T | Local (45°) | Very short (5 years) | Urban seasonal |
| 7 | Super-local | Very local (10°) | Ultra-short (2 years) | High-resolution |
Recommendation: Start with Scenario 3 (regional) for most applications.
Pre-defined regional boundaries:
| Code | Region | Longitude | Latitude |
|---|---|---|---|
| 0 | Global | -180 to 180 | -60 to 75 |
| 1 | CONUS | -125 to -65 | 24 to 50 |
| 2 | Europe | -12 to 45 | 35 to 72 |
| 3 | East Asia | 95 to 145 | 18 to 55 |
| 4 | South Asia | 60 to 95 | 5 to 40 |
| 5 | Africa | -20 to 52 | -35 to 38 |
| 6 | South America | -82 to -34 | -56 to 13 |
| 7 | Australia | 112 to 154 | -44 to -10 |
| 8 | North America | -168 to -50 | 15 to 75 |
| 9 | Middle East | 25 to 65 | 10 to 42 |
| 10 | Arctic | -180 to 180 | 60 to 75 |
Optimized for regularly-gridded soft data (CTM models):
BMEparam.dataFormat = 'stg';- Best for: Regular CTM model grids
- Performance: Fast for grid-aligned data
- Function:
krigingME_stg.m
Fastest for large uniform grids:
BMEparam.dataFormat = 'stug';- Best for: Large-scale estimation with uniform CTM grids
- Performance: Fastest overall
- Function:
krigingME_stug.m
Standard vector format:
BMEparam.dataFormat = 'stv';- Best for: Hard data only or small datasets
- Performance: Slower but most flexible
- Function:
krigingME.m
If you observe vertical lines or artifacts in BME standard deviation maps:
% Use diagnostic visualization tool
opts.metric = 'std';
opts.display = 'grid';
opts.modelName = 'MERRA2-GMI'; % Overlay model grid
visualizeBMEdiagnostic('5BMEspatialPlots/BME*.mat', opts);See README_diagnostic_visualization.md for detailed troubleshooting.
Common solutions:
- Try different data format:
stg↔stug - Adjust search radius: Increase
BMEparam.dmax - Check covariance parameters
- Verify soft data grid consistency
Run data quality assessment:
assessTOARdataQuality(obs, 1); % 1 = create plotsExplore data interactively:
exploreTOARdata(obs, go);For large-scale validation across multiple scenarios, use the HPC system:
cd hpc_validation
bash submit_all_jobs.sh # Submit LOOCV jobs for all scenarios
bash monitor_jobs.sh watch # Monitor progressAfter completion:
cd hpc_validation
aggregate_results([2 3 6 7], [2016:2019]);See hpc_validation/README.md for details.
The framework computes comprehensive validation statistics:
- R² (R-squared): Variance explained (0-1, higher is better)
- RMSE (Root Mean Square Error): Overall prediction error (ppb, lower is better)
- MAE (Mean Absolute Error): Average absolute error (ppb, lower is better)
- ME (Mean Error): Systematic bias (ppb, closer to 0 is better)
- NMB (Normalized Mean Bias): Percentage bias (%, closer to 0 is better)
- NME (Normalized Mean Error): Percentage error (%, lower is better)
- IOA (Index of Agreement): Overall agreement (0-1, higher is better)
- FAC2 (Factor of 2): Fraction within factor of 2 (0-1, higher is better)
Location: 5BMEspatialPlots/
Filename format:
BME{method}_go{scenario}_lt{logtransf}_area{code}_res{resolution}_{format}_land{flag}_time{YYYY.YY}.mat
Example:
BME10000132_go3_lt0_area2_res1.00_stg_land1_time2016.50.mat
Contents:
sk: Estimation grid coordinates [nGrid × 2]tk: Estimation time (decimal year)XkBMEm: Residual mean estimates [nGrid × 1]XkBMEv: Residual variance estimates [nGrid × 1]gok: Global offset at estimation points [nGrid × 1]YkBMEm: Final predictions (XkBMEm + gok) [nGrid × 1]sMSobs: Observation locations [nObs × 2]Yobs: Observation values [nObs × 1]
Location: 7validation/
Monthly cached results:
TOAR_LOOCV_BME{method}_go{scenario}_y{year}_m{month}.mat
Summary results:
validation_summary_go{scenario}_y{years}.mat
- Use caching: Set
forceEstimation = 0to reuse cached results - Optimize data format: Use
stugfor large grids,stgfor regular CTM grids - Adjust neighbor limits: Balance accuracy vs speed with
nhmaxandnsmax - Spatial thinning: For soft data, use
thinningFactor = 2or higher - Regional subsetting: Limit
spatialBoundsto area of interest - Parallel validation: Use HPC system for multi-scenario validation
If you use this code in your research, please cite:
[Citation information to be added]
- Soft Data Integration: RAMP-corrected CTM model data fusion
- Diagnostic Tools: Troubleshooting BME artifacts
- HPC Validation: Large-scale parallel validation
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes with clear documentation
- Submit a pull request
[License information to be added]
For questions or issues:
- Open an issue on GitHub
- [Contact information to be added]
- TOAR-II project for providing observational data
- BMELIB developers for the BME framework
- RAMP team for bias-corrected CTM outputs