Skip to content

Latest commit

 

History

History
361 lines (279 loc) · 10.6 KB

File metadata and controls

361 lines (279 loc) · 10.6 KB

RAMP-Corrected Soft Data for BME Data Fusion

This directory contains RAMP-corrected CTM model outputs formatted as soft data for BME data fusion.

Directory Structure

2softdata/
├── README.md                      # This file
├── softData_UKML_YYYY-YYYY.mat   # Cached soft data structures
├── plots/                         # Visualization plots
│   ├── UKML_mean_yYYYY_mMM.png
│   └── UKML_variance_yYYYY_mMM.png
└── archived/                      # Old versions (if any)

Data Sources

The system requires TWO types of files:

1. Spatial Grid .mat Files

Generated by extractModelSpatialInfo.m from original CSV model outputs.

Located in 1data/CTM/model_output_data/spatial_grids/:

  • Naming: {modelName}_spatial_grid.mat (e.g., M3fusion_spatial_grid.mat)

Contents:

  • lon - Longitude vector [nGrid × 1]
  • lat - Latitude vector [nGrid × 1]
  • nGridPoints - Total number of grid points
  • yearsChecked - Years verified for consistency
  • isConsistent - Boolean flag for spatial consistency

Purpose: Provides spatial grid structure (lon, lat coordinates)

Generation: Run extractModelSpatialInfo.m once to create these files from CSV data

2. Parquet Files (RAMP-Corrected Values)

Located in 1data/CTM/:

  • lambda1_{model}_{year}_v{version}-parallel.parquet - Mean field
  • lambda2_{model}_{year}_v{version}-parallel.parquet - Variance field

Where:

  • lambda1 = RAMP-corrected mean MDA8 ozone (ppb)
  • lambda2 = RAMP-corrected variance (ppb²)
  • model = Model name (e.g., UKML)
  • version = RAMP calibration version (e.g., v3)

Parquet File Format:

  • Columns 1-12: Monthly MDA8 values (Jan-Dec)
  • NO spatial information - spatial coordinates come from .mat files
  • Row order must match .mat file grid order (critical assumption)

Data Integration

The workflow:

  1. Read lon, lat from spatial grid .mat file
  2. Read lambda1, lambda2 from parquet files (12 monthly columns)
  3. Match rows assuming same grid order
  4. Create unified structure with sMS ([lon, lat]), Z (mean), Zv (variance)

Important: Run extractModelSpatialInfo.m first to generate spatial grid .mat files!

Workflow

1. Load RAMP Data

% Load from parquet files (creates cache on first run)
ctmData = loadRAMPdata('UKML', [2015:2020]);

% Force reload from parquet (ignore cache)
ctmData = loadRAMPdata('UKML', [2015:2020], '1data/CTM', 1);

Output structure:

ctmData.lon      % [nGrid × 1] Longitude (degrees)
ctmData.lat      % [nGrid × 1] Latitude (degrees)
ctmData.sMS      % [nGrid × 2] Spatial coordinates as [lon, lat]
ctmData.tME      % [1 × nMonths] Time in decimal years
ctmData.Z        % [nGrid × nMonths] Mean field (lambda1)
ctmData.Zv       % [nGrid × nMonths] Variance field (lambda2)
ctmData.gridInfo % Grid metadata (nGridPoints, yearsChecked, isConsistent)

Caching: Creates 1data/CTM/CTM_RAMP_UKML_2015-2020_v3.mat (~50-500 MB)

2. Create Soft Data Structure

% Basic usage
softData = createSoftDataStructure(ctmData, obs);

% With spatial subsetting (e.g., CONUS)
options.spatialBounds = [-125 -65 24 50];
softData = createSoftDataStructure(ctmData, obs, options);

% With spatial thinning (every 2nd grid point)
options.thinningFactor = 2;
softData = createSoftDataStructure(ctmData, obs, options);

Output structure:

softData.sMS     % [nPoints × 2] Spatial coordinates as [lon, lat]
softData.lon     % [nPoints × 1] Longitude (degrees, for reference)
softData.lat     % [nPoints × 1] Latitude (degrees, for reference)
softData.tME     % [1 × nMonths] Time vector (aligned with obs)
softData.Z       % [nPoints × nMonths] Mean values
softData.Zv      % [nPoints × nMonths] Variance values

Memory: ~14-140 MB depending on grid resolution and thinning

3. Visualize Soft Data

% Plot mean and variance for January
plotSoftData(softData, obs, 1, 'both');

% Plot mean only for multiple months
plotSoftData(softData, obs, [1 6 12], 'mean');

Output: PNG files in 2softdata/plots/

4. Use in BME

% Specify soft data when creating knowledge base
BMEmethod = '11000132';  % Note: digit 2 = '1' for soft data
[KG, KS, BMEparam] = getTOARknowledgeBase(obs, go, cov, softData, BMEmethod);

% KS.softdata now contains:
%   .p  - [nSoft × 3] coordinates (lon, lat, time)
%   .z  - [nSoft × 1] residual mean values
%   .vs - [nSoft × 1] variance values

BME Method Codes with Soft Data

Without Soft Data (current):

'10000132'
  ↑
  Digit 2 = 0 (no CTM data)

With Soft Data (new):

'11000132'
  ↑
  Digit 2 = 1 (includes CTM soft data)

Change in BME parameters:

  • nsmax: Maximum soft data neighbors (digit 6)
    • 0 → 0 soft neighbors (hard only)
    • 1 → 3 soft neighbors
    • 2 → 4 soft neighbors
    • 4 → 50 soft neighbors
    • 6 → 200 soft neighbors

Example: '11000242' = hard + soft, nsmax=50, nhmax=200

Optimization Options

Spatial Subsetting

Reduce to analysis domain only:

options.spatialBounds = [minLon maxLon minLat maxLat];

Benefit: 50-90% memory reduction for regional studies

Spatial Thinning

Keep every Nth grid point:

options.thinningFactor = 2;  % Every 2nd point

Benefit: 75% memory reduction, minimal accuracy loss (BME uses nsmax anyway)

Remove Hard Data Locations

Avoid double-counting:

options.removeHardData = 1;

Benefit: Prevents soft data at same location/time as hard data

File Sizes

Typical sizes for 6 years (2015-2020, 72 months):

Grid Resolution Points Raw Cache Soft Data With Thinning (×2)
0.1° × 0.1° 64,800 450 MB 140 MB 35 MB
0.25° × 0.25° 10,368 72 MB 22 MB 5.5 MB
0.5° × 0.5° 2,592 18 MB 5.5 MB 1.4 MB

All stored as single precision for efficiency

Quality Control

Automated QC during structure creation:

  • ✓ Negative variance → set to 0.01
  • ✓ Infinite/NaN values → removed
  • ✓ Temporal alignment with obs
  • ✓ Minimum variance threshold (default: 0.01)

Expected Impact

Based on typical BME data fusion results:

Dense Observation Regions

  • R² improvement: +0.02-0.05
  • RMSE improvement: -1-2 ppb

Sparse Observation Regions

  • R² improvement: +0.10-0.20
  • RMSE improvement: -3-8 ppb

Overall (Mixed)

  • R² improvement: +0.05-0.10
  • RMSE improvement: -2-4 ppb

Test Script

Run the complete workflow:

test_softdata_workflow

This demonstrates:

  1. Loading RAMP data
  2. Creating soft data structure
  3. Visualization
  4. BME integration
  5. Test estimation

Helper Functions

extractModelSpatialInfo.m

Extracts spatial grids from original CSV model outputs and saves as .mat files.

Run this ONCE before using loadRAMPdata:

% Edit the script to select which models to process
% The script will:
% 1. Read first available CSV file to get reference grid
% 2. Verify consistency across all available years
% 3. Save {modelName}_spatial_grid.mat files

% Run the script
extractModelSpatialInfo

Output files:

1data/CTM/model_output_data/spatial_grids/
├── M3fusion_spatial_grid.mat
├── AM4_spatial_grid.mat
├── CAMS_spatial_grid.mat
└── ...

Each .mat file contains:

  • lon - Longitude coordinates
  • lat - Latitude coordinates
  • nGridPoints - Number of points
  • yearsChecked - Years verified
  • isConsistent - Spatial consistency flag

Troubleshooting

Spatial grid .mat file not found

Error: Spatial grid file not found: 1data/CTM/model_output_data/spatial_grids/M3fusion_spatial_grid.mat
Please run extractModelSpatialInfo.m first to generate spatial grid files.

Solution: Run extractModelSpatialInfo.m to generate spatial grid .mat files from CSV data

Parquet files not found

Error: Lambda1 file not found

Solution: Check file naming convention and path. Expected format:

1data/CTM/lambda1_UKML_2017_v3-parallel.parquet

Memory issues

Out of memory error during loading

Solutions:

  1. Use spatial thinning: options.thinningFactor = 2
  2. Subset to analysis domain: options.spatialBounds = [...]
  3. Process fewer years at once

Temporal misalignment

Warning: No temporal overlap between obs and CTM

Solution: Check that obs.tME and ctmData.tME have overlapping time periods

Grid size mismatch

Error: Lambda1 file has 5000 rows, expected 6480 (grid size)

Solution: Parquet file rows must match spatial grid .mat file size. Check:

  1. Spatial grid .mat file: load and check nGridPoints
  2. Parquet file row count: should match nGridPoints
  3. Ensure parquet files and CSV files used the same grid structure

Spatial consistency warning

Warning: Spatial grid is not consistent across all years. Proceed with caution.

Information: This warning means extractModelSpatialInfo.m detected that some years have different grid structures. The reference grid (first available year) is used. Check the yearsChecked and isConsistent fields in the gridInfo for details.

Version History

  • v1.3 (2025-01-28): Simplified coordinate system

    • Removed Mercator coordinate conversion
    • sMS field now simply [lon, lat] in degrees
    • Updated distance calculations to use degree-based tolerance (0.01 deg ~1 km)
    • Changed spaceUnit from 'mercator' to 'degrees'
    • Simplified workflow - no coordinate transformations needed
  • v1.2 (2025-01-28): .mat file integration for spatial coordinates

    • Added extractModelSpatialInfo.m to extract spatial grids from CSV files
    • Updated loadRAMPdata.m to use .mat file spatial grids (not NetCDF)
    • Spatial grids saved as {modelName}_spatial_grid.mat
    • Added spatial consistency checking across years
    • Breaking change: Requires running extractModelSpatialInfo.m first
  • v1.1 (2025-01-28): NetCDF integration for spatial coordinates (deprecated)

    • Added getCTMspatialGrid.m to read lon/lat from NetCDF files
    • Updated loadRAMPdata.m to use NetCDF spatial grid
    • Added Mercator coordinate conversion
    • Fixed createSoftDataStructure.m to use Mercator coordinates
    • Breaking change: Parquet files now have 12 columns (no spatial info)
  • v1.0 (2025-01-15): Initial soft data loading system

    • Parquet file loading with caching
    • Soft data structure creation
    • Visualization tools
    • BME integration

Contact

For questions about:

  • RAMP correction: See RAMP documentation
  • Data format: Check parquet file structure
  • BME integration: See getTOARknowledgeBase.m documentation