Skip to content

Unable to ingest obs4MIPs GPCP-V2.3 #260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bouweandela opened this issue Apr 23, 2025 · 4 comments
Open

Unable to ingest obs4MIPs GPCP-V2.3 #260

bouweandela opened this issue Apr 23, 2025 · 4 comments

Comments

@bouweandela
Copy link
Contributor

bouweandela commented Apr 23, 2025

$ ref datasets ingest --source-type obs4mips /work/bd0854/DATA/ESMValTool2/download/obs4MIPs/GPCP-V2.3/
2025-04-23 15:55:44.242 | WARNING  | cmip_ref.datasets.obs4mips:parse_obs4mips:109 - ['activity_id', 'grid', 'grid_label', 'institution_id', 'nominal_resolution', 'variable_id', 'variant_label'] are missing from the file metadata
/home/b/b381141/src/Climate-REF/climate-ref/.venv/lib/python3.11/site-packages/ecgtools/builder.py:208: UserWarning: Unable to parse 1 assets. A list of these assets can be found in `.invalid_assets` attribute.
  ).clean_dataframe()
2025-04-23 15:55:44.246 | ERROR    | cmip_ref.datasets.obs4mips:find_local_datasets:206 - No datasets found
╭───────────────────────────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────────────────────────────────────────────────────────────────╮
│ /home/b/b381141/src/Climate-REF/climate-ref/packages/ref/src/cmip_ref/cli/datasets.py:125 in ingest                                                                                                                                 │
│                                                                                                                                                                                                                                     │
│   122 │   │   logger.error(f"File or directory {file_or_directory} does not exist")                                                                                                                                                 │
│   123 │   │   raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), file_or_directo                                                                                                                                      │
│   124 │                                                                                                                                                                                                                             │
│ ❱ 125 │   data_catalog = adapter.find_local_datasets(file_or_directory)                                                                                                                                                             │
│   126 │   data_catalog = adapter.validate_data_catalog(data_catalog, skip_invalid=skip_invalid                                                                                                                                      │
│   127 │                                                                                                                                                                                                                             │
│   128 │   logger.info(                                                                                                                                                                                                              │
│                                                                                                                                                                                                                                     │
│ ╭──────────────────────────────────────────────────────────────────────────── locals ─────────────────────────────────────────────────────────────────────────────╮                                                                 │
│ │           adapter = <cmip_ref.datasets.obs4mips.Obs4MIPsDatasetAdapter object at 0x7f28cfdd6990>                                                                │                                                                 │
│ │            config = Config(                                                                                                                                     │                                                                 │
│ │                     │   log_level='WARNING',                                                                                                                    │                                                                 │
│ │                     │   paths=PathConfig(                                                                                                                       │                                                                 │
│ │                     │   │   log=PosixPath('/work/bk1088/b381141/cmip_ref/log'),                                                                                 │                                                                 │
│ │                     │   │   scratch=PosixPath('/scratch/b/b381141/cmip_ref/scratch'),                                                                           │                                                                 │
│ │                     │   │   software=PosixPath('/work/bk1088/b381141/cmip_ref/software'),                                                                       │                                                                 │
│ │                     │   │   results=PosixPath('/work/bk1088/b381141/cmip_ref/results'),                                                                         │                                                                 │
│ │                     │   │   dimensions_cv=PosixPath('/home/b/b381141/src/Climate-REF/climate-ref/packages/ref-core/src/cmip_ref_core/pycmec/cv_cmip7_aft.yaml') │                                                                 │
│ │                     │   ),                                                                                                                                      │                                                                 │
│ │                     │   db=DbConfig(database_url='sqlite:////home/b/b381141/.config/cmip_ref/db/cmip_ref.db', run_migrations=True),                             │                                                                 │
│ │                     │   executor=ExecutorConfig(executor='cmip_ref.executor.local.LocalExecutor', config={}),                                                   │                                                                 │
│ │                     │   metric_providers=[                                                                                                                      │                                                                 │
│ │                     │   │   MetricsProviderConfig(provider='cmip_ref_metrics_esmvaltool.provider', config={}),                                                  │                                                                 │
│ │                     │   │   MetricsProviderConfig(provider='cmip_ref_metrics_ilamb.provider', config={}),                                                       │                                                                 │
│ │                     │   │   MetricsProviderConfig(provider='cmip_ref_metrics_pmp.provider', config={})                                                          │                                                                 │
│ │                     │   ]                                                                                                                                       │                                                                 │
│ │                     )                                                                                                                                           │                                                                 │
│ │               ctx = <click.core.Context object at 0x7f28cfd8d250>                                                                                               │                                                                 │
│ │                db = <cmip_ref.database.Database object at 0x7f28d0ca9610>                                                                                       │                                                                 │
│ │           dry_run = False                                                                                                                                       │                                                                 │
│ │ file_or_directory = PosixPath('/work/bd0854/DATA/ESMValTool2/download/obs4MIPs/GPCP-V2.3')                                                                      │                                                                 │
│ │            kwargs = {}                                                                                                                                          │                                                                 │
│ │            n_jobs = None                                                                                                                                        │                                                                 │
│ │      skip_invalid = False                                                                                                                                       │                                                                 │
│ │             solve = False                                                                                                                                       │                                                                 │
│ │       source_type = <SourceDatasetType.obs4MIPs: 'obs4mips'>                                                                                                    │                                                                 │
│ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯                                                                 │
│                                                                                                                                                                                                                                     │
│ /home/b/b381141/src/Climate-REF/climate-ref/packages/ref/src/cmip_ref/datasets/obs4mips.py:207 in find_local_datasets                                                                                                               │
│                                                                                                                                                                                                                                     │
│   204 │   │   datasets = builder.df                                                                                                                                                                                                 │
│   205 │   │   if datasets.empty:                                                                                                                                                                                                    │
│   206 │   │   │   logger.error("No datasets found")                                                                                                                                                                                 │
│ ❱ 207 │   │   │   raise ValueError("No obs4MIPs-compliant datasets found")                                                                                                                                                          │
│   208 │   │                                                                                                                                                                                                                         │
│   209 │   │   # Convert the start_time and end_time columns to datetime objects                                                                                                                                                     │
│   210 │   │   # We don't know the calendar used in the dataset (TODO: Check what ecgtools does                                                                                                                                      │
│                                                                                                                                                                                                                                     │
│ ╭─────────────────────────────────────────────────────────────────────────────────────────────────────── locals ───────────────────────────────────────────────────────────────────────────────────────────────────────╮            │
│ │           builder = Builder(paths=['/work/bd0854/DATA/ESMValTool2/download/obs4MIPs/GPCP-V2.3'], storage_options={}, depth=10, exclude_patterns=[], include_patterns=['*.nc'], joblib_parallel_kwargs={'n_jobs': 1}) │            │
│ │          datasets = Empty DataFrame                                                                                                                                                                                  │            │
│ │                     Columns: []                                                                                                                                                                                      │            │
│ │                     Index: []                                                                                                                                                                                        │            │
│ │ file_or_directory = PosixPath('/work/bd0854/DATA/ESMValTool2/download/obs4MIPs/GPCP-V2.3')                                                                                                                           │            │
│ │              self = <cmip_ref.datasets.obs4mips.Obs4MIPsDatasetAdapter object at 0x7f28cfdd6990>                                                                                                                     │            │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯            │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: No obs4MIPs-compliant datasets found
$ ncdump -h /work/bd0854/DATA/ESMValTool2/download/obs4MIPs/GPCP-V2.3/v20180519/pr_GPCP-SG_L3_v2.3_197901-201710.nc 
netcdf pr_GPCP-SG_L3_v2.3_197901-201710 {
dimensions:
	time = UNLIMITED ; // (466 currently)
	bnds = 2 ;
	lat = 72 ;
	lon = 144 ;
variables:
	double time(time) ;
		time:bounds = "time_bnds" ;
		time:units = "days since 1900-1-1" ;
		time:calendar = "gregorian" ;
		time:axis = "T" ;
		time:long_name = "time" ;
		time:standard_name = "time" ;
	double time_bnds(time, bnds) ;
	double lat(lat) ;
		lat:bounds = "lat_bnds" ;
		lat:units = "degrees_north" ;
		lat:axis = "Y" ;
		lat:long_name = "latitude" ;
		lat:standard_name = "latitude" ;
	double lat_bnds(lat, bnds) ;
	double lon(lon) ;
		lon:bounds = "lon_bnds" ;
		lon:units = "degrees_east" ;
		lon:axis = "X" ;
		lon:long_name = "longitude" ;
		lon:standard_name = "longitude" ;
	double lon_bnds(lon, bnds) ;
	float pr(time, lat, lon) ;
		pr:standard_name = "precipitation_flux" ;
		pr:long_name = "Precipitation" ;
		pr:comment = "at surface; includes both liquid and solid phases from all types of clouds (both large-scale and convective)" ;
		pr:units = "kg m-2 s-1" ;
		pr:original_name = "precip" ;
		pr:cell_methods = "time: mean" ;
		pr:cell_measures = "area: areacella" ;
		pr:missing_value = 1.e+20f ;
		pr:_FillValue = 1.e+20f ;
		pr:associated_files = "baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation gridspecFile: gridspec_atmos_fx_Obs-GPCP_GPCP_r0i0p0.nc areacella: areacella_fx_Obs-GPCP_GPCP_r0i0p0.nc" ;
		pr:history = "2018-02-08T15:25:54Z altered by CMOR: Inverted axis: lat." ;

// global attributes:
		:institution = "NASA Goddard Space Flight Center, Greenbelt MD, USA" ;
		:institute_id = "NASA-GSFC" ;
		:source = "Obs-GPCP (Global Precipitation Climatology Project) v23rB1" ;
		:model_id = "Obs-GPCP" ;
		:contact = "George Huffman ([email protected])" ;
		:references = "Huffman et al. 1997, http://dx.doi.org/10.1175/1520-0477(1997)078<0005:TGPCPG>2.0.CO;2; Adler et al. 2003, http://dx.doi.org/10.1175/1525-7541(2003)004<1147:TVGPCP>2.0.CO;2; Huffman et al. 2009, http://dx.doi.org/10.1029/2009GL040000; Adler et al. 2016, Global Precipitation Climatology Project (GPCP) Monthly Analysis: Climate Algorithm Theoretical Basis Document (C-ATBD)" ;
		:tracking_id = "4070c751-6c2d-440f-a4d7-5b325fb98990" ;
		:mip_specs = "CMIP5" ;
		:source_id = "GPCP" ;
		:product = "observations" ;
		:frequency = "mon" ;
		:creation_date = "2018-02-08T15:25:54Z" ;
		:history = "2018-02-08T15:25:54Z CMOR rewrote data to comply with CF standards and CMIP5 requirements." ;
		:Conventions = "CF-1.4" ;
		:project_id = "obs4MIPs" ;
		:table_id = "Table Amon_ana (10 March 2011) 34230b4cbd7bedf38c827d6e41c1b8ea" ;
		:title = "Global Precipitation Climatology Project (GPCP) Climate Data Record (CDR), Monthly V2.3 observation output prepared for obs4MIPs." ;
		:cmor_version = "2.9.1" ;
		:comment = "NOAA Climate Data Record Program for satellites, FY 2011. Global Precipitation Climatology Project (GPCP) Monthly Version 2.3 gridded, merged satellite/gauge precipitation Climate Data Record (CDR) with errors from 1979 to present." ;
		:source_type = "satellite_retrieval_and_gauge_analysis" ;
		:realm = "atmos" ;
		:modeling_realm = "atmos" ;
}
@minxu74
Copy link
Contributor

minxu74 commented Apr 24, 2025

Where is the data from? The format of the data did not follow the CMIP6 CVs. institute_id shall be institution_id.

@bouweandela
Copy link
Contributor Author

It is obs4MIPs data, here is the ncdump with the URL:

$ ncdump -h https://dpesgf03.nccs.nasa.gov/thredds/fileServer/obs4MIPs/observations/NASA-GSFC/Obs-GPCP/GPCP/V2.3/atmos/pr/pr_GPCP-SG_L3_v2.3_197901-201710.nc#bytes
netcdf pr_GPCP-SG_L3_v2.3_197901-201710 {
dimensions:
	time = UNLIMITED ; // (466 currently)
	bnds = 2 ;
	lat = 72 ;
	lon = 144 ;
variables:
	double time(time) ;
		time:bounds = "time_bnds" ;
		time:units = "days since 1900-1-1" ;
		time:calendar = "gregorian" ;
		time:axis = "T" ;
		time:long_name = "time" ;
		time:standard_name = "time" ;
	double time_bnds(time, bnds) ;
	double lat(lat) ;
		lat:bounds = "lat_bnds" ;
		lat:units = "degrees_north" ;
		lat:axis = "Y" ;
		lat:long_name = "latitude" ;
		lat:standard_name = "latitude" ;
	double lat_bnds(lat, bnds) ;
	double lon(lon) ;
		lon:bounds = "lon_bnds" ;
		lon:units = "degrees_east" ;
		lon:axis = "X" ;
		lon:long_name = "longitude" ;
		lon:standard_name = "longitude" ;
	double lon_bnds(lon, bnds) ;
	float pr(time, lat, lon) ;
		pr:standard_name = "precipitation_flux" ;
		pr:long_name = "Precipitation" ;
		pr:comment = "at surface; includes both liquid and solid phases from all types of clouds (both large-scale and convective)" ;
		pr:units = "kg m-2 s-1" ;
		pr:original_name = "precip" ;
		pr:cell_methods = "time: mean" ;
		pr:cell_measures = "area: areacella" ;
		pr:missing_value = 1.e+20f ;
		pr:_FillValue = 1.e+20f ;
		pr:associated_files = "baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation gridspecFile: gridspec_atmos_fx_Obs-GPCP_GPCP_r0i0p0.nc areacella: areacella_fx_Obs-GPCP_GPCP_r0i0p0.nc" ;
		pr:history = "2018-02-08T15:25:54Z altered by CMOR: Inverted axis: lat." ;

// global attributes:
		:institution = "NASA Goddard Space Flight Center, Greenbelt MD, USA" ;
		:institute_id = "NASA-GSFC" ;
		:source = "Obs-GPCP (Global Precipitation Climatology Project) v23rB1" ;
		:model_id = "Obs-GPCP" ;
		:contact = "George Huffman ([email protected])" ;
		:references = "Huffman et al. 1997, http://dx.doi.org/10.1175/1520-0477(1997)078<0005:TGPCPG>2.0.CO;2; Adler et al. 2003, http://dx.doi.org/10.1175/1525-7541(2003)004<1147:TVGPCP>2.0.CO;2; Huffman et al. 2009, http://dx.doi.org/10.1029/2009GL040000; Adler et al. 2016, Global Precipitation Climatology Project (GPCP) Monthly Analysis: Climate Algorithm Theoretical Basis Document (C-ATBD)" ;
		:tracking_id = "4070c751-6c2d-440f-a4d7-5b325fb98990" ;
		:mip_specs = "CMIP5" ;
		:source_id = "GPCP" ;
		:product = "observations" ;
		:frequency = "mon" ;
		:creation_date = "2018-02-08T15:25:54Z" ;
		:history = "2018-02-08T15:25:54Z CMOR rewrote data to comply with CF standards and CMIP5 requirements." ;
		:Conventions = "CF-1.4" ;
		:project_id = "obs4MIPs" ;
		:table_id = "Table Amon_ana (10 March 2011) 34230b4cbd7bedf38c827d6e41c1b8ea" ;
		:title = "Global Precipitation Climatology Project (GPCP) Climate Data Record (CDR), Monthly V2.3 observation output prepared for obs4MIPs." ;
		:cmor_version = "2.9.1" ;
		:comment = "NOAA Climate Data Record Program for satellites, FY 2011. Global Precipitation Climatology Project (GPCP) Monthly Version 2.3 gridded, merged satellite/gauge precipitation Climate Data Record (CDR) with errors from 1979 to present." ;
		:source_type = "satellite_retrieval_and_gauge_analysis" ;
		:realm = "atmos" ;
		:modeling_realm = "atmos" ;
}

@minxu74
Copy link
Contributor

minxu74 commented Apr 25, 2025

Hmm. It seems not an official obs4MIPs data, even its filename does not follow the conventions. Maybe it is very old version. I searched the ESGF-1-5 globus index and found the following records:

> esgf15mms query-globus stage obs4MIPs --order-by _timestamp.desc --time-range TO2025-03-16   --printvar id,data_node,tracking_id --type=File --data_node=dpesgf03.nccs.nasa.gov --limit 10 --id=obs4MIPs.NASA-GSFC.GPCP::like
{"total": 5, "subject": "obs4MIPs.NASA-GSFC.GPCP.atmos.day.v20180519.pr_GPCP-1DD_L3_v1.3_19961001-20161231.nc|dpesgf03.nccs.nasa.gov", "id": "obs4MIPs.NASA-GSFC.GPCP.atmos.day.v20180519.pr_GPCP-1DD_L3_v1.3_19961001-20161231.nc|dpesgf03.nccs.nasa.gov", 
"data_node": "dpesgf03.nccs.nasa.gov", "tracking_id": ["722b1cf4-411d-46d1-a540-ddebfb38c69e"]}
{"total": 5, "subject": "obs4MIPs.NASA-GSFC.GPCP.atmos.mon.v20180518.pr_GPCP-SG_L3_v2.2_197901-201312.nc|dpesgf03.nccs.nasa.gov", "id": "obs4MIPs.NASA-GSFC.GPCP.atmos.mon.v20180518.pr_GPCP-SG_L3_v2.2_197901-201312.nc|dpesgf03.nccs.nasa.gov", 
"data_node": "dpesgf03.nccs.nasa.gov", "tracking_id": ["e75db178-c222-4849-9ccf-c463fbbe7774"]}
{"total": 5, "subject": "obs4MIPs.NASA-GSFC.GPCP.atmos.mon.v20180518.prStderr_GPCP-SG_L3_v2.2_197901-201312.nc|dpesgf03.nccs.nasa.gov", "id": 
"obs4MIPs.NASA-GSFC.GPCP.atmos.mon.v20180518.prStderr_GPCP-SG_L3_v2.2_197901-201312.nc|dpesgf03.nccs.nasa.gov", "data_node": "dpesgf03.nccs.nasa.gov", "tracking_id": ["09a7bdaa-33b0-4a40-9e8e-7ab59cfbebef"]}
{"total": 5, "subject": "obs4MIPs.NASA-GSFC.GPCP.atmos.day.v20180518.pr_GPCP-1DD_L3_v1.2_19961001-20110630.nc|dpesgf03.nccs.nasa.gov", "id": "obs4MIPs.NASA-GSFC.GPCP.atmos.day.v20180518.pr_GPCP-1DD_L3_v1.2_19961001-20110630.nc|dpesgf03.nccs.nasa.gov", 
"data_node": "dpesgf03.nccs.nasa.gov", "tracking_id": ["66a6d983-a8d9-412d-9de8-34424c2adcbe"]}
{"total": 5, "subject": "obs4MIPs.NASA-GSFC.GPCP.atmos.mon.v20180519.pr_GPCP-SG_L3_v2.3_197901-201710.nc|dpesgf03.nccs.nasa.gov", "id": "obs4MIPs.NASA-GSFC.GPCP.atmos.mon.v20180519.pr_GPCP-SG_L3_v2.3_197901-201710.nc|dpesgf03.nccs.nasa.gov", 
"data_node": "dpesgf03.nccs.nasa.gov", "tracking_id": ["4070c751-6c2d-440f-a4d7-5b325fb98990"]}

It seemed that the last one looks very similar to yours. Maybe you can use the last one?

Never mind, I just found the url of the last one just pointed to your link above. So they are exactly same.

There are some new versions of GPCP data available. I do not know if the new ones follow the conventions or you can use them.

{"total": 9, "subject": "obs4MIPs.NASA-GSFC.GPCP-Monthly-3-2.mon.pr.gn.v20231205|eagle.alcf.anl.gov", "id": "obs4MIPs.NASA-GSFC.GPCP-Monthly-3-2.mon.pr.gn.v20231205|eagle.alcf.anl.gov", "data_node": "eagle.alcf.anl.gov"}
{"total": 9, "subject": "obs4MIPs.NASA-GSFC.GPCP-Daily-3-2.mon.pr.gn.v20231205|eagle.alcf.anl.gov", "id": "obs4MIPs.NASA-GSFC.GPCP-Daily-3-2.mon.pr.gn.v20231205|eagle.alcf.anl.gov", "data_node": "eagle.alcf.anl.gov"}
{"total": 9, "subject": "obs4MIPs.NASA-GSFC.GPCP-Monthly-3-2.mon.pr.gn.v20231205|esgf-data2.llnl.gov", "id": "obs4MIPs.NASA-GSFC.GPCP-Monthly-3-2.mon.pr.gn.v20231205|esgf-data2.llnl.gov", "data_node": "esgf-data2.llnl.gov"}
{"total": 9, "subject": "obs4MIPs.NASA-GSFC.GPCP-Daily-3-2.mon.pr.gn.v20231205|esgf-data2.llnl.gov", "id": "obs4MIPs.NASA-GSFC.GPCP-Daily-3-2.mon.pr.gn.v20231205|esgf-data2.llnl.gov", "data_node": "esgf-data2.llnl.gov"}
{"total": 9, "subject": "obs4MIPs.NASA-GSFC.GPCPMON-3-1.atmos.mon.pr.v20200831|esgf-node.ornl.gov", "id": "obs4MIPs.NASA-GSFC.GPCPMON-3-1.atmos.mon.pr.v20200831|esgf-node.ornl.gov", "data_node": "esgf-node.ornl.gov", "tracking_id": 
["hdl:21.14102/902f4f3f-977b-42cc-b469-dd0647415a1d"]}

@bouweandela
Copy link
Contributor Author

We're unable to use the newer versions due to SciTools/iris#6411 at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants