-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What happened?
Hi everyone,
Excited to report my first bug π! I have been creating some grouped test netCDF4 files for unit testing our internal repository. I started getting segmentation faults when I added variables of a string datatype. This only happens with engine='netCDF4'
. When you change the engine to 'h5netcdf'
there are no segmentation faults. We are thinking this has something to do with the netcdf4-c library. However, I have only been able to replicate this issue with open_datatree()
, with the engine set to the default netCDF4
library. And not with nc4.Dataset()
or xr.open_dataset()
. My colleague @lsterzinger has been getting segmentation faults with all three of these methods and will elaborate on this thread.
We've been able to narrow this down to a problem with data variables with a non-numerical datatype, by creating netCDF4 files with variables of a string datatype, np.dtype('<U4')
. open_datatree()
seg faults after the fourth call (see example below). I have not been able to replicate segmentation faults for netCDF4 files without string data variables, even with a thousand calls to open_datatree()
or with the engine set to 'h5netcdf'
for datasets with string variables.
To replicate this error:
In [9]: max_retry = 0
...: while max_retry < 15:
...: oco2_tree = open_datatree(
...: "./downloads/OCO2_L2_Lite_SIF.11r/oco2_LtSIF_220101_B11012Ar_220627180315s.nc4" )
...: max_retry += 1
...: print(max_retry)
1
2
3
4
Segmentation fault
In a docker container we are running the netcdf-c library version '4.8.1
and we are building the netCDF4 python library from source. On my local machine I am running netcdf-c library version 4.9.3-development
. I have been getting the segmentation faults on both machines.
Data source
Here is the granule data download link from our online archive. It has non-numerical datatypes, specifically string and datetime types.
Granule tree structure:
In [4]: open_datatree(
...: "./downloads/OCO2_L2_Lite_SIF.11r/oco2_LtSIF_220101_B11012Ar_220627180315s.nc4"
...: )
Out[4]:
DataTree('None', parent=None)
β Dimensions: (sounding_dim: 188677, vertex_dim: 4)
β Dimensions without coordinates: sounding_dim, vertex_dim
β Data variables: (12/15)
β Delta_Time (sounding_dim) float64 2MB ...
β SZA (sounding_dim) float32 755kB ...
β VZA (sounding_dim) float32 755kB ...
β SAz (sounding_dim) float32 755kB ...
β VAz (sounding_dim) float32 755kB ...
β Longitude (sounding_dim) float32 755kB ...
β ... ...
β SIF_740nm (sounding_dim) float32 755kB ...
β SIF_Uncertainty_740nm (sounding_dim) float32 755kB ...
β Daily_SIF_740nm (sounding_dim) float32 755kB ...
β Daily_SIF_757nm (sounding_dim) float32 755kB ...
β Daily_SIF_771nm (sounding_dim) float32 755kB ...
β Quality_Flag (sounding_dim) float64 2MB ...
β Attributes: (12/32)
β References: ['Sun, Y. et al., Remote Sensing of En...
β conventions: CF-1.6
β product_version: B11012Ar
β summary: Fraunhofer-line based SIF retrievals
β keywords: ISS, OCO-2, Solar Induced Fluorescence...
β keywords_vocabulary: NASA Global Change Master Directory (G...
β ... ...
β InputBuildId: B11.0.06
β InputPointers: oco2_L2MetGL_39883a_211231_B11006r_220...
β CoordSysBuilder: ucar.nc2.dataset.conv.CF1Convention
β identifier_product_doi_authority: http://dx.doi.org/
β gesdisc_collection: 11r
β identifier_product_doi: 10.5067/OTRE7KQS8AU8
βββ DataTree('Cloud')
β Dimensions: (sounding_dim: 188677)
β Dimensions without coordinates: sounding_dim
β Data variables:
β surface_albedo_abp (sounding_dim) float32 755kB ...
β cloud_flag_abp (sounding_dim) float64 2MB ...
β delta_pressure_abp (sounding_dim) float32 755kB ...
β co2_ratio (sounding_dim) float32 755kB ...
β o2_ratio (sounding_dim) float32 755kB ...
βββ DataTree('Geolocation')
β Dimensions: (sounding_dim: 188677, vertex_dim: 4)
β Dimensions without coordinates: sounding_dim, vertex_dim
β Data variables:
β time_tai93 (sounding_dim) datetime64[ns] 2MB ...
β solar_zenith_angle (sounding_dim) float32 755kB ...
β solar_azimuth_angle (sounding_dim) float32 755kB ...
β sensor_zenith_angle (sounding_dim) float32 755kB ...
β sensor_azimuth_angle (sounding_dim) float32 755kB ...
β altitude (sounding_dim) float32 755kB ...
β longitude (sounding_dim) float32 755kB ...
β latitude (sounding_dim) float32 755kB ...
β footprint_longitude_vertices (sounding_dim, vertex_dim) float32 3MB ...
β footprint_latitude_vertices (sounding_dim, vertex_dim) float32 3MB ...
βββ DataTree('Metadata')
β Dimensions: (sounding_dim: 188677)
β Dimensions without coordinates: sounding_dim
β Data variables:
β CollectionLabel <U17 68B ...
β BuildId <U8 32B ...
β OrbitId (sounding_dim) float64 2MB ...
β SoundingId (sounding_dim) float64 2MB ...
β FootprintId (sounding_dim) float64 2MB ...
β MeasurementMode (sounding_dim) float64 2MB ...
βββ DataTree('Meteo')
β Dimensions: (sounding_dim: 188677)
β Dimensions without coordinates: sounding_dim
β Data variables:
β surface_pressure (sounding_dim) float32 755kB ...
β specific_humidity (sounding_dim) float32 755kB ...
β vapor_pressure_deficit (sounding_dim) float32 755kB ...
β temperature_skin (sounding_dim) float32 755kB ...
β temperature_two_meter (sounding_dim) float32 755kB ...
β wind_speed (sounding_dim) float32 755kB ...
βββ DataTree('Offset')
β Dimensions: (signalbin_dim: 227, footprint_dim: 8,
β statistics_dim: 2)
β Dimensions without coordinates: signalbin_dim, footprint_dim, statistics_dim
β Data variables: (12/13)
β signal_histogram_bins (signalbin_dim) float32 908B ...
β signal_histogram_757nm (signalbin_dim, footprint_dim) float64 15kB ...
β signal_histogram_771nm (signalbin_dim, footprint_dim) float64 15kB ...
β SIF_Relative_Mean_757nm (signalbin_dim, footprint_dim, statistics_dim) float32 15kB ...
β SIF_Mean_757nm (signalbin_dim, footprint_dim, statistics_dim) float32 15kB ...
β SIF_Relative_Median_757nm (signalbin_dim, footprint_dim, statistics_dim) float32 15kB ...
β ... ...
β SIF_Relative_SDev_757nm (signalbin_dim, footprint_dim, statistics_dim) float32 15kB ...
β SIF_Relative_Mean_771nm (signalbin_dim, footprint_dim, statistics_dim) float32 15kB ...
β SIF_Mean_771nm (signalbin_dim, footprint_dim, statistics_dim) float32 15kB ...
β SIF_Relative_Median_771nm (signalbin_dim, footprint_dim, statistics_dim) float32 15kB ...
β SIF_Median_771nm (signalbin_dim, footprint_dim, statistics_dim) float32 15kB ...
β SIF_Relative_SDev_771nm (signalbin_dim, footprint_dim, statistics_dim) float32 15kB ...
βββ DataTree('Science')
β Dimensions: (sounding_dim: 188677)
β Dimensions without coordinates: sounding_dim
β Data variables: (12/16)
β sounding_qual_flag (sounding_dim) float64 2MB ...
β IGBP_index (sounding_dim) float64 2MB ...
β continuum_radiance_757nm (sounding_dim) float32 755kB ...
β SIF_757nm (sounding_dim) float32 755kB ...
β SIF_Unadjusted_757nm (sounding_dim) float32 755kB ...
β SIF_Relative_757nm (sounding_dim) float32 755kB ...
β ... ...
β SIF_Unadjusted_771nm (sounding_dim) float32 755kB ...
β SIF_Relative_771nm (sounding_dim) float32 755kB ...
β SIF_Unadjusted_Relative_771nm (sounding_dim) float32 755kB ...
β SIF_Uncertainty_771nm (sounding_dim) float32 755kB ...
β daily_correction_factor (sounding_dim) float32 755kB ...
β sounding_land_fraction (sounding_dim) float32 755kB ...
βββ DataTree('Sequences')
Dimensions: (sequences_dim: 0, sounding_dim: 188677)
Dimensions without coordinates: sequences_dim, sounding_dim
Data variables:
SequencesName (sequences_dim) <U1 0B ...
SequencesId (sequences_dim) <U1 0B ...
SequencesMode (sequences_dim) <U1 0B ...
SequencesIndex (sounding_dim) float64 2MB ...
SegmentsIndex (sounding_dim) float64 2MB ...
What did you expect to happen?
I expected `open_datatree(engine='netCDF4') to return DataTree object. Instead it seg faults.
Minimal Complete Verifiable Example
max_retry = 0
while max_retry < 15:
oco2_tree = open_datatree('./OCO2_L2_Lite_SIF.11r/oco2_LtSIF_220101_B11012Ar_220627180315s.nc4')
max_retry += 1
print(max_retry)
MVCE confirmation
- Minimal example β the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example β the example is self-contained, including all data and the text of any traceback.
- Verifiable example β the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue β a search of GitHub Issues suggests this is not a duplicate.
- Recent environment β the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
platform linux -- Python 3.12.4, pytest-8.2.2, pluggy-1.5.0 -- /usr/local/bin/python
cachedir: .pytest_cache
rootdir: /usr/src/app
configfile: pyproject.toml
plugins: subtests-0.12.1, inline-snapshot-0.10.2, cov-5.0.0, anyio-4.4.0
collected 28 items
tests/test_compare.py::test_smoke_test PASSED [ 3%]
tests/test_compare.py::test_class_auto_runs_one_test PASSED [ 7%]
tests/test_compare.py::test_compare_global_attrs_keys_values PASSED [ 10%]
tests/test_compare.py::test_get_intersection Fatal Python error: Segmentation fault
Current thread 0x0000ffffb3b63020 (most recent call first):
File "/usr/local/lib/python3.12/site-packages/xarray/backends/file_manager.py", line 217 in _acquire_with_cache_info
File "/usr/local/lib/python3.12/site-packages/xarray/backends/file_manager.py", line 199 in acquire_context
File "/usr/local/lib/python3.12/contextlib.py", line 137 in __enter__
File "/usr/local/lib/python3.12/site-packages/xarray/backends/netCDF4_.py", line 412 in _acquire
File "/usr/local/lib/python3.12/site-packages/xarray/backends/netCDF4_.py", line 418 in ds
File "/usr/local/lib/python3.12/site-packages/xarray/backends/netCDF4_.py", line 356 in __init__
File "/usr/local/lib/python3.12/site-packages/xarray/backends/netCDF4_.py", line 409 in open
File "/usr/local/lib/python3.12/site-packages/xarray/backends/netCDF4_.py", line 646 in open_dataset
File "/usr/local/lib/python3.12/site-packages/xarray/backends/api.py", line 571 in open_dataset
File "/usr/local/lib/python3.12/site-packages/datatree/io.py", line 66 in _open_datatree_netcdf
File "/usr/local/lib/python3.12/site-packages/datatree/io.py", line 58 in open_datatree
File "/usr/src/app/regression_tests/compare.py", line 40 in to_xarray_datatree
File "/usr/src/app/regression_tests/compare.py", line 29 in __init__
File "/usr/src/app/tests/test_compare.py", line 34 in __init__
File "/usr/src/app/tests/test_compare.py", line 113 in variable_comparison_class_test_data_a_b_fixture
File "/usr/local/lib/python3.12/site-packages/_pytest/fixtures.py", line 880 in call_fixture_func
File "/usr/local/lib/python3.12/site-packages/_pytest/fixtures.py", line 1125 in pytest_fixture_setup
File "/usr/local/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/usr/local/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/usr/local/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/usr/local/lib/python3.12/site-packages/_pytest/fixtures.py", line 1076 in execute
File "/usr/local/lib/python3.12/site-packages/_pytest/fixtures.py", line 606 in _get_active_fixturedef
File "/usr/local/lib/python3.12/site-packages/_pytest/fixtures.py", line 521 in getfixturevalue
File "/usr/local/lib/python3.12/site-packages/_pytest/fixtures.py", line 686 in _fillfixtures
File "/usr/local/lib/python3.12/site-packages/_pytest/python.py", line 1635 in setup
File "/usr/local/lib/python3.12/site-packages/_pytest/runner.py", line 514 in setup
File "/usr/local/lib/python3.12/site-packages/_pytest/runner.py", line 159 in pytest_runtest_setup
File "/usr/local/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/usr/local/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/usr/local/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/usr/local/lib/python3.12/site-packages/_pytest/runner.py", line 241 in <lambda>
File "/usr/local/lib/python3.12/site-packages/_pytest/runner.py", line 341 in from_call
File "/usr/local/lib/python3.12/site-packages/_pytest/runner.py", line 240 in call_and_report
File "/usr/local/lib/python3.12/site-packages/_pytest/runner.py", line 129 in runtestprotocol
File "/usr/local/lib/python3.12/site-packages/_pytest/runner.py", line 116 in pytest_runtest_protocol
File "/usr/local/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/usr/local/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/usr/local/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/usr/local/lib/python3.12/site-packages/_pytest/main.py", line 364 in pytest_runtestloop
File "/usr/local/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/usr/local/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/usr/local/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/usr/local/lib/python3.12/site-packages/_pytest/main.py", line 339 in _main
File "/usr/local/lib/python3.12/site-packages/_pytest/main.py", line 285 in wrap_session
File "/usr/local/lib/python3.12/site-packages/_pytest/main.py", line 332 in pytest_cmdline_main
File "/usr/local/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/usr/local/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/usr/local/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/usr/local/lib/python3.12/site-packages/_pytest/config/__init__.py", line 178 in main
File "/usr/local/lib/python3.12/site-packages/_pytest/config/__init__.py", line 206 in console_main
File "/usr/local/bin/pytest", line 8 in <module>
Extension modules: charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, cftime._cftime, netCDF4._netCDF4 (total: 58)
Segmentation fault
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.12.4 (main, Jun 7 2024, 19:15:23) [GCC 12.2.0]
python-bits: 64
OS: Linux
OS-release: 6.6.16-linuxkit
machine: aarch64
processor:
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('C', 'UTF-8')
libhdf5: 1.10.8
libnetcdf: 4.8.1
xarray: 2024.5.0
pandas: 2.2.2
numpy: 1.26.4
scipy: None
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.3.0
h5py: 3.11.0
zarr: None
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 70.0.0
pip: 24.0
conda: None
pytest: 8.2.2
mypy: None
IPython: 8.0.1
sphinx: None