- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 311
Description
Reproduced on
- hdf5 1.14.6, Linux x64
- hdf5 2.0.0.4 git tip (cd2414b)
Issue description
If I create a variable string dtype
dtype = H5Tcopy(H5T_C_S1);
assert(dtype >= 0);
r = H5Tset_size(dtype, H5T_VARIABLE);
assert(r >= 0);and then I create a dataset with a filter
r = H5Pset_filter(plist, FILTER_BLOSC, H5Z_FLAG_OPTIONAL, 0, NULL);
assert(r >= 0);
dset = H5Dcreate(fid, "dset", dtype, sid, H5P_DEFAULT, plist, H5P_DEFAULT);
assert(dset >= 0);then the can_apply and set_local callbacks previously defined by H5Zregister are not invoked by H5DCreate. Nothing in the documentation hints at the potential of these callbacks to be skipped.
Impact
Both hdf5-blosc and hdf5-BLOSC2 rely on set_local to automatically amend cd_values. If set_local doesn't run, filter crashes.
What follows is two examples of typical user configuration:
unsigned int cd_values[7];
/* 0 to 3 (inclusive) param slots are reserved. */
cd_values[4] = 4;       /* compression level */
cd_values[5] = 1;       /* 0: shuffle not active, 1: shuffle active */
cd_values[6] = BLOSC_BLOSCLZ; /* the actual compressor to use */
r = H5Pset_filter(plist, FILTER_BLOSC, H5Z_FLAG_OPTIONAL, 7, cd_values);
assert(r >= 0);in the above example, blosc_set_local fills in the previously uninitialised cd_values[0:4], so that blosc_filter can use them. Crucially, this includes the size of the data type, which is not retrievable from the filter callback itself.
r = H5Pset_filter(plist, FILTER_BLOSC, H5Z_FLAG_OPTIONAL, 0, NULL);
assert(r >= 0);In the above example, blosc_set_local bumps up cd_nelmts to 4 and fills cd_values.
Workaround
hdf5-blosc
The final user can explicitly initialise cd_values[0:4] to work around the problem:
unsigned int cd_values[7] = {
    FILTER_BLOSC_VERSION, BLOSC_VERSION_FORMAT, 1, 0,
    4, 1, BLOSC_BLOSCLZ
};
r = H5Pset_filter(plist, FILTER_BLOSC, H5Z_FLAG_OPTIONAL, 7, cd_values);
assert(r >= 0);or just the bare minimum:
unsigned int cd_values[4] = {FILTER_BLOSC_VERSION, BLOSC_VERSION_FORMAT, 1, 0};
r = H5Pset_filter(plist, FILTER_BLOSC, H5Z_FLAG_OPTIONAL, 4, cd_values);
assert(r >= 0);hdf5-BLOSC2
There is no simple workaround for Blosc2, short of manually invoking blosc2_set_local, because the cd_values are heavily data-dependent and non-trivial.
Full reproducer
$ git clone https://github.com/crusaderky/hdf5-blosc.git 
$ cd hdf5-blosc
$ git checkout strings
$ pixi r build
$ pixi r test
[...]
 9/10 Test  #9: test_strings ...........................Subprocess aborted***Exception:   1.11 sec
test_strings: /home/crusaderky/github/tmp/hdf5-blosc/src/blosc_filter.c:176: blosc_filter: Assertion `cd_nelmts >= 4' failed.Metadata
Metadata
Assignees
Labels
Type
Projects
Status