Skip to content

can_apply and set_local filter callbacks skipped for H5T_VARIABLE dtypes #5942

@crusaderky

Description

@crusaderky

Reproduced on

  • hdf5 1.14.6, Linux x64
  • hdf5 2.0.0.4 git tip (cd2414b)

Issue description

If I create a variable string dtype

dtype = H5Tcopy(H5T_C_S1);
assert(dtype >= 0);
r = H5Tset_size(dtype, H5T_VARIABLE);
assert(r >= 0);

and then I create a dataset with a filter

r = H5Pset_filter(plist, FILTER_BLOSC, H5Z_FLAG_OPTIONAL, 0, NULL);
assert(r >= 0);
dset = H5Dcreate(fid, "dset", dtype, sid, H5P_DEFAULT, plist, H5P_DEFAULT);
assert(dset >= 0);

then the can_apply and set_local callbacks previously defined by H5Zregister are not invoked by H5DCreate. Nothing in the documentation hints at the potential of these callbacks to be skipped.

Impact

Both hdf5-blosc and hdf5-BLOSC2 rely on set_local to automatically amend cd_values. If set_local doesn't run, filter crashes.

What follows is two examples of typical user configuration:

unsigned int cd_values[7];
/* 0 to 3 (inclusive) param slots are reserved. */
cd_values[4] = 4;       /* compression level */
cd_values[5] = 1;       /* 0: shuffle not active, 1: shuffle active */
cd_values[6] = BLOSC_BLOSCLZ; /* the actual compressor to use */
r = H5Pset_filter(plist, FILTER_BLOSC, H5Z_FLAG_OPTIONAL, 7, cd_values);
assert(r >= 0);

in the above example, blosc_set_local fills in the previously uninitialised cd_values[0:4], so that blosc_filter can use them. Crucially, this includes the size of the data type, which is not retrievable from the filter callback itself.

r = H5Pset_filter(plist, FILTER_BLOSC, H5Z_FLAG_OPTIONAL, 0, NULL);
assert(r >= 0);

In the above example, blosc_set_local bumps up cd_nelmts to 4 and fills cd_values.

Workaround

hdf5-blosc

The final user can explicitly initialise cd_values[0:4] to work around the problem:

unsigned int cd_values[7] = {
    FILTER_BLOSC_VERSION, BLOSC_VERSION_FORMAT, 1, 0,
    4, 1, BLOSC_BLOSCLZ
};
r = H5Pset_filter(plist, FILTER_BLOSC, H5Z_FLAG_OPTIONAL, 7, cd_values);
assert(r >= 0);

or just the bare minimum:

unsigned int cd_values[4] = {FILTER_BLOSC_VERSION, BLOSC_VERSION_FORMAT, 1, 0};
r = H5Pset_filter(plist, FILTER_BLOSC, H5Z_FLAG_OPTIONAL, 4, cd_values);
assert(r >= 0);

hdf5-BLOSC2

There is no simple workaround for Blosc2, short of manually invoking blosc2_set_local, because the cd_values are heavily data-dependent and non-trivial.

Full reproducer

Blosc/hdf5-blosc#38

$ git clone https://github.com/crusaderky/hdf5-blosc.git 
$ cd hdf5-blosc
$ git checkout strings
$ pixi r build
$ pixi r test
[...]
 9/10 Test  #9: test_strings ...........................Subprocess aborted***Exception:   1.11 sec
test_strings: /home/crusaderky/github/tmp/hdf5-blosc/src/blosc_filter.c:176: blosc_filter: Assertion `cd_nelmts >= 4' failed.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

To be triaged

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions