Skip to content

Blosc/Blosc2 segfault with variable-width strings #364

@crusaderky

Description

@crusaderky

Blosc and Blosc2 crash when faced with variable-width strings, both the legacy object strings or the new NpyStrings a.k.a. StringDType.

This is caused by an upstream bug. Pytables is also affected.
#363 introduces unit tests for string dtypes, which have been temporarily skipped for blosc and blosc2.

Reproducer

compression i8 S3 object T
"gzip" ✔️ ✔️ ✔️ ✔️
"lzf" ✔️ ✔️ ✔️ ✔️
hdf5plugin.BZip2() ✔️ ✔️ ✔️ ✔️
hdf5plugin.LZ4() ✔️ ✔️ ✔️ ✔️
hdf5plugin.Blosc() ✔️ ✔️ segfault segfault
hdf5plugin.Blosc2() ✔️ ✔️ segfault segfault

Full reproducer:

import os

import h5py
import hdf5plugin
import numpy as np

fname = "/tmp/ds.h5"

for compression in (
    None,
    "gzip",
    "lzf",
    hdf5plugin.BZip2(),
    hdf5plugin.LZ4(),
    hdf5plugin.Blosc(),
    hdf5plugin.Blosc2(),
):
    for data in (
        np.asarray([1]),
        np.asarray(["foo"], dtype="S"),
        np.asarray([b"foo"], dtype="O"),
        np.asarray(["foo"], dtype="T"),
    ):
        print("desired compression =", compression)
        print("dtype =", data.dtype)

        # Optional: produce meaningful differences in file size
        data = np.tile(data, 1_000_000)

        with h5py.File(fname, "w") as f:
            f.create_dataset("mydataset", data=data, compression=compression)

        print("file size =", os.path.getsize(fname))
        with h5py.File(fname, "r+") as f:
            ds = f["mydataset"]
            print("actual compression =", ds.compression)
            print("compression_opts =", ds.compression_opts)

            actual = (ds.astype("T") if data.dtype.kind == "T" else ds)[:]
        np.testing.assert_array_equal(actual, data)

        print("=" * 80, flush=True)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions