Add SparseHist wrapper for large multi-systematic histograms by bendavid · Pull Request #25 · WMass/wums

bendavid · 2026-04-09T06:40:01Z

Three commits adding a SparseHist wrapper class around scipy
sparse arrays carrying hist axes metadata, plus supporting fixes.

This provides a minimal python representation for sparse boost
histograms in C++ from narf which allows them to be pickled
and/or passed directly to rabbit without creating a dense intermediate.

Add SparseHist wrapper combining a scipy sparse array with hist
axes (c833677): stores the dense N-D shape implied by a sequence
of hist axes in the with-flow layout (axis.extent per axis) and
provides toarray and to_flat_csr methods that extract either
the with-flow or no-flow representation. Also supports dict-style
slicing along axes by regular-bin index for use cases such as
multi-systematic dispatch in rabbit.
Use int64 indices in SparseHist.to_flat_csr for large flat
sizes (256be1f): the CSR returned previously cast indices and
indptr to int32, which silently overflowed when the flat target
size exceeded the int32 range. This affected SparseHist instances
built from large multi-axis inputs (e.g. an
(eta, phi, pt, mass, corparms) hist with ~108k corparms, where
the with-flow flat size is ~6.3 billion bins). Now switch to int64
whenever the target size does not fit in int32.
Protect against future incompatible change in hist (71f7eb6).

The wrapper stores the dense N-D shape implied by a sequence of hist axes in the with-flow layout (axis.extent per axis) and provides toarray and to_flat_csr methods that can extract either the with-flow or no-flow representation. Also supports dict-style slicing along axes by regular-bin index for use cases such as multi-systematic dispatch in rabbit. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

The CSR returned by to_flat_csr always cast indices and indptr to int32, which silently overflowed when the flat target size exceeded the int32 range. This affected SparseHist instances built from large multi-axis inputs (e.g. a (eta, phi, pt, mass, corparms) hist with ~108k corparms, where the with-flow flat size is ~6.3 billion bins). Now switch to int64 whenever the target size does not fit in int32. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

davidwalter2 · 2026-04-10T01:46:27Z

wums/sparse_hist.py

+        return tuple.__getitem__(self, key)
+
+
+class SparseHist:


Not sure if this is the idea but if we want to use SparseHist as drop in replacement for a regular Hist object we should give it the same attributes.

Right now "name" and "label" are the obvious ones missing.

There are also small differences e.g. the .shape for SparseHist includes under/overflow while it it not included in the regular Hist.

On the Hist object I can also do things like "h_dense.axes.name" which doesn't work for the SparseHist.

Functions like "fill" or "project" could be set as "NotImplemented" or "NotSupported"

Just for reference this is the full list:

>>> h_sparse.__dir__() ['_axes', '_dense_shape', '_size', '_flat_indices', '_values', '__module__', '__firstlineno__', '__doc__', '_underflow_offset', '__init__', '_from_flat', 'axes', 'shape', 'dtype', 'nnz', 'toarray', 'tocoo', 'to_flat_csr', '__getitem__', '__static_attributes__', '__dict__', '__weakref__', '__new__', '__repr__', '__hash__', '__str__', '__getattribute__', '__setattr__', '__delattr__', '__lt__', '__le__', '__eq__', '__ne__', '__gt__', '__ge__', '__reduce_ex__', '__reduce__', '__getstate__', '__subclasshook__', '__init_subclass__', '__format__', '__sizeof__', '__dir__', '__class__'] >>> h_dense.__dir__() ['_variance_known', 'name', 'label', '__module__', '__firstlineno__', '__static_attributes__', '__orig_bases__', '__weakref__', '__doc__', '__parameters__', '_family', '__slots__', '__init__', '_generate_axes_', '_repr_html_', '_name_to_index', '_to_uhi_', 'from_columns', 'project', 'T', 'fill', 'fill_flattened', 'sort', '_convert_index_wildcards', '_loc_shortcut', '_step_shortcut', '_index_transform', '__getitem__', '__setitem__', 'profile', 'density', 'show', 'plot', 'plot1d', 'plot2d', 'plot2d_full', 'plot_ratio', 'plot_pull', 'plot_pie', 'stack', 'integrate', '__annotations__', '__init_subclass__', '_clone', '_new_hist', '_from_histogram_cpp', '_from_histogram_object', '_import_bh_', '_export_bh_', '__getattr__', '_from_uhi_', 'ndim', 'view', '__array__', '__hash__', '__eq__', '__ne__', '__add__', '__iadd__', '__radd__', '__sub__', '__isub__', '__mul__', '__rmul__', '__truediv__', '__div__', '__idiv__', '__itruediv__', '__imul__', '_compute_inplace_op', '__str__', '_axis', 'storage_type', '_storage_type', '_reduce', '__copy__', '__deepcopy__', '__getstate__', '__setstate__', '__repr__', '_compute_uhi_index', '_compute_commonindex', 'to_numpy', 'copy', 'reset', 'empty', 'sum', 'size', 'shape', '_handle_slice', '_rebin_with_groups', 'kind', 'values', 'variances', 'counts', '_hist', 'axes', '__dict__', '_types', '__class_getitem__', '__new__', '__getattribute__', '__setattr__', '__delattr__', '__lt__', '__le__', '__gt__', '__ge__', '__reduce_ex__', '__reduce__', '__subclasshook__', '__format__', '__sizeof__', '__dir__', '__class__'] >>> h_dense.__dict__ {'_variance_known': True, 'name': None, 'label': None} >>> h_sparse.__dict__ {'_axes': (Regular(20, -5, 5, name='x'),), '_dense_shape': (22,), '_size': 22, '_flat_indices': array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]), '_values': array([265., 235., 247., 249., 249., 263., 260., 265., 226., 248., 247., 254., 230., 227., 261., 246., 254., 283., 242., 249.])}

bendavid and others added 3 commits April 7, 2026 02:09

protect against future incompatible change in hist

71f7eb6

This was referenced Apr 9, 2026

SparseStorage, concurrent_flat_map, and SparseMatrixAtomic bendavid/narf#43

Open

Sparse mode performance, SparseHist input dispatch, and low-memory --noHessian mode WMass/rabbit#129

Open

davidwalter2 reviewed Apr 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SparseHist wrapper for large multi-systematic histograms#25

Add SparseHist wrapper for large multi-systematic histograms#25
bendavid wants to merge 3 commits intoWMass:mainfrom
bendavid:sparsehists

bendavid commented Apr 9, 2026

Uh oh!

davidwalter2 Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bendavid commented Apr 9, 2026

Uh oh!

davidwalter2 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants