Add SparseHist wrapper for large multi-systematic histograms#25
Open
bendavid wants to merge 3 commits intoWMass:mainfrom
Open
Add SparseHist wrapper for large multi-systematic histograms#25bendavid wants to merge 3 commits intoWMass:mainfrom
bendavid wants to merge 3 commits intoWMass:mainfrom
Conversation
The wrapper stores the dense N-D shape implied by a sequence of hist axes in the with-flow layout (axis.extent per axis) and provides toarray and to_flat_csr methods that can extract either the with-flow or no-flow representation. Also supports dict-style slicing along axes by regular-bin index for use cases such as multi-systematic dispatch in rabbit. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The CSR returned by to_flat_csr always cast indices and indptr to int32, which silently overflowed when the flat target size exceeded the int32 range. This affected SparseHist instances built from large multi-axis inputs (e.g. a (eta, phi, pt, mass, corparms) hist with ~108k corparms, where the with-flow flat size is ~6.3 billion bins). Now switch to int64 whenever the target size does not fit in int32. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
| return tuple.__getitem__(self, key) | ||
|
|
||
|
|
||
| class SparseHist: |
Collaborator
There was a problem hiding this comment.
Not sure if this is the idea but if we want to use SparseHist as drop in replacement for a regular Hist object we should give it the same attributes.
Right now "name" and "label" are the obvious ones missing.
There are also small differences e.g. the .shape for SparseHist includes under/overflow while it it not included in the regular Hist.
On the Hist object I can also do things like "h_dense.axes.name" which doesn't work for the SparseHist.
Functions like "fill" or "project" could be set as "NotImplemented" or "NotSupported"
Just for reference this is the full list:
>>> h_sparse.__dir__()
['_axes', '_dense_shape', '_size', '_flat_indices', '_values', '__module__', '__firstlineno__', '__doc__', '_underflow_offset', '__init__', '_from_flat', 'axes', 'shape', 'dtype', 'nnz', 'toarray', 'tocoo', 'to_flat_csr', '__getitem__', '__static_attributes__', '__dict__', '__weakref__', '__new__', '__repr__', '__hash__', '__str__', '__getattribute__', '__setattr__', '__delattr__', '__lt__', '__le__', '__eq__', '__ne__', '__gt__', '__ge__', '__reduce_ex__', '__reduce__', '__getstate__', '__subclasshook__', '__init_subclass__', '__format__', '__sizeof__', '__dir__', '__class__']
>>> h_dense.__dir__()
['_variance_known', 'name', 'label', '__module__', '__firstlineno__', '__static_attributes__', '__orig_bases__', '__weakref__', '__doc__', '__parameters__', '_family', '__slots__', '__init__', '_generate_axes_', '_repr_html_', '_name_to_index', '_to_uhi_', 'from_columns', 'project', 'T', 'fill', 'fill_flattened', 'sort', '_convert_index_wildcards', '_loc_shortcut', '_step_shortcut', '_index_transform', '__getitem__', '__setitem__', 'profile', 'density', 'show', 'plot', 'plot1d', 'plot2d', 'plot2d_full', 'plot_ratio', 'plot_pull', 'plot_pie', 'stack', 'integrate', '__annotations__', '__init_subclass__', '_clone', '_new_hist', '_from_histogram_cpp', '_from_histogram_object', '_import_bh_', '_export_bh_', '__getattr__', '_from_uhi_', 'ndim', 'view', '__array__', '__hash__', '__eq__', '__ne__', '__add__', '__iadd__', '__radd__', '__sub__', '__isub__', '__mul__', '__rmul__', '__truediv__', '__div__', '__idiv__', '__itruediv__', '__imul__', '_compute_inplace_op', '__str__', '_axis', 'storage_type', '_storage_type', '_reduce', '__copy__', '__deepcopy__', '__getstate__', '__setstate__', '__repr__', '_compute_uhi_index', '_compute_commonindex', 'to_numpy', 'copy', 'reset', 'empty', 'sum', 'size', 'shape', '_handle_slice', '_rebin_with_groups', 'kind', 'values', 'variances', 'counts', '_hist', 'axes', '__dict__', '_types', '__class_getitem__', '__new__', '__getattribute__', '__setattr__', '__delattr__', '__lt__', '__le__', '__gt__', '__ge__', '__reduce_ex__', '__reduce__', '__subclasshook__', '__format__', '__sizeof__', '__dir__', '__class__']
>>> h_dense.__dict__
{'_variance_known': True, 'name': None, 'label': None}
>>> h_sparse.__dict__
{'_axes': (Regular(20, -5, 5, name='x'),), '_dense_shape': (22,), '_size': 22, '_flat_indices': array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20]), '_values': array([265., 235., 247., 249., 249., 263., 260., 265., 226., 248., 247.,
254., 230., 227., 261., 246., 254., 283., 242., 249.])}
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Three commits adding a
SparseHistwrapper class around scipysparse arrays carrying hist axes metadata, plus supporting fixes.
This provides a minimal python representation for sparse boost
histograms in C++ from narf which allows them to be pickled
and/or passed directly to rabbit without creating a dense intermediate.
Add
SparseHistwrapper combining a scipy sparse array with histaxes (c833677): stores the dense N-D shape implied by a sequence
of hist axes in the with-flow layout (
axis.extentper axis) andprovides
toarrayandto_flat_csrmethods that extract eitherthe with-flow or no-flow representation. Also supports dict-style
slicing along axes by regular-bin index for use cases such as
multi-systematic dispatch in rabbit.
Use int64 indices in
SparseHist.to_flat_csrfor large flatsizes (256be1f): the CSR returned previously cast indices and
indptr to int32, which silently overflowed when the flat target
size exceeded the int32 range. This affected SparseHist instances
built from large multi-axis inputs (e.g. an
(eta, phi, pt, mass, corparms)hist with ~108k corparms, wherethe with-flow flat size is ~6.3 billion bins). Now switch to int64
whenever the target size does not fit in int32.
Protect against future incompatible change in hist (71f7eb6).