(very fundamental) feat: get rid of the blessed 'system' multiindex #39

pbeaucage · 2022-07-24T04:23:38Z

Gather round, and let me tell you a story.

In the old days of PyHyper, long before there were clusters with terabytes of memory to play with, and when xarray was young, a postdoc wanted software that would collate heaps and heaps of RSoXS data - or indeed, any other data. RSoXS, you see, is the poster child for the curse of dimensionality. If you try to track 5 samples, 55 energies, 2 polarizations, and 5 rotations with a 2048x2048 image at each your head quickly explodes. Or at least, your computer's memory does, because that stack has a footprint in the hundreds of gb. So the postdoc learned about pandas.MultiIndex, realized you could stack that horrible pile of garbage into one, and then the computer's memory would not explode. There was much rejoicing. And the code grew, and grew, building out from the idea that there was one index that an integrator could operate on. People always asked "why do I have to .unstack('system')? what's a .system? what does it mean to .unstack()? And there was always much wailing and gnashing of teeth when the tutorial inevitably took a detour into how xarray is not just a magical data cloud but a real in memory representation with a footprint, and actually a quite restrictive one at that, and what is a sparse-array, and why don't we use one, and arrrgh. Cue the inevitable detour away from science into data machinery - exactly what this library is intended to prevent.

Here's the thing: if your code is clever, and can inspect its own xarray.Indexes, it can just dynamically generate this multiindex when it's needed (say, for integration), then unstack it out of existence. Or, it can notice that your array already has a multiindex and just use that.

There are still cases - many cases, in fact - where having a single multiindex is good. Massive RSoXS data cubes are one. But there are at least an equal number of cases where imposing a multiindex makes users jump through needless hoops.

This is that clever issue. Make PFEnergySeriesIntegrator and PFGeneralIntegrator and WPIntegrator not care about the existence of a system, neither the name nor the single multiindex concept.

The text was updated successfully, but these errors were encountered:

pbeaucage · 2022-09-12T12:52:43Z

This appears to have caused follow on errors #41 and probably others. Shouldn’t have been closed

pbeaucage added a commit that referenced this issue Jul 24, 2022

Initial attempt at #39

711af20

pbeaucage closed this as completed in b7a9e04 Jul 24, 2022

pbeaucage added a commit that referenced this issue Jul 24, 2022

Addnl typo fixes #39

8ddfef4

pbeaucage added a commit that referenced this issue Jul 24, 2022

Bugfix #39

c8c6965

pbeaucage added a commit that referenced this issue Jul 24, 2022

typo fix #39

2371ac2

pbeaucage reopened this Sep 12, 2022

pbeaucage mentioned this issue Sep 12, 2022

Multiple tests failing on KeyError: Energy #41

Closed

pbeaucage mentioned this issue Jan 22, 2023

refactor: xr.Dataset as primary data structure #62

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(very fundamental) feat: get rid of the blessed 'system' multiindex #39

(very fundamental) feat: get rid of the blessed 'system' multiindex #39

pbeaucage commented Jul 24, 2022

pbeaucage commented Sep 12, 2022

(very fundamental) feat: get rid of the blessed 'system' multiindex #39

(very fundamental) feat: get rid of the blessed 'system' multiindex #39

Comments

pbeaucage commented Jul 24, 2022

pbeaucage commented Sep 12, 2022