Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(very fundamental) feat: get rid of the blessed 'system' multiindex #39

Open
pbeaucage opened this issue Jul 24, 2022 · 1 comment
Open

Comments

@pbeaucage
Copy link
Collaborator

Gather round, and let me tell you a story.

In the old days of PyHyper, long before there were clusters with terabytes of memory to play with, and when xarray was young, a postdoc wanted software that would collate heaps and heaps of RSoXS data - or indeed, any other data. RSoXS, you see, is the poster child for the curse of dimensionality. If you try to track 5 samples, 55 energies, 2 polarizations, and 5 rotations with a 2048x2048 image at each your head quickly explodes. Or at least, your computer's memory does, because that stack has a footprint in the hundreds of gb. So the postdoc learned about pandas.MultiIndex, realized you could stack that horrible pile of garbage into one, and then the computer's memory would not explode. There was much rejoicing. And the code grew, and grew, building out from the idea that there was one index that an integrator could operate on. People always asked "why do I have to .unstack('system')? what's a .system? what does it mean to .unstack()? And there was always much wailing and gnashing of teeth when the tutorial inevitably took a detour into how xarray is not just a magical data cloud but a real in memory representation with a footprint, and actually a quite restrictive one at that, and what is a sparse-array, and why don't we use one, and arrrgh. Cue the inevitable detour away from science into data machinery - exactly what this library is intended to prevent.

Here's the thing: if your code is clever, and can inspect its own xarray.Indexes, it can just dynamically generate this multiindex when it's needed (say, for integration), then unstack it out of existence. Or, it can notice that your array already has a multiindex and just use that.

There are still cases - many cases, in fact - where having a single multiindex is good. Massive RSoXS data cubes are one. But there are at least an equal number of cases where imposing a multiindex makes users jump through needless hoops.

This is that clever issue. Make PFEnergySeriesIntegrator and PFGeneralIntegrator and WPIntegrator not care about the existence of a system, neither the name nor the single multiindex concept.

pbeaucage added a commit that referenced this issue Jul 24, 2022
pbeaucage added a commit that referenced this issue Jul 24, 2022
pbeaucage added a commit that referenced this issue Jul 24, 2022
pbeaucage added a commit that referenced this issue Jul 24, 2022
@pbeaucage pbeaucage reopened this Sep 12, 2022
@pbeaucage
Copy link
Collaborator Author

This appears to have caused follow on errors #41 and probably others. Shouldn’t have been closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant