You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the old days of PyHyper, long before there were clusters with terabytes of memory to play with, and when xarray was young, a postdoc wanted software that would collate heaps and heaps of RSoXS data - or indeed, any other data. RSoXS, you see, is the poster child for the curse of dimensionality. If you try to track 5 samples, 55 energies, 2 polarizations, and 5 rotations with a 2048x2048 image at each your head quickly explodes. Or at least, your computer's memory does, because that stack has a footprint in the hundreds of gb. So the postdoc learned about pandas.MultiIndex, realized you could stack that horrible pile of garbage into one, and then the computer's memory would not explode. There was much rejoicing. And the code grew, and grew, building out from the idea that there was one index that an integrator could operate on. People always asked "why do I have to .unstack('system')? what's a .system? what does it mean to .unstack()? And there was always much wailing and gnashing of teeth when the tutorial inevitably took a detour into how xarray is not just a magical data cloud but a real in memory representation with a footprint, and actually a quite restrictive one at that, and what is a sparse-array, and why don't we use one, and arrrgh. Cue the inevitable detour away from science into data machinery - exactly what this library is intended to prevent.
Here's the thing: if your code is clever, and can inspect its own xarray.Indexes, it can just dynamically generate this multiindex when it's needed (say, for integration), then unstack it out of existence. Or, it can notice that your array already has a multiindex and just use that.
There are still cases - many cases, in fact - where having a single multiindex is good. Massive RSoXS data cubes are one. But there are at least an equal number of cases where imposing a multiindex makes users jump through needless hoops.
This is that clever issue. Make PFEnergySeriesIntegrator and PFGeneralIntegrator and WPIntegrator not care about the existence of a system, neither the name nor the single multiindex concept.
The text was updated successfully, but these errors were encountered:
Gather round, and let me tell you a story.
In the old days of PyHyper, long before there were clusters with terabytes of memory to play with, and when xarray was young, a postdoc wanted software that would collate heaps and heaps of RSoXS data - or indeed, any other data. RSoXS, you see, is the poster child for the curse of dimensionality. If you try to track 5 samples, 55 energies, 2 polarizations, and 5 rotations with a 2048x2048 image at each your head quickly explodes. Or at least, your computer's memory does, because that stack has a footprint in the hundreds of gb. So the postdoc learned about pandas.MultiIndex, realized you could stack that horrible pile of garbage into one, and then the computer's memory would not explode. There was much rejoicing. And the code grew, and grew, building out from the idea that there was one index that an integrator could operate on. People always asked "why do I have to
.unstack('system')
? what's a.system
? what does it mean to.unstack()
? And there was always much wailing and gnashing of teeth when the tutorial inevitably took a detour into how xarray is not just a magical data cloud but a real in memory representation with a footprint, and actually a quite restrictive one at that, and what is a sparse-array, and why don't we use one, and arrrgh. Cue the inevitable detour away from science into data machinery - exactly what this library is intended to prevent.Here's the thing: if your code is clever, and can inspect its own
xarray.Indexes
, it can just dynamically generate this multiindex when it's needed (say, for integration), then unstack it out of existence. Or, it can notice that your array already has a multiindex and just use that.There are still cases - many cases, in fact - where having a single multiindex is good. Massive RSoXS data cubes are one. But there are at least an equal number of cases where imposing a multiindex makes users jump through needless hoops.
This is that clever issue. Make PFEnergySeriesIntegrator and PFGeneralIntegrator and WPIntegrator not care about the existence of a system, neither the name nor the single multiindex concept.
The text was updated successfully, but these errors were encountered: