Save spectrograms with `np.savez_compressed`

- [ ] can we reduce file size
- [ ] without affecting training
- [ ] and requiring a ton of re-engineering of dataset prep / datapipe class

Currently we use a "just a bunch of files of approach", which lets us use the same npz file--the spectrogram, the input to a model--with multiple npy files--the labels, the target of the model.

Sort of a worst case might be where we get a big benefit from jamming all the spectrograms in a single zarr archive, but that means we have to re-engineer all the code that assumes the spectrograms exist as separate files: the prep step, the dataset class, etc. The reason to prefer the separate files is mainly for tracking metadata and for readability, but maybe I am overvaluing this.

This doesn't need to be highest priority but it could help make it easier to upload the dataset.

edit: if we *were* to cram all the spectrograms into a single zarr archive, then we might want to access with a mem-mapping approach. [DAS docs](http://janclemenslab.org/das/technical/data_formats.html) suggest it's not easy to squeeze good performance out of this:
>  While zarr, h5py, and xarray provide mechanisms for out-of-memory access, they tend to be slower in our experience or require fine tuning to reach the performance reached with memmapped npy files.

I did find examples for pytorch + zarr previously in other domains but similarly got the impression that it's not a simple clear process to follow and it's not easy to troubleshoot. Although the point about just mem-mapping npy makes me wonder if I should try that

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save spectrograms with `np.savez_compressed` #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Save spectrograms with np.savez_compressed #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Save spectrograms with `np.savez_compressed` #1