[Discussion]: syntax differences due to backend differences

### What would you like to see added to PyNWB?

In a conversation with @satra and @oruebel at the BRAIN meeting, we discussed syntax differences in pynwb that are the result of using different backends. This has the potential to cause bugs and complications for downstream code. I am aware of three areas in PyNWB where differences appear.

1. IO for reading and writing files: A user needs to decide between `NWBHDF5IO` and `NWBZarrIO` in order to read or write a file. The writing part doesn't seem that bad since the user will be required to indicate the backend of choice somehow and using a different class seems like a reasonable way to do that. However, for read you might prefer that pynwb determines the backend automatically. Otherwise, we are going to need to cover the read classes all of the different possible backends in tutorials. IMHO it wouldn't hurt to have an `NWBIO` class that can automatically determine the backend. This is a relatively simple solution and would make the pynwb library more user-friendly.
relevant issues: https://github.com/NeurodataWithoutBorders/pynwb/issues/858

2. dataset configuration on write: chunking, compression, filters, shuffling, etc. are specified slightly differently between HDF5/h5py and Zarr. You might consider creating a unifying api layer that translates into the different specs for the backend (@satra was advocating for this) however, there are enough differences in the capabilities and logic of the different approaches that this would not be straightforward. If we do create a unifying language, it would be difficult not to restrict our utilization of configuration capabilities of particular datasets. For example, I'd like to use WavPack, but that is only available with Zarr currently.
3. Dataset indexing when reading: When you read datasets from an NWB file, you get `h5py.Dataset` vs. `zarr.Dataset`. These classes act similarly to a `np.ndarray`, but there are enough differences that it will likely cause bugs for any analysis script that wants to work on flexible backends. For example, this code works in Zarr:

```python
import zarr
import numpy as np

# Create a new Zarr array
zarr_array = zarr.zeros((10, 10))  # Create a 10x10 array of zeros
zarr_array[:] = np.random.rand(10, 10)  # Fill the array with random numbers

# Assume zarr_array is a 2D array, create index arrays
rows = np.array([0, 1, 2])
cols = np.array([1, 2, 3])

# Use multidimensional indexing
subset = zarr_array[rows, cols]

# Print the subset
print(subset)
```
```
[0.43752435 0.57966441 0.86366265]
```

but the following analogous code in h5py does not work:

```python
import h5py
import numpy as np

# Create a new HDF5 file and dataset
file = h5py.File('filename.hdf5', 'w')
data = np.random.rand(10, 10)  # Create a 10x10 array of random numbers
dataset = file.create_dataset('dataset', data=data)

# Assume dataset is a 2D array, create index arrays
rows = np.array([0, 1, 2])
cols = np.array([1, 2, 3])

# Use multidimensional indexing
subset = dataset[rows, cols]

# Print the subset
print(subset)

# Remember to close the file
file.close()
```

```pytb
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [9], line 10
      7 dataset = file.create_dataset('dataset', data=data)
      9 # Use multidimensional indexing
---> 10 subset = dataset[rows, cols]
     12 # Print the subset
     13 print(subset)

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File ~/opt/miniconda3/lib/python3.9/site-packages/h5py/_hl/dataset.py:814, in Dataset.__getitem__(self, args, new_dtype)
    809     return arr
    811 # === Everything else ===================
    812 
    813 # Perform the dataspace selection.
--> 814 selection = sel.select(self.shape, args, dataset=self)
    816 if selection.nselect == 0:
    817     return numpy.zeros(selection.array_shape, dtype=new_dtype)

File ~/opt/miniconda3/lib/python3.9/site-packages/h5py/_hl/selections.py:82, in select(shape, args, dataset)
     79     space = h5s.create_simple(shape)
     80     selector = _selector.Selector(space)
---> 82 return selector.make_selection(args)

File h5py/_selector.pyx:276, in h5py._selector.Selector.make_selection()

File h5py/_selector.pyx:189, in h5py._selector.Selector.apply_args()

TypeError: Only one indexing vector or array is currently allowed for fancy indexing
```

There are many other subtle differences between the indexing of these different array-like classes. See the [Zarr docs on fancy indexing](https://zarr.readthedocs.io/en/stable/tutorial.html#indexing-with-a-mask-array).

In NWB Widgets, we are starting to run into bugs that are the result of these differences, e.g. https://github.com/NeurodataWithoutBorders/nwbwidgets/pull/283. Without a proper response, I anticipate that these types of issues will accumulate across the cross-section of analysis tools and backends.


### Is your feature request related to a problem?

_No response_

### What solution would you like?

There are some pretty big trade-offs here, including homogenization, backwards compatibility, and evaluation of the complexity of each solution. I think it deserves some discussion.

### Do you have any interest in helping implement the feature?

Yes.

### Code of Conduct

- [X] I agree to follow this project's [Code of Conduct](https://github.com/NeurodataWithoutBorders/pynwb/blob/dev/.github/CODE_OF_CONDUCT.rst)
- [X] Have you checked the [Contributing](https://github.com/NeurodataWithoutBorders/pynwb/blob/dev/docs/CONTRIBUTING.rst) document?
- [X] Have you ensured this change was not already [requested](https://github.com/NeurodataWithoutBorders/pynwb/issues)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Discussion]: syntax differences due to backend differences #1702

What would you like to see added to PyNWB?

Is your feature request related to a problem?

What solution would you like?

Do you have any interest in helping implement the feature?

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Discussion]: syntax differences due to backend differences #1702

Description

What would you like to see added to PyNWB?

Is your feature request related to a problem?

What solution would you like?

Do you have any interest in helping implement the feature?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions