Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module 'spatialdata' has no attribute 'match_sdata_to_table' #912

Open
bellenger-l opened this issue Mar 25, 2025 · 5 comments
Open

module 'spatialdata' has no attribute 'match_sdata_to_table' #912

bellenger-l opened this issue Mar 25, 2025 · 5 comments

Comments

@bellenger-l
Copy link

Hello,

I wanted to filter my Xenium data but I have an error with the function match_sdata_to_table.

Here an reproducible example

```python
import spatialdata as spd
from spatialdata.datasets import blobs

sdata = blobs()
sub_adata = sdata.tables["table"][:10]
sub_sdata = spd.match_sdata_to_table(
sdata=sdata, table_name="table", table=sub_adata, how="right"
)
```

Return the following error :

AttributeError: module 'spatialdata' has no attribute 'match_sdata_to_table'. Did you mean: 'match_element_to_table'?

Desktop (optional):

  • RedHat (8.7)
  • spatialdata 0.3.0

How can I fix this ?
Thanks for your time
Best
Lea

@Pancreas-Pratik
Copy link

Pancreas-Pratik commented Mar 27, 2025

Hi Lea,
I had this same issue. I learned that spatialdata 0.3.0 is the release version.
match_sdata_to_table is currently in the dev version.

In order to fix this I had to install a dev version that had the pull request #627. I chose to go to install the additional pull request #883 just to avoid any other potential issues.

How to install a specific pull request:

If you go on the main page https://github.com/scverse/spatialdata and click "Commits":

Image

and then click the double square to "Copy SHA" [yellow arrow]:

Image

and then construct the pip install command for spatialdata and to specify the specific dev version with the SHA:

pip install git+https://github.com/scverse/spatialdata.git@6e259f0afbf67379ade21e62560910e89f752c68

You may need git as well in order to run that command. If you are on an institution HPC, you could see if git is already installed via module avail git and module load <git output from module avail git>.

Respectfully,
Pratik

@bellenger-l
Copy link
Author

bellenger-l commented Mar 31, 2025

Hello @Pancreas-Pratik ,

Thanks a lot for your help !! Your explanation were completely clear. Now the function works, I lost a lot of information (all images, Points, Labels and some shapes) in my subsetted spatialdata object but I think it's another problem, I'll take a closer look.

Best regards,
Lea

@Pancreas-Pratik
Copy link

Hello @Pancreas-Pratik ,

Thanks a lot for your help !! Your explanation were completely clear. Now the function works, I lost a lot of information (all images, Points, Labels and some shapes) in my subsetted spatialdata object but I think it's another problem, I'll take a closer look.

Best regards, Lea

You are welcome @bellenger-l
I am unsure regarding the the loss of information. I am just learning how to use spatialdata myself, therefore I could not pinpoint where your issue is for the sake of helping you with troubleshooting.

What I do is, I load a fresh spatialdata object via xenium() every time I am analyzing, and then just run through all of the code I had written and saved in my jupyter .ipynb notebook from start to where I had left off. So in your case, if you were to do it this way, the code you used for subset would have to be re-run every time you restart your jupyter notebook kernel. Maybe the way I am doing it is bad practice, since I have read and write the entire xenium /outs/ folder every time I am working on this project, but it feels cleaner to me in a way in terms of reproducibility (knowing that whoever runs the code I am running, it should work for them every time).

I have not figured out the .zarr file saving option yet completely. I think that is how to save progress on changes made to a spatialdata in an intermediate space?

@bellenger-l
Copy link
Author

bellenger-l commented Mar 31, 2025

There is two separate things, I'm afraid...

  • the Zarr store that allow to have your spatialdata object in a different space where some element can be saved throughout your analysis.
  • the loss of information due to the filtering.

For instance, I use the Zarr store because I am testing spatialdata and different spatial transcriptomics packages and some steps are very time consuming and I don't necessarily want to compute everything from the beggining. It's working fine in my opinion except when we want to save the table (anndata within the spatialdata object), we need to save the entire zarr store again.

Regarding the match_sdata_to_table function this is what I see :

  • My spatialdata before filtering sdata :
SpatialData object, with associated Zarr store: /home/blabla/object.zarr
├── Images
│     ├── 'he_image': DataArray[cyx] 
│     ├── 'morphology_focus': DataTree[cyx] 
│     └── 'morphology_mip': DataTree[cyx] 
├── Labels
│     ├── 'cell_labels': DataTree[yx] 
│     └── 'nucleus_labels': DataTree[yx] 
├── Points
│     └── 'transcripts': DataFrame with shape: (<Delayed>, 10) (3D points)
├── Shapes
│     ├── 'cell_boundaries': GeoDataFrame shape: (413404, 1) (2D shapes)
│     ├── 'cell_circles': GeoDataFrame shape: (413404, 2) (2D shapes)
│     ├── 'nucleus_boundaries': GeoDataFrame shape: (413404, 1) (2D shapes)
│     └── 'tissue_outline': GeoDataFrame shape: (27, 1) (2D shapes)
└── Tables
      └── 'table': AnnData (413404, 426)
with coordinate systems:
    ▸ 'global', with elements:
        he_image (Images), morphology_focus (Images), morphology_mip (Images), cell_labels (Labels), nucleus_labels (Labels), transcripts (Points), cell_boundaries (Shapes), cell_circles (Shapes), nucleus_boundaries (Shapes), tissue_outline (Shapes)
  • My subsetted spatialdata object obtained with the following command : sub_sdata = spd.match_sdata_to_table(sdata=sdata, table_name="table", table=sub_adata, how="right")
SpatialData object
├── Shapes
│     └── 'cell_circles': GeoDataFrame shape: (411085, 2) (2D shapes)
└── Tables
      └── 'table': AnnData (411085, 426)
with coordinate systems:
    ▸ 'global', with elements:
        cell_circles (Shapes)

So even when I perform the filtering and affect the result in another variable, I correctly retrieve cells of interest but at the expense of different spatialdata slots.
What I don't know is why ? My hypothesis is it's somehow due to the annotation tables that are not linked to the different elements of spatialdata (except cell_circles?)

Did you see the same phenomenon when using match_sdata_to_table function ?

@Pancreas-Pratik
Copy link

Oh. I can help with this.

I was having trouble with the same, which is , essentially, re-inputting the anndata back into the sdata and subsetting cell_boundaries and cell_circles (I imagine the same can be done for nucleus_boundaries). @LucaMarconato actually helped me with this exact issue. His solution is here: #898 (comment) Below is how I implemented his solution for myself and renamed the object names to your object names.


How to subset the cell_circles and cell_boundaries and re-input subsetted anndata back into the spatialdata (See ***Note below):

# backup sdata first
sdata_backup=sdata

# subset for cell_circles
sub_sdata = spd.match_sdata_to_table(sdata=sdata, table_name="table", table=sub_adata, how="right")
sdata.shapes['cell_circles']=sub_sdata.shapes['cell_circles']

# repeat for cell_boundaries
sdata["table"].obs["region"] = "cell_boundaries"
sdata.set_table_annotates_spatialelement(table_name="table", region="cell_boundaries")

sub_sdata = spd.match_sdata_to_table(sdata=sdata, table_name="table", table=sub_adata, how="right")
sdata.shapes['cell_boundaries']=sub_sdata.shapes['cell_boundaries']

# and then re-input the anndata
sdata.tables["table"] = sub_adata

# sdata should have cell_circles and cell_boundaries filtered by sub_adata and sub_adata should be re-inputted back into the sdata 
sdata

***Note: If you see @LucaMarconato solution to my issue above, he said the current solution is only a temporary fix:
From #898 (comment):

Two comments:

we are currently improving the ergonomics around these type of operations with a new API called match_sdata_to_table(), merged here

#627 and with a work-in-progress PR called for a new API called filter_table_by_query(), discussed here #894. Also, squidpy will offer APIs similar to the scanpy ones. The implementation for the moment will be separate because first we need to enable the join APIs (used in the functions above), to return a view and not always a copy. This is being worked out in this spatialdata PR here #701. The squidpy PR is this one here: scverse/squidpy#967

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants