Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xenium Ranger 3.0.1.1 after Cellpose resegmentation - "ValueError: A linearring requires at least 4 coordinates." #245

Open
Lem-P opened this issue Dec 2, 2024 · 15 comments · May be fixed by #283

Comments

@Lem-P
Copy link

Lem-P commented Dec 2, 2024

After segmentation with cellpose, I ran Xenium Ranger 3.0.1.1
xeniumranger import-segmentation --id=my_ID \ --xenium-bundle=$DIR/outs \ --nuclei=$DIR/outs/mask_nuclei.ome_cp_masks.tif \ --cells=$DIR/outs/mask_cells.ome_cp_masks.tif

When I try to read the output folder with spatialdata_io (0.1.7.dev5+g82ff327), spatialdata (0.2.7.dev7+gd746485)
sdata = spatialdata_io.xenium(dir)
I get this error:
"ValueError: A linearring requires at least 4 coordinates."

@Lem-P
Copy link
Author

Lem-P commented Dec 4, 2024

Is there a temporary workaround to manually create the SpatialData data framework from Xenium ranger output? (I mean a tutorial/notebook/...)

@Lem-P
Copy link
Author

Lem-P commented Dec 16, 2024

Nobody have an idea about how to resolve this issue?
Does anybody knows how to download the previous working version of Xenium Ranger (v3.0.0.0)?
Or any tutorial to manually create the spatial data object from Xenium Ranger output?

@Lem-P
Copy link
Author

Lem-P commented Dec 20, 2024

After updating to Xenium Ranger v3.1, still not working.
Nobody has any idea how to resolve this issue?

`Traceback (most recent call last):
File "./Read_Write_zarr.py", line 11, in sdata = spatialdata_io.xenium(dir)

File "./lib/python3.10/site-packages/spatialdata_io/_utils.py", line 46, in wrapper return f(*args, **kwargs)

File "./lib/python3.10/site-packages/spatialdata_io/readers/xenium.py", line 232, in xenium polygons["cell_boundaries"] = _get_polygons(

File "./lib/python3.10/site-packages/spatialdata_io/readers/xenium.py", line 349, in _get_polygons out = Parallel(n_jobs=n_jobs)(

File "./lib/python3.10/site-packages/joblib/parallel.py", line 1918, in call return output if self.return_generator else list(output)

File "./lib/python3.10/site-packages/joblib/parallel.py", line 1847, in _get_sequential_output res = func(*args, **kwargs)

File "./lib/python3.10/site-packages/spatialdata_io/readers/xenium.py", line 341, in _poly return Polygon(arr[:-1])

File "./lib/python3.10/site-packages/shapely/geometry/polygon.py", line 230, in new shell = LinearRing(shell)

File "./lib/python3.10/site-packages/shapely/geometry/polygon.py", line 104, in new geom = shapely.linearrings(coordinates)

File "./lib/python3.10/site-packages/shapely/decorators.py", line 77, in wrapper return func(*args, **kwargs)

File "./lib/python3.10/site-packages/shapely/creation.py", line 173, in linearrings return lib.linearrings(coords, out=out, **kwargs)

ValueError: A linearring requires at least 4 coordinates.`

@ConstensouxAlexis
Copy link

ConstensouxAlexis commented Jan 2, 2025

Hi,
You should inspect your cellpose segmentation and make sure you do not have polygons with less than 4 vertices.
If you want to load this Cellpose segmentation anyway, you can remove all cell polygons with less than 4 vertices:
(Usually this represent a very low number of cells, ~ 0.1%)

cell_boundaries = pd.read_parquet("./cell_boundaries.parquet")
cells_to_remove = cell_boundaries.groupby("cell_id").filter(lambda x: len(x) < 4).cell_id.unique()
cell_boundaries = cell_boundaries[~cell_boundaries.cell_id.isin(cells_to_remove)]
cell_boundaries.to_parquet("./cell_boundaries.parquet")

If you do you, you should certainly also modify the transcripts file and the count matrix; be careful not to modify your raw data, it can be a bit dangerous to do this !

@LucaMarconato
Copy link
Member

Thanks @Lem-P for reporting and thanks @ConstensouxAlexis for sharing a workaround. @Lem-P does this solve your issue?

If not, I'll be happy to have a look into this, but please I ask you to share a reproducible script running on some public data to be making it easier to inspect this. Thank you.

@ConstensouxAlexis
Copy link

ConstensouxAlexis commented Jan 6, 2025

I think this is issue is caused by Xeniumranger import-segmentation and not by spatialdata. When I run xeniumranger with a custom segmentation (Baysor or CellPose), even if I make sure all my segmented cells can be casted to Polygon and have more than 4 vertices, Xeniumranger will create some cells with less than 4 vertices. I haven't figured out yet what could be the problem !

@Lem-P
Copy link
Author

Lem-P commented Feb 25, 2025

Hi,
I am back on this project! I have indeed polygons with less than 4 vertices in the cell_boundaries.parquet file and could filter them out of the cell boundaries file.
Should I then use the cell ID from cells_to_remove to filter out those ID from the 'transcripts.parquet' file? What about the 'cells.parquet' and 'nucleus_boundaries.parquet' files?
Also, how do I modify 'cell_feature_matrix.h5'? (Sorry for the noob question)

@Lem-P
Copy link
Author

Lem-P commented Feb 25, 2025

In line with @ConstensouxAlexis last comment, as it seems it is Xenium Ranger that introduces these problematic polygons, is there a (easy) way to generate the input files for SpatialData-io without using Xenium Ranger?

@ConstensouxAlexis
Copy link

I think that it's fine just to remove problematic polygons; this will only impact the vizualisation and not the analysis

@Lem-P
Copy link
Author

Lem-P commented Feb 25, 2025

Thank you, removing the problematic polygons only in the cell_boundaries.parquet file allowed me to make a spatialData object.
I was afraid that a discrepancy in the cell ID between files would generate issues

@timtreis timtreis linked a pull request Feb 27, 2025 that will close this issue
@timtreis
Copy link
Member

Hey @Lem-P and @ConstensouxAlexis, I linked a PR to this issue which should fix the issue. Could you verify with your data?

@LucaMarconato I'm filtering out the polygons and remove the IDs from the table - am I missing something?

@ConstensouxAlexis
Copy link

Hello @timtreis, thank you for the PR, I will verify with my data. Just to let you know, this polygon issue is caused by xeniumranger software, and they are currently working on this issue: kharchenkolab/Baysor#153 (comment).

Also, the issue with those polygons is only visualization; the associated expression vector in the table is valid, so maybe you shouldn't remove it from the table ? I am not sure what would be the best fix

@timtreis
Copy link
Member

timtreis commented Mar 4, 2025

Hm, thanks for the extra info! I think given that we are seeing datasets with this issue in the wild now, we know they exist and will cause errors for some users who processed their data with the faulty model. I think my gut feeling would be to leave the error handling and filtering in, but to modify the warning telling the users to reprocess their data with one of the newer versions 🤔

@LoganAMorrison
Copy link

We released a new version of XeniumRanger (3.1.1) that fixes the number of polygon vertices error

@timtreis
Copy link
Member

Great to hear @LoganAMorrison! However, I think we should still merge that PR since there might be datasets out there generated with the faulty versions. The current implementations tells users what has been filtered, so it's also not sth sneaky in the background. Wdyt @LucaMarconato?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants