Skip to content

allow filtering by ids#627

Merged
LucaMarconato merged 14 commits intoscverse:mainfrom
melonora:filter
Feb 23, 2025
Merged

allow filtering by ids#627
LucaMarconato merged 14 commits intoscverse:mainfrom
melonora:filter

Conversation

@melonora
Copy link
Collaborator

@melonora melonora commented Jul 9, 2024

closes #556

This PR allows for filtering elements (points, shapes, tables) on instance IDs. For labels I decided just as with the joins to not support it as it is potentially an expensive operation, where I am not certain what the usecase would be (one can adjust the color to a background color in any case for particular labels or even set it transparent).

The PR is related to #626, however I think it is better to separate out filtering on instance IDs. The function here can ultimately still be used in his function.

The PR will not cause an error in case of filtering on instance IDs that are not there. Furthermore it is not a problem to filter on instances in a table that is annotating more than 1 element. If instances for both annotated elements need to be filtered in the table this will be retrieved. If region_names is defined, a view of the table with rows annotating elements in region_names is first created before filtering on instance IDs.

After filtering of the table, the metadata for annotation of regions is checked and updated after which the table is validated.

Changelog

  • added new API match_sdata_to_table() which allows for complex subsetting of SpatialData objects
  • changed SpatialData.get_annotated_regions() to compute on-the-fly the regions annotated by a table, instead of returning `table.uns['spatialdata_attrs']['region']
  • added overwrite_metadata parameter to TableModel.parse() to conveniently replace region, region_key, instance_key metadata.

Update Luca (as shown with the code in the tests):
closes #280
closes #284

@codecov
Copy link

codecov bot commented Jul 9, 2024

Codecov Report

Attention: Patch coverage is 93.93939% with 2 lines in your changes missing coverage. Please review.

Project coverage is 92.11%. Comparing base (5e2b1a4) to head (f521f1f).
Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
src/spatialdata/models/models.py 90.90% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #627      +/-   ##
==========================================
+ Coverage   92.00%   92.11%   +0.10%     
==========================================
  Files          48       48              
  Lines        7408     7429      +21     
==========================================
+ Hits         6816     6843      +27     
+ Misses        592      586       -6     
Files with missing lines Coverage Δ
src/spatialdata/__init__.py 96.42% <ø> (ø)
src/spatialdata/_core/query/relational_query.py 91.14% <100.00%> (+0.15%) ⬆️
src/spatialdata/_core/spatialdata.py 91.46% <100.00%> (+0.20%) ⬆️
src/spatialdata/models/models.py 88.34% <90.90%> (+1.09%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This was referenced Jul 9, 2024
@LucaMarconato LucaMarconato marked this pull request as draft July 12, 2024 17:52
@LucaMarconato
Copy link
Member

LucaMarconato commented Jul 12, 2024

@melonora converting to draft, what's missing for the moment, just

  • the tests?

@LucaMarconato
Copy link
Member

LucaMarconato commented Jul 12, 2024

  • ah also the function should be private so it can be used internally by a single "entry point" filter(). made the function public and removed filter().

@LucaMarconato
Copy link
Member

Hi, I reviewed the PR. First please check my comment in this linked PR: #626 (comment).

In this PR, I refined the function described in the comment linked above, into a new public API match_sdata_to_table(). As shown in the tests, all the use cases of the linked PR are supported. The use cases of filtering by instance_ids from points/shapes are supported by the functions match_table_to_element() (or by the join operations).

Since the purpose of the current PR was to add a private function that could support the filter() method of the linked PR, and since now the new public function allows for this. I removed the filter() method.

This is supported also by the following reasons:

  • the filter() function was supposed to be private, but now we do not need such private function
  • the method introduces code redundancy that can be avoided by using the join APIs (this is how I implemented match_sdata_to_table()

Finally, since two comments that I had from the review of the filter() method (even if not relevant anymore if the method is removed).

  • the points were not filtered by the index, but by the instance_key column, whose role is to match the transcript to a cell index to which the transcript below to. I had fixed this in 122aa5f (#627). Note that the semantic of instance_key column for points is not well defined as reported here Bug with INSTANCE_KEY for points? #217.
  • the old API (with the parameter element_names) did not allow to specify particular instances for particular elements. For instance given two segmentation masks (shapes) "shapes0" and "shapes1" annotated by a table. It was not possible to select the cells [0, 1] for "shapes0", but only the cell [0] for "shapes1". The new implementation allows for these stratified types of filters.

@LucaMarconato LucaMarconato marked this pull request as ready for review February 2, 2025 16:52
@LucaMarconato
Copy link
Member

LucaMarconato commented Feb 2, 2025

Ready for review! I tag both @melonora and @aeisenbarth please.

In this PR I also:

  • improved the TableModel.parse() function.
    • Some lines were never reached from the tests, as codecov shows. I fixed this
    • I added a new parameter overwrite_metadata: bool that can be used to conveniently replace region, region_key, instance_key. I use this match_sdata_to_table().
  • I rewrote the method SpatialData.get_annotated_regions(), which I use in match_sdata_to_table(), to compute the regions that are actually annotated by the table instead of simply returning the region metadata. This is a first step in the direction of dropping region, as explained here Drop region (region_key and instance_key are enough) #629.

@LucaMarconato LucaMarconato enabled auto-merge (squash) February 23, 2025 23:04
@LucaMarconato LucaMarconato merged commit 38e73ba into scverse:main Feb 23, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Subset spatialdata by list of cell ids Feature request: spatial cropping from select table rows Filter spatialData

2 participants