Skip to content

Conversation

danlu1
Copy link
Contributor

@danlu1 danlu1 commented Oct 3, 2025

Problem:

At present, delete_rows accepts only a table query to provide ROW_ID and ROW_VERSION values for deletion. This will be extended to also accept a dataframe, replicating the deprecated function’s ability to specify rows through a row_id_vers_list.

Solution:

  1. Add a new param to delete_rows so rows can be deleted with a filtered dataframe.
  2. Tutorial has been updated to reflect this change.

Testing:

Unit test and integration test are added and they all passed.

@danlu1 danlu1 requested a review from a team as a code owner October 3, 2025 00:48
Copy link
Contributor

@jaymedina jaymedina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far! 🔥

I did an initial pass-through and was wondering: What happens when a user feed a dataframe with row_ids/versions/etags that don't exist in that table? what does the API spit back and is that error-handling enough?

@danlu1 danlu1 changed the title [GEN-1667] [GEN-1667] Delete Table Rows Using Filtered DataFrame Oct 3, 2025
@danlu1
Copy link
Contributor Author

danlu1 commented Oct 3, 2025

@jaymedina Thanks for the feedback, I will address them accordingly.

"The dataframe must contain the 'ROW_ID' and 'ROW_VERSION' columns."
)

if self.__class__.__name__ in CLASSES_THAT_CONTAIN_ROW_ETAG:
Copy link
Contributor Author

@danlu1 danlu1 Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BryanFauble When I add test case for the below ValueError, I got the AttributeError: 'Dataset' object has no attribute 'delete_rows_async' for my test case of Dataset. As this line, I believe we originally intent to use delete_row in the Dataset, EntityView, DatasetCollection, and SubmissionView but I didn't find the function in any of the models. Have we not had the chance to add this function yet?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these other items are special @danlu1 . You can't actually delete rows of those Entity types, the reason is because there is a separate items attribute on the entity itself that dictates what should be present on the table:

https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/Dataset.html

If you notice that:

class Dataset(
    DatasetSynchronousProtocol,
    AccessControllable,
    ViewBase,
    ViewStoreMixin,
    DeleteMixin,
    ColumnMixin,
    GetMixin,
    QueryMixin,
    ViewUpdateMixin,
    ViewSnapshotMixin,
):

Doesn't contain the TableDeleteRowMixin which gives this ability to the class

@BryanFauble BryanFauble self-requested a review October 6, 2025 15:57
elif df is not None:
rows_to_delete = df
client.logger.info(
f"Found {len(rows_to_delete)} rows to delete for given dataframe."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"Found {len(rows_to_delete)} rows to delete for given dataframe."
f"Received {len(rows_to_delete)} rows to delete for given dataframe."

Comment on lines +4157 to +4163
if (
"ROW_ID" not in rows_to_delete.columns
or "ROW_VERSION" not in rows_to_delete.columns
):
raise ValueError(
"The dataframe must contain the 'ROW_ID' and 'ROW_VERSION' columns."
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be put before the log message is printed so the validation occurs first?

* Deleting specific rows - Query for the rows you want to delete and call syn.delete on the results

```python
table.delete_rows(query=f"SELECT * FROM {table.id} WHERE Strand = '+'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few lines up there is a part of this markdown that needs to be updated: Deleting specific rows - Query for the rows you want to delete and call syn.delete on the results

Could you update this please?

Copy link
Member

@BryanFauble BryanFauble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No major complaints from me. Thanks for making these changes!

Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 awesome work and great review. I'll defer to Bryan for final review once this is done.

Copy link
Contributor

@linglp linglp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @danlu1 ! I tested the examples, and they worked for the most part. However, when I tried deleting a row with wrong version numbers, I expected an error but the request actually went through. Here's an example:

async def main():
    results = await query_async(query="SELECT * FROM syn63301937")
    print(results)

Here I saw:

Downloading files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 576/576 [00:00<00:00, 1.38kB/s, syn63301937]
   ROW_ID  ROW_VERSION     Component                                           Filename                                    Id     entityId
0      12            6  MockFilename    schematic - main/MockFilenameComponent/txt3.txt  6b27d97b-0344-4499-8f08-8535ad72b802  syn61682653
1      13            6  MockFilename  schematic - main/MockFilenameComponent/this_fi...  7185f1ee-00bb-42d4-9051-1b61166fde0f  syn61682653
2      14            6  MockFilename    schematic - main/MockFilenameComponent/txt4.txt  9ce08bef-35a0-40d5-8b06-96c6071b1609   syn6168265
3      15            6  MockFilename    schematic - main/MockFilenameComponent/txt6.txt  07973efb-7d36-475a-b29f-0aa470c722b5          NaN

As you can see, the row version number is 6. But when I did:

df = pd.DataFrame({"ROW_ID": [10, 11], "ROW_VERSION": [100, 100]})

async def main():
    await Table(id="syn63301937").delete_rows_async(df=df)
asyncio.run(main())

I expected the request to fail because row version 100 doesn't exist. But the request appeared to go through:

(synapsePythonClient) lpeng@Mac synapsePythonClient % python3 /Users/lpeng/code/synapsePythonClient/test.py
Welcome, Lingling Peng! You are using the 'default' profile.
Found 2 rows to delete for given dataframe.
Uploading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 33.0/33.0 [00:00<00:00, 68.8B/s, syn63301937_upload_636feaba-4b89-4d20-961c-034866078060.csv]
/entity/syn63301937/table/transaction/async: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.00/1.00 [00:01<00:00, 1.21s/it]

df = pd.DataFrame({"ROW_ID": [1, 2], "ROW_VERSION": [1, 1]})
async def main():
await Table(id="syn1234").delete_rows_async(df)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danlu1 I got an error when running the example, here it should be:

df = pd.DataFrame({"ROW_ID": [1, 2], "ROW_VERSION": [1, 1]})
            async def main():
                await Table(id="syn1234").delete_rows_async(df=df)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Yes, the param should be specified. I will update.

@danlu1
Copy link
Contributor Author

danlu1 commented Oct 7, 2025

@linglp Thanks for testing the code. Can you do me a favor and confirm the actual row with wrong row_version were deleted? The "Found 2 rows to delete for given dataframe." printed before the row deletion so I'd like to learn the behavior of the code. Appreciate your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants