Skip to content

Conversation

BryanFauble
Copy link
Member

@BryanFauble BryanFauble commented Sep 10, 2025

View the API docs here:

https://synapsepythonclient--1243.org.readthedocs.build/en/1243/reference/experimental/sync/factory_operations/
https://synapsepythonclient--1243.org.readthedocs.build/en/1243/reference/experimental/sync/link_entity/

async:

https://synapsepythonclient--1243.org.readthedocs.build/en/1243/reference/experimental/async/factory_operations/
https://synapsepythonclient--1243.org.readthedocs.build/en/1243/reference/experimental/async/link_entity/

Problem:

  • In order to use the current OOP object models you must know ahead of time what the class is for the Synapse ID that you have is. Sometimes, but not always, is this possible.
  • A generic get factory method will better support R.
  • The link entity was the last synapse ID type entity to be supported, and had some implications in the generic get function.

Solution:

  • Create a get/get_async function that serves as the entry point for searching for an entity by its id, or name+parent and return the correct class.
  • If the class is not yet implemented in the client it will return back an entity bundle
  • The link entity was created and with some get/store functionality as well as following the link to the target entity

Testing:

  • I did a bunch of manual testing to verify the logic would get entities I have stored in a test project across a variety of types
  • Integration tests to verify the logic across the 11 different entities fully supported.

@BryanFauble BryanFauble requested a review from a team as a code owner September 10, 2025 22:32
Comment on lines 26 to 183
@dataclass
class FileOptions:
"""
Configuration options specific to File entities when using the factory methods.
This dataclass allows you to customize how File entities are handled during
retrieval, including download behavior, file location, and collision handling.
Attributes:
download_file: Whether to automatically download the file content when
retrieving the File entity. If True, the file will be downloaded to
the local filesystem. If False, only the metadata will be retrieved.
Default is True.
download_location: The local directory path where the file should be
downloaded. If None, the file will be downloaded to the default Synapse
cache location. If specified,
must be a valid directory path. Default is None.
if_collision: Strategy to use when a file with the same name already
exists at the download location. Valid options are:
- "keep.both": Keep both files by appending a number to the new file
- "overwrite.local": Overwrite the existing local file
- "keep.local": Keep the existing local file and skip download
Default is "keep.both".
Example:
Configure file download options:
```python
from synapseclient.models import FileOptions
# Download file to specific location with overwrite
file_options = FileOptions(
download_file=True,
download_location="/path/to/downloads/",
if_collision="overwrite.local"
)
# Only retrieve metadata, don't download file content
metadata_only = FileOptions(download_file=False)
```
"""

download_file: bool = True
download_location: Optional[str] = None
if_collision: str = "keep.both"


@dataclass
class ActivityOptions:
"""
Configuration options for entities that support activity/provenance tracking.
This dataclass controls whether activity information (provenance data) should
be included when retrieving entities. Activity information tracks the computational
steps, data sources, and relationships that led to the creation of an entity.
Attributes:
include_activity: Whether to include activity/provenance information when
retrieving the entity. If True, the returned entity will have its
activity attribute populated with provenance data (if available).
If False, the activity attribute will be None. Including activity
may result in additional API calls and slower retrieval times.
Default is False.
Example:
Configure activity inclusion:
```python
from synapseclient.models import ActivityOptions
# Include activity information
with_activity = ActivityOptions(include_activity=True)
# Skip activity information (faster retrieval)
without_activity = ActivityOptions(include_activity=False)
```
Note:
Activity information is only available for entities that support provenance
tracking (File, Table, Dataset, etc...). For other entity
types, this option is ignored.
"""

include_activity: bool = False


@dataclass
class TableOptions:
"""
Configuration options for table-like entities when using the factory methods.
This dataclass controls how table-like entities (Table, Dataset, EntityView,
MaterializedView, SubmissionView, VirtualTable, and DatasetCollection) are
retrieved, particularly whether column schema information should be included.
Attributes:
include_columns: Whether to include column schema information when
retrieving table-like entities. If True, the returned entity will
have its columns attribute populated with Column objects containing
schema information (name, column_type, etc.). If False, the columns
attribute will be an empty dict. Including columns may result in
additional API calls but provides complete table structure information.
Default is True.
Example:
Configure table column inclusion:
```python
from synapseclient.models import TableOptions
# Include column schema information
with_columns = TableOptions(include_columns=True)
# Skip column information (faster retrieval)
without_columns = TableOptions(include_columns=False)
```
"""

include_columns: bool = True


@dataclass
class LinkOptions:
"""
Configuration options specific to Link entities when using the factory methods.
This dataclass controls how Link entities are handled during retrieval,
particularly whether the link should be followed to return the target entity
or if the Link entity itself should be returned.
Attributes:
follow_link: Whether to follow the link and return the target entity
instead of the Link entity itself. If True, the factory method will
return the entity that the Link points to (e.g., if a Link points
to a File, a File object will be returned). If False, the Link
entity itself will be returned, allowing you to inspect the link's
properties such as target_id, target_version, etc. Default is False.
Example:
Configure link following behavior:
```python
from synapseclient.models import LinkOptions
# Follow the link and return the target entity
follow_target = LinkOptions(follow_link=True)
# Return the Link entity itself
return_link = LinkOptions(follow_link=False)
```
Note:
- When follow_link=True, the returned entity type depends on what the
Link points to (could be File, Project, Folder, etc.)
- When follow_link=False, a Link entity is always returned
"""

follow_link: bool = False
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage of these options is up for debate if this is how we want to implement them. The thought behind this is that if we add more options that fall under these buckets it won't change the signature of the generic get function, it remains relatively stable.

Additionally, it helps to group "like" functionality like the File settings you can change.

Copy link
Member

@thomasyu888 thomasyu888 Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to consider is that these options seem specific to the get factory operation. We may have different options for storing and etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point that in it's current implementation these are for the get operation only, what we could do is reuse these classes during the store/delete operation as well, but note in the docstrings of the Options class what will be used during get/store/delete, that way it's consistent how we are defining the behavior of the operation.

Something to note here if we look at the new File class for example.

When you store the file all the arguments that you would use in the Synapse class store are actually attributes on the File class instance:

    # To be deprecated store method:
    def store(
        self,
        obj,
        *,
        createOrUpdate=True,
        forceVersion=True,
        versionLabel=None,
        isRestricted=False,
        activity=None,
        used=None,
        executed=None,
        activityName=None,
        activityDescription=None,
        set_annotations=True,
    ):

Related attributes on the File class:

    activity: Optional[Activity] = field(default=None, compare=False)
    """The Activity model represents the main record of Provenance in Synapse.  It is
    analygous to the Activity defined in the
    [W3C Specification](https://www.w3.org/TR/prov-n/) on Provenance. Activity cannot
    be removed during a store operation by setting it to None. You must use:
    [synapseclient.models.Activity.delete_async][] or
    [synapseclient.models.Activity.disassociate_from_entity_async][].
    """

    annotations: Optional[
        Dict[
            str,
            Union[
                List[str],
                List[bool],
                List[float],
                List[int],
                List[date],
                List[datetime],
            ],
        ]
    ] = field(default_factory=dict, compare=False)
    """Additional metadata associated with the folder. The key is the name of your
    desired annotations. The value is an object containing a list of values
    (use empty list to represent no values for key) and the value type associated with
    all values in the list. To remove all annotations set this to an empty dict `{}`."""

    create_or_update: bool = field(default=True, repr=False, compare=False)
    """
    (Store only)

    Indicates whether the method should automatically perform an update if the file
    conflicts with an existing Synapse object.
    """

    force_version: bool = field(default=True, repr=False, compare=False)
    """
    (Store only)

    Indicates whether the method should increment the version of the object if something
    within the entity has changed. For example updating the description or name.
    You may set this to False and an update to the entity will not increment the
    version.

    Updating the `version_label` attribute will also cause a version update regardless
    of this flag.

    An update to the MD5 of the file will force a version update regardless of this
    flag.
    """

    is_restricted: bool = field(default=False, repr=False)
    """
    (Store only)

    If set to true, an email will be sent to the Synapse access control team to start
    the process of adding terms-of-use or review board approval for this entity.
    You will be contacted with regards to the specific data being restricted and the
    requirements of access.

    This may be used only by an administrator of the specified file.
    """

So rather than these be arguments of the new store factory operation, it's a part of the data model of the class that you pass into the function. Depending on if you fill that data in will it determine the behavior.

Comment on lines 1189 to 1209
from synapseclient import Synapse

client = Synapse.get_client(synapse_client=synapse_client)
client.logger.warning(
"Unknown entity type: %s. Falling back to returning %s as a dictionary bundle matching "
"https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/entitybundle/v2/EntityBundle.html",
entity_type,
synapse_id,
)

# This allows the function to handle new entity types that may be added in the future
if version_number is not None:
return await get_entity_id_version_bundle2(
entity_id=synapse_id,
version=version_number,
synapse_client=synapse_client,
)
else:
return await get_entity_id_bundle2(
entity_id=synapse_id, synapse_client=synapse_client
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fallback behavior so when more entity types get added to Synapse this function will still return something that folks can use.

Comment on lines 687 to 690
activity_options: Optional[ActivityOptions] = None,
file_options: Optional[FileOptions] = None,
table_options: Optional[TableOptions] = None,
link_options: Optional[LinkOptions] = None,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what these option arguments look like. In the case of FileOptions you'd create an instance of it, change what you want to change and then pass it into the get/get_async function call.

Comment on lines -333 to -340
entity = await loop.run_in_executor(
None,
lambda: Synapse.get_client(synapse_client=synapse_client).store(
obj=synapse_folder,
set_annotations=False,
isRestricted=self.is_restricted,
createOrUpdate=False,
),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This run_in_executor code is stuff I am trying to get rid it since it was the original hack used to wrap the original Synapse client.


@dataclass()
@async_to_sync
class Link:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an implementation of the Link entity. This is used to create a pointer to another Synapse entity. In the Web UI it looks like:

image

- For `csv`, you can specify:
- `quote_character`: Character used for quoting fields. Default is double quote (").
- `escape_character`: Character used for escaping special characters. Default is backslash (\).
- `escape_character`: Character used for escaping special characters. Default is backslash.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was seeing a warning about this since it's a special escape character, so I just removed it for now

@BryanFauble BryanFauble requested a review from Copilot September 10, 2025 22:46
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a factory function for getting and creating entities in the Synapse Python client, with the addition of the Link entity as the last entity type to be supported. The factory function provides a unified interface for retrieving any Synapse entity by ID or name without knowing the specific entity type beforehand.

  • Factory operations for generic entity retrieval with get() and get_async() functions
  • Addition of Link entity support with creation, storage, and retrieval capabilities
  • Cleanup of the wrap_async_to_sync function signature by removing the redundant syn parameter

Reviewed Changes

Copilot reviewed 24 out of 25 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/integration/synapseclient/models/synchronous/test_factory_operations.py Comprehensive integration tests for the synchronous factory operations across all entity types
tests/integration/synapseclient/models/async/test_factory_operations.py Comprehensive integration tests for the asynchronous factory operations across all entity types
synapseclient/models/factory_operations.py Implementation of the factory functions and options classes for entity retrieval
synapseclient/models/link.py New Link entity model with full CRUD operations
synapseclient/models/init.py Exports for factory operations and Link entity
Various other files Updates to support Link entity and remove syn parameter from wrap_async_to_sync

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

```python
from synapseclient import Synapse
from synapseclient.models import get
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For discussion: I had always thought that models should contain the classes and the "services" and/or factory methods could live elsewhere. That said, I'm not opposed to this, just wanted to be mindful of the "models" module growing a lot

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent point. I like the idea of splitting this out into a different place, right now we have:

  1. synapseclient/api
  2. synapseclient/models

that new code is actively being developed in.

We have the opportunity here to use:

  1. synapseclient/services (Already exists, and just contains the old json_schema.py module)
  2. synapseclient/operations
  3. synapseclient/factory

@Sage-Bionetworks/synapse-python-developers It would be nice to get your thoughts here

Copy link
Member

@thomasyu888 thomasyu888 Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be another team discussion - I added to the agenda

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this to synapseclient/operations

@thomasyu888 thomasyu888 requested a review from a team September 10, 2025 23:35
Comment on lines 187 to 191
file_options = FileOptions(
download_file=True,
download_location=temp_dir,
if_collision="overwrite.local",
)
Copy link
Member

@thomasyu888 thomasyu888 Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for discussion. I like this and what was your rationale in having these as one class, instead of parameters in get?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is that all of the related options are all "together" in a single spot that folks can see "These are the flags I can specify that are specific to getting a file", rather than it being a list in the get factory function.

What this will mean is that we can add more options in the future to this Options class as well without overloading the argument section of the get function itself.

Column(name="id", column_type=ColumnType.ENTITYID),
Column(name="name", column_type=ColumnType.STRING, maximum_size=256),
]
entity_view = EntityView(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there is a need to test the get of a entity view with different snapshot versions?

Copy link
Contributor

@SageGJ SageGJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote would be to default to follow_link=True and to default to downloading so that naive users can access the original files without needing to perform extra steps and developers can set follow_link=False if they need to work on the link itself and make updates

- Introduced LinkSynchronousProtocol to define synchronous interface for Link operations.
- Updated Link class to implement LinkSynchronousProtocol, adding get and store methods with detailed docstrings.
- Modified get_async method in Link class to include file_options parameter for enhanced file retrieval options.
- Updated storable_entity_components to include DatasetCollection and MaterializedView in type hints.
- Added comprehensive tests for Link retrieval, ensuring default behavior follows links and supports custom file options.
@BryanFauble BryanFauble requested a review from a team September 19, 2025 17:50
Copy link
Contributor

@linglp linglp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your hard work, Bryan! I tested getting and storing links, and they both worked for me overall. The only issue I ran into was that if a link has already been created, the step of running merge_dataclass_entities throws an error.

@BryanFauble BryanFauble requested a review from linglp September 30, 2025 20:24
Copy link
Contributor

@linglp linglp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me overall! Just two small comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants