-
Notifications
You must be signed in to change notification settings - Fork 72
[SYNPY-1623] Factory function for get and creation of link entity #1243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
@dataclass | ||
class FileOptions: | ||
""" | ||
Configuration options specific to File entities when using the factory methods. | ||
This dataclass allows you to customize how File entities are handled during | ||
retrieval, including download behavior, file location, and collision handling. | ||
Attributes: | ||
download_file: Whether to automatically download the file content when | ||
retrieving the File entity. If True, the file will be downloaded to | ||
the local filesystem. If False, only the metadata will be retrieved. | ||
Default is True. | ||
download_location: The local directory path where the file should be | ||
downloaded. If None, the file will be downloaded to the default Synapse | ||
cache location. If specified, | ||
must be a valid directory path. Default is None. | ||
if_collision: Strategy to use when a file with the same name already | ||
exists at the download location. Valid options are: | ||
- "keep.both": Keep both files by appending a number to the new file | ||
- "overwrite.local": Overwrite the existing local file | ||
- "keep.local": Keep the existing local file and skip download | ||
Default is "keep.both". | ||
Example: | ||
Configure file download options: | ||
```python | ||
from synapseclient.models import FileOptions | ||
# Download file to specific location with overwrite | ||
file_options = FileOptions( | ||
download_file=True, | ||
download_location="/path/to/downloads/", | ||
if_collision="overwrite.local" | ||
) | ||
# Only retrieve metadata, don't download file content | ||
metadata_only = FileOptions(download_file=False) | ||
``` | ||
""" | ||
|
||
download_file: bool = True | ||
download_location: Optional[str] = None | ||
if_collision: str = "keep.both" | ||
|
||
|
||
@dataclass | ||
class ActivityOptions: | ||
""" | ||
Configuration options for entities that support activity/provenance tracking. | ||
This dataclass controls whether activity information (provenance data) should | ||
be included when retrieving entities. Activity information tracks the computational | ||
steps, data sources, and relationships that led to the creation of an entity. | ||
Attributes: | ||
include_activity: Whether to include activity/provenance information when | ||
retrieving the entity. If True, the returned entity will have its | ||
activity attribute populated with provenance data (if available). | ||
If False, the activity attribute will be None. Including activity | ||
may result in additional API calls and slower retrieval times. | ||
Default is False. | ||
Example: | ||
Configure activity inclusion: | ||
```python | ||
from synapseclient.models import ActivityOptions | ||
# Include activity information | ||
with_activity = ActivityOptions(include_activity=True) | ||
# Skip activity information (faster retrieval) | ||
without_activity = ActivityOptions(include_activity=False) | ||
``` | ||
Note: | ||
Activity information is only available for entities that support provenance | ||
tracking (File, Table, Dataset, etc...). For other entity | ||
types, this option is ignored. | ||
""" | ||
|
||
include_activity: bool = False | ||
|
||
|
||
@dataclass | ||
class TableOptions: | ||
""" | ||
Configuration options for table-like entities when using the factory methods. | ||
This dataclass controls how table-like entities (Table, Dataset, EntityView, | ||
MaterializedView, SubmissionView, VirtualTable, and DatasetCollection) are | ||
retrieved, particularly whether column schema information should be included. | ||
Attributes: | ||
include_columns: Whether to include column schema information when | ||
retrieving table-like entities. If True, the returned entity will | ||
have its columns attribute populated with Column objects containing | ||
schema information (name, column_type, etc.). If False, the columns | ||
attribute will be an empty dict. Including columns may result in | ||
additional API calls but provides complete table structure information. | ||
Default is True. | ||
Example: | ||
Configure table column inclusion: | ||
```python | ||
from synapseclient.models import TableOptions | ||
# Include column schema information | ||
with_columns = TableOptions(include_columns=True) | ||
# Skip column information (faster retrieval) | ||
without_columns = TableOptions(include_columns=False) | ||
``` | ||
""" | ||
|
||
include_columns: bool = True | ||
|
||
|
||
@dataclass | ||
class LinkOptions: | ||
""" | ||
Configuration options specific to Link entities when using the factory methods. | ||
This dataclass controls how Link entities are handled during retrieval, | ||
particularly whether the link should be followed to return the target entity | ||
or if the Link entity itself should be returned. | ||
Attributes: | ||
follow_link: Whether to follow the link and return the target entity | ||
instead of the Link entity itself. If True, the factory method will | ||
return the entity that the Link points to (e.g., if a Link points | ||
to a File, a File object will be returned). If False, the Link | ||
entity itself will be returned, allowing you to inspect the link's | ||
properties such as target_id, target_version, etc. Default is False. | ||
Example: | ||
Configure link following behavior: | ||
```python | ||
from synapseclient.models import LinkOptions | ||
# Follow the link and return the target entity | ||
follow_target = LinkOptions(follow_link=True) | ||
# Return the Link entity itself | ||
return_link = LinkOptions(follow_link=False) | ||
``` | ||
Note: | ||
- When follow_link=True, the returned entity type depends on what the | ||
Link points to (could be File, Project, Folder, etc.) | ||
- When follow_link=False, a Link entity is always returned | ||
""" | ||
|
||
follow_link: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usage of these options is up for debate if this is how we want to implement them. The thought behind this is that if we add more options that fall under these buckets it won't change the signature of the generic get
function, it remains relatively stable.
Additionally, it helps to group "like" functionality like the File settings you can change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to consider is that these options seem specific to the get
factory operation. We may have different options for storing and etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point that in it's current implementation these are for the get
operation only, what we could do is reuse these classes during the store/delete
operation as well, but note in the docstrings of the Options class what will be used during get/store/delete
, that way it's consistent how we are defining the behavior of the operation.
Something to note here if we look at the new File class for example.
When you store the file all the arguments that you would use in the Synapse class store are actually attributes on the File class instance:
# To be deprecated store method:
def store(
self,
obj,
*,
createOrUpdate=True,
forceVersion=True,
versionLabel=None,
isRestricted=False,
activity=None,
used=None,
executed=None,
activityName=None,
activityDescription=None,
set_annotations=True,
):
Related attributes on the File class:
activity: Optional[Activity] = field(default=None, compare=False)
"""The Activity model represents the main record of Provenance in Synapse. It is
analygous to the Activity defined in the
[W3C Specification](https://www.w3.org/TR/prov-n/) on Provenance. Activity cannot
be removed during a store operation by setting it to None. You must use:
[synapseclient.models.Activity.delete_async][] or
[synapseclient.models.Activity.disassociate_from_entity_async][].
"""
annotations: Optional[
Dict[
str,
Union[
List[str],
List[bool],
List[float],
List[int],
List[date],
List[datetime],
],
]
] = field(default_factory=dict, compare=False)
"""Additional metadata associated with the folder. The key is the name of your
desired annotations. The value is an object containing a list of values
(use empty list to represent no values for key) and the value type associated with
all values in the list. To remove all annotations set this to an empty dict `{}`."""
create_or_update: bool = field(default=True, repr=False, compare=False)
"""
(Store only)
Indicates whether the method should automatically perform an update if the file
conflicts with an existing Synapse object.
"""
force_version: bool = field(default=True, repr=False, compare=False)
"""
(Store only)
Indicates whether the method should increment the version of the object if something
within the entity has changed. For example updating the description or name.
You may set this to False and an update to the entity will not increment the
version.
Updating the `version_label` attribute will also cause a version update regardless
of this flag.
An update to the MD5 of the file will force a version update regardless of this
flag.
"""
is_restricted: bool = field(default=False, repr=False)
"""
(Store only)
If set to true, an email will be sent to the Synapse access control team to start
the process of adding terms-of-use or review board approval for this entity.
You will be contacted with regards to the specific data being restricted and the
requirements of access.
This may be used only by an administrator of the specified file.
"""
So rather than these be arguments of the new store
factory operation, it's a part of the data model of the class that you pass into the function. Depending on if you fill that data in will it determine the behavior.
from synapseclient import Synapse | ||
|
||
client = Synapse.get_client(synapse_client=synapse_client) | ||
client.logger.warning( | ||
"Unknown entity type: %s. Falling back to returning %s as a dictionary bundle matching " | ||
"https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/entitybundle/v2/EntityBundle.html", | ||
entity_type, | ||
synapse_id, | ||
) | ||
|
||
# This allows the function to handle new entity types that may be added in the future | ||
if version_number is not None: | ||
return await get_entity_id_version_bundle2( | ||
entity_id=synapse_id, | ||
version=version_number, | ||
synapse_client=synapse_client, | ||
) | ||
else: | ||
return await get_entity_id_bundle2( | ||
entity_id=synapse_id, synapse_client=synapse_client | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fallback behavior so when more entity types get added to Synapse this function will still return something that folks can use.
activity_options: Optional[ActivityOptions] = None, | ||
file_options: Optional[FileOptions] = None, | ||
table_options: Optional[TableOptions] = None, | ||
link_options: Optional[LinkOptions] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what these option arguments look like. In the case of FileOptions you'd create an instance of it, change what you want to change and then pass it into the get/get_async
function call.
entity = await loop.run_in_executor( | ||
None, | ||
lambda: Synapse.get_client(synapse_client=synapse_client).store( | ||
obj=synapse_folder, | ||
set_annotations=False, | ||
isRestricted=self.is_restricted, | ||
createOrUpdate=False, | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This run_in_executor code is stuff I am trying to get rid it since it was the original hack used to wrap the original Synapse client.
synapseclient/models/link.py
Outdated
|
||
@dataclass() | ||
@async_to_sync | ||
class Link: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- For `csv`, you can specify: | ||
- `quote_character`: Character used for quoting fields. Default is double quote ("). | ||
- `escape_character`: Character used for escaping special characters. Default is backslash (\). | ||
- `escape_character`: Character used for escaping special characters. Default is backslash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was seeing a warning about this since it's a special escape character, so I just removed it for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a factory function for getting and creating entities in the Synapse Python client, with the addition of the Link entity as the last entity type to be supported. The factory function provides a unified interface for retrieving any Synapse entity by ID or name without knowing the specific entity type beforehand.
- Factory operations for generic entity retrieval with
get()
andget_async()
functions - Addition of Link entity support with creation, storage, and retrieval capabilities
- Cleanup of the
wrap_async_to_sync
function signature by removing the redundantsyn
parameter
Reviewed Changes
Copilot reviewed 24 out of 25 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
tests/integration/synapseclient/models/synchronous/test_factory_operations.py | Comprehensive integration tests for the synchronous factory operations across all entity types |
tests/integration/synapseclient/models/async/test_factory_operations.py | Comprehensive integration tests for the asynchronous factory operations across all entity types |
synapseclient/models/factory_operations.py | Implementation of the factory functions and options classes for entity retrieval |
synapseclient/models/link.py | New Link entity model with full CRUD operations |
synapseclient/models/init.py | Exports for factory operations and Link entity |
Various other files | Updates to support Link entity and remove syn parameter from wrap_async_to_sync |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
```python | ||
from synapseclient import Synapse | ||
from synapseclient.models import get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For discussion: I had always thought that models should contain the classes and the "services" and/or factory methods could live elsewhere. That said, I'm not opposed to this, just wanted to be mindful of the "models" module growing a lot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent point. I like the idea of splitting this out into a different place, right now we have:
synapseclient/api
synapseclient/models
that new code is actively being developed in.
We have the opportunity here to use:
synapseclient/services
(Already exists, and just contains the oldjson_schema.py
module)synapseclient/operations
synapseclient/factory
@Sage-Bionetworks/synapse-python-developers It would be nice to get your thoughts here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be another team discussion - I added to the agenda
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this to synapseclient/operations
file_options = FileOptions( | ||
download_file=True, | ||
download_location=temp_dir, | ||
if_collision="overwrite.local", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for discussion. I like this and what was your rationale in having these as one class, instead of parameters in get?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here is that all of the related options are all "together" in a single spot that folks can see "These are the flags I can specify that are specific to getting a file", rather than it being a list in the get factory function.
What this will mean is that we can add more options in the future to this Options class as well without overloading the argument section of the get function itself.
Column(name="id", column_type=ColumnType.ENTITYID), | ||
Column(name="name", column_type=ColumnType.STRING, maximum_size=256), | ||
] | ||
entity_view = EntityView( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think there is a need to test the get
of a entity view with different snapshot versions?
…ttribute existence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My vote would be to default to follow_link=True
and to default to downloading so that naive users can access the original files without needing to perform extra steps and developers can set follow_link=False
if they need to work on the link
itself and make updates
- Introduced LinkSynchronousProtocol to define synchronous interface for Link operations. - Updated Link class to implement LinkSynchronousProtocol, adding get and store methods with detailed docstrings. - Modified get_async method in Link class to include file_options parameter for enhanced file retrieval options. - Updated storable_entity_components to include DatasetCollection and MaterializedView in type hints. - Added comprehensive tests for Link retrieval, ensuring default behavior follows links and supports custom file options.
…e can logically group items differently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your hard work, Bryan! I tested getting and storing links, and they both worked for me overall. The only issue I ran into was that if a link has already been created, the step of running merge_dataclass_entities
throws an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me overall! Just two small comments.
View the API docs here:
https://synapsepythonclient--1243.org.readthedocs.build/en/1243/reference/experimental/sync/factory_operations/
https://synapsepythonclient--1243.org.readthedocs.build/en/1243/reference/experimental/sync/link_entity/
async:
https://synapsepythonclient--1243.org.readthedocs.build/en/1243/reference/experimental/async/factory_operations/
https://synapsepythonclient--1243.org.readthedocs.build/en/1243/reference/experimental/async/link_entity/
Problem:
Solution:
Testing: