-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/gcf tree #75
Feature/gcf tree #75
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good so far, just some minor things.
Potential refactor idea in the future is to split figuring out the comparable region from calculating the scores, so that there is not a large amount of data being returned in the yield. the workflow is a bit complicated to do this but still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, but to optimize performance can you change this to send the numbers directly instead of the entire pair object? See inline comments.
And to keep the order consistent everywhere, let's do:
(
pair_a_id,
pair_b_id,
dist,
jaccard,
adjacency,
dss,
[lcs stuff],
[start and stops after extend],
[reverse],
[alignment mode]
)
distance, jaccard, adjacency, dss = results[idx] | ||
# for idx, pair in enumerate(task_batch): | ||
# distance, jaccard, adjacency, dss = results[idx] | ||
for pair, distance, jaccard, adjacency, dss in results: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In another branch I discovered this can be simplified:
for a, b, c, d in some_tuple:
yield (a, b, c, d)
is equivalent to
yield from some_tuple
@@ -251,13 +269,13 @@ def calculate_scores_pair( | |||
|
|||
for pair in pairs: | |||
if pair.region_a.parent_gbk == pair.region_b.parent_gbk: | |||
results.append((0.0, 1.0, 1.0, 1.0)) | |||
results.append((pair, 0.0, 1.0, 1.0, 1.0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
results.append(0,0, 1.0, 1.0, 1.0, [lcs stuff], [comparable distance starts and stops], [reverse], [alignment mode])
big_scape/network/network.py
Outdated
@@ -96,7 +96,7 @@ def get_edge( | |||
if edge is None: | |||
return None | |||
|
|||
return cast(tuple[int, int, float, float, float, float, str], edge) | |||
return cast(tuple[int, int, float, float, float, float, str], edge[0:7]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you're here, can you refactor this a bit to select columns in the select_statement on 86 instead of getting everything and returning a slice of the result. e.g.
distance_table = DB.metadata.tables["distance"]
select_columns = (
distance_table.c.region_a_id,
distance_table.c.region_b_id, # note the last comma; this is a tuple
)
select_statement = (
distance_table
.select(columns)
something like that. not essential if you want to skip it.
|
||
Returns: | ||
Written file containing protein domain alignment | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a sneaky TODO: refactor to this docstring? just to indicate that this is not our code 😎
Contains GCF alignment and tree building, as well as highlights of compared region (i.e. protocluster/core).