feat!: add assembly registry utilities#63
Merged
jsstevenson merged 4 commits into0.2.0from Oct 21, 2025
Merged
Conversation
korikuzma
requested changes
Oct 21, 2025
| HG19 = "hg19" | ||
|
|
||
|
|
||
| ID_TO_REFERENCE_MAP = { |
Member
There was a problem hiding this comment.
Consider changing from set to a list. I'm thinking of the case where want to do a quick look up without SeqRepo, where we know both the assembly and chromosome so we could do something like ID_TO_REFERENID_TO_REFERENCE_MAP["hg19"][11-1]
Member
Author
There was a problem hiding this comment.
Yeah that seems ok. I was wondering about whether we'd need to add more IDs in the future, but now that I think about it, I'm not sure what those would be.
jsstevenson
added a commit
that referenced
this pull request
Oct 29, 2025
* Add get_assembly_from_refget_id(), which takes a "SQ.ABCDEFG"-type string and checks against a static mapping of known sequence IDs and their corresponding reference assemblies. In our other use cases, we use SeqRepo for this, but it seems silly to do so when there's a pretty small number of possible associations and they rarely (never?) change. This will enable applications like AnyVar to unit-test liftover protocols without needing to mock or stage a seqrepo interface. * Tighten requirements on input. I think it's more complicated than it's worth to be flexible about whether something is a string or an enum. Previously we might've left this in to be compatible with pyliftover but I think we can afford to be stricter. * update various docs/docstring type things This breaks some interface elements. I'm merging it into a 0.2.0 branch because I think there are one or two other small changes I'd like to make before assessing the impact of a release on places like coolseqtool.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
close #62
get_assembly_from_refget_id(), which takes a"SQ.ABCDEFG"-type string and checks against a static mapping of known sequence IDs and their corresponding reference assemblies. In our other use cases, we use SeqRepo for this, but it seems silly to do so when there's a pretty small number of possible associations and they rarely (never?) change. This will enable applications like AnyVar to unit-test liftover protocols without needing to mock or stage a seqrepo interface.pyliftoverbut I think we can afford to be stricter.This breaks some interface elements. I'm merging it into a
0.2.0branch because I think there are one or two other small changes I'd like to make before assessing the impact of a release on places like coolseqtool.