Skip to content

feat!: add assembly registry utilities#63

Merged
jsstevenson merged 4 commits into0.2.0from
feat/62-seq-ids
Oct 21, 2025
Merged

feat!: add assembly registry utilities#63
jsstevenson merged 4 commits into0.2.0from
feat/62-seq-ids

Conversation

@jsstevenson
Copy link
Member

@jsstevenson jsstevenson commented Oct 20, 2025

close #62

  • Add get_assembly_from_refget_id(), which takes a "SQ.ABCDEFG"-type string and checks against a static mapping of known sequence IDs and their corresponding reference assemblies. In our other use cases, we use SeqRepo for this, but it seems silly to do so when there's a pretty small number of possible associations and they rarely (never?) change. This will enable applications like AnyVar to unit-test liftover protocols without needing to mock or stage a seqrepo interface.
  • Tighten requirements on input. I think it's more complicated than it's worth to be flexible about whether something is a string or an enum. Previously we might've left this in to be compatible with pyliftover but I think we can afford to be stricter.
  • update various docs/docstring type things

This breaks some interface elements. I'm merging it into a 0.2.0 branch because I think there are one or two other small changes I'd like to make before assessing the impact of a release on places like coolseqtool.

@jsstevenson jsstevenson added the priority:medium Medium priority label Oct 21, 2025
@jsstevenson jsstevenson changed the base branch from main to 0.2.0 October 21, 2025 00:28
@jsstevenson jsstevenson requested a review from korikuzma October 21, 2025 00:53
@jsstevenson jsstevenson marked this pull request as ready for review October 21, 2025 00:54
HG19 = "hg19"


ID_TO_REFERENCE_MAP = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider changing from set to a list. I'm thinking of the case where want to do a quick look up without SeqRepo, where we know both the assembly and chromosome so we could do something like ID_TO_REFERENID_TO_REFERENCE_MAP["hg19"][11-1]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that seems ok. I was wondering about whether we'd need to add more IDs in the future, but now that I think about it, I'm not sure what those would be.

@jsstevenson jsstevenson requested a review from korikuzma October 21, 2025 11:53
Copy link
Member

@korikuzma korikuzma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@jsstevenson jsstevenson merged commit 0e98274 into 0.2.0 Oct 21, 2025
17 checks passed
@jsstevenson jsstevenson deleted the feat/62-seq-ids branch October 21, 2025 12:24
jsstevenson added a commit that referenced this pull request Oct 29, 2025
* Add get_assembly_from_refget_id(), which takes a "SQ.ABCDEFG"-type string and checks against a static mapping of known sequence IDs and their corresponding reference assemblies. In our other use cases, we use SeqRepo for this, but it seems silly to do so when there's a pretty small number of possible associations and they rarely (never?) change. This will enable applications like AnyVar to unit-test liftover protocols without needing to mock or stage a seqrepo interface.
* Tighten requirements on input. I think it's more complicated than it's worth to be flexible about whether something is a string or an enum. Previously we might've left this in to be compatible with pyliftover but I think we can afford to be stricter.
* update various docs/docstring type things

This breaks some interface elements. I'm merging it into a 0.2.0 branch because I think there are one or two other small changes I'd like to make before assessing the impact of a release on places like coolseqtool.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:medium Medium priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants