-
-
Notifications
You must be signed in to change notification settings - Fork 72
Genbank parser rewrite #437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| wantErr bool | ||
| }{{ | ||
| name: "parses example header from spec", | ||
| data: `GBBCT1.SEQ Genetic Sequence Data Bank |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the cases where wantErr is nil, perhaps it could be helpful to define data in terms of want.
So for instance, Header could have a method like
func (h Header) bytes() []bytes {
strbuilder := strings.Builder{}
for _, r := range h.FileName {
strbuilder.WriteRune(r)
}
for i := 0; i < paddingBeforeTitle-len(h.FileName); i++ {
strbuilder.WriteByte(" "[0]) // maybe hard code some common byte values for these :shrug:
}
for _, r := range h.Title {
strbuilder.WriteRune(r)
}
strbuilder.WriteByte("\n"[0])
// etc.
}
for _, tc := range testcases {
t.Run(tc.name, func(t *testing.T) {
var p *Parser
if data == "" {
p = NewParser(strings.NewReader(tc.want)
} else {
p = NewParser(strings.NewReader(tc.data))
}
got, err := p.parseHeader()
if gotErr := err != nil; gotErr != tc.wantErr {
t.Fatalf("incorrect error returned from parseHeader, wantErr: %v, err: %v", tc.wantErr, err)
}
if diff := cmp.Diff(tc.want, got); !tc.wantErr && diff != "" {
t.Fatalf("parseHeader returned incorrect header, (-want, +got): %v", diff)
}
})
}This should do 2 very nice things:
- Contributors don't need to to spend time aligning multi line string literals in accordance to the spec. All they have to do is define a Struct. More complex structs of the Genbank could have some constructors with some nice defaults to help build up the test data if it is too complex to define the struct in place or if we find that it takes too much boiler plate to define over and over again. This should make it easier to add and reason about test data.
- This gets us part of the way to implementing writing to disk!
This is probably overkill for header and locus, but maybe good to keep in mind for the future. Having golden tests would be super helpful once we're ready to test some complete Genbank files. This would make it trivial to add test cases and work through any problems if bugs reported.
Also, maybe it's good to have a few different test tables instead of the one for header? That way the test case itself is a little simpler. For instance, TestParseHeaderHappy and TestParseHeaderSad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points. We need write ability - makes sense to implement this along the way so we can just do round-trip tests! Will go back and do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest commit implements these suggestions for the header:
- separate "happy" and "sad" tests into
TestXXXRoundtripandTestXXXErr - make tests roundtrip instead
Still need to do this for the rest of the spec though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm! I'd say ship it!
|
This PR has had no activity in the past 2 months. Marking as |
|
This PR has had no activity in the past 2 months. Marking as |
* generate codon table in constructor rather than at buildtime * update tests to handle constructor error, since tables are no longer generated at buildtime * update examples to handle constructor error, since tables are no longer generated at buildtime * write a proper deepcopy func + test
* basic bitvector * jacobsons start and refactor to uint for accurate machine words * confident that jacobson rank is working * reusing the incoming bitvector instead of copying everyithing for jacobson rank * access and bounds checking * just do uint64 for simplicity. bound checking and access * bit vector fixes, rsa good enough, wavelet start * Simple wavelet tree with access * wavelet fix access, add select, fix rsa bitvector select * got count working, but had to throw out jacobsons * rsa fixes and refactors * bwt locate * extract * doc BWT, refactor, and return a possible error during construction * add TODO about sorting and the nullChar * bwt examples * wavelet tree doc * wavelet tree explanation * doc and note for waveletTree * add bwt high level. move wavelet tree's some rsa bv docs * simplify bitvector, docs for bitvector and rsaBitvector * Cite Ben Langmead. --------- Co-authored-by: Willow Carretero Chavez <[email protected]> Co-authored-by: Timothy Stiles <[email protected]>
* moved align, bwt, and mash into new search package. * updated import paths.
* run length bwt * better clarify mapping from original sequnce space to run space --------- Co-authored-by: Timothy Stiles <[email protected]>
* added package level doc strings.
* lint fix to use string interpolation. * more string interpolation. * more string interpolation.
* fixed typo. * updated comment * reorderd tutorial chapters. * created short tutorial on primer design * deleted dna parts tutorial wireframe. * explained PCR cycles and caveats. * renamed primer design tutorial. * disabled whitespace linting. * now using output of append.
* Separated minimums between design and simulate * Added check on primer length when using simulate * Added CHANGELOG.md
|
This PR has had no activity in the past 2 months. Marking as |
Changes in this PR
Rewrite of the Genbank parser. Currently in a draft state.
Why are you making these changes?
See #434 .
Are any changes breaking? (IMPORTANT)
Yes. Changes to the structure and methods of the
Genbankstruct, as well as the API for the parser.Pre-merge checklist
All of these must be satisfied before this PR is considered
ready for merging. Mergeable PRs will be prioritized for review.
primers/primers_test.gofor what this might look like.CHANGELOG.mdin the[Unreleased]section.