Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor NN code #111

Merged
merged 16 commits into from
Jun 4, 2019
Merged

Refactor NN code #111

merged 16 commits into from
Jun 4, 2019

Conversation

michelole
Copy link
Member

This refactor NN code so that pieces of code can more easily be reused in #107 and #110.

michelole added 13 commits May 29, 2019 13:40
Introduce the new interface `InputRepresentation` to separate logic of input representation (e.g. word embeddings, character trigrams) from iterators and classifiers. This allows new combinations required as part of bst-mug#107 and bst-mug#110.

Move data-dependent methods such as `initializeTruncateLength` and `loadFeaturesForNarrative` to the iterators.

Remove public and duplicate attributes to reduce complexity.
This allows other combinations as required by bst-mug#107 and bst-mug#110.
So that it can be reused by other NN tests.
Pull-up `loadFeaturesForNarrative` with a call to an abstract method `getUnits`.
Make code closer to the original example and to the `SentenceIterator`.
Both `SentenceIterator` (`calculateMaxSentences`) and `TokenIterator` (`initializeTruncateLength`) implemented a method to calculate the longest sequence in training data and used it to initialize `truncateLength`.
Now that we have `getUnits`, DRY and pull up this method.

Note that `TokenIterator` method cleaned the text before tokenization, but such cleaning was not done at training/test time and was therefore a bug.

Not also `TokenIterator` calculated in the same method token/type coverage, which was used only for debugging before introduction of `fasttext`, that returns an approximate vector for out-of-dictionary tokens.
@coveralls
Copy link

coveralls commented Jun 4, 2019

Coverage Status

Coverage decreased (-0.7%) to 71.951% when pulling f075134 on michelole:cleanup into af901da on bst-mug:master.

@bst-mug bst-mug deleted a comment Jun 4, 2019
@bst-mug bst-mug deleted a comment Jun 4, 2019
@bst-mug bst-mug deleted a comment Jun 4, 2019
@bst-mug bst-mug deleted a comment Jun 4, 2019
@bst-mug bst-mug deleted a comment Jun 4, 2019
@bst-mug bst-mug deleted a comment Jun 4, 2019
@bst-mug bst-mug deleted a comment Jun 4, 2019
@bst-mug bst-mug deleted a comment Jun 4, 2019
@bst-mug bst-mug deleted a comment Jun 4, 2019
@bst-mug bst-mug deleted a comment Jun 4, 2019
@bst-mug bst-mug deleted a comment Jun 4, 2019
@michelole michelole merged commit 2e599c3 into bst-mug:master Jun 4, 2019
@michelole michelole deleted the cleanup branch June 4, 2019 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants