Basic text processing #82

michelole · 2017-07-04T12:52:20Z

Ensure that Elastic Search is performing basic text processing, such as:

Plural/singular
Removing parenthesis

michelole · 2017-07-10T10:52:37Z

We're using the default standard analyzer. According to the documentation:

It splits the text on word boundaries, as defined by the Unicode Consortium, and removes most punctuation. Finally, it lowercases all terms.

We could switch to the english analyzer, which performs stemming and stopwords removal. It would then handle plurals accordingly.

We need to reindex to apply these changes.

This refs bst-mug#107, bst-mug#82 and bst-mug#70.

michelole · 2017-07-29T17:07:08Z

English stemming worsened results form 0,7693 to 0,6884.

It could still benefit from #97, however.

michelole added the P0 label Jul 4, 2017

michelole self-assigned this Jul 4, 2017

michelole mentioned this issue Jul 10, 2017

Most fields #97

Open

michelole added the reindex label Jul 10, 2017

michelole pushed a commit to michelole/trec2017 that referenced this issue Jul 29, 2017

Enables new experiments

074ae82

This refs bst-mug#107, bst-mug#82 and bst-mug#70.

michelole closed this as completed Jul 29, 2017

michelole added the experiment label Oct 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic text processing #82

Basic text processing #82

michelole commented Jul 4, 2017

michelole commented Jul 10, 2017

michelole commented Jul 29, 2017

Basic text processing #82

Basic text processing #82

Comments

michelole commented Jul 4, 2017

michelole commented Jul 10, 2017

michelole commented Jul 29, 2017