-
Notifications
You must be signed in to change notification settings - Fork 94
Open
Description
Regarding the pre-trained vectors for some of the corpora: (on the HistWords website)
For specific decades, there appear to be a handful of word vectors that are "0.0" across all 300 dimensions. It should be noted that for these corresponding words, they are still present in the corpus for this particular decade.
However, they do not seem to get any sort of representation across 300 dimensions, and have been assigned zero values throughout. For example, the vector for the word 'autism', from the 1800s decade of the Google n-grams eng-all vectors is [0.0 ... 0.0] for all 300 dimensions.
Would treating these words as simply 'missing' from the corpus at this particular decade be apt?
Metadata
Metadata
Assignees
Labels
No labels