Skip to content

Zero-valued vectors? #12

@nishanthsanjeev

Description

@nishanthsanjeev

Regarding the pre-trained vectors for some of the corpora: (on the HistWords website)

For specific decades, there appear to be a handful of word vectors that are "0.0" across all 300 dimensions. It should be noted that for these corresponding words, they are still present in the corpus for this particular decade.

However, they do not seem to get any sort of representation across 300 dimensions, and have been assigned zero values throughout. For example, the vector for the word 'autism', from the 1800s decade of the Google n-grams eng-all vectors is [0.0 ... 0.0] for all 300 dimensions.

Would treating these words as simply 'missing' from the corpus at this particular decade be apt?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions