Skip to content

Different result found in the released vectors on Chinese corpus against the paper #11

@zhicongchen

Description

@zhicongchen

Hi, I'm working on the Chinese corpus downloaded from Histwords.

I read the vectors of 病毒 & 电脑 and get the following results for cosine similarity:

('病毒', '电脑')
1950, cosine similarity=0.000
1960, cosine similarity=0.000
1970, cosine similarity=0.000
1980, cosine similarity=0.360
1990, cosine similarity=0.263

The Spearman correlation between [0, 0, 0, 0.36, 0.26] and [1950, 1960, 1970, 1980, 1990] is 0.78. However, in the paper reports the correlation as 0.89 (at the end of section 3.2).

Is there anything going wrong with my data processing? Thank you for your attention.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions