We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
看仓库说明: 使用的训练数据是: https://huggingface.co/datasets/shibing624/chinese_text_correction 测试数据是: SIGHAN-2015(sighan2015_test.tsv) EC-LAW(ec_law_test.tsv) MCSC(mcsc_test.tsv)
检查发现,EC-LAW和MCSC数据和训练数据是有重叠的,这和三个测试集的效果一致,EC-LAW,MCSC接近1,SIGHAN-2015奇怪的只有0.4917
想问一下,训练的时候有去除在测试集中的数据吗?
The text was updated successfully, but these errors were encountered:
训练的时候包括了测试集中的数据。
Sorry, something went wrong.
可以看一下,https://github.com/TW-NLP/ChineseErrorCorrector 在训练过程中,是将训练和测试集分开的,有7B和32B规模的纠错大模型,希望可以帮助你。
No branches or pull requests
看仓库说明:
使用的训练数据是:
https://huggingface.co/datasets/shibing624/chinese_text_correction
测试数据是:
SIGHAN-2015(sighan2015_test.tsv)
EC-LAW(ec_law_test.tsv)
MCSC(mcsc_test.tsv)
检查发现,EC-LAW和MCSC数据和训练数据是有重叠的,这和三个测试集的效果一致,EC-LAW,MCSC接近1,SIGHAN-2015奇怪的只有0.4917
想问一下,训练的时候有去除在测试集中的数据吗?
The text was updated successfully, but these errors were encountered: