#Bert增量预训练的方法可以参考我知乎的
https://zhuanlan.zhihu.com/p/346952104
#bert 预训练模型下载链接
BERT-Large, Uncased (Whole Word Masking)
: 24-layer, 1024-hidden, 16-heads, 340M parametersBERT-Large, Cased (Whole Word Masking)
: 24-layer, 1024-hidden, 16-heads, 340M parametersBERT-Base, Uncased
: 12-layer, 768-hidden, 12-heads, 110M parametersBERT-Large, Uncased
: 24-layer, 1024-hidden, 16-heads, 340M parametersBERT-Base, Cased
: 12-layer, 768-hidden, 12-heads , 110M parametersBERT-Large, Cased
: 24-layer, 1024-hidden, 16-heads, 340M parametersBERT-Base, Multilingual Cased (New, recommended)
: 104 languages, 12-layer, 768-hidden, 12-heads, 110M parametersBERT-Base, Multilingual Uncased (Orig, not recommended)
(Not recommended, useMultilingual Cased
instead): 102 languages, 12-layer, 768-hidden, 12-heads, 110M parametersBERT-Base, Chinese
: Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters