You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
5.3 Feature-based Approach with BERT
We use the representation of the first sub-token as the input to the token-level classifier over the NER label set.
In the middle of Chapter 5.3, it is written that the label is set on the first token of the subword. Since there is no description of "X", "X" is not used at present.
The following issue also states that the model can learn patterns by learning only the first word of the subword with fine-tuning without using'X'. kamalkraj/BERT-NER#1 (comment)
Are there any plans to avoid using the special token X?
The text was updated successfully, but these errors were encountered:
I think that the special token X was used in BERT paper v1 but not used in v2.
BERT paper v2 is written below.
In the middle of Chapter 5.3, it is written that the label is set on the first token of the subword. Since there is no description of "X", "X" is not used at present.
The following issue also states that the model can learn patterns by learning only the first word of the subword with fine-tuning without using'X'.
kamalkraj/BERT-NER#1 (comment)
Are there any plans to avoid using the special token X?
The text was updated successfully, but these errors were encountered: