Could you elaborate more on the extrinsic evaluation?

You mentioned in the paper that you randomly sampled 1% of the training set and 5 of each class for the validation set. I tried to replicate the baseline results on SST-2 by fine-tuning bert-base-uncased (as mentioned in the paper), but the results are much higher than the target numbers.

Your Paper: 59.08 (5.59) [15 trials]
My Attempt: 72.89 (6.36) [9 trials]

I could probably increase the number of trials to see if I was just unlucky, but it is unlikely that statistical variance could deviate the numbers that much. Could you provide more details about your experiments? Did you sample the datasets with different seeds for each trial? 

BTW I **am** using the dataset provided by the authors of CBERT (training set size 6,228). Thanks in advance.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Could you elaborate more on the extrinsic evaluation? #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Could you elaborate more on the extrinsic evaluation? #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions