Skip to content

Latest commit

 

History

History
47 lines (24 loc) · 1.52 KB

File metadata and controls

47 lines (24 loc) · 1.52 KB

Automatic Label Error Correction Without Human Labor

LICENSE 996.icu

Doc

https://guotong1988.github.io/core_research/2024/02/01/auto-re-label/

Run

Step-1, Train the model on origin training dataset, train.py

Step-2, Predict the training/dev datasets, predict.py

Step-3, Prepare the candidate training datasets, get_dataset_list.py

Step-4, Find the best dataset by dev accuracy, explore_train.py

Requirement

transformers 4.38.2 or 4.26.1

torch 2.2.1 or 1.11.0

scikit-learn 1.3.2

datasets 2.18.0

accelerate 0.27.2

Experiment Results

table1

table1

Relate Work

Label Error Correction With Human Labor: The Re-Label Method For Data-Centric Machine Learning

Controllable Label Error Fixing: Re-Label By Data Pattern For Controllable Deep Learning

More Info

The method proposed in this project (and its relate works) can be applied to all manually annotated (or dataset by LLMs) deep learning tasks, not just NLP tasks, but can be efficiently extended to CV tasks, speech recognition tasks, text-to-speech tasks, and more.