The ArguAI framework uses fine-grained argumentation mining models and NLP techniques to analyze comments on online participation portals. It includes Flair-based label (argument type) classifiers trained on CeR, CMV, AM2, and PE datasets, with optional eNRC emotion features.
-
Recommended: use the
enrcconda environment (from ArgStanceNRC or your own), which providesflair,nrclex,enrc, and related deps. -
Alternative: from project root, install with Poetry (see
src/pyproject.toml):cd src && poetry install
Then activate before running scripts:
conda activate enrc # if using conda
cd /path/to/ArguAI/srcPlace or link JSON datasets under ArguAI/datasets/:
| Dataset | File | Text column | Label column |
|---|---|---|---|
| CeR | df_CeR.json |
text |
type |
| CMV | dfCMV-v2.json |
op_EDU |
Label |
| AM2 | dfAM2-v1.json |
texts |
types |
| PE | dfPE_stance-v1.json |
EDU |
semanticType |
These are used by train_flair.py (and by argfine.py when called from notebooks).
Trains a Flair TextClassifier for argument-type (label) classification on the datasets above. Optionally adds NRC or eNRC emotion features (8-dim) to the transformer embedding.
Basic usage (no emotion features):
cd src
python train_flair.py --dataset CeR --model roberta-base --epochs 10With eNRC features (requires ArgStanceNRC repo and enrc package; uses cache under ArguAI/nrc_cache/):
python train_flair.py --dataset CeR --use-enrc --threshold 0.4 --epochs 10All datasets, one model per dataset:
python train_flair.py --dataset all --use-enrc --threshold 0.4Single model on all four datasets combined:
python train_flair.py --dataset all --combined --use-enrc --output-dir ../resources/taggers/flair_label_all_combined_enrcMain options:
| Option | Description | Default |
|---|---|---|
--dataset |
AM2, CeR, CMV, PE, or all |
AM2 |
--model |
HuggingFace transformer name | roberta-base |
--epochs |
Training epochs | 10 |
--lr |
Learning rate | 5e-5 |
--batch-size |
Mini-batch size | 4 |
--use-nrc |
Use NRC emotion features (nrclex) | off |
--use-enrc |
Use expandNRC (eNRC) from ArgStanceNRC | off |
--threshold |
eNRC threshold | 0.8 |
--output-dir |
Directory for saved model | ../resources/taggers/flair_label_<dataset>_enrc |
--combined |
With --dataset all, train one combined model |
off |
Trained models are written under ArguAI/resources/taggers/ (or --output-dir), including final-model.pt.
argfine.py is a library used from notebooks (e.g. Demo_argfine_public-participants.ipynb). It provides:
visualize_column_counts,modify_flat,ready_for_flair,split_dffor data preptrain_argfine(train_s, dev_s, test_s, model_name, learning_rate, mini_batch_size, max_epoch, label_type, base_path)to train a Flair text classifier
Example pattern from a notebook or script:
from argfine import modify_flat, ready_for_flair, split_df, train_argfine
# Load data, filter to valid labels (e.g. fact/value/policy), flatten
df = pd.read_json("../datasets/df_CeR.json")
valid = {"fact", "value", "policy"}
df_flat = modify_flat(df, valid, text_col="text", type_col="type")
df_ready = ready_for_flair(df_flat, valid, "text", "type")
train_s, dev_s, test_s = split_df(df_ready, "text", "type")
train_argfine(
train_s, dev_s, test_s,
model_name="roberta-base",
learning_rate=5e-5,
mini_batch_size=4,
max_epoch=10,
label_type="label",
base_path="../resources/taggers/argfine_CeR",
)Run the demo notebook for a full pipeline:
cd src
jupyter notebook Demo_argfine_public-participants.ipynbArguAI/
├── README.md
├── datasets/ # df_CeR.json, dfCMV-v2.json, dfAM2-v1.json, dfPE_stance-v1.json
├── nrc_cache/ # Cached NRC/eNRC features (created by train_flair.py)
├── resources/taggers/ # Saved Flair models (final-model.pt, etc.)
└── src/
├── argfine.py # Data prep + train_argfine()
├── train_flair.py # CLI for label classification (with optional eNRC)
├── pyproject.toml
└── Demo_argfine_public-participants.ipynb