Skip to content

myeghaneh/ArguAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ArgFine

The ArguAI framework uses fine-grained argumentation mining models and NLP techniques to analyze comments on online participation portals. It includes Flair-based label (argument type) classifiers trained on CeR, CMV, AM2, and PE datasets, with optional eNRC emotion features.

Environment

  • Recommended: use the enrc conda environment (from ArgStanceNRC or your own), which provides flair, nrclex, enrc, and related deps.

  • Alternative: from project root, install with Poetry (see src/pyproject.toml):

    cd src && poetry install

Then activate before running scripts:

conda activate enrc   # if using conda
cd /path/to/ArguAI/src

Datasets

Place or link JSON datasets under ArguAI/datasets/:

Dataset File Text column Label column
CeR df_CeR.json text type
CMV dfCMV-v2.json op_EDU Label
AM2 dfAM2-v1.json texts types
PE dfPE_stance-v1.json EDU semanticType

These are used by train_flair.py (and by argfine.py when called from notebooks).

Running the scripts

1. Train Flair label classifier (train_flair.py)

Trains a Flair TextClassifier for argument-type (label) classification on the datasets above. Optionally adds NRC or eNRC emotion features (8-dim) to the transformer embedding.

Basic usage (no emotion features):

cd src
python train_flair.py --dataset CeR --model roberta-base --epochs 10

With eNRC features (requires ArgStanceNRC repo and enrc package; uses cache under ArguAI/nrc_cache/):

python train_flair.py --dataset CeR --use-enrc --threshold 0.4 --epochs 10

All datasets, one model per dataset:

python train_flair.py --dataset all --use-enrc --threshold 0.4

Single model on all four datasets combined:

python train_flair.py --dataset all --combined --use-enrc --output-dir ../resources/taggers/flair_label_all_combined_enrc

Main options:

Option Description Default
--dataset AM2, CeR, CMV, PE, or all AM2
--model HuggingFace transformer name roberta-base
--epochs Training epochs 10
--lr Learning rate 5e-5
--batch-size Mini-batch size 4
--use-nrc Use NRC emotion features (nrclex) off
--use-enrc Use expandNRC (eNRC) from ArgStanceNRC off
--threshold eNRC threshold 0.8
--output-dir Directory for saved model ../resources/taggers/flair_label_<dataset>_enrc
--combined With --dataset all, train one combined model off

Trained models are written under ArguAI/resources/taggers/ (or --output-dir), including final-model.pt.

2. ArguFine utilities (argfine.py)

argfine.py is a library used from notebooks (e.g. Demo_argfine_public-participants.ipynb). It provides:

  • visualize_column_counts, modify_flat, ready_for_flair, split_df for data prep
  • train_argfine(train_s, dev_s, test_s, model_name, learning_rate, mini_batch_size, max_epoch, label_type, base_path) to train a Flair text classifier

Example pattern from a notebook or script:

from argfine import modify_flat, ready_for_flair, split_df, train_argfine

# Load data, filter to valid labels (e.g. fact/value/policy), flatten
df = pd.read_json("../datasets/df_CeR.json")
valid = {"fact", "value", "policy"}
df_flat = modify_flat(df, valid, text_col="text", type_col="type")
df_ready = ready_for_flair(df_flat, valid, "text", "type")
train_s, dev_s, test_s = split_df(df_ready, "text", "type")

train_argfine(
    train_s, dev_s, test_s,
    model_name="roberta-base",
    learning_rate=5e-5,
    mini_batch_size=4,
    max_epoch=10,
    label_type="label",
    base_path="../resources/taggers/argfine_CeR",
)

Run the demo notebook for a full pipeline:

cd src
jupyter notebook Demo_argfine_public-participants.ipynb

Project layout

ArguAI/
├── README.md
├── datasets/          # df_CeR.json, dfCMV-v2.json, dfAM2-v1.json, dfPE_stance-v1.json
├── nrc_cache/         # Cached NRC/eNRC features (created by train_flair.py)
├── resources/taggers/ # Saved Flair models (final-model.pt, etc.)
└── src/
    ├── argfine.py         # Data prep + train_argfine()
    ├── train_flair.py     # CLI for label classification (with optional eNRC)
    ├── pyproject.toml
    └── Demo_argfine_public-participants.ipynb

About

The ArgumetationAI Framework utilizes advanced Argumentation Mining models, enhanced with cutting-edge NLP techniques and language models, to analyze participants' comments on online participation portals.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors