Skip to content

A model to detect patronizing or condescending language, as part of coursework for COMP70016 Natural Language Processing @ Imperial College London

Notifications You must be signed in to change notification settings

shiva-tk/dont-patronize-me

Repository files navigation

Don’t Patronize Me

Overview

This repository contains the code and experiments for the COMP70016 NLP coursework, which involves developing a binary classification model to predict whether a text contains patronising and condescending language (PCL). This task is based on SemEval 2022 Task 4 (Subtask 1).

Project Structure

dont-patronize-me
├── baselines                            Simple models against which we benchmark our model
│   ├── bow.ipynb
│   └── tfidf.ipynb
├── data                                 Data used for training / evaluation
│   ├── raw
│   │   ├── dev-parids.csv               Rows for official dev split
│   │   ├── dontpatronizeme.tsv          "Don't patronize me!" dataset
│   │   └── train-parids.csv             Rows for official train split
│   ├── complete.csv
│   ├── dev.csv                          Preprocessed dev set
│   ├── dev.txt                          Final model predictions for the dev set
│   ├── reworded.csv                     LLM augmented samples, based on the train set
│   └── train.csv                        Preprocessed train set
├── scripts
│   ├── preprocessing.py
│   └── rewording.py
├── analysis.ipynb                       Final analysis of the model's performance
├── dataset.ipynb                        Initial analysis of the composition of the dataset
├── experiments.ipynb                    Trialing model improvements / hyperparameter tuning
├── modeleval.py                         Library to evaluate different models / hyperparameters
└── README.org

About

A model to detect patronizing or condescending language, as part of coursework for COMP70016 Natural Language Processing @ Imperial College London

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published