This project implements a Text Classifier using an Encoder-only Transformer model. It is trained on the AG_NEWS dataset, which consists of news articles that can be classified into four categories:
- World
- Sports
- Business
- Science/Technology
This model leverages the power of Transformer-based architectures to accurately classify news articles into these categories.
Follow these steps to set up the project:
-
Clone the repository:
git clone https://github.com/Red-RobinHood/Text-Classifier.git cd Text-Classifier -
Install the required dependencies:
pip install -r requirements.txt
-
Ensure that you are using Python 3.8 or higher.
To use the pretrained model for text classification, follow these steps:
- Change the
custominputflag in themodel.pyfile (on line 368):- Set it to
Trueto give input via the command line interface (CLI). - Set it to
Falseto use the validation data from theval.csvfile.
- Set it to
Once the flag is set, run the script:
python model.pyThis will use the pretrained model to classify news articles.
To train the model on your own dataset:
-
Delete the existing weights file from the weights folder or change the model_name parameter.
-
Add your custom dataset to the appropriate subfolder within the Dataset folder.
-
After this, run the training script to start training on your dataset.
• This project uses the AG_NEWS dataset for training and validation.
• The approach is inspired by research on Transformer architectures, particularly for text classification tasks.
Feel free to contribute by opening issues or submitting pull requests!