This project implements a fake news detection system using BERT and PyTorch. The model classifies news articles as either FAKE or REAL.
- Utilizes the
bert-base-uncasedmodel from HuggingFace Transformers. - Custom
NewsDatasetclass for efficient data loading and tokenization. - 5-Fold Cross Validation for robust evaluation.
- Training and validation loops with accuracy reporting.
ui_bert.py: Main script for data loading, model definition, training, and evaluation.full_dataset.csv: CSV file containing news articles and labels.
-
Install dependencies:
pip install pandas torch scikit-learn transformers
-
Prepare your dataset:
- Ensure
full_dataset.csvis in the project directory. - The CSV should have columns:
text(news content) andlabel(FAKEorREAL).
- Ensure
-
Run the script:
python ui_bert.py
- Tokenizer: BERT tokenizer (
bert-base-uncased) - Model: BERT with a frozen encoder and a linear classification head
- Loss: CrossEntropyLoss
- Optimizer: Adam
The script uses 5-fold cross validation to split the dataset and reports accuracy for each fold.
- NewsDataset: Handles tokenization and formatting for BERT input.
- FakeNewsClassifier: Wraps BERT and adds a linear layer for classification.
- BERT parameters are frozen during training for efficiency.
- Adjust
n_epochsandbatch_sizeinui_bert.pyas needed.
This project is for educational purposes.