Skip to content

The Fake News Detection system features a user-friendly Tkinter-based GUI that allows users to input a news article, title, and author. Users can select from six machine learning models to instantly classify the news as Real or Fake. The interface provides quick and interactive predictions, making it ideal for real-time demonstrations.

Notifications You must be signed in to change notification settings

uditjain100/Deception-Decoder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“ฐ Deception Decoder: Fake News Detection Using Machine Learning

๐Ÿ“Œ Project Overview

In the digital age, misinformation has emerged as a critical global issue, deeply affecting politics, public health, financial systems, and societal harmony. With the rapid dissemination of news on social media platforms like Twitter, Facebook, and WhatsApp, identifying fake news in real-time has become both a necessity and a technological challenge. This project, titled "Deception Decoder", addresses this growing concern by building a comprehensive fake news detection system powered by Natural Language Processing (NLP) and machine learning.

The core of this system is a multi-model machine learning framework that evaluates news articles for authenticity. We preprocess news content using standard NLP techniques such as tokenization, lemmatization, and TF-IDF vectorization to extract meaningful features. A total of 29 different ML modelsโ€”ranging from simple linear classifiers to advanced ensemble learnersโ€”are applied and compared across multiple datasets. These include classifiers like Random Forest, AdaBoost, Perceptron, SVM, Naive Bayes, and optimization-based models like L-BFGS and SGD. Each model is tested and evaluated using a suite of performance metrics including accuracy, precision, recall, and F1-score.

The objective of this framework is not only high detection accuracy but also scalability, interpretability, and robustness in real-world deployment. This solution can be integrated by news agencies, fact-checking organizations, and government platforms to automate misinformation filtering at scale. With future enhancements like transformer-based models (e.g., BERT), multilingual support, and real-time detection capabilities, Deception Decoder can become a vital tool in the global fight against fake news.

๐Ÿ”Ÿ Key Features of This Project:

  1. โœ… 29 machine learning algorithms benchmarked for performance.
  2. ๐Ÿง  Uses TF-IDF and NLP preprocessing for accurate feature extraction.
  3. ๐Ÿ“Š Evaluates models on accuracy, precision, recall, and F1-score.
  4. ๐Ÿ“ Structured pipeline with clean dataset management and notebooks.
  5. ๐Ÿš€ Supports real-time fake news classification via trained models.
  6. ๐Ÿงช Includes comparative analysis for model selection.
  7. ๐Ÿ” Analyzes linguistic cues and semantic patterns in text.
  8. ๐ŸŒ Designed for scalability and integration into social platforms.
  9. ๐Ÿ”ง Easily extendable to incorporate BERT, GPT, or LLMs.
  10. ๐Ÿ›ก๏ธ Built with a focus on information integrity and cybersecurity.

๐Ÿง  Problem Statement

The rise of social media has enabled rapid dissemination of fake or misleading information that can influence elections, harm public health, and distort reality. Our system aims to solve this by:

  • Automating fake news classification using AI models
  • Evaluating multiple classifiers to determine optimal performance
  • Supporting real-world deployment for digital content moderation

๐Ÿงช Models and Algorithms Used

We implemented and compared 29 supervised ML models, including:

  • Linear Models: Logistic Regression, Perceptron, Ridge Classifier, Passive Aggressive
  • Tree-Based Models: Decision Tree, Random Forest, Extra Tree, Decision Stump
  • Ensemble Methods: AdaBoost, Gradient Boosting, Voting Classifier, Random Patches
  • Neural Nets: MLPClassifier, BernoulliRBM
  • SVMs: Linear SVC, SVC (RBF, gamma kernels)
  • Naive Bayes: MultinomialNB, ComplementNB
  • Optimization Algorithms: SGD, L-BFGS, Newton-CG, SAG
  • Gradient Boosters: XGBoost, CatBoost

๐Ÿ—‚๏ธ Project Structure

The repository is organized into clearly defined directories for code, datasets, documentation, and supporting research material.

.
โ”œโ”€โ”€ code/                        # All core code and notebooks
โ”‚   โ”œโ”€โ”€ app.ipynb               # Main implementation notebook
โ”‚   โ”œโ”€โ”€ First_DateSet_Kaggle.ipynb
โ”‚   โ”œโ”€โ”€ Fourth_DataSet_Liar.ipynb
โ”‚   โ”œโ”€โ”€ model.pkl               # Saved model
โ”‚   โ”œโ”€โ”€ README.md               # Code directory-specific readme
โ”‚   โ”œโ”€โ”€ Second_DataSet_ISOT.ipynb
โ”‚   โ”œโ”€โ”€ test.csv                # Test data
โ”‚   โ”œโ”€โ”€ Third_DataSet_WELFake.ipynb
โ”‚   โ””โ”€โ”€ train.csv               # Training data

โ”œโ”€โ”€ dataset/                    # Raw datasets (archived)
โ”‚   โ”œโ”€โ”€ fake-news-kaggleTrump.rar
โ”‚   โ”œโ”€โ”€ ISOT_News_dataset.rar
โ”‚   โ”œโ”€โ”€ liar_dataset.rar
โ”‚   โ”œโ”€โ”€ readme.md
โ”‚   โ””โ”€โ”€ WELFake_dataset.rar

โ”œโ”€โ”€ documentation/              # Reports, PPTs, and screenshots
โ”‚   โ”œโ”€โ”€ 24CSM1R23_DSF_Report_Final.docx
โ”‚   โ”œโ”€โ”€ 24CSM1R23_DSF_Report.pdf
โ”‚   โ”œโ”€โ”€ 24CSM1R23_Report_Formatted_DSF.pdf
โ”‚   โ”œโ”€โ”€ Advance algorithm Assignment.pdf
โ”‚   โ”œโ”€โ”€ Advance algorithm PPT of Fake News Detection System 24CSM2S01 24CSM2R10 24CSM1R23 (1).pptx
โ”‚   โ”œโ”€โ”€ Advance algorithm Report of Fake News Detection System 24CSM2S01 24CSM2R10 24CSM1R23.pdf
โ”‚   โ”œโ”€โ”€ Project_Report_SS.docx
โ”‚   โ”œโ”€โ”€ User_Interface_Result.jpg
โ”‚   โ””โ”€โ”€ User_Interface.jpg

โ”œโ”€โ”€ README.md                   # Main project README
โ”œโ”€โ”€ requirements.txt            # Python dependencies
โ””โ”€โ”€ researchpaper/              # Reference research articles
    โ”œโ”€โ”€ A Comprehensive Review on Fake News.pdf
    โ”œโ”€โ”€ MEFaND_A_Multimodel_Framework_for_Early_Fake_News_Detection.pdf
    โ””โ”€โ”€ sensors-24-05817-v2.pdf

Each component in the directory structure plays a specific role to keep the project modular, reproducible, and well-documented.


๐Ÿ–ฅ๏ธ User Interface

The system includes a simple yet effective Graphical User Interface (GUI) for real-time fake news prediction. This interface is built using Tkinter (Python's standard GUI package) and allows users to input a news article along with the title and author. Users can then choose from various pre-trained models to predict whether the news is Real or Fake.

Fake News Detection GUI


๐Ÿ’ก Features of the Interface

  • Title Field: Enter the headline of the news article.
  • Author Field: Enter the name of the author.
  • News Content Field: Paste or write the full article content for evaluation.
  • Model Buttons: Select from six machine learning models:
    • Decision Tree
    • Random Forest
    • AdaBoost
    • SGD Classifier
    • Perceptron
    • Logistic Regression
  • Prediction Output: After selecting a model, a popup message appears with the result โ€” either:
    • โœ… "The news is predicted to be: Real News"
    • โŒ "The news is predicted to be: Fake News"

๐Ÿงช Example UI in Action

When a user enters the content and clicks on a model button (e.g., SGD Classifier), a message box pops up with the prediction result:

The news is predicted to be: Fake News

This makes the system interactive, user-friendly, and ideal for:

  • Classroom demonstrations
  • Real-time academic projects
  • Integration into larger misinformation detection tools

โš™๏ธ Methodology

This section outlines the comprehensive pipeline followed in the Fake News Detection framework โ€” from raw data handling to final predictions and evaluation.


๐Ÿงผ 1. Data Preprocessing

Raw news articles are noisy and unstructured. To ensure optimal input for machine learning algorithms, the following NLP techniques are applied:

  • Lowercasing: Converts all text to lowercase to avoid duplication of words like "News" and "news".
  • Tokenization: Breaks down text into smaller units like words or phrases using nltk.word_tokenize() for structured analysis.
  • Stopword Removal: Eliminates common but uninformative words like "the", "is", "and" using standard English stopword lists.
  • Lemmatization: Converts words to their dictionary base form (e.g., "running" โ†’ "run") using WordNetLemmatizer, ensuring semantic consistency.
  • Punctuation & Special Character Removal: Strips out symbols, digits, and special characters that do not contribute to classification.
  • TF-IDF Vectorization: Converts textual data into numerical format using Term Frequency-Inverse Document Frequency to weigh significant words more heavily across documents.

The result is a clean, structured, and meaningful feature set ready for model training.


๐Ÿง  2. Model Training

Multiple supervised learning models are trained using labeled datasets with Real and Fake news categories. Each model learns linguistic patterns, word distributions, and semantic clues from the vectorized features:

  • Training involves splitting the dataset into training and test sets.
  • Cross-validation is used for fair evaluation.
  • Models used include Random Forest, Logistic Regression, Perceptron, AdaBoost, SGD, among others.
  • Hyperparameter tuning is applied (e.g., using GridSearchCV) for optimal performance.

Each algorithm builds its internal structure โ€” such as decision trees, weights, or ensembles โ€” to minimize classification error.


๐Ÿ”ฎ 3. Prediction Phase

Once trained, each model is used to predict the label of unseen news articles:

  • Input articles are passed through the same preprocessing pipeline.
  • Features are vectorized using the same TF-IDF model used during training.
  • Predictions are made on test data or new, real-time data.
  • Multiple models may be compared or combined using an ensemble (e.g., Voting Classifier) to improve robustness.

This ensures generalization and practical applicability to real-world social media content.


๐Ÿ“Š 4. Evaluation

Each modelโ€™s effectiveness is evaluated using standard classification metrics:

  • Accuracy: Measures overall correctness of predictions.
  • Precision: Proportion of correctly predicted fake news out of all predicted fake news.
  • Recall: Ability to detect actual fake news instances.
  • F1-Score: Harmonic mean of precision and recall, balancing false positives and false negatives.
  • Confusion Matrix: Visual breakdown of true/false positives and negatives.
  • ROC-AUC: Area under the curve for binary classification sensitivity.

The evaluation phase allows comparison of models and helps in selecting the most reliable and accurate one.


๐Ÿ“ˆ Results

Model Accuracy (%) Precision Recall F1-Score
SGD Classifier 97.62 0.97 0.97 0.97
Perceptron 97.01 0.98 0.95 0.96
AdaBoost 96.39 0.96 0.96 0.96
Logistic Regression 96.15 0.96 0.95 0.96
Random Forest 96.08 0.96 0.95 0.96
Decision Tree 94.20 0.96 0.92 0.94

โžก๏ธ Full evaluations and visualizations are available in the Jupyter notebooks.


๐Ÿš€ How to Run

Follow these step-by-step instructions to set up, install, and run the Fake News Detection project locally on your system.


1. ๐Ÿ” Clone the Repository

Use Git to clone the repository from GitHub to your local machine.

git clone https://github.com/your-username/Fake_News_Detection.git
cd Fake_News_Detection

Replace your-username with your actual GitHub username if you're hosting the project on GitHub.


2. ๐Ÿ“ฆ Set Up a Python Virtual Environment (Recommended)

Itโ€™s best to isolate project dependencies using a virtual environment:

python -m venv venv
source venv/bin/activate        # On Unix or MacOS
venv\Scripts\activate         # On Windows

3. ๐Ÿ“š Install Required Dependencies

Install all necessary Python packages using the provided requirements.txt file.

pip install -r requirements.txt

This installs essential libraries such as:

  • scikit-learn
  • pandas
  • nltk
  • xgboost
  • catboost
  • matplotlib
  • jupyter

4. ๐Ÿ“‚ Open Jupyter Notebook

Start the Jupyter Notebook environment to run and explore the notebooks interactively.

jupyter notebook

Then, navigate to the code/ directory and open:

code/app.ipynb

5. โ–ถ๏ธ Run the Notebook

In app.ipynb, run each cell in order:

  • Data Loading & Preprocessing
  • TF-IDF Vectorization
  • Model Training (choose from 29 models)
  • Evaluation & Visualization
  • Model Saving (optional: outputs model.pkl)

6. ๐Ÿ“Š View Results

After execution, you will see:

  • Accuracy, Precision, Recall, F1-score for each model
  • Confusion matrix and ROC curves
  • Visual summaries and model comparison

7. ๐Ÿ’พ (Optional) Load Trained Model

If you've already trained a model and saved it as model.pkl, load it as:

import pickle
model = pickle.load(open("code/model.pkl", "rb"))

You can then use it to classify new incoming news articles.


This setup enables local experimentation, training, evaluation, and inference of fake news articles using a variety of models.


๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ Team Members

Name Roll No. Email College
Udit Jain 24CSM1R23 [email protected] NIT Warangal, Dept. of CSE
Deepaka Hebbar 24CSM2S01 [email protected] NIT Warangal, Dept. of CSE
Digvijay Singh 24CSM2R04 [email protected] NIT Warangal, Dept. of CSE
Manish Bajpai (Supervisor) - - NIT Warangal, Dept. of CSE

๐Ÿ“š References

  1. A Predictive Model for Benchmarking Fake News Detection โ€” Sensors, 2024
  2. Fake News Detection in Social Media โ€” IEEE MysuruCon
  3. A Comprehensive Review on Fake News Detection โ€” IEEE Access
  4. Fake News Detection Using ML โ€” IEEE IHSH
  5. Elevating Detection via Deep Neural Networks โ€” IEEE Access 2024

๐Ÿ›ฃ๏ธ Future Scope

  • โœ… BERT and GPT-based transformer models
  • โœ… Multimodal data integration (images + text)
  • โœ… Real-time detection systems
  • โœ… Behavioral pattern analysis
  • โœ… Cross-language and cross-platform support

About

The Fake News Detection system features a user-friendly Tkinter-based GUI that allows users to input a news article, title, and author. Users can select from six machine learning models to instantly classify the news as Real or Fake. The interface provides quick and interactive predictions, making it ideal for real-time demonstrations.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published