Chat Harassment Detection Hackathon

This repository contains the code and Persian report for the Divar platform's Health Hackathon, aimed at detecting harassment in user chats using NLP techniques. Divar is a classified ads and e-commerce online platform in Iran, boasting 30 million monthly active users. The chat data used for this project belongs to Divar.

In this hackathon, I secured the 6th position out of 50 competitors, achieving a model accuracy of 90%.

The notebook is also available on Google Colab:

Notebook Table of Contents

Setup Environment: Preparing the necessary environment for the project.
Load and Read Data: Loading the chat data for analysis.
Data Preprocessing and EDA:
- Extract Messages: Extracting messages from the dataset.
- Post Title WordCloud: Visualizing frequent words in post titles.
- Post Categories: Analyzing the categories of posts.
- Normalize Texts: Normalizing text data for consistency.
- Word Tokenize: Tokenizing words for further analysis.
- Swear Extraction: Identifying swear words in the dataset.
- Remove Stopwords: Eliminating common stopwords.
- Stemming & Lemmatization: Reducing words to their base forms.
- Check #Words Distribution: Analyzing the distribution of word counts.
- Choosing Initial Features: Selecting initial features for the model.
- Split Data: Splitting the data into training and testing sets.
- TF-IDF Vectorizer: Converting text data into TF-IDF features.
- Feature Selection with Chi-Squared: Selecting relevant features using the Chi-Squared test.
Model Data:
- Machine Learning Algorithm Selection and Initialization: Choosing and initializing the appropriate machine learning algorithms.
- Tune Model with Hyper-Parameters: Fine-tuning the model with hyperparameter optimization.
- Final Model and Competition Submission: Finalizing the model and preparing for competition submission.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md
platform_health(v1.3).ipynb		platform_health(v1.3).ipynb
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Chat Harassment Detection Hackathon

Notebook Table of Contents

About

Uh oh!

Releases

Packages

Languages

momalekpour/Chat-Harassment-Detection

Folders and files

Latest commit

History

Repository files navigation

Chat Harassment Detection Hackathon

Notebook Table of Contents

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages