Skip to content

vic-comm/Cyberbullying-bot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›‘οΈ AntiBully Bot

AI-powered content moderation with explainable decisions and continuous learning.

Python FastAPI Discord.py PostgreSQL

Demo: https://youtu.be/H17_tdERf5g


What It Does

Most moderation tools rely on keyword lists that are easy to bypass, or black-box AI that offers no transparency. AntiBully combines hybrid ML, explainable AI (LIME), and a human-in-the-loop feedback system that improves the model over time.

Key capabilities:

  • Hybrid ML β€” DistilBERT embeddings + XGBoost, enriched with user context (violation history, channel toxicity) from a Redis feature store
  • Explainability β€” LIME highlights which words triggered a flag; users see exactly why they were moderated
  • Feedback loop β€” Users dispute false positives β†’ admins review β†’ corrected labels retrain the model monthly
  • Drift detection β€” Evidently monitors weekly; automated retraining triggers on significant drift or monthly schedule
  • Multi-platform β€” One ML backend serving Discord, Slack, and WhatsApp

<<<<<<< HEAD

🎯 Overview

AntiBully Bot is an intelligent content moderation system that combines:

  • Hybrid ML Model: DistilBERT (text embeddings) + XGBoost (user context features)
  • Explainable AI: LIME-generated explanations for every decision
  • Admin Feedback Loop: Human-in-the-loop corrections improve model accuracy
  • Multi-Platform: Single ML backend serves Discord, Slack, WhatsApp bots
  • Production MLOps: Automated drift detection, retraining, and deployment

Why This Exists

Traditional moderation tools either:

  1. Use simple keyword filters (easy to bypass)
  2. Use black-box AI (no transparency)
  3. Can't learn from mistakes (frozen models)

AntiBully solves all three by combining state-of-the-art NLP, explainable AI, and continuous learning.


✨ Key Features

πŸ€– Intelligent Moderation

  • Context-Aware: Uses user history (violation rate, tenure) + channel toxicity
  • Multi-Level Severity: LOW/MEDIUM/HIGH classification with configurable actions
  • Strike System: Graduated penalties (warn β†’ timeout β†’ kick β†’ ban)
  • Configurable: Per-server thresholds, actions, and message templates

πŸ” Explainable AI

  • LIME Integration: Shows which words contributed to toxicity score
  • User Dashboard: Users can see why they were flagged via !explain command
  • Admin Dashboard: Admins review uncertain cases with full context
  • Dispute System: Users dispute β†’ admin reviews β†’ model learns

πŸ“Š Production MLOps

  • Drift Detection: Evidently monitors data/model drift weekly
  • Automated Retraining: Triggers on high drift or monthly schedule
  • Feature Store: Redis caches user features for <5ms inference
  • Versioning: DVC tracks data, MLflow tracks models
  • CI/CD Ready: Docker containers, Railway.app deployment

πŸ” Security & Privacy

  • Double-Gated Feedback: User disputes require admin approval
  • Spam Protection: Detects coordinated attacks, flooding, repetition
  • Protected Patterns: Slurs never overridden by user feedback
  • Audit Trail: Every action logged with timestamps and admin IDs

πŸ—οΈ System Architecture

High-Level Architecture

=======

Architecture

33177d6 (Updated readme)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        USER INTERFACES                             β”‚
β”‚   Discord Bot  β”‚  Slack Bot  β”‚  WhatsApp Bot  β”‚  Telegram (WIP)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚ WebSocket/REST
             β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                       BOT SERVICE LAYER                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚  β”‚  Moderation  β”‚  β”‚    Admin     β”‚  β”‚   Feedback   β”‚            β”‚
β”‚  β”‚  (on_message)β”‚  β”‚   Commands   β”‚  β”‚  (!explain)  β”‚            β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚ POST /predict   β”‚ GET /config      β”‚ POST /explain
          β–Ό                 β–Ό                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      INFERENCE API (FastAPI)                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚ Toxicity Detectorβ”‚  β”‚ LIME Explainer   β”‚  β”‚ Feature Enricherβ”‚ β”‚
β”‚  β”‚ DistilBERT+XGB   β”‚  β”‚ (word importance)β”‚  β”‚ (Redis lookup)  β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚ log_event()                                   β”‚ get_features()
          β–Ό                                               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        DATA LAYER                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚
β”‚  β”‚   Supabase PostgreSQL       β”‚  β”‚      Redis (Feature Store)   β”‚β”‚
β”‚  β”‚  β”œβ”€ logs                    β”‚  β”‚  β”œβ”€ user_toxicity:prod:{id}  β”‚β”‚
β”‚  β”‚  β”œβ”€ server_configs          β”‚  β”‚  β”œβ”€ channel_stats:{id}       β”‚β”‚
β”‚  β”‚  β”œβ”€ server_user_violations  β”‚  β”‚  └─ (5ms lookups)            β”‚β”‚
β”‚  β”‚  β”œβ”€ feedback (disputes)     β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
β”‚  β”‚  └─ admin_review_queue      β”‚                                  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚ Nightly Sync
                           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  MLOPS ORCHESTRATOR (Prefect)                      β”‚
β”‚                                                                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ STEP 1: Drift Detection                                     β”‚  β”‚
β”‚  β”‚  KS-test + PSI (statistical) + Evidently report            β”‚  β”‚
β”‚  β”‚  β†’ severity: none / low / medium / high / critical         β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                              β”‚                                     β”‚
β”‚                              β–Ό                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ STEP 2: Data Ingestion                                      β”‚  β”‚
β”‚  β”‚  Stratified sample (5% anchors) + 100% admin feedback      β”‚  β”‚
β”‚  β”‚  Quality checks β†’ merge β†’ DVC push to S3 (Parquet)         β”‚  β”‚
β”‚  β”‚  β†’ status: success / skipped / failed (aborts pipeline)    β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                              β”‚                                     β”‚
β”‚                              β–Ό                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ STEP 3: Retraining Decision                                 β”‚  β”‚
β”‚  β”‚  high/critical drift β†’ trigger main_flow() immediately      β”‚  β”‚
β”‚  β”‚  medium drift        β†’ flag for next scheduled run          β”‚  β”‚
β”‚  β”‚  low/none            β†’ skip, model is stable               β”‚  β”‚
β”‚  β”‚  Experiments tracked in MLflow (DagsHub)                   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Model Pipeline

Input: "you are trash" + user_id + channel_id
    ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ STAGE 1: TEXT FEATURES                                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DistilBERT Embeddings (768 dims)                        β”‚
β”‚ + Static Features:                                      β”‚
β”‚   β”œβ”€ msg_len, caps_ratio, slur_count                    β”‚
β”‚   β”œβ”€ personal_pronoun_count, question_count             β”‚
β”‚   └─ char_repetition, exclamation_count                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚ (768 + 15 features)
                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ STAGE 2: USER CONTEXT ENRICHMENT (Redis)                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   β”œβ”€ user_bad_ratio_7d (% toxic messages)               β”‚
β”‚   β”œβ”€ violation_count_7d                                 β”‚
β”‚   β”œβ”€ channel_toxicity_ratio                             β”‚
β”‚   β”œβ”€ hours_since_last_msg                               β”‚
β”‚   └─ is_new_to_channel                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚ (768 + 15 + 5 = 788 features)
                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ STAGE 3: XGBoost Classifier                             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Output: P(toxic) ∈ [0, 1]                               β”‚
β”‚   β”œβ”€ < 0.3  β†’ SAFE                                      β”‚
β”‚   β”œβ”€ 0.3–0.5 β†’ LOW                                      β”‚
β”‚   β”œβ”€ 0.5–0.7 β†’ MEDIUM                                   β”‚
β”‚   └─ > 0.7  β†’ HIGH                                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β–Ό
  { "is_toxic": true, "confidence": 0.85, "severity": "HIGH" }

How the Feedback Loop Works

1. Model flags message β†’ User gets DM with LIME explanation
2. User clicks ❌ Wrong β†’ submits dispute reason
3. Admin runs /review β†’ approves or overrides model decision
4. Month-end: corrected labels + high-confidence anchors retrain model
5. New model deployed β†’ false positive rate drops

Admins can review disputes one-by-one or bulk-approve by user pattern β€” useful when the model is systematically misclassifying gaming slang, regional expressions, etc.


Setup Guide

Prerequisites

  • Python 3.11+
  • Node.js 18+ (for some ML tooling)
  • PostgreSQL 15+ or a Supabase account
  • Redis 7+
  • Discord bot token
  • S3-compatible storage (AWS S3, MinIO, or Supabase Storage)
  • MLflow tracking server (free via DagsHub)

1. Clone & Install

git clone https://github.com/yourusername/antibully-bot.git
cd antibully-bot

python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

pip install -r requirements.txt

2. Configure Environment

cp .env.example .env

Edit .env:

# Database
DATABASE_URL=postgresql://user:pass@host:5432/db

# Redis
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=          # leave blank if none

# Discord
DISCORD_TOKEN=your_bot_token
DISCORD_APPLICATION_ID=your_app_id

# API
API_BASE_URL=http://localhost:8000

# MLOps
MLFLOW_TRACKING_URI=https://dagshub.com/user/repo.mlflow
S3_BUCKET=your-bucket
AWS_ACCESS_KEY_ID=your_key
AWS_SECRET_ACCESS_KEY=your_secret

# Model
MODEL_LOCAL_PATH=./baked_model
EXPERIMENT_NAME=toxicity-detector
STAGE=Production

3. Initialize Database

python scripts/run_migrations.py

This creates the logs, server_configs, server_user_violations, and feedback tables, plus indexes.

4. Run Services

Open three terminals:

# Terminal 1 β€” Inference API
uvicorn api_service.app:app --reload --port 8000

# Terminal 2 β€” Discord Bot
python -m bot_service.bot

# Terminal 3 β€” Redis (if running locally)
redis-server

5. Invite the Bot to Your Server

https://discord.com/api/oauth2/authorize?client_id=YOUR_APP_ID&permissions=1099780063238&scope=bot%20applications.commands

6. Configure Moderation Settings

In your Discord server, run:

/config

This opens an interactive menu to set strike thresholds, timeout durations, severity actions, log channels, and more. To use safe defaults immediately:

/quickset preset:balanced

Usage

For Admins

/review                  # Work through dispute queue
/review view:stats       # See model accuracy and review counts
/review view:by_user     # Group disputes by user for bulk review
/review_user user:@name  # Bulk approve/override all disputes from one user
/pardon user:@name reason:"false positive"
/strikes user:@name      # View violation history

For Users

!explain    # Get a DM showing which words triggered the flag, with a dispute button

Deployment (Railway.app)

Estimated cost: ~$19/month. Setup time: ~30 minutes.

  1. Create an account at railway.app and connect your GitHub repo.
  2. Create four services:
Service Dockerfile Start Command Notes
Discord Bot Dockerfile.bot python -m bot_service.bot 512 MB RAM
Inference API Dockerfile.api uvicorn api_service.app:app 2 GB RAM, expose port 8000
Redis Plugin β€” 1-click provision
MLOps Worker Dockerfile.mlops β€” Cron: 0 3 * * *
  1. Add environment variables to each service.
  2. Push to main β€” Railway auto-deploys.

API Reference

Endpoint Method Description Latency
/predict POST Classify message toxicity 50–100ms
/explain POST Generate LIME word importance 2–4s
/feedback POST Record user dispute <50ms
/health GET Service status <10ms

POST /predict example:

// Request
{ "text": "you are trash", "user_id": "u123", "channel_id": "c456" }

// Response
{ "is_toxic": true, "confidence": 0.85, "severity": "HIGH", "features_used": { ... } }

Roadmap

  • Discord bot with LIME explanations
  • Admin feedback loop and review dashboard
  • Automated MLOps pipeline with drift detection
  • Slack integration
  • WhatsApp integration
  • Multi-language support
  • Mobile admin app

Tech Stack

<<<<<<< HEAD Star this repo if you found it useful! ⭐

DistilBERT Β· XGBoost Β· LIME Β· FastAPI Β· discord.py Β· Supabase Β· Redis Β· MLflow Β· DVC Β· Evidently Β· Prefect Β· Railway.app


MIT License · Built for safer online communities ⭐

33177d6 (Updated readme)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors