Skip to content

Feature Request: Add a Section to Visualize Data Preprocessing for ML Algorithms #41

@SrishtiSonam

Description

@SrishtiSonam

Hi @manasvi-0,
I had an idea to add a Data Preprocessing Visualization feature for ML algorithms to enhance the learning experience on AlgoLab. I believe this could be a valuable addition to the project.
Looking forward to hearing your thoughts and discussing it further!

Problem Statement

While AlgoLab offers visual explanations of ML algorithms, most real-world Machine Learning projects start with data preprocessing, which is crucial to model performance. However, learners often struggle to understand how preprocessing affects data, especially when working with CSV files.

There is currently no module in AlgoLab to visualize these preprocessing steps, which limits its utility as a complete learning tool for beginners.


Proposed Solution

Introduce a new section to the platform that allows users to:

  • Upload a CSV dataset
  • Select various data preprocessing operations
  • Visualize how these steps transform the data
  • Export the preprocessed data if needed

This will bridge the gap between raw data and algorithm training, making AlgoLab a more end-to-end ML learning experience.


Key Features

1. CSV Upload

  • Allow users to upload their dataset (.csv format)
  • Show basic metadata: shape, column types, missing values

2. Preprocessing Options

Each with real-time visual impact:

  • Missing Value Handling
    • Drop missing rows/columns
    • Fill with mean/median/mode
  • Encoding
    • Label Encoding
    • One-Hot Encoding
  • Scaling
    • StandardScaler
    • MinMaxScaler
    • RobustScaler
  • Outlier Detection
    • Z-score or IQR-based visualization
  • Feature Selection
    • Variance Threshold
    • Correlation matrix heatmap

3. Visualization

  • Before vs After comparison plots
  • Distribution plots (histogram, boxplot) for numeric columns
  • Pairplot before and after scaling
  • Heatmaps for correlations and missing data
  • Preview of encoded/scaled tables

4. Export Option

  • Let users download the final preprocessed dataset for use in their own ML models

UI Flow (Using Streamlit)

  • Tab/Section: "Data Preprocessing Visualizer"
  • Step 1: Upload CSV
  • Step 2: Preview & Select preprocessing steps
  • Step 3: View visualizations
  • Step 4: Download transformed dataset

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions