Skip to content

suesu1204/MLB_TEAM

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MLB_EWHA Machine Learning Project

This repository contains the implementation of the Machine Learning course project. The project explores Closed-world and Open-world classification tasks using datasets of monitored and unmonitored websites.


📂 Repository Structure

  • mon.csv: Features extracted from the mon_standard.pkl file.

  • unmonmon.csv: Features extracted from the unmon_standard10_3000.pkl & mon_standard.pkl file.

  • load_pickle_code.ipynb: The datasets were preprocessed and converted into features.

  • Task Process/: Includes all the code used during project development.

  • Optimized Models/: Contains optimized code for evaluation purposes.


📊 Project Overview

1️⃣ Closed-World Classification

  • Objective: Classify 95 monitored websites into 95 unique labels.
  • Dataset: mon.csv (processed from mon_standard.pkl).

2️⃣ Open-World Classification

  • Objective: Include both monitored and unmonitored datasets for the following tasks:
    • Binary Classification: Predict whether a trace is monitored (1) or unmonitored (-1).
    • Multi-Class Classification: Classify 95 monitored websites (0-94) and unmonitored websites (-1) into a total of 96 classes.

📚 Datasets

Download Links:

Dataset Details:

Dataset Instances Classes Description
mon_standard.pkl 19,000 95 (0-94) Monitored website traces. Each website has 10 subpages, observed 20 times each.
unmon_standard10_3000.pkl 3,000 (unmonitored only) Unmonitored website traces.

🚀 Setup and Execution

1️⃣ Environment

The code was executed in the following environment:

  • Python Version: 3.8.10

2️⃣ Required Libraries

To run the code, the following Python libraries need to be installed:

  • pandas
  • numpy
  • matplotlib
  • scikit-learn
  • xgboost
  • imbalanced-learn

3️⃣ Steps to Run

  1. Clone the repository and navigate to the project directory.

    git clone https://github.com/MLB-EWHA/MLB_TEAM.git
    cd MLB_TEAM
  2. Install the required libraries listed above.

    pip install pandas numpy matplotlib scikit-learn xgboost imbalanced-learn
  3. Run the desired classification tasks:

    • Navigate to the Optimized Models/ folder.
    1. Closed-World Classification:

      • Run the script:
        Closed_Multi_Random Forest.ipynb
      • This script builds a model to classify the 95 monitored websites into unique classes.
    2. Open-World Binary Classification:

      • Run the script:
        Open_Binary_XGBoost.ipynb
      • This script builds a binary classification model to distinguish monitored (1) from unmonitored (-1) website traffic.
    3. Open-World Multi-Class Classification:

      • Run the script:
        Open_Multi_Random Forest.ipynb
      • This script builds a multi-class classification model to classify 95 monitored websites (0-94) and unmonitored websites (-1) into 96 total classes.

📋 Additional Information

  • load_pickle_code.ipynb demonstrates how the datasets were preprocessed and converted into features.
  • Task Process/ folder includes all exploratory code used during development.
  • Optimized Models/ folder contains finalized, optimized code for evaluation.

👨‍💻 Team

Team Name: MLB_EWHA

  • Team Members:
    • Juyoung Yoon
    • Yoonjae Oh
    • Sunghyun Shin
    • Eunwoo Choi
    • Yusun Choi

For any questions or clarifications, please contact us via email

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%