This repository contains the implementation of the Machine Learning course project. The project explores Closed-world and Open-world classification tasks using datasets of monitored and unmonitored websites.
-
mon.csv: Features extracted from themon_standard.pklfile. -
unmonmon.csv: Features extracted from theunmon_standard10_3000.pkl & mon_standard.pklfile. -
load_pickle_code.ipynb: The datasets were preprocessed and converted into features. -
Task Process/: Includes all the code used during project development. -
Optimized Models/: Contains optimized code for evaluation purposes.
- Objective: Classify 95 monitored websites into 95 unique labels.
- Dataset:
mon.csv(processed frommon_standard.pkl).
- Objective: Include both monitored and unmonitored datasets for the following tasks:
- Binary Classification: Predict whether a trace is monitored (
1) or unmonitored (-1). - Multi-Class Classification: Classify 95 monitored websites (
0-94) and unmonitored websites (-1) into a total of 96 classes.
- Binary Classification: Predict whether a trace is monitored (
| Dataset | Instances | Classes | Description |
|---|---|---|---|
mon_standard.pkl |
19,000 | 95 (0-94) | Monitored website traces. Each website has 10 subpages, observed 20 times each. |
unmon_standard10_3000.pkl |
3,000 | (unmonitored only) | Unmonitored website traces. |
The code was executed in the following environment:
- Python Version: 3.8.10
To run the code, the following Python libraries need to be installed:
- pandas
- numpy
- matplotlib
- scikit-learn
- xgboost
- imbalanced-learn
-
Clone the repository and navigate to the project directory.
git clone https://github.com/MLB-EWHA/MLB_TEAM.git cd MLB_TEAM -
Install the required libraries listed above.
pip install pandas numpy matplotlib scikit-learn xgboost imbalanced-learn
-
Run the desired classification tasks:
- Navigate to the
Optimized Models/folder.
-
Closed-World Classification:
- Run the script:
Closed_Multi_Random Forest.ipynb
- This script builds a model to classify the 95 monitored websites into unique classes.
- Run the script:
-
Open-World Binary Classification:
- Run the script:
Open_Binary_XGBoost.ipynb
- This script builds a binary classification model to distinguish monitored (
1) from unmonitored (-1) website traffic.
- Run the script:
-
Open-World Multi-Class Classification:
- Run the script:
Open_Multi_Random Forest.ipynb
- This script builds a multi-class classification model to classify 95 monitored websites (
0-94) and unmonitored websites (-1) into 96 total classes.
- Run the script:
- Navigate to the
load_pickle_code.ipynbdemonstrates how the datasets were preprocessed and converted into features.Task Process/folder includes all exploratory code used during development.Optimized Models/folder contains finalized, optimized code for evaluation.
Team Name: MLB_EWHA
- Team Members:
- Juyoung Yoon
- Yoonjae Oh
- Sunghyun Shin
- Eunwoo Choi
- Yusun Choi
For any questions or clarifications, please contact us via email