This repository contains a high-performance machine learning solution for classifying sensor-based data. The approach utilizes LightGBM with custom windowed features and precision threshold tuning to handle highly imbalanced classes.
The objective is to predict binary targets from a time-series sensor dataset (
To improve the model's predictive power, the following transformations were applied:
-
Time Features: Extracted
hour,day_of_week, andis_weekendfrom timestamps. -
Sensor Aggregates: Computed row-wise
mean,std, andmaxacross all sensors. -
Windowing & Smoothing: * Rolling Mean & Variance: Captured local trends and volatility.
- EWMA: Exponentially Weighted Moving Averages for noise reduction.
-
Differencing: Calculated step-to-step changes (
$X_t - X_{t-1}$ ).
-
Signal Ratios: Created
$X1/X2$ and$X3/X4$ interaction features.
- Validation: A 73/27 Chronological Split was used to simulate real-world forecasting and prevent data leakage from future timestamps.
-
Scaling: Features were normalized using
StandardScaler. -
Model:
LGBMClassifierwithscale_pos_weightto address the heavy class imbalance. -
Optimization: Custom thresholding was implemented. Instead of the default
$0.5$ , the model searches for the optimal threshold (found at 0.97) to maximize the F1-Score.
- Validation F1-Score: ~0.618
- Optimal Threshold: 0.97
- Model Params: 250 estimators, 0.03 learning rate, max depth of 5.
import pandas as pd
import numpy as np
from lightgbm import LGBMClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import f1_score