Skip to content

ASTRAL_TUAI is a research project developed within the framework of the TUAI Innovation Grant, funded by the Italian National Recovery and Resilience Plan (PNRR) CN1 - Spoke 6 - CUP: I53C22000690001.

Notifications You must be signed in to change notification settings

Unipisa/ASTRAL_TUAI

Repository files navigation

ASTRAL_TUAI

ASTRAL_TUAI is a research project developed within the framework of the TUAI Innovation Grant, funded by the Italian National Recovery and Resilience Plan (PNRR) CN1 - Spoke 6 - CUP: I53C22000690001.

Description

The project proposes the development of Machine Learning (ML) algorithms for diagnostics, prognostics, and health management (PHM) solutions for complex cyber-physical systems (CPS), in response to demands for more affordable and reliable systems suitable for complex life cycle conditions.
The project aims at the development of Digital Twins (DT) of the involved CPSs, which will fuel AI-based (Artificial Intelligence-based) algorithms to monitor the reliability and trustworthiness of the models.

The project transversally collects use cases and promotes synergy among different industrial domains (e.g., Space, Aerospace, Energy, Smart Grid, Critical Infrastructures) to achieve a common national research strategy and shared AI-based and HPC (High Performance Computing) technological platforms for PHM applications.
The ML-based algorithms are expected to support the monitoring and analysis of time series data (e.g., telemetries) to detect and predict possible failures and anomalies in the systems; identify failure causes in components, with the goal of preventing their recurrence (Root Cause Analysis, Fault Tree Analysis); and analyze system reliability through data exploitation, aiming to estimate the Remaining Useful Life (RUL) and thus enable proactive failure prevention.


Structure of the Repository

├── A_preprocessing.ipynb               # Raw data preprocessing
├── B_train.ipynb                       # Machine Learning model training
├── C_Deploy.ipynb                    	# Inference and deployment phase
├── C_Deploy.py                       	# Python version of inference (faster than notebook)
├── config.py                         	# Configuration file
├── functions.py                      	# Common utility functions
├── functions_plant_tagging.py        	# Plant-specific tagging functions
├── functions_tagging.py              	# General tagging functions
├── requirements.txt                  	# Minimal dependencies
├── __init__.py                       	# Package initializer

├── models/                           	# Trained model directory

├── data/                             	# Data folder 
│   ├── input/                        	# Raw input data
│   │   └── df_input.csv              	# Original dataset
│   └── preprocessed/                 	# Preprocessed data
│       ├── blacklist_*.csv           	# Generated blacklist file
│       └── preprocessed_*.parquet    	# Preprocessed dataset

├── Anomaly_20TT_427 [°C]_5/          	# Anomaly detection output
│   └── 2022-12-01_2025-03-07/
│       ├── overview.png              	# Anomaly overview
│       └── A_1/                      	# Specific anomaly case
│           ├── *.png, *.csv
│           └── CC/                   	# Control Chart plots

Requirements

The system has been tested on a workstation with the following specifications:

  • OS: Windows 11
  • CPU: Intel Core i7-12700H (14 cores total)
  • RAM: 64 GB
  • GPU: NVIDIA GeForce RTX 3070 (8 GB VRAM)
  • Python: 3.11.8

Setting up a virtual environment (with venv)

To create and activate a virtual environment using Python's built-in venv module:

# Create the virtual environment
python -m venv venv

# Activate the environment (on Windows)
.\venv\Scripts\activate

# Activate the environment (on Unix/Mac)
source venv/bin/activate

# Install required packages
pip install -r requirements.txt

Once activated, you can run the notebooks or scripts within the virtual environment.


Usage Guide

After setting up the environment and activating it, the project is organized into three main steps:


A_preprocessing.ipynb — Preprocessing Step

Purpose:
Cleans and prepares raw time series data for training and inference. Handles timestamp gaps, missing values, feature filtering, and blacklist generation.

Inputs:

  • data/input/df_input.csv — raw plant telemetry data with a DatetimeIndex and measurement columns.

Configuration (from config.py):

  • features_to_be_removed – list of columns to exclude
  • COLUMNS_TO_KEEP_FOR_BLACKLIST – subset of features monitored for abnormality

Outputs:

  • data/preprocessed/preprocessed_<date>.parquet – clean and sorted dataset
  • data/preprocessed/blacklist_<date>.csv – optional file listing invalid timestamps

To run:
Open and execute A_preprocessing.ipynb.


B_train.ipynb — Training Step

Purpose:
Trains a Machine Learning model (e.g., Random Forest) to predict a target variable based on historical telemetry.

Inputs:

  • Preprocessed .parquet file from step A

Configuration (from config.py):

  • TARGET: feature to predict (e.g., "20TT_425 [°C]")
  • TRAINING_START, TRAINING_END: training data window
  • VALIDATION_START, VALIDATION_END: validation window
  • N_CORE: CPU cores to use (-1 = all available)

Outputs:

  • Trained model saved in models/ (e.g., RF_*.sav)
  • Performance metrics shown in the notebook

To run:
Open and execute B_train.ipynb.

MLflow Tracking (Optional)

Purpose:
Enables experiment tracking (parameters, metrics, artifacts) during model training via MLflow.

Requirements:

  • MLflow must be installed in the Python environment
    (e.g., pip install mlflow)
  • The backend (local or remote) must be properly configured if needed

Configuration (in B_train.ipynb):
At the beginning of the notebook, enable MLflow by setting:

USE_MLFLOW = True

Optionally, set the experiment name:

MLFLOW_EXPERIMENT = "RandomForestTemperature"

Once enabled, training runs will be logged automatically to the configured MLflow tracking server (UI, parameters, metrics, model artifacts, etc.).


C_Deploy.ipynb / C_Deploy.py — Inference and Anomaly Detection

Purpose:
Loads the trained model, performs predictions on new telemetry data, and detects anomalies based on error thresholds.

Inputs:

  • Preprocessed .parquet file from step A
  • Model .sav file from step B

Configuration (from config.py):

  • model_name: name of the model file
  • INFERENCE_START, INFERENCE_END: inference time range
  • threshold: max error for normal behavior
  • MIN_ANOMALY_LENGTH: minimum duration for an anomaly event

Outputs:

  • Anomaly plots and CSVs in Anomaly_*/ folders
  • Detected anomaly time intervals

To run:
Use C_Deploy.ipynb (interactive) or C_Deploy.py (scripted).


About config.py

This file centralizes shared parameters across the workflow:

  • Preprocessing filters
  • Model target and training/validation ranges
  • Inference time windows and detection thresholds

Each script loads config.py to ensure consistency.

Authors

Contributors names and contact info

License

This software is released for research, experimentation, and educational purposes only.

Use, modification, and distribution are permitted only for non-commercial purposes and with proper attribution to the University of Pisa.
Any commercial use, in whole or in part, is strictly prohibited without written authorization from the project holders.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT ANY EXPRESS OR IMPLIED WARRANTY,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

© 2025 University of Pisa

Il presente software è rilasciato per scopi di ricerca, sperimentazione e didattica.

È consentito l'uso, la modifica e la distribuzione del codice esclusivamente nell'ambito di attività non commerciali e con obbligo di attribuzione all’Università di Pisa.
Ogni utilizzo commerciale, totale o parziale, è vietato salvo esplicita autorizzazione scritta da parte dei titolari del progetto.

IL SOFTWARE VIENE FORNITO "COSÌ COM'È", SENZA ALCUNA GARANZIA ESPRESSA O IMPLICITA,
INCLUSE, MA NON LIMITATE A, GARANZIE DI COMMERCIABILITÀ O IDONEITÀ PER UNO SCOPO PARTICOLARE.

© 2025 Università di Pisa

Copyright (c) 2025 Università di Pisa License

Acknowledgments

This work was developed by the Artificial Intelligence R&D Group at the Department of Information Engineering, University of Pisa, as part of the activities carried out within the ASTRAL_TUAI project.

About

ASTRAL_TUAI is a research project developed within the framework of the TUAI Innovation Grant, funded by the Italian National Recovery and Resilience Plan (PNRR) CN1 - Spoke 6 - CUP: I53C22000690001.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published