ASTRAL_TUAI is a research project developed within the framework of the TUAI Innovation Grant, funded by the Italian National Recovery and Resilience Plan (PNRR) CN1 - Spoke 6 - CUP: I53C22000690001.
The project proposes the development of Machine Learning (ML) algorithms for diagnostics, prognostics, and health management (PHM) solutions for complex cyber-physical systems (CPS), in response to demands for more affordable and reliable systems suitable for complex life cycle conditions.
The project aims at the development of Digital Twins (DT) of the involved CPSs, which will fuel AI-based (Artificial Intelligence-based) algorithms to monitor the reliability and trustworthiness of the models.
The project transversally collects use cases and promotes synergy among different industrial domains (e.g., Space, Aerospace, Energy, Smart Grid, Critical Infrastructures) to achieve a common national research strategy and shared AI-based and HPC (High Performance Computing) technological platforms for PHM applications.
The ML-based algorithms are expected to support the monitoring and analysis of time series data (e.g., telemetries) to detect and predict possible failures and anomalies in the systems; identify failure causes in components, with the goal of preventing their recurrence (Root Cause Analysis, Fault Tree Analysis); and analyze system reliability through data exploitation, aiming to estimate the Remaining Useful Life (RUL) and thus enable proactive failure prevention.
├── A_preprocessing.ipynb # Raw data preprocessing
├── B_train.ipynb # Machine Learning model training
├── C_Deploy.ipynb # Inference and deployment phase
├── C_Deploy.py # Python version of inference (faster than notebook)
├── config.py # Configuration file
├── functions.py # Common utility functions
├── functions_plant_tagging.py # Plant-specific tagging functions
├── functions_tagging.py # General tagging functions
├── requirements.txt # Minimal dependencies
├── __init__.py # Package initializer
├── models/ # Trained model directory
├── data/ # Data folder
│ ├── input/ # Raw input data
│ │ └── df_input.csv # Original dataset
│ └── preprocessed/ # Preprocessed data
│ ├── blacklist_*.csv # Generated blacklist file
│ └── preprocessed_*.parquet # Preprocessed dataset
├── Anomaly_20TT_427 [°C]_5/ # Anomaly detection output
│ └── 2022-12-01_2025-03-07/
│ ├── overview.png # Anomaly overview
│ └── A_1/ # Specific anomaly case
│ ├── *.png, *.csv
│ └── CC/ # Control Chart plots
The system has been tested on a workstation with the following specifications:
- OS: Windows 11
- CPU: Intel Core i7-12700H (14 cores total)
- RAM: 64 GB
- GPU: NVIDIA GeForce RTX 3070 (8 GB VRAM)
- Python: 3.11.8
To create and activate a virtual environment using Python's built-in venv module:
# Create the virtual environment
python -m venv venv
# Activate the environment (on Windows)
.\venv\Scripts\activate
# Activate the environment (on Unix/Mac)
source venv/bin/activate
# Install required packages
pip install -r requirements.txtOnce activated, you can run the notebooks or scripts within the virtual environment.
After setting up the environment and activating it, the project is organized into three main steps:
Purpose:
Cleans and prepares raw time series data for training and inference. Handles timestamp gaps, missing values, feature filtering, and blacklist generation.
Inputs:
data/input/df_input.csv— raw plant telemetry data with aDatetimeIndexand measurement columns.
Configuration (from config.py):
features_to_be_removed– list of columns to excludeCOLUMNS_TO_KEEP_FOR_BLACKLIST– subset of features monitored for abnormality
Outputs:
data/preprocessed/preprocessed_<date>.parquet– clean and sorted datasetdata/preprocessed/blacklist_<date>.csv– optional file listing invalid timestamps
To run:
Open and execute A_preprocessing.ipynb.
Purpose:
Trains a Machine Learning model (e.g., Random Forest) to predict a target variable based on historical telemetry.
Inputs:
- Preprocessed
.parquetfile from step A
Configuration (from config.py):
TARGET: feature to predict (e.g.,"20TT_425 [°C]")TRAINING_START,TRAINING_END: training data windowVALIDATION_START,VALIDATION_END: validation windowN_CORE: CPU cores to use (-1= all available)
Outputs:
- Trained model saved in
models/(e.g.,RF_*.sav) - Performance metrics shown in the notebook
To run:
Open and execute B_train.ipynb.
Purpose:
Enables experiment tracking (parameters, metrics, artifacts) during model training via MLflow.
Requirements:
- MLflow must be installed in the Python environment
(e.g.,pip install mlflow) - The backend (local or remote) must be properly configured if needed
Configuration (in B_train.ipynb):
At the beginning of the notebook, enable MLflow by setting:
USE_MLFLOW = TrueOptionally, set the experiment name:
MLFLOW_EXPERIMENT = "RandomForestTemperature"Once enabled, training runs will be logged automatically to the configured MLflow tracking server (UI, parameters, metrics, model artifacts, etc.).
Purpose:
Loads the trained model, performs predictions on new telemetry data, and detects anomalies based on error thresholds.
Inputs:
- Preprocessed
.parquetfile from step A - Model
.savfile from step B
Configuration (from config.py):
model_name: name of the model fileINFERENCE_START,INFERENCE_END: inference time rangethreshold: max error for normal behaviorMIN_ANOMALY_LENGTH: minimum duration for an anomaly event
Outputs:
- Anomaly plots and CSVs in
Anomaly_*/folders - Detected anomaly time intervals
To run:
Use C_Deploy.ipynb (interactive) or C_Deploy.py (scripted).
This file centralizes shared parameters across the workflow:
- Preprocessing filters
- Model target and training/validation ranges
- Inference time windows and detection thresholds
Each script loads config.py to ensure consistency.
Contributors names and contact info
- Alessio Bechini alessio.bechini@unipi.it
- Pietro Ducange pietro.ducange@unipi.it
- Francesco Marcelloni francesco.marcelloni@unipi.it
- Giustino Claudio Miglionico giustino.miglionico@phd.unipi.it
- Fabrizio Ruffini fabrizio.ruffini@unipi.it
This software is released for research, experimentation, and educational purposes only.
Use, modification, and distribution are permitted only for non-commercial purposes and with proper attribution to the University of Pisa.
Any commercial use, in whole or in part, is strictly prohibited without written authorization from the project holders.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT ANY EXPRESS OR IMPLIED WARRANTY,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
© 2025 University of Pisa
Il presente software è rilasciato per scopi di ricerca, sperimentazione e didattica.
È consentito l'uso, la modifica e la distribuzione del codice esclusivamente nell'ambito di attività non commerciali e con obbligo di attribuzione all’Università di Pisa.
Ogni utilizzo commerciale, totale o parziale, è vietato salvo esplicita autorizzazione scritta da parte dei titolari del progetto.
IL SOFTWARE VIENE FORNITO "COSÌ COM'È", SENZA ALCUNA GARANZIA ESPRESSA O IMPLICITA,
INCLUSE, MA NON LIMITATE A, GARANZIE DI COMMERCIABILITÀ O IDONEITÀ PER UNO SCOPO PARTICOLARE.
© 2025 Università di Pisa
Copyright (c) 2025 Università di Pisa
This work was developed by the Artificial Intelligence R&D Group at the Department of Information Engineering, University of Pisa, as part of the activities carried out within the ASTRAL_TUAI project.