ASTRAL_TUAI

ASTRAL_TUAI is a research project developed within the framework of the TUAI Innovation Grant, funded by the Italian National Recovery and Resilience Plan (PNRR) CN1 - Spoke 6 - CUP: I53C22000690001.

Description

The project proposes the development of Machine Learning (ML) algorithms for diagnostics, prognostics, and health management (PHM) solutions for complex cyber-physical systems (CPS), in response to demands for more affordable and reliable systems suitable for complex life cycle conditions.
The project aims at the development of Digital Twins (DT) of the involved CPSs, which will fuel AI-based (Artificial Intelligence-based) algorithms to monitor the reliability and trustworthiness of the models.

The project transversally collects use cases and promotes synergy among different industrial domains (e.g., Space, Aerospace, Energy, Smart Grid, Critical Infrastructures) to achieve a common national research strategy and shared AI-based and HPC (High Performance Computing) technological platforms for PHM applications.
The ML-based algorithms are expected to support the monitoring and analysis of time series data (e.g., telemetries) to detect and predict possible failures and anomalies in the systems; identify failure causes in components, with the goal of preventing their recurrence (Root Cause Analysis, Fault Tree Analysis); and analyze system reliability through data exploitation, aiming to estimate the Remaining Useful Life (RUL) and thus enable proactive failure prevention.

Structure of the Repository

├── A_preprocessing.ipynb               # Raw data preprocessing
├── B_train.ipynb                       # Machine Learning model training
├── C_Deploy.ipynb                    	# Inference and deployment phase
├── C_Deploy.py                       	# Python version of inference (faster than notebook)
├── config.py                         	# Configuration file
├── functions.py                      	# Common utility functions
├── functions_plant_tagging.py        	# Plant-specific tagging functions
├── functions_tagging.py              	# General tagging functions
├── requirements.txt                  	# Minimal dependencies
├── __init__.py                       	# Package initializer

├── models/                           	# Trained model directory

├── data/                             	# Data folder 
│   ├── input/                        	# Raw input data
│   │   └── df_input.csv              	# Original dataset
│   └── preprocessed/                 	# Preprocessed data
│       ├── blacklist_*.csv           	# Generated blacklist file
│       └── preprocessed_*.parquet    	# Preprocessed dataset

├── Anomaly_20TT_427 [°C]_5/          	# Anomaly detection output
│   └── 2022-12-01_2025-03-07/
│       ├── overview.png              	# Anomaly overview
│       └── A_1/                      	# Specific anomaly case
│           ├── *.png, *.csv
│           └── CC/                   	# Control Chart plots

Requirements

The system has been tested on a workstation with the following specifications:

OS: Windows 11
CPU: Intel Core i7-12700H (14 cores total)
RAM: 64 GB
GPU: NVIDIA GeForce RTX 3070 (8 GB VRAM)
Python: 3.11.8

Setting up a virtual environment (with `venv`)

To create and activate a virtual environment using Python's built-in venv module:

# Create the virtual environment
python -m venv venv

# Activate the environment (on Windows)
.\venv\Scripts\activate

# Activate the environment (on Unix/Mac)
source venv/bin/activate

# Install required packages
pip install -r requirements.txt

Once activated, you can run the notebooks or scripts within the virtual environment.

Usage Guide

After setting up the environment and activating it, the project is organized into three main steps:

A_preprocessing.ipynb — Preprocessing Step

Purpose:
Cleans and prepares raw time series data for training and inference. Handles timestamp gaps, missing values, feature filtering, and blacklist generation.

Inputs:

data/input/df_input.csv — raw plant telemetry data with a DatetimeIndex and measurement columns.

Configuration (from config.py):

features_to_be_removed – list of columns to exclude
COLUMNS_TO_KEEP_FOR_BLACKLIST – subset of features monitored for abnormality

Outputs:

data/preprocessed/preprocessed_<date>.parquet – clean and sorted dataset
data/preprocessed/blacklist_<date>.csv – optional file listing invalid timestamps

To run:
Open and execute A_preprocessing.ipynb.

B_train.ipynb — Training Step

Purpose:
Trains a Machine Learning model (e.g., Random Forest) to predict a target variable based on historical telemetry.

Inputs:

Preprocessed .parquet file from step A

Configuration (from config.py):

TARGET: feature to predict (e.g., "20TT_425 [°C]")
TRAINING_START, TRAINING_END: training data window
VALIDATION_START, VALIDATION_END: validation window
N_CORE: CPU cores to use (-1 = all available)

Outputs:

Trained model saved in models/ (e.g., RF_*.sav)
Performance metrics shown in the notebook

To run:
Open and execute B_train.ipynb.

MLflow Tracking (Optional)

Purpose:
Enables experiment tracking (parameters, metrics, artifacts) during model training via MLflow.

Requirements:

MLflow must be installed in the Python environment
(e.g., pip install mlflow)
The backend (local or remote) must be properly configured if needed

Configuration (in B_train.ipynb):
At the beginning of the notebook, enable MLflow by setting:

USE_MLFLOW = True

Optionally, set the experiment name:

MLFLOW_EXPERIMENT = "RandomForestTemperature"

Once enabled, training runs will be logged automatically to the configured MLflow tracking server (UI, parameters, metrics, model artifacts, etc.).

C_Deploy.ipynb / C_Deploy.py — Inference and Anomaly Detection

Purpose:
Loads the trained model, performs predictions on new telemetry data, and detects anomalies based on error thresholds.

Inputs:

Preprocessed .parquet file from step A
Model .sav file from step B

Configuration (from config.py):

model_name: name of the model file
INFERENCE_START, INFERENCE_END: inference time range
threshold: max error for normal behavior
MIN_ANOMALY_LENGTH: minimum duration for an anomaly event

Outputs:

Anomaly plots and CSVs in Anomaly_*/ folders
Detected anomaly time intervals

To run:
Use C_Deploy.ipynb (interactive) or C_Deploy.py (scripted).

About `config.py`

This file centralizes shared parameters across the workflow:

Preprocessing filters
Model target and training/validation ranges
Inference time windows and detection thresholds

Each script loads config.py to ensure consistency.

Authors

Contributors names and contact info

Alessio Bechini alessio.bechini@unipi.it
Pietro Ducange pietro.ducange@unipi.it
Francesco Marcelloni francesco.marcelloni@unipi.it
Giustino Claudio Miglionico giustino.miglionico@phd.unipi.it
Fabrizio Ruffini fabrizio.ruffini@unipi.it

License

This software is released for research, experimentation, and educational purposes only.

Use, modification, and distribution are permitted only for non-commercial purposes and with proper attribution to the University of Pisa.
Any commercial use, in whole or in part, is strictly prohibited without written authorization from the project holders.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT ANY EXPRESS OR IMPLIED WARRANTY,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Il presente software è rilasciato per scopi di ricerca, sperimentazione e didattica.

È consentito l'uso, la modifica e la distribuzione del codice esclusivamente nell'ambito di attività non commerciali e con obbligo di attribuzione all’Università di Pisa.
Ogni utilizzo commerciale, totale o parziale, è vietato salvo esplicita autorizzazione scritta da parte dei titolari del progetto.

IL SOFTWARE VIENE FORNITO "COSÌ COM'È", SENZA ALCUNA GARANZIA ESPRESSA O IMPLICITA,
INCLUSE, MA NON LIMITATE A, GARANZIE DI COMMERCIABILITÀ O IDONEITÀ PER UNO SCOPO PARTICOLARE.

Acknowledgments

This work was developed by the Artificial Intelligence R&D Group at the Department of Information Engineering, University of Pisa, as part of the activities carried out within the ASTRAL_TUAI project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASTRAL_TUAI

Description

Structure of the Repository

Requirements

Setting up a virtual environment (with `venv`)

Usage Guide

A_preprocessing.ipynb — Preprocessing Step

B_train.ipynb — Training Step

MLflow Tracking (Optional)

C_Deploy.ipynb / C_Deploy.py — Inference and Anomaly Detection

About `config.py`

Authors

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
A_preprocessing.ipynb		A_preprocessing.ipynb
B_train.ipynb		B_train.ipynb
C_Deploy.ipynb		C_Deploy.ipynb
C_Deploy.py		C_Deploy.py
__init__.py		__init__.py
config.py		config.py
functions.py		functions.py
functions_plant_tagging.py		functions_plant_tagging.py
functions_tagging.py		functions_tagging.py
readme.md		readme.md
requirements.txt		requirements.txt

Unipisa/ASTRAL_TUAI

Folders and files

Latest commit

History

Repository files navigation

ASTRAL_TUAI

Description

Structure of the Repository

Requirements

Setting up a virtual environment (with venv)

Usage Guide

A_preprocessing.ipynb — Preprocessing Step

B_train.ipynb — Training Step

MLflow Tracking (Optional)

C_Deploy.ipynb / C_Deploy.py — Inference and Anomaly Detection

About config.py

Authors

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Setting up a virtual environment (with `venv`)

About `config.py`

Packages