Welcome to my GitHub! I'm a Data Engineer and Data Analyst with strong expertise in Python, MATLAB, and Geospatial Data Analysis. Passionate about Computer Vision, Remote Sensing, and leveraging spatial data for impactful environmental and urban solutions.
- Programming: Python, R, MATLAB, SQL
- Data Engineering & Cloud: ETL pipelines, Apache Airflow, Docker, AWS, PostgreSQL, MariaDB, PostGIS
- Big Data & Streaming: Apache Spark, Apache Flink, Apache Kafka, ClickHouse, Elasticsearch
- Geospatial Tools: GIS (QGIS), GDAL, Remote Sensing
- Computer Vision & Machine Learning: OpenCV, TensorFlow, PyTorch, scikit-learn
- Data Analysis & Visualization: pandas, numpy, Tableau, matplotlib, seaborn
- Version Control & CI/CD: Git, GitHub Actions
- Other Tools: Git, SGems, Postgre, PostGIS
Real Time Fraud Detection Pipeline π GitHub Repo
-Real-Time Fraud Detection & Analytics Pipeline Designed and implemented a scalable, containerized data pipeline to process high-velocity retail transactions. The system utilizes Change Data Capture (CDC) via Debezium to stream row-level changes from sharded PostgreSQL and MySQL databases into Apache Kafka. A Spark Structured Streaming job consumes these streams, performing real-time ETL and stateful aggregations to detect sales anomalies, writing the final metrics to Redis for low-latency access and visualization.
TravelPulse: Real-Time Tourism Analytics Platform π GitHub Repo
-TravelPulse is a real-time data analytics platform built with Kafka, Spark Structured Streaming, Prometheus, and Grafana, designed to simulate and monitor tourism activity across Italy. It collects live data from flights, hotel bookings, and weather sources, processes them in Spark to compute KPIs such as flight delays, booking trends, and tourism season scores, and visualizes everything through Grafana dashboards. By turning streaming data into actionable insights, TravelPulse helps city planners, airlines, and hotels make faster, data-driven decisions in the tourism ecosystem.
TOP β Turin Open Platform π GitHub Repo
-A near real-time open data platform for Turin, integrating weather, air quality, traffic, and social sentiment into actionable KPIs for smarter urban planning and decision-making.
GDPR-Aware Reddit Data ETL Workflow Implemented with Apache Airflow and PostgreSQL π GitHub Repo
-This project implements an automated data pipeline using Apache Airflow to extract data from Reddit, store it as CSV files, transform and combine the data, and finally load it into PostgreSQL. The workflow is designed with GDPR-awareness, ensuring proper handling and archiving of raw and processed data.
Computer Vision Deep Learning DeepLabV3 ResNet Backbone for Image Segmentation π GitHub Repo
- Applied deep learning models like DeepLabV3 with ResNet50 backbone to extract building footprints from satellite imagery for urban planning.
Geospatial Data Preparation for Deep Learning π GitHub Repo
- A collection of Python scripts for preprocessing and postprocessing geospatial imagery, designed to prepare satellite and aerial data for deep learning models. Includes tools for raster clipping, merging, tiling, CRS adjustment, format conversion, and vectorization of model outputs β bridging Remote Sensing and Computer Vision workflows.
Geostatistical Modeling and-Environmental Data Analysis π GitHub Repo
- Conducted thorough O3 density study in 5 European countries using EEA data. Analyzed with R Studio: distance calculations, variogram modeling (linear, spherical, Gaussian, exponential), model comparison via cross-validation. Optimal model chosen. Produced kriging maps in SGems.
Movie Library Desktop Application π GitHub Repo
- Movie Library is a simple desktop app to organize and track your movies. Easily add, edit, and categorize films youβve watched or want to watch. Built with Python and includes an easy Windows installer for quick setup.
COVID-19 Data Web Scraping and Analysis π GitHub Repo
- Perform web scraping to extract a global COVID-19 dataset from a public Wikipedia page, followed by comprehensive data analysis tasks on the collected data.