Cyprien Kelma — Data Engineer

Final-year software engineering student at ISEN Lille, currently in a Data Engineer apprenticeship at Decathlon Digital. Building production-grade data platforms on GCP, AWS and Databricks. Available for a permanent contract from September 2026.

Main Projects

1. DroneFleet Optimizer — Real-Time Fleet Management Infrastructure

Full Website Documentation

A complete event-driven system for coordinating emergency medical delivery drone fleets in real time.

Architected a polyglot microservices system (Python FastAPI, Java Spring Boot) with Pub/Sub messaging, achieving sub-500ms latency for critical delivery operations
Designed and solved a Vehicle Routing Problem with Pickup and Delivery (VRPPD) using Google OR-Tools, with an optimistic concurrency model to handle concurrent optimization cycles safely
Deployed fully on GCP using Terraform (Cloud Run, Firestore, Artifact Registry), with a local dev environment via Docker Compose and emulators — zero cloud cost during development
Built an ELT analytics pipeline (dbt, BigQuery) to process drone telemetry and expose a real-time monitoring dashboard with Streamlit

Stack: Python, Java, GCP, Terraform, Pub/Sub, Firestore, dbt, BigQuery, Docker, GitHub Actions

Repository

2. Cloud ELT Platform — Reusable Data Infrastructure Template

A production-ready ELT pipeline template deployable on any GCP account in under 20 minutes. Built as part of a Cloud Computing course (M2), graded 18/20.

Provisioned the full infrastructure with Terraform (IaC): GCS buckets for dev and prod, BigQuery datasets with lifecycle policies, IAM service accounts following least-privilege principles
Orchestrated daily ingestion and transformation workflows with Prefect Cloud, with retry logic and support for both local and cloud execution modes
Modelled data in BigQuery using dbt with a full medallion architecture (staging, intermediate, marts) and a star schema exposed to Power BI
Set up CI/CD with GitHub Actions to automatically regenerate and publish the dbt documentation site on every production merge

Stack: GCP (GCS, BigQuery, Cloud Run), Terraform, Prefect Cloud, dbt, Python, GitHub Actions, Power BI

Repository

3. Distributed Data Architecture — Scalable Backend System

A distributed, multi-modal storage architecture built to handle TB-scale data volumes, developed as a fourth-year engineering project, graded 19/20.

Designed horizontal scaling with Kubernetes (K3s) and Helm across PostgreSQL (relational), Cassandra (high-write NoSQL), Neo4j (graph), and Redis/KeyDB (cache)
Built a daily ELT pipeline orchestrated by Airflow on Kubernetes: Bronze (raw extraction) → Silver (Parquet, cleaning) → Gold (DuckDB analytical warehouse)
Developed the backend API in NestJS with multi-storage integrations and MinIO as an S3-compatible data lake

Stack: Kubernetes, Spark, Airflow, PostgreSQL, Cassandra, Neo4j, Redis, MinIO, DuckDB, NestJS

Repository - Concept Paper (French)

Experience

Data Engineer, Apprenticeship — Decathlon Digital, Lille (Sep 2025 – End of Aug 2026)

Built a self-service analytics platform backed by Databricks, Cloud Run and BigQuery, enabling data analysts to ship production-grade data apps, reducing delivery time from days to hours through AI-assisted development workflows
Drove self-service data adoption by building core dbt models and mentoring analysts on software engineering best practices (Git workflows, CI/CD, automated testing), while industrializing an ML forecasting pipeline with MLFlow and Airflow orchestration serving 10+ sports business analysts and financial team members
Optimized production pipelines performances by implementing Auto-Compact, Partitioning, Idempotency, Vacuum and incremental logic on Delta tables format, reducing execution time by 40% and improving downstream read efficiency

Data Engineer, Internship — Decathlon Belgium, Brussels (May 2025 – Aug 2025)

Built and deployed a production-grade ETL pipeline on Databricks processing high-volume logistics data from a REST API into BigQuery, ensuring data integrity with Great Expectations and Pydantic across thousands of daily deliveries.
Led a reporting pipeline on AWS using Airflow (MWAA) to orchestrate Docker containers on Kubernetes (EKS), automatically generating 100+ slide weekly BI reports and eliminating hours of manual work.
Maintained and improved PySpark transformation codebase across 5+ Databricks jobs, developed and refactored dbt models, resolved critical Airflow DAG bugs, and contributed a fix to an open-source library within a CI/CD workflow

Stack

Data Engineering: PySpark, Databricks, Delta Lake, dbt, Airflow, Prefect, ETL/ELT, Great Expectations, Pydantic
Warehouses & Storage: BigQuery, PostgreSQL, Snowflake, Cassandra, Redis, MongoDB
Cloud: GCP (BigQuery, Cloud Run, Pub/Sub, GCS, Firestore), AWS (S3, EKS, MWAA, ECR)
Infrastructure: Terraform, Docker, Kubernetes, GitHub Actions
Languages: Python, SQL, Java (Spring Boot)

Education

Master of Computer Science — ISEN Lille, French Engineering School (2021 – 2026)

Data Structures & Algorithms, Java, Distributed Systems, Machine Learning, Deep Learning, DevOps, Cloud Computing, Database Management (SQL & NoSQL), Operations Research, Metaheuristics, Spark.

Contact :

LinkedIn : cyprien-kelma
Email : cyprien.kelma@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cyprien Kelma — Data Engineer

Main Projects

1. DroneFleet Optimizer — Real-Time Fleet Management Infrastructure

2. Cloud ELT Platform — Reusable Data Infrastructure Template

3. Distributed Data Architecture — Scalable Backend System

Experience

Stack

Education

Contact :

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Cyprien Kelma — Data Engineer

Main Projects

1. DroneFleet Optimizer — Real-Time Fleet Management Infrastructure

2. Cloud ELT Platform — Reusable Data Infrastructure Template

3. Distributed Data Architecture — Scalable Backend System

Experience

Stack

Education

Contact :

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages