Skip to content

mkd/custom-elt-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Data Engineering ELT Pipeline Project

Python PostgreSQL Docker Docker Compose GitHub

This repository contains a custom ELT (Extract, Load, Transform) pipeline implementation that I built as part of my data engineering learning journey. The project demonstrates how to integrate Docker, PostgreSQL, Python, and database operations into a complete data pipeline.

Note: This project follows the excellent tutorial by Justin Chau:

Prerequisite Knowledge

Before working with this ELT project, I recommend completing Docker's official getting started materials:

These resources provide fundamental Docker knowledge that this ELT project builds upon.

Project Overview

This implementation serves as a hands-on learning exercise for key data engineering technologies:

  • Containerization with Docker
  • Database operations with PostgreSQL
  • ETL/ELT processes with Python scripts
  • Orchestration with Docker Compose

The pipeline extracts data from a source database, loads it into a destination database, and prepares it for transformation.

System Architecture

The solution consists of three main components:

  1. Source Database: PostgreSQL container with sample movie rental data
  2. Destination Database: PostgreSQL container for transformed data
  3. ETL Processor: Python service that manages the data pipeline

Repository Structure

├── docker-compose.yaml # Container orchestration configuration
├── elt_script/
│ ├── Dockerfile # Python environment setup
│ └── elt_script.py # Main ELT processing logic
├── source_db_init/
│ └── init.sql # Source database schema and sample data
└── README.md # This documentation

Key Features

  • Containerized PostgreSQL databases for source and destination
  • Automated data pipeline execution
  • Sample database schema for movie rental domain
  • Health checks and dependency management between services

Getting Started

Prerequisites

  • Docker Engine
  • Docker Compose
  • Git (optional)

Installation

  1. Clone this repository:
    git clone custom-elt-project
    cd custom-elt-project
    
  2. Start the containers:
    docker-compose up
    
  3. The ELT process will run automatically once services are healthy

Accessing Databases

  • Source Database: localhost:5433
  • Destination Database: localhost:5434

Use your preferred PostgreSQL client to connect (default credentials in docker-compose.yaml).

Learning Objectives

Through this project, I aimed to gain practical experience with:

  • Docker containerization and multi-service coordination
  • Database-to-database data pipelines
  • Python scripting for ETL/ELT processes
  • Data engineering workflow orchestration

Acknowledgments

Special thanks to Justin Chau for creating the original tutorial that inspired this implementation. This project follows his instructional materials while allowing me to build the solution myself as a learning exercise.

Next Steps

Future enhancements may include:

  • Adding dbt for transformation layer
  • Integrating Airflow for workflow orchestration
  • Implementing data quality checks
  • Extending the data model

About

Customer ELT project for learning purposes

Resources

Stars

Watchers

Forks

Packages

No packages published