This repository contains a custom ELT (Extract, Load, Transform) pipeline implementation that I built as part of my data engineering learning journey. The project demonstrates how to integrate Docker, PostgreSQL, Python, and database operations into a complete data pipeline.
Note: This project follows the excellent tutorial by Justin Chau:
Before working with this ELT project, I recommend completing Docker's official getting started materials:
These resources provide fundamental Docker knowledge that this ELT project builds upon.
This implementation serves as a hands-on learning exercise for key data engineering technologies:
- Containerization with Docker
- Database operations with PostgreSQL
- ETL/ELT processes with Python scripts
- Orchestration with Docker Compose
The pipeline extracts data from a source database, loads it into a destination database, and prepares it for transformation.
The solution consists of three main components:
- Source Database: PostgreSQL container with sample movie rental data
- Destination Database: PostgreSQL container for transformed data
- ETL Processor: Python service that manages the data pipeline
├── docker-compose.yaml # Container orchestration configuration
├── elt_script/
│ ├── Dockerfile # Python environment setup
│ └── elt_script.py # Main ELT processing logic
├── source_db_init/
│ └── init.sql # Source database schema and sample data
└── README.md # This documentation
- Containerized PostgreSQL databases for source and destination
- Automated data pipeline execution
- Sample database schema for movie rental domain
- Health checks and dependency management between services
- Docker Engine
- Docker Compose
- Git (optional)
- Clone this repository:
git clone custom-elt-project cd custom-elt-project
- Start the containers:
docker-compose up
- The ELT process will run automatically once services are healthy
- Source Database: localhost:5433
- Destination Database: localhost:5434
Use your preferred PostgreSQL client to connect (default credentials in docker-compose.yaml).
Through this project, I aimed to gain practical experience with:
- Docker containerization and multi-service coordination
- Database-to-database data pipelines
- Python scripting for ETL/ELT processes
- Data engineering workflow orchestration
Special thanks to Justin Chau for creating the original tutorial that inspired this implementation. This project follows his instructional materials while allowing me to build the solution myself as a learning exercise.
Future enhancements may include:
- Adding dbt for transformation layer
- Integrating Airflow for workflow orchestration
- Implementing data quality checks
- Extending the data model