This repository contains a custom Extract, Load, Transform (ELT) project that utlises Docker and PostreSQL to demonstrate a simple ELT process.
- docker-compose.yaml: This file contains the configuration for Docker Compose, which is used to orchestrate multiple Docker containers. It defines three services:
- source_postgres: The source PostgreSQL database.
- destination_postgres: The destination PostgreSQL database.
- elt_script: The service that runs the ELT script.
-
elt_script/Dockerfile: This Dockerfile sets up a Python environment and installs the PostgreSQL client. It also copies the ELT script into the container and sets it as the default command.
-
elt_script/elt_script.py: This Python script performs the ELT process. It waits for the source PostgreSQL database to become available, then dumps its data to a SQL file and loads this data into the destination PostgreSQL database.
-
source_db_init/init.sql: This SQL script initialises the source database with sample data. It creates tables for users, films, film categories, actors, and film actors, and inserts sample data into these tables.
- Docker Compose: Using the
docker-compose.yaml
file, three Docker containers are spun up:
- A source PostgreSQL database with sample data.
- A destination PostgreSQL database.
- A Python environment that runs the ELT script.
-
ELT Process: The
elt_script.py
waits for the source PostgreSQL database to become available. Once it's available, the script uses pg_dump to dump the source database to a SQL file. Then, it uses psql to load this SQL file into the destination PostgreSQL database. -
Database Initialisation: The
init.sql
script initialises the source database with sample data. It creates several tables and populates them with sample data.
- Ensure you have Docker and Docker Compose installed on your machine.
- Clone this repository.
- Navigate to the repository directory and run docker-compose up.
- Once all containers are up and running, the ELT process will start automatically.
- After the ELT process completes, you can access the source and destination PostgreSQL databases on ports 5433 and 5434, respectively.