Data Warehouse and ETL Process for Olympic and Economic Data Analysis

Overview

This project is designed to demonstrate the complete process of building a Data Warehouse, which involves several key stages:

Data Collection and Settlement (OLTP - Online Transaction Processing): This involves gathering and storing raw data, typically in an operational database.
Data Transformation and Integration (ETL - Extract, Transform, Load): The raw data is transformed and integrated into a unified format, making it suitable for analysis.
Data Warehouse Design and Implementation (OLAP - Online Analytical Processing): The data warehouse is designed, usually before the ETL process, to support complex queries and reporting.
Data Analysis and Visualisation (BI - Business Intelligence): The final stage aims to provide actionable insights, answer business questions, and support decision-making processes.

To illustrate these concepts, we use the fictitious Adventure Works Cycles database, representing a multinational manufacturing company that sells bicycles and accessories.

Environment Setup

Step 1: Cloning the Repository

To begin, clone the project repository:

git clone https://github.com/PascalSun/DW_2024.git

Familiarity with Git and GitHub is essential for software engineers. If you’re new to Git, here are some resources to get started:

Additionally, if you do not have a code editor, we recommend installing VS Code.

Step 2: Database Environment Setup

Ensure that Docker and Docker Compose are installed on your machine. These tools are necessary to set up the required databases and Jupyter Lab environment.

Start the services with the following command:

docker compose up -d

This will start the containers necessary for the project. You should have four running containers: pgdb, pgadmin, sqlserver, and jupyterlab. You can check this with:

docker ps

If any container is missing, check the logs for that container using:

docker logs -f <container_name>

Debug Hint

To confirm that all four containers are running, use:

docker ps

You should see four containers: pgdb, pgadmin, sqlserver, and jupyterlab. If one is missing, check the logs for that container:

docker logs -f <container_name>

Special Instructions for Windows Users

If you run the docker compose up command in PowerShell or the terminal provided by Windows, you might encounter issues with the sqlserver container due to line ending conflicts between Unix-based and Windows-based systems. To resolve this:

Open the project in VS Code.
Navigate to OLTP/sqlserver/install.sh and OLTP/sqlserver/startup.sh.
Change the line endings from CRLF to LF (in the bottom-right corner of VS Code).
Save the files and run:
```
docker compose up --build -d
```

Special Instructions for Mac M1/M2/M3 Users

For users with ARM architecture, the sqlserver and jupyterlab containers may not start properly due to the mcr.microsoft.com/mssql/server image not supporting ARM architecture. To resolve this:

Enable Use Rosetta for x86_64/amd64 emulation on Apple Silicon in Docker settings:
- Open Docker Desktop settings.
- Go to the General tab.
- Check the Use Rosetta for x86_64/amd64 emulation on Apple Silicon option.
- Click Apply & Restart.
Clean the images and run:
```
docker compose up --build -d
```

Step 3: Local Python Development Environment Setup

While the project provides a jupyterlab container within Docker for development, you may also choose to set up a local Python development environment:

Install Graphviz and its Python package for drawing ER diagrams:
- On macOS: brew install graphviz
- Install the Python package: pip install graphviz

Create a virtual environment with Python 3.9 or higher:

Using venv:

python3 -m venv venv
source venv/bin/activate

Using conda:

conda create -n dw python=3.9
conda activate dw

Install the project dependencies:
```
pip install -r requirements.txt
```
Run Jupyter Lab in the project directory:
```
jupyter lab
```

Step 4: Explore and Understand

This project encourages you to explore both the Docker and local development environments, understand their setups, and be prepared to discuss the advantages and disadvantages of each approach. This exploration is valuable for understanding the full lifecycle of a data warehouse project, from data collection to analysis and visualisation.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Final Submission		Final Submission
OLAP/pgdb		OLAP/pgdb
OLTP		OLTP
Project 1		Project 1
Week 3 demo		Week 3 demo
data		data
imgs		imgs
notebooks		notebooks
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile_JupyterLab		Dockerfile_JupyterLab
Dockerfile_MS		Dockerfile_MS
Dockerfile_PG		Dockerfile_PG
Final Submission.zip		Final Submission.zip
OLAP.ipynb		OLAP.ipynb
README.md		README.md
docker-compose.yml		docker-compose.yml
local_install.sql		local_install.sql
local_installDW.sql		local_installDW.sql
local_installdemo.sql		local_installdemo.sql
products.csv		products.csv
requirements.txt		requirements.txt
sales.csv		sales.csv
shops.csv		shops.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Warehouse and ETL Process for Olympic and Economic Data Analysis

Overview

Environment Setup

Step 1: Cloning the Repository

Step 2: Database Environment Setup

Debug Hint

Special Instructions for Windows Users

Special Instructions for Mac M1/M2/M3 Users

Step 3: Local Python Development Environment Setup

Step 4: Explore and Understand

About

Releases

Packages

Languages

iheathers/Olympic-Economic-Data-Warehouse-ETL-Analysis

Folders and files

Latest commit

History

Repository files navigation

Data Warehouse and ETL Process for Olympic and Economic Data Analysis

Overview

Environment Setup

Step 1: Cloning the Repository

Step 2: Database Environment Setup

Debug Hint

Special Instructions for Windows Users

Special Instructions for Mac M1/M2/M3 Users

Step 3: Local Python Development Environment Setup

Step 4: Explore and Understand

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages