DataPipeline-Kafka-OpenWeatherAPI

Realtime Data Streaming Data Engineering Project | OpenWeather API | Kafka + Spark

Introduction

This project serves as a comprehensive guide to building an end-to-end data engineering pipeline. It covers each stage from data ingestion to processing and finally to storage, utilizing a robust tech stack that includes Python, Apache Kafka, Apache Zookeeper, and Apache Spark. Everything is containerized using Docker for ease of deployment and scalability.

System Architecture

The project is designed with the following components:

Data Source: We use openweather API to generate weather data for our pipeline.
Apache Kafka and Zookeeper: Used for streaming data to the processing engine.
Control Center and Schema Registry: Helps in monitoring and schema management of our Kafka streams.
Apache Spark: For data processing with its master and worker nodes.

Technologies

Python
Apache Kafka
Apache Zookeeper
Apache Spark
Docker

Getting Started

Run Docker Compose to spin up the services:
```
docker-compose up
```
Run data pipeline by jupter Notebook OpenWeatherAPI_Kafka.ipynb
Other 2 data pipelines with Spark and Pandas OpenWeatherAPI_Spark.ipynb, OpenWeatherAPI_Pandas.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
OpenWeatherAPI-kafka.png		OpenWeatherAPI-kafka.png
OpenWeatherAPI_Kafka.ipynb		OpenWeatherAPI_Kafka.ipynb
OpenWeatherAPI_Pandas.ipynb		OpenWeatherAPI_Pandas.ipynb
OpenWeatherAPI_Spark.ipynb		OpenWeatherAPI_Spark.ipynb
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataPipeline-Kafka-OpenWeatherAPI

Realtime Data Streaming Data Engineering Project | OpenWeather API | Kafka + Spark

Table of Contents

Introduction

System Architecture

Technologies

Getting Started

About

Releases

Packages

Languages

License

MinZhangData/DataPipeline-Kafka-OpenWeatherAPI

Folders and files

Latest commit

History

Repository files navigation

DataPipeline-Kafka-OpenWeatherAPI

Realtime Data Streaming Data Engineering Project | OpenWeather API | Kafka + Spark

Table of Contents

Introduction

System Architecture

Technologies

Getting Started

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages