Skip to content

Latest commit

 

History

History
44 lines (29 loc) · 1.48 KB

File metadata and controls

44 lines (29 loc) · 1.48 KB

DataPipeline-Kafka-OpenWeatherAPI

Realtime Data Streaming Data Engineering Project | OpenWeather API | Kafka + Spark

Table of Contents

Introduction

This project serves as a comprehensive guide to building an end-to-end data engineering pipeline. It covers each stage from data ingestion to processing and finally to storage, utilizing a robust tech stack that includes Python, Apache Kafka, Apache Zookeeper, and Apache Spark. Everything is containerized using Docker for ease of deployment and scalability.

System Architecture

OpenWeatherAPI-kafka

The project is designed with the following components:

  • Data Source: We use openweather API to generate weather data for our pipeline.
  • Apache Kafka and Zookeeper: Used for streaming data to the processing engine.
  • Control Center and Schema Registry: Helps in monitoring and schema management of our Kafka streams.
  • Apache Spark: For data processing with its master and worker nodes.

Technologies

  • Python
  • Apache Kafka
  • Apache Zookeeper
  • Apache Spark
  • Docker

Getting Started

  1. Run Docker Compose to spin up the services:

    docker-compose up
  2. Run data pipeline by jupter Notebook OpenWeatherAPI_Kafka.ipynb

  3. Other 2 data pipelines with Spark and Pandas OpenWeatherAPI_Spark.ipynb, OpenWeatherAPI_Pandas.ipynb