🚄 IRCTC Real-Time Data Pipeline using Google Cloud (GCP)

📢 Overview

The IRCTC Real-Time Data Pipeline is a cloud-based data processing system designed to ingest, transform, and store real-time streaming data from IRCTC (Indian Railway Catering and Tourism Corporation). This project leverages Google Cloud Platform (GCP) services such as Pub/Sub, Dataflow (Apache Beam), BigQuery, and Cloud Storage to enable seamless data processing, transformation, and analysis.

📊 Project Flowchart

📁 Architecture Overview

🔹 Data Flow Pipeline

Data Ingestion: Simulated IRCTC Mock Data is published to Google Pub/Sub.
Data Processing: A Dataflow pipeline (Apache Beam) reads data from Pub/Sub, applies Python UDFs for transformation and fault tolerance.
Data Storage: The transformed data is stored in Google BigQuery for analytics.
UDF Registration: User-defined functions (transform_UDF.py) are registered from Google Cloud Storage to BigQuery.

⚙️ Tech Stack

Google Cloud Pub/Sub → Real-time message streaming
Google Dataflow (Apache Beam) → Data processing and transformation
Google BigQuery → Data warehouse for analytics
Google Cloud Storage → Stores UDF files
Python → Apache Beam pipeline & UDF implementation
SQL → Data transformation & querying in BigQuery
Terraform (Optional) → Infrastructure as Code (IaC) for GCP setup

🚀 Features

✔️ Real-time data ingestion using Pub/Sub
✔️ Serverless & scalable processing via Dataflow
✔️ Custom transformations using Python UDFs
✔️ Fault tolerance & error handling
✔️ Data warehousing for analytics using BigQuery
✔️ Optimized SQL queries for analysis and reporting

🗄️ BigQuery Schema

Column Name	Data Type	Description
`row_key`	STRING	Unique identifier for each record
`name`	STRING	Passenger's name
`age`	INT64	Passenger's age
`email`	STRING	Passenger's email address
`join_date`	DATE	Date when the passenger joined
`last_login`	TIMESTAMP	Last login timestamp
`loyalty_points`	INT64	Loyalty points earned
`account_balance`	FLOAT64	Account balance in INR
`is_active`	BOOL	Indicates if the account is active
`inserted_at`	TIMESTAMP	Timestamp when the record was inserted
`updated_at`	TIMESTAMP	Last updated timestamp
`loyalty_status`	STRING	Loyalty membership status
`account_age_days`	INT64	Total days since account creation

🎯 Use Cases

📊 Passenger Behavior Analysis: Using real-time & historical data to understand customer trends.
🎁 Loyalty Program Management: Enhancing customer engagement through data-driven rewards.
🔍 Operational Monitoring: Identifying active/inactive users for improved service efficiency.
📈 Trend Analysis: Leveraging BigQuery for actionable business insights.

📝 Future Enhancements

✅ Integrate Cloud Functions for event-driven triggers.
✅ Implement Dataflow Streaming Mode for real-time analytics.
✅ Optimize BigQuery Queries to enhance cost efficiency and performance.

👨‍💻 Author

Sujit Mahapatra
📧 Email | 🔗 LinkedIn

⭐ Contribute

Contributions are welcome! If you’d like to improve the project, feel free to fork the repository and submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
IRCTC Flowchart.png		IRCTC Flowchart.png
README.md		README.md
bigquery_create_table.sql		bigquery_create_table.sql
config.py		config.py
dataflow_pipeline.py		dataflow_pipeline.py
irctc_mock_data_to_pubsub.py		irctc_mock_data_to_pubsub.py
irctc_queries.sql		irctc_queries.sql
transform_udf.py		transform_udf.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚄 IRCTC Real-Time Data Pipeline using Google Cloud (GCP)

📢 Overview

📊 Project Flowchart

📁 Architecture Overview

🔹 Data Flow Pipeline

⚙️ Tech Stack

🚀 Features

🗄️ BigQuery Schema

🎯 Use Cases

📝 Future Enhancements

👨‍💻 Author

⭐ Contribute

About

Uh oh!

Releases

Packages

Languages

sujitmahapatra/IRCTC-RealTime-Data-Pipeline-GCP

Folders and files

Latest commit

History

Repository files navigation

🚄 IRCTC Real-Time Data Pipeline using Google Cloud (GCP)

📢 Overview

📊 Project Flowchart

📁 Architecture Overview

🔹 Data Flow Pipeline

⚙️ Tech Stack

🚀 Features

🗄️ BigQuery Schema

🎯 Use Cases

📝 Future Enhancements

👨‍💻 Author

⭐ Contribute

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages