[DMP 2025]:  Scalable, Decoupled Data Ingestion Framework for Real-Time and Batch Integration

### Ticket Contents

## Description

**_Project Context_**
MedPlat processes high volumes of data originating from a variety of sources—mobile apps, web portals, legacy systems, and third-party APIs. These data streams come in both real-time and batch formats, leading to growing challenges in integration, consistency, and timely availability for analytics and decision-making.

Currently, MedPlat uses tightly coupled, manually managed data integrations. This creates:

High maintenance overhead

Delays in data availability for dashboards and alerts

Inflexibility to onboard new systems or support real-time use cases

**_Objective_**
Design and implement a scalable, loosely coupled data ingestion framework that can efficiently integrate heterogeneous sources and deliver consistent, timely data to storage and analytics pipelines.

The solution should be resilient, modular, and extensible, with support for:
- Real-time ingestion via Kafka (or alternatives)
- Batch ingestion via APIs or file uploads
- A pluggable contract-driven interface for onboarding sources
- Observability and fault tolerance




### Goals & Mid-Point Milestone

## Goals
- [ ] Provide a unified data ingestion layer capable of handling both streaming and batch data
- [ ] Reduce system complexity and eliminate hardcoded integrations via a plug-and-play architecture
- [ ] Centralize data flow through a message queue or dispatcher service
- [ ] Enable clean data routing to downstream storage, analytics engines, and alerting systems
- [ ] Design system with fault tolerance, retry logic, and monitoring
- [ ] Create documentation for extending the pipeline to new modules or partner systems


### Setup/Installation

_No response_

### Expected Outcome

A centralized, scalable ingestion system that improves reliability, onboarding speed, and downstream analytics

Near real-time or scheduled availability of data for dashboards, alerts, and models

Reduced manual effort in maintaining integrations

Well-documented architecture and data contracts to guide future expansion

### Acceptance Criteria

✅ Ingestion layer supports at least 2 batch and 2 real-time sources
✅ Plug-and-play design allows new sources/sinks to be configured easily
✅ Messages/data can be routed to multiple downstream systems (e.g., PostgreSQL, ElasticSearch, S3)
✅ Retry mechanisms, logging, and status monitoring are in place
✅ Architecture diagrams, setup scripts, and integration guides are included in the repo

### Implementation Details

Event Broker (Real-time): Apache Kafka / Apache Pulsar / RabbitMQ

Batch Ingestion: Scheduled pull via API, FTP, or manual CSV ingest via CLI

Language: Python / Node.js / Golang

Data Schema: JSON, Avro, or Parquet with versioned contract registry

Observability: Logs via ELK/FluentD, Prometheus metrics for system health

Scalability: Modular microservices or container-based deployment (Docker + Kubernetes optional)

### Mockups/Wireframes

Flow diagrams and data contracts to be created collaboratively with MedPlat data engineering team.

### Product Name

Unified Data Ingestion Framework

### Organisation Name

Bandhu

### Domain

⁠Healthcare

### Tech Skills Needed

Angular

### Mentor(s)

@mvadodariya

### Category

Backend

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DMP 2025]: Scalable, Decoupled Data Ingestion Framework for Real-Time and Batch Integration #544

Ticket Contents

Description

Goals & Mid-Point Milestone

Goals

Setup/Installation

Expected Outcome

Acceptance Criteria

Implementation Details

Mockups/Wireframes

Product Name

Organisation Name

Domain

Tech Skills Needed

Mentor(s)

Category

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[DMP 2025]: Scalable, Decoupled Data Ingestion Framework for Real-Time and Batch Integration #544

Description

Ticket Contents

Description

Goals & Mid-Point Milestone

Goals

Setup/Installation

Expected Outcome

Acceptance Criteria

Implementation Details

Mockups/Wireframes

Product Name

Organisation Name

Domain

Tech Skills Needed

Mentor(s)

Category

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions