-
Notifications
You must be signed in to change notification settings - Fork 499
Description
Ticket Contents
Description
Project Context
MedPlat processes high volumes of data originating from a variety of sources—mobile apps, web portals, legacy systems, and third-party APIs. These data streams come in both real-time and batch formats, leading to growing challenges in integration, consistency, and timely availability for analytics and decision-making.
Currently, MedPlat uses tightly coupled, manually managed data integrations. This creates:
High maintenance overhead
Delays in data availability for dashboards and alerts
Inflexibility to onboard new systems or support real-time use cases
Objective
Design and implement a scalable, loosely coupled data ingestion framework that can efficiently integrate heterogeneous sources and deliver consistent, timely data to storage and analytics pipelines.
The solution should be resilient, modular, and extensible, with support for:
- Real-time ingestion via Kafka (or alternatives)
- Batch ingestion via APIs or file uploads
- A pluggable contract-driven interface for onboarding sources
- Observability and fault tolerance
Goals & Mid-Point Milestone
Goals
- Provide a unified data ingestion layer capable of handling both streaming and batch data
- Reduce system complexity and eliminate hardcoded integrations via a plug-and-play architecture
- Centralize data flow through a message queue or dispatcher service
- Enable clean data routing to downstream storage, analytics engines, and alerting systems
- Design system with fault tolerance, retry logic, and monitoring
- Create documentation for extending the pipeline to new modules or partner systems
Setup/Installation
No response
Expected Outcome
A centralized, scalable ingestion system that improves reliability, onboarding speed, and downstream analytics
Near real-time or scheduled availability of data for dashboards, alerts, and models
Reduced manual effort in maintaining integrations
Well-documented architecture and data contracts to guide future expansion
Acceptance Criteria
✅ Ingestion layer supports at least 2 batch and 2 real-time sources
✅ Plug-and-play design allows new sources/sinks to be configured easily
✅ Messages/data can be routed to multiple downstream systems (e.g., PostgreSQL, ElasticSearch, S3)
✅ Retry mechanisms, logging, and status monitoring are in place
✅ Architecture diagrams, setup scripts, and integration guides are included in the repo
Implementation Details
Event Broker (Real-time): Apache Kafka / Apache Pulsar / RabbitMQ
Batch Ingestion: Scheduled pull via API, FTP, or manual CSV ingest via CLI
Language: Python / Node.js / Golang
Data Schema: JSON, Avro, or Parquet with versioned contract registry
Observability: Logs via ELK/FluentD, Prometheus metrics for system health
Scalability: Modular microservices or container-based deployment (Docker + Kubernetes optional)
Mockups/Wireframes
Flow diagrams and data contracts to be created collaboratively with MedPlat data engineering team.
Product Name
Unified Data Ingestion Framework
Organisation Name
Bandhu
Domain
Healthcare
Tech Skills Needed
Angular
Mentor(s)
Category
Backend