This project aims to develop a Health Monitoring System that analyzes patient health parameters such as Blood Pressure, Sugar Level, Cholesterol, and Hemoglobin. The system leverages big data technologies (Apache Spark, Hadoop) for efficient processing and data visualization to identify trends and potential health risks.
- Synthetic Patient Data Generation (10,000 patient records)
- Big Data Processing using Apache Spark & Pandas
- Statistical Analysis (Mean, Standard Deviation, Correlation)
- Data Visualization Dashboard (Matplotlib, Seaborn)
- Identification of High-Risk Patients
- Scalable for Real-World Integration
- Programming Language: Python (Pandas, NumPy, SciPy, Matplotlib, Seaborn)
- Big Data Frameworks: Apache Spark, Hadoop (Optional for distributed storage)
- Data Generation: Faker Library (Synthetic data simulation)
- Visualization Tools: Matplotlib, Seaborn
├── data/ # Sample patient dataset (CSV format)
├── src/ # Source code for data processing & analysis
│ ├── data_generator.py # Generates synthetic patient data
│ ├── data_processing.py # Data cleaning, transformation, and Spark processing
│ ├── analysis.py # Statistical analysis & health risk identification
│ ├── visualization.py # Dashboard & data visualization
├── notebooks/ # Jupyter Notebooks for testing
├── README.md # Project documentation
└── requirements.txt # Python dependencies
The dataset includes 10,000 synthetic patient profiles, each with the following attributes:
- Patient ID (Unique Identifier)
- Name, Age, Gender, Contact Information
- Medical History
- Blood Pressure (BP), Sugar Level, Cholesterol, Hemoglobin
- Load Dataset: Import CSV into Apache Spark DataFrame
- Data Cleaning: Handle missing values, remove duplicates, and normalize health parameters
- Aggregation: Compute mean, standard deviation, and statistical distributions
- Health Risk Identification: Flag patients with abnormal health parameters
- Histograms: Distribution of BP, Sugar, Cholesterol, Hemoglobin
- Scatter Plots: Identifying relationships (e.g., Age vs. Cholesterol)
- Box Plots: Comparing BP across gender
- Correlation Heatmap: Relationships between all health parameters
- Machine Learning Models for disease prediction (Diabetes, Hypertension, etc.)
- Real-Time Data Monitoring using IoT & streaming pipelines (Kafka, Spark Streaming)
- Integration with Wearable Devices for continuous health tracking
git clone https://github.com/juned-k786/health-monitoring-system.git
cd health-monitoring-systepip install -r requirements.txtpython src/data_generator.pypython src/data_processing.pypython src/visualization.pyThis project is open-source and available under the MIT License.
💡 Contributions are welcome! Feel free to open issues and submit pull requests. 🚀
This project is open-source and available under the MIT License.
© 2025 Sumit kapadia.