Skip to content

Commit 0a6a3e8

Browse files
committed
fix
1 parent 64c4388 commit 0a6a3e8

File tree

6 files changed

+53
-57
lines changed

6 files changed

+53
-57
lines changed

docs/posts/blogs/blog1.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,4 +76,4 @@ Satellite imagery, IoT sensors, and big data analytics are helping predict natur
7676
The power of data science lies in its ability to solve problems, create opportunities, and transform the world. Whether you’re a student, a professional, or an enthusiast, now is the time to embrace data science and contribute to this evolving field.
7777

7878

79-
Thank you for reading! 🚀✨
79+
Thank you for reading!

docs/posts/blogs/blog2.md

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Data Science in Entertainment: Lights, Camera, Algorithms! 🎥
1+
# Data Science in Entertainment: Lights, Camera, Algorithms!
22

33
## Introduction: When Data Meets Showbiz
44

@@ -7,7 +7,7 @@ Have you ever wondered how Netflix knows exactly what you want to watch next, or
77
Let’s dive into the world where data science meets the spotlight, and explore how algorithms are transforming the entertainment industry.
88

99

10-
## 1. Personalized Recommendations: Your Digital BFF 🎯
10+
## 1. Personalized Recommendations
1111

1212
### A. The Netflix Effect
1313

@@ -21,7 +21,7 @@ Fun Fact: Over 80% of Netflix views are driven by recommendations!
2121

2222

2323

24-
### B. Spotify’s Musical Genius 🎵
24+
### B. Spotify’s Musical Genius
2525

2626
![Spotify Data Science](../../images/blogs/ds_spotify.png)
2727

@@ -33,7 +33,7 @@ Real-World Impact: Spotify’s recommendation system boosts user engagement by o
3333

3434

3535

36-
## 2. Box Office Predictions: Data Behind the Blockbusters 🎥
36+
## 2. Box Office Predictions: Data Behind the Blockbusters
3737

3838
Before the first ticket is sold, data science predicts whether a movie will be a flop or a blockbuster.
3939

@@ -46,7 +46,7 @@ Case Study: Predicting Marvel’s *Avengers: Endgame* would surpass $1 billion i
4646

4747

4848

49-
## 3. Gaming Analytics: Leveling Up the Experience 🎮
49+
## 3. Gaming Analytics: Leveling Up the Experience
5050

5151
![Games](../../images/blogs/ds_games.png)
5252

@@ -76,7 +76,7 @@ Fun Fact: Epic Games processes over 2 petabytes of data daily to improve gamepla
7676
3. Sentiment-Driven Content: Movies and songs dynamically changing based on your mood, tracked through wearables or devices.
7777

7878

79-
## 5. Challenges: The Dark Side of the Spotlight 🌑
79+
## 5. Challenges: The Dark Side of the Spotlight
8080

8181
1. Privacy Concerns: With so much personal data being collected, maintaining user trust is critical.
8282
2. Algorithmic Bias: Ensuring diverse and inclusive recommendations instead of reinforcing existing biases.
@@ -88,5 +88,3 @@ Fun Fact: Epic Games processes over 2 petabytes of data daily to improve gamepla
8888
Data science has turned entertainment into an experience that’s personal, predictive, and magical. From the playlists that understand your mood to the games that adapt to your skills, it’s an era where creativity and computation come together.
8989

9090
The future of entertainment is not just about watching or listening; it’s about living the experience – and data science is leading the way.
91-
92-

docs/posts/open_source/gsoc.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# My Google Summer of Code Journey – Extending Data Structures, Algorithms, and the C++ Backend 🚀
1+
# My Google Summer of Code Journey – Extending Data Structures, Algorithms, and the C++ Backend
22

33
## Introduction: A Summer of Growth, Code, and Algorithms
44

@@ -8,7 +8,7 @@ I’m Sakshi Oza, a Master's student passionate about data-science, open-source
88
If you’ve ever wondered how Python libraries can achieve C++-level performance while implementing advanced algorithms, this blog is for you. I’ll share my GSoC journey, including challenges, solutions, and learnings.
99

1010

11-
## About the Project 📚
11+
## About the Project
1212

1313
The project focused on enhancing a Python-based data structures library, `pydatastructs` by:
1414
1. Extending existing data structures and algorithms.
@@ -18,7 +18,7 @@ The project focused on enhancing a Python-based data structures library, `pydata
1818
This combination bridges the gap between Python’s developer-friendliness and C++’s computational efficiency.
1919

2020

21-
## Breaking It Down: My GSoC Timeline
21+
## Breaking It Down: My GSoC Timeline
2222

2323
### 🔹 Community Bonding Period
2424
Before coding began, I focused on:
@@ -57,7 +57,7 @@ I explored the gaps in the current implementation and set milestones for my cont
5757
In Phase 2, I focused on backend optimization by implementing a C++ backend for performance-critical algorithms. Why? Python is fantastic for development, but when it comes to heavy computation, C++ shines. By combining the two, we achieve the best of both worlds.
5858

5959

60-
### Key Contributions in this Phase 🛠️
60+
### Key Contributions in this Phase
6161

6262
#### 1. Sorting Algorithms
6363
- Added bubble_sort, selection_sort, and insertion_sort with C++ backend support.
@@ -82,7 +82,7 @@ Introduced a lazy segment tree to handle range-based queries and updates efficie
8282
Segment trees are invaluable for applications like interval management and range sum/count queries.
8383

8484

85-
## Challenges and Learnings 🤓
85+
## Challenges and Learnings
8686

8787
### 1. Network Flow Complexity
8888
Implementing Edmond-Karp and Dinic’s algorithms required a deep understanding of graph theory, BFS, and DFS.
@@ -97,7 +97,7 @@ I explored Cython and other backend options to ensure seamless interoperability
9797
- Wrote comprehensive test cases for every addition to ensure robustness.
9898

9999

100-
## Memes from the Journey 🎭
100+
## Memes from the Journey
101101

102102
1. When Network Flow Started Making Sense
103103
*“When theory meets implementation, and it finally clicks.”*
@@ -109,15 +109,15 @@ I explored Cython and other backend options to ensure seamless interoperability
109109
*“Months of hard work, countless commits, and it all comes down to one button: Merge PR.”*
110110

111111

112-
## Impact: Why This Matters 🌍
112+
## Impact: Why This Matters
113113

114114
The contributions I made during GSoC 2023 will:
115115
1. Enhance library performance through optimized algorithms.
116116
2. Expand library utility with new data structures and methods.
117117
3. Improve scalability with a C++ backend for heavy computations.
118118

119119

120-
## Gratitude 🙏
120+
## Gratitude
121121

122122
I’m incredibly grateful to my mentors:
123123
- Gagandeep Singh
@@ -127,11 +127,11 @@ I’m incredibly grateful to my mentors:
127127
Their guidance, patience, and feedback were invaluable throughout this project. I also want to thank the GSoC community for fostering such a collaborative and supportive environment.
128128

129129

130-
## Conclusion: A Summer to Remember 🌟
130+
## Conclusion: A Summer to Remember
131131

132132
GSoC 2023 was a transformative experience. From grappling with network flows to integrating C++ backends, I grew as a developer and problem solver. This project has not only strengthened my technical skills but also deepened my appreciation for open-source contributions.
133133

134134
I’m excited to continue my open-source journey, contribute more, and keep growing as a developer.
135135

136-
Thank you for reading! 💻✨
136+
Thank you for reading!
137137
“Keep coding, keep learning, and let’s build something amazing together!”

docs/posts/research/project1.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11

2-
# 🚀 Predicting Cardiovascular Risk Using Machine Learning 🩺
2+
# Predicting Cardiovascular Risk Using Machine Learning
33

44

5-
## Introduction: A Fight Against the Silent Killer 💔
5+
## Introduction: A Fight Against the Silent Killer
66

77
Cardiovascular diseases (CVDs) are the leading cause of death worldwide, accounting for millions of lives lost annually. Identifying individuals at risk early can help implement preventive strategies and save lives. The challenge? Finding the right tools to predict this risk effectively.
88

99
In this project, I used machine learning techniques—Principal Component Analysis (PCA), K-Means Clustering, and LASSO Logistic Regression—to identify high-risk individuals based on health data from the NHANES dataset.
1010

1111

12-
## Data Overview 📊
12+
## Data Overview
1313

1414
I used the National Health and Nutrition Examination Survey (NHANES) dataset, which includes key demographic, physiological, and dietary features:
1515

@@ -27,7 +27,7 @@ Participants were classified as High Risk (1) if they met at least one of the fo
2727
Otherwise, they were labeled as Low Risk (0).
2828

2929

30-
## Methodology 🛠️
30+
## Methodology
3131

3232
I applied three key machine learning techniques to analyze and predict CVD risk:
3333

@@ -52,7 +52,7 @@ Result:
5252
- PC2 explained 20.4% of variance, focusing on Systolic BP and Diastolic BP.
5353

5454

55-
### 2. K-Means Clustering 🤖
55+
### 2. K-Means Clustering
5656
K-Means was applied to PCA components to identify subgroups within the data. The objective was to minimize within-cluster variance:
5757

5858
$$
@@ -77,7 +77,7 @@ Insights:
7777
- Cluster 3: Middle-aged individuals with moderate BMI → Medium Risk
7878

7979

80-
### 3. LASSO Logistic Regression 📉
80+
### 3. LASSO Logistic Regression
8181
LASSO regression shrinks insignificant predictors to zero, focusing only on the most influential features. The loss function includes a penalty term:
8282

8383
$$
@@ -104,7 +104,7 @@ The model achieved excellent predictive accuracy:
104104
- Specificity: 93.6%
105105

106106

107-
## Results and Discussion 📈
107+
## Results and Discussion
108108

109109
### Key Takeaways
110110

@@ -117,21 +117,21 @@ The model achieved excellent predictive accuracy:
117117
3. The LASSO model’s high accuracy proves its reliability for predicting CVD risk.
118118

119119

120-
## Visual Results 📊
120+
## Visual Results
121121

122122
1. PCA Biplot: Visualize variable contributions to CVD risk.
123123
2. K-Means Cluster Plot: Show the distinct clusters based on PCA components.
124124
3. LASSO Coefficient Table: Highlight the importance of each predictor.
125125
4. ROC Curve: Demonstrates the model’s high predictive performance.
126126

127127

128-
## Challenges Faced 💡
128+
## Challenges Faced
129129
1. Multicollinearity: Addressed using PCA for dimensionality reduction.
130130
2. Optimal Clustering: Achieved using the Elbow Method.
131131
3. Model Tuning: Finding the best regularization parameter $\lambda$ for LASSO.
132132

133133

134-
## Conclusion: Insights for Public Health 🌍
134+
## Conclusion: Insights for Public Health
135135

136136
This study demonstrates the power of machine learning in predicting cardiovascular risk. By combining PCA, K-Means Clustering, and LASSO Regression, I:
137137
- identified key predictors of CVD risk: BMI, Systolic BP, and Age.
@@ -140,12 +140,12 @@ This study demonstrates the power of machine learning in predicting cardiovascul
140140
These findings can guide public health strategies to focus resources on high-risk individuals and promote preventive healthcare.
141141

142142

143-
## Future Directions 🚀
143+
## Future Directions
144144

145145
1. Use longitudinal data to monitor CVD risk over time.
146146
2. Explore advanced models like Deep Learning for complex interactions.
147147
3. Apply this framework to global datasets for broader impact.
148148

149149

150-
## Thank You! 💻✨
150+
## Thank You!
151151
*"Let’s use data to solve real-world problems and create a healthier world!"*

docs/posts/research/project2.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
# 🧪 Analyzing the Impact of Occupational Radiation Exposure on Cancer Mortality
1+
# Analyzing the Impact of Occupational Radiation Exposure on Cancer Mortality
22

3-
## 1. Introduction: The Shadow of Radiation ☢️
3+
## 1. Introduction: The Shadow of Radiation
44

55
Occupational radiation exposure is a significant public health concern, particularly for workers in industries like nuclear power, metal processing, and energy production. Long-term exposure to radiation has been linked to increased cancer risks, making this an important area of occupational health research.
66

@@ -13,9 +13,9 @@ This retrospective study investigates the impact of occupational radiation expos
1313

1414

1515

16-
## 3. Methods and Materials 🛠️
16+
## 3. Methods and Materials
1717

18-
### A. Data Sources 📊
18+
### A. Data Sources
1919

2020
Two datasets were analyzed:
2121

@@ -33,15 +33,15 @@ Note: The study focuses on white male workers due to demographic consistency, re
3333

3434

3535

36-
### B. Workflow of the Study 🔄
36+
### B. Workflow of the Study
3737

3838
The data analysis process followed a systematic workflow as shown below:
3939

4040
![flowchart](../../images/project2/flowchart.png)
4141

4242

4343

44-
### C. Data Preprocessing 🧹
44+
### C. Data Preprocessing
4545

4646
1. Cancer Classification:
4747
Workers were categorized into two groups based on ICD-8 codes:
@@ -57,7 +57,7 @@ Table 1: Example Rows from Merged Dataset
5757

5858

5959

60-
### D. Statistical Analysis 🧮
60+
### D. Statistical Analysis
6161

6262
1. Descriptive Statistics:
6363
Calculated measures like mean, standard deviation, skewness, and kurtosis for photon dose levels.
@@ -71,7 +71,7 @@ Table 1: Example Rows from Merged Dataset
7171

7272

7373

74-
## 4. Results 📈
74+
## 4. Results
7575

7676
### A. Descriptive Analysis
7777

@@ -113,26 +113,26 @@ Result: The test showed a statistically significant difference in photon dose le
113113

114114

115115

116-
## 5. Discussion: Key Insights 🧐
116+
## 5. Discussion: Key Insights
117117

118118
1. Workers who died from cancer-related causes had significantly higher radiation doses than those who died from other causes.
119119
2. The skewed distribution of photon doses highlights the importance of non-parametric tests in exposure data analysis.
120120
3. These findings align with the hypothesis that long-term radiation exposure increases the risk of cancer mortality.
121121

122122

123123

124-
## 6. Conclusion: What This Means for Public Health 🌍
124+
## 6. Conclusion: What This Means for Public Health
125125

126126
Our analysis suggests a strong association between occupational radiation exposure and cancer mortality among FMPC workers. These results underscore the importance of:
127127
1. Dose Monitoring: Implementing strict monitoring protocols for radiation levels.
128128
2. Safety Regulations: Ensuring safety standards to minimize exposure in occupational settings.
129129
3. Future Research: Conducting studies with larger, more diverse datasets to validate these findings.
130130

131-
## 7. References 📚
131+
## 7. References
132132

133133
1. Cragle, D. L., Watkins, J. P., Ingle, J. N., et al. *Mortality Among White Male Workers at a Uranium Processing Plant.* (1996).
134134
2. CEDR (1994). *Fernald Retrospective Cancer Mortality Study.*
135135

136136

137-
Thank you for reading! 🚀
137+
Thank you for reading!
138138
*“Science is not only about understanding the world; it’s about protecting it.”*

0 commit comments

Comments
 (0)