isg75 · DelmarB · Jul 15, 2025 · Jul 15, 2025 · Jul 15, 2025 · Jul 15, 2025
diff --git a/.virtual_documents/notebooks/Untitled.ipynb b/.virtual_documents/notebooks/Untitled.ipynb
@@ -0,0 +1,12 @@
+import pandas as pd
+
+pd.read_csv('Users/Dejmen/Desktop/Ironhack/week5/Day1/vanguard-ab-test/data/raw/df_final_demo.txt', sep="\t")
+
+
+df = pd.read_csv("../data/raw/df_final_demo.txt") 
+
+
+df.head(20)
+
+
+
diff --git a/README.md b/README.md
@@ -1,77 +1,139 @@
-# Project overview
-...
+# 🌍 Vanguard A/B Testing: Website Redesign Performance Analysis
 
-# Installation
+## Objective
+This project uses A/B testing to evaluate the performance of a new website design compared to the existing version. Our goal is to determine — through formal statistical hypothesis testing — whether the new design improves key user behavior metrics, such as completion rate and time efficiency. In addition, we aim to uncover potential usability issue(s) within the new design for further refinement.
 
-1. **Clone the repository**:
+---
 
-```bash
-git clone https://github.com/YourUsername/repository_name.git
-```
+## 🔍 Hypothesis
 
-2. **Install UV**
+We hypothesize that the new website design improves user performance across several key indicators, including:
 
-If you're a MacOS/Linux user type:
+### A higher completion rate
 
-```bash
-curl -LsSf https://astral.sh/uv/install.sh | sh
-```
+### Lower error (reversal) rates
 
-If you're a Windows user open an Anaconda Powershell Prompt and type :
+### Shorter time spent on steps, indicating better usability
 
-```bash
-powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
-```
+We will apply statistical hypothesis testing (2 Sample T-tests) to compare performance metrics between users assigned to the old design (control group) and the new design (test group).
 
-3. **Create an environment**
+---
 
-```bash
-uv venv 
-```
+## Funnel Structure
+The user journey consists of three sequential steps, followed by a final "Confirm" step, representing successful completion. Users may proceed forward or move backward in the process. Backward navigation (step reversal) may indicate confusion or design inefficiencies.
 
-3. **Activate the environment**
+This funnel structure provides the framework for defining and analyzing all metrics.
 
-If you're a MacOS/Linux user type (if you're using a bash shell):
+---
 
-```bash
-source ./venv/bin/activate
-```
+## Primary Metric
+- Quality Visits Leading to Confirmation
 
-If you're a MacOS/Linux user type (if you're using a csh/tcsh shell):
+A "quality visit" is defined as a session in which the user completes all steps and reaches the final "Confirm" stage.
+
+### KPIs
 
-```bash
-source ./venv/bin/activate.csh
-```
+#### Key Performance Indicators (KPIs)
 
-If you're a Windows user type:
+Each KPI below was subjected to hypothesis testing to assess whether the differences between the old and new designs are statistically significant.
 
-```bash
-.\venv\Scripts\activate
-```
+##### Completion Rate without error regardless of prior error visits
+The proportion of users who reach the final "Confirm" step in one visit. Comparing the test group +5% threshold (cost-effectiveness)
+Hypothesis test: Z-test
 
-4. **Install dependencies**:
+##### Time Spent on Each Step
+The average time users spend at each step of the funnel, analyzed by age group.
+Hypothesis test: T-Test
 
-```bash
-uv pip install -r requirements.txt
-```
+##### Error Rate (Step Reversals)
+Proportion of users (according to age group)  who move backward in the process flow (from a later step to an earlier one) and fail to complete the final step – control and test group separation.
+Hypothesis test: T-Test
 
-# Questions 
-...
+#### Age Group Engagement
+Comparison of the average session duration (in seconds) across defined age groups (<30, 30–39, 40–49, 50-59, 60-69, 70-79, 80+) between control group and test group of the website.
+Test: Barplot
 
-# Dataset 
-...
+#### Expected Outcomes
+- Statistically confirm whether the new design improves user engagement and conversion.
+- Ensure that engagement remains consistent across age groups.
+- Identify steps in the funnel where users experience friction (e.g., high reversal rates or time delays) and provide actionable redesign suggestions.
 
-## Main dataset issues
 
-- ...
-- ...
-- ...
+## 🧾 Dataset Description
 
-## Solutions for the dataset issues
-...
+### 🧱 Raw Datasets:
+
+- **df_final_demo.txt**
+- **df_final_experiment_clients.txt**
+- **df_final_web_data_pt_1.txt**
+- **df_final_web_data_pt_2.txt**
+
+### Dataset obstacles:
+- 
+- **df_final_experiment_clients.txt**
+- ~ 20,000 rows were deleted due to multiple NaN values
+
+- **df_final_demo.txt**
+- 14 rows were delted due to all columns having Nan values
+
+### Final 
+> Note: All txt file were individually exported to single csv files.  After cleaning, individual csv files were exported to be merged into one table. This made querying easier.
+- **merged_df_clean.csv**
+
+---
+
+## 💻 Technologies Used
+
+| Area                 | Tools/Technologies                                      |
+|----------------------|---------------------------------------------------------|
+| Data Manipulation    | Python (Pandas, NumPy)                                  |
+| Data Visualization   | Matplotlib, Seaborn, Pyplot                             |
+| Documentation        | Jupyter Notebook, Markdown, GitHub,                     |
+| Version Control      | Git, GitHub, Anaconda Powershell                        |
+| Statistical Analysis | Scipy, statsmodels                                      |
+
+
+---
+
+## 📦 Deliverables
+
+- ✅ [Repository "vanguard-ab-test" on GitHub](https://github.com/Brenvillag/vanguard-ab-test) 
+- ✅ [Raw dataset](https://github.com/data-bootcamp-v4/lessons/tree/main/5_6_eda_inf_stats_tableau/project/files_for_project)
+- ✅ Jupyter Notebook with cleaned and documented dataset (`merged_df_clean.csv`)
+- ✅ Jupyter Notebook calling of the functions 
+- ✅ Python ".py"-file with functions
+- ✅ Tableau file
+- ✅ [Group 1 Trello Project Page](https://trello.com/b/xIrQ1kK7/vanguard-ab-test)
+- ✅ README documentation: README.md
+- ✅ [Group 1 Presentation](https://docs.google.com/presentation/d/1Z9yE8gTMzNdZwtDIAucWzqTXzSzvAsn52Qk6IsR0oF4/edit?usp=sharing) 
+
+
+---
+
+## 👨‍💼 Target Audience
+
+- **Target Market**: AGE GROUP - We have to cater to the older clients to "confirm"
+- **Stakeholders**: Suggest changes or accept the new design
+- **Analysts / Webdesigners**: Offer a clean dataset for further projects or optimizations
+
+---
+
+## 🛠️ Future Work
+- **Assist in Webdesign optimizations**: Notebooks and function python file ready to use after later changes
+
+---
+
+## 👥 Contributors
+
+- Brenda Villaverde
+- Damian Witkowski
+- Sherin Kuruvilla
+- Delmar Bumanglag
+
+---
+
+## 🌐 We have proven our hypthesis
+### WE NEED TO CHANGE THIS 
+📢 *The web design does contribute to a faster confirm rate per visit.*
 
-# Conclussions
-...
 
-# Next steps
-...
diff --git a/data/clean/age_group_error.csv b/data/clean/age_group_error.csv
@@ -0,0 +1,15 @@
+Variation,age_group,error_rate
+Control,0-30,0.15558953697647318
+Control,30-39,0.15940402768893042
+Control,40-49,0.17310098148834152
+Control,50-59,0.19123746897912958
+Control,60-69,0.19960180306261735
+Control,70-79,0.22200764133244416
+Control,80+,0.24773730196068816
+Test,0-30,0.16564013002027975
+Test,30-39,0.17899731432486105
+Test,40-49,0.19186831800873064
+Test,50-59,0.22495122646676183
+Test,60-69,0.23618682235425859
+Test,70-79,0.27431886823911966
+Test,80+,0.28537193092450375
diff --git a/data/clean/avg_step_duration_for_tableau.csv b/data/clean/avg_step_duration_for_tableau.csv
@@ -0,0 +1,71 @@
+process_step,Variation,age_group,avg_step_duration_seconds
+confirm,Control,<30,88.78953626634959
+confirm,Control,30–39,98.81361892583121
+confirm,Control,40–49,119.84432809773124
+confirm,Control,50–59,133.71443708609272
+confirm,Control,60–69,165.97391304347826
+confirm,Control,70–79,194.9187165775401
+confirm,Control,80+,206.316091954023
+confirm,Test,<30,96.64970145009951
+confirm,Test,30–39,95.47562371252003
+confirm,Test,40–49,108.9019344438474
+confirm,Test,50–59,139.2750573036049
+confirm,Test,60–69,170.72394881170018
+confirm,Test,70–79,193.96587301587303
+confirm,Test,80+,193.47602739726028
+start,Control,<30,124.17042939353696
+start,Control,30–39,123.16122082585278
+start,Control,40–49,168.35644310474754
+start,Control,50–59,164.11613406079502
+start,Control,60–69,164.76671289875173
+start,Control,70–79,177.45447705041386
+start,Control,80+,162.98187311178248
+start,Test,<30,126.47676837725382
+start,Test,30–39,131.79662423907027
+start,Test,40–49,151.10570005534035
+start,Test,50–59,149.25849762066622
+start,Test,60–69,156.62193095809488
+start,Test,70–79,182.8992684299381
+start,Test,80+,153.2340425531915
+step_1,Control,<30,29.697157267308572
+step_1,Control,30–39,34.593635486981675
+step_1,Control,40–49,37.48498031903874
+step_1,Control,50–59,45.53220648698036
+step_1,Control,60–69,52.142279845091764
+step_1,Control,70–79,66.15290669272106
+step_1,Control,80+,65.53932584269663
+step_1,Test,<30,31.161000179888468
+step_1,Test,30–39,30.739241265557055
+step_1,Test,40–49,33.76131687242798
+step_1,Test,50–59,39.101942305482126
+step_1,Test,60–69,44.34618217530076
+step_1,Test,70–79,50.809257185516984
+step_1,Test,80+,45.43069306930693
+step_2,Control,<30,24.26086956521739
+step_2,Control,30–39,28.077482876712327
+step_2,Control,40–49,34.75361653272101
+step_2,Control,50–59,44.883597883597886
+step_2,Control,60–69,49.07524752475248
+step_2,Control,70–79,57.26813655761024
+step_2,Control,80+,67.81428571428572
+step_2,Test,<30,38.98378254910918
+step_2,Test,30–39,38.70398338682273
+step_2,Test,40–49,38.680789798436855
+step_2,Test,50–59,51.74283293320153
+step_2,Test,60–69,56.66799265605875
+step_2,Test,70–79,65.04033379694019
+step_2,Test,80+,84.8649193548387
+step_3,Control,<30,85.31499429874573
+step_3,Control,30–39,92.84223664503246
+step_3,Control,40–49,104.78492893537141
+step_3,Control,50–59,110.91185112634672
+step_3,Control,60–69,75.60493827160494
+step_3,Control,70–79,73.74145616641901
+step_3,Control,80+,84.42574257425743
+step_3,Test,<30,92.35365853658537
+step_3,Test,30–39,93.17980022197558
+step_3,Test,40–49,109.72405251424325
+step_3,Test,50–59,111.91771708683473
+step_3,Test,60–69,82.81814901677792
+step_3,Test,70–79,82.87806205770278
+step_3,Test,80+,92.08994708994709
diff --git a/data/clean/clean_flow.csv b/data/clean/clean_flow.csv
@@ -0,0 +1,3 @@
+Group,users_completed,total_users,clean_completion_rate
+Control,7683,26271,29.25
+Test,8944,29908,29.91
diff --git a/data/clean/cleaned_data_file.csv b/data/clean/cleaned_data_file.csv
diff --git a/data/clean/completions_stat_summary_clean.csv b/data/clean/completions_stat_summary_clean.csv
@@ -0,0 +1,5 @@
+Scenario,Group,Completion Rate (0 errors),Completion Rate (Confirmed),Observed Difference,Required Difference for 5% Lift,Z-statistic,P-value (one-sided),Statistical Conclusion,Interpretation
+0 errors,Control,29.2452%,,n/a,n/a,n/a,n/a,n/a,n/a
+0 errors,Test,29.9050%,,0.6599%,1.4623%,-2.0788,0.9812,Fail to reject null,No cost-effective improvement
+Confirm,Control,,59.2288%,n/a,n/a,n/a,n/a,n/a,n/a
+Confirm,Test,,65.1966%,5.9678%,2.9614%,7.3403,0.0,Reject the null hypothesis,Test shows cost-effective improvement (lift > 5%).