DATA SCIENCE | Project 22: Parking Utilisation and Management | 50% Completion#1715
DATA SCIENCE | Project 22: Parking Utilisation and Management | 50% Completion#1715AtishayDeakin wants to merge 8 commits intomasterfrom
Conversation
Data Science - Sprint 1 (Data Cleaning and Ingestion Pipeline) - 100%
Sprint 2 - Task 2 Done - Baseline Predictive Model Development
There was a problem hiding this comment.
Hey Atishay, I have gone through the notebook and the PR. Good progress on the pipeline and baseline model, the logic is solid. However, there are a few things that need to be addressed before I can approve, based on the PR checklist:
-
Dataset access - The notebook loads from a local CSV. As Per the checklist, datasets need to be accessed via API v2.1 and API keys must not be visible.
-
File naming- The notebook is currently called task.ipynb. It needs to follow the proper naming convention.
-
Australian English - There are a few American English spellings that need updating:
- "Standardizing" - "Standardising"
- "Visualizing" - "Visualising"
-
Libraries not at the top - matplotlib, seaborn, and sklearn are imported halfway through the notebook. The checklist asks for all library imports at the top.
-
No markdown cells -The notebook is 4 code cells with no markdown. The checklist says the use case should read as a clear step-by-step tutorial using the correct Use Case Template. Adding markdown sections to explain what each step does and why would bring it in line.
-
Visualisation interpretation - The occupancy plot has a title and labels which is good, but the checklist also requires a written interpretation of what the visualisation shows. Adding a markdown cell below the chart explaining the key takeaways would cover this.
-
Citation artefact - Cell 1 has [cite: 317, 349] in a code comment which looks like a leftover and should be removed.
-
Completion level - The PR title says 50% but the notebook is currently 4 fairly short code cells with minimal elaboration. For 50% I’d expect to see more depth in each section - data quality checks, result observations, more detailed EDA with visualisations, and discussion of model performance beyond just the accuracy score.
Once these are sorted, I'm happy to re-review. Tag me when you push the updates.
This PR merges the completed Sprint 1 (Data Setup) and the initial phase of Sprint 2 (Model Build). It establishes the end-to-end data pipeline, from raw ingestion to a functional baseline predictive model.
List of Changes: