Skip to content

DATA SCIENCE | Project 22: Parking Utilisation and Management | 50% Completion#1715

Open
AtishayDeakin wants to merge 8 commits intomasterfrom
project_22_master
Open

DATA SCIENCE | Project 22: Parking Utilisation and Management | 50% Completion#1715
AtishayDeakin wants to merge 8 commits intomasterfrom
project_22_master

Conversation

@AtishayDeakin
Copy link
Copy Markdown
Collaborator

This PR merges the completed Sprint 1 (Data Setup) and the initial phase of Sprint 2 (Model Build). It establishes the end-to-end data pipeline, from raw ingestion to a functional baseline predictive model.

List of Changes:

  1. Data Standardisation: Implemented a Python-based ingestion pipeline that converts raw sensor pings into a consistent ISO 8601 format.
  2. Occupancy Mapping: Developed logic to map categorical sensor states into a binary format (1/0) for machine learning readiness.
  3. Strategic Filtering: Added a custom filter to handle COVID-19 lockdown anomalies (March–May 2020) to maintain baseline model integrity.
  4. Exploratory Data Visualisation: Created time-series and spatial plots to identify high-congestion hotspots in the Melbourne CBD (Lonsdale and Bourke Streets).
  5. Baseline Model Implementation: Developed and trained a Random Forest classifier to forecast bay availability.

Copy link
Copy Markdown
Collaborator

@manya0033 manya0033 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Atishay, I have gone through the notebook and the PR. Good progress on the pipeline and baseline model, the logic is solid. However, there are a few things that need to be addressed before I can approve, based on the PR checklist:

  1. Dataset access - The notebook loads from a local CSV. As Per the checklist, datasets need to be accessed via API v2.1 and API keys must not be visible.

  2. File naming- The notebook is currently called task.ipynb. It needs to follow the proper naming convention.

  3. Australian English - There are a few American English spellings that need updating:

    • "Standardizing" - "Standardising"
    • "Visualizing" - "Visualising"
  4. Libraries not at the top - matplotlib, seaborn, and sklearn are imported halfway through the notebook. The checklist asks for all library imports at the top.

  5. No markdown cells -The notebook is 4 code cells with no markdown. The checklist says the use case should read as a clear step-by-step tutorial using the correct Use Case Template. Adding markdown sections to explain what each step does and why would bring it in line.

  6. Visualisation interpretation - The occupancy plot has a title and labels which is good, but the checklist also requires a written interpretation of what the visualisation shows. Adding a markdown cell below the chart explaining the key takeaways would cover this.

  7. Citation artefact - Cell 1 has [cite: 317, 349] in a code comment which looks like a leftover and should be removed.

  8. Completion level - The PR title says 50% but the notebook is currently 4 fairly short code cells with minimal elaboration. For 50% I’d expect to see more depth in each section - data quality checks, result observations, more detailed EDA with visualisations, and discussion of model performance beyond just the accuracy score.
    Once these are sorted, I'm happy to re-review. Tag me when you push the updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants