Skip to content

add readme and created seperate folder#1730

Open
IshikaAnand7 wants to merge 1 commit intoChameleon-company:masterfrom
IshikaAnand7:master
Open

add readme and created seperate folder#1730
IshikaAnand7 wants to merge 1 commit intoChameleon-company:masterfrom
IshikaAnand7:master

Conversation

@IshikaAnand7
Copy link
Copy Markdown
Collaborator

Add SDNET2018 Data Exploration, Balancing, and Augmentation Notebook

Overview

This PR introduces a comprehensive data exploration notebook (02_sdnet2018_data_exploration.ipynb) that provides end-to-end analysis and preparation of the SDNET2018 crack detection dataset.

Key Features

  • Dataset Discovery: Automatically scans the local SDNET2018-style folder structure (Deck, Pavement, Wall surfaces with Cracked/Uncracked subclasses) and builds a complete manifest
  • Class Imbalance Analysis: Visualizes the distribution of cracked vs. uncracked images across surface types to identify imbalance issues
  • Image Geometry Profiling: Samples and analyzes image dimensions to understand dataset characteristics
  • Train/Validation/Test Splits: Prepares balanced dataset splits while preserving class distributions
  • Augmentation Previews: Demonstrates Albumentations-based transformations (rotation, brightness, distortion, etc.) useful for robustness during training
  • Balanced Training Manifests: Creates curated training manifests with class-balanced sampling strategies

Dataset Structure Supported

dataset/
├── D/  (Deck)
│   ├── CD/ (Cracked Deck)
│   └── UD/ (Uncracked Deck)
├── P/  (Pavement)
│   ├── CP/ (Cracked Pavement)
│   └── UP/ (Uncracked Pavement)
└── W/  (Wall)
    ├── CW/ (Cracked Wall)
    └── UW/ (Uncracked Wall)

Outputs

  • Saved manifests and visualization plots to artifacts/manifests/ and artifacts/plots/
  • Ready-to-use dataframes for training and evaluation pipelines

Use Case

Ideal for crack detection model development, dataset validation, and understanding class imbalance challenges before model training.

Copy link
Copy Markdown
Collaborator

@kavita57 kavita57 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good

Copy link
Copy Markdown
Collaborator

@manya0033 manya0033 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Ishika, I've reviewed the notebook and the PR. The data exploration work is solid, good structure with class balance analysis, pixel intensity profiling, stratified splits, and augmentation previews. A few things need to be addressed before I can approve:

  1. PR title format - The title needs to follow the standard: team name, project name (matching Trello card), and completion percentage. "add readme and created seperate folder" doesn't meet this. Something like: "AI | Project 6a: Crack Detection -Data Exploration | X% Completion".

  2. PR source - This is coming from IshikaAnand7:master (your personal fork). PRs should come from a dedicated branch after cloning the company repository, not from a forked repo's master branch.

  3. Australian English - Found "summarize" in the notebook, should be "summarise".

  4. Dataset access - The notebook loads from a local ./dataset directory. The checklist requires datasets to be accessed via API v2.1.

  5. Review quality - I'll be honest, the approval from Kavita was a single word ("good") and came through immediately. A proper review should catch things like the points above. I'd also flag that this is the same pattern as Kavita's PR (#1728) where the review was approved very quickly. The review process is there to help each other improve the work, so please make sure your second reviewer does a thorough pass against the checklist.

  6. Folder structure - Your notebook is under Playground/project_6a_ishika and Kavita's work is under Playground/project_6a. Since you're both working on the same project, the notebooks should be in a single shared project folder rather than split into separate personal ones. The numbering already suggests a sequence (01, 02, 03) so they belong together.

The notebook content is genuinely good, once these process items are sorted it should be ready. Tag me when you push the updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants