Skip to content

Conversation

@loicguillois
Copy link
Collaborator

Add comprehensive analysis scripts and improve DPE import to support updating existing buildings with newer DPE data.

Analysis tools:

  • Add analyze_dpe_distribution.py: temporal distribution analysis
  • Add analyze_dpe_prod_comparison.py: compare RAW vs PROD data with
    quarterly breakdown, BAN address matching analysis, and PDF reports
  • Add ban_lookup.py: fast BAN address lookup using Parquet format
  • Add extract_missing_dpe.py: extract DPE not imported to production

Import improvements:

  • Remove dpe_id IS NULL filter to allow updating existing DPE
  • Add date comparison logic: only import if new DPE is more recent
  • Add id_rnb and provenance_id_rnb fields to dpe_raw import
  • Centralize logs in logs/ directory for all import scripts
  • Add --after parameter to import-ademe.py for date filtering

Analysis features:

  • Quarterly breakdown of DPE imports with RNB ID tracking
  • BAN address matching analysis (found vs missing in ban_addresses)
  • Detection of buildings that should be updated but weren't
  • PDF export with matplotlib charts for visual reporting

This addresses the low 2025 DPE import rate (0.6%) by:

  1. Enabling updates of existing buildings with newer DPE
  2. Identifying ~81.6% of BAN addresses are in the full BAN dataset
    but only ~45% match the ban_addresses table in production
  3. Providing tools to analyze and monitor import effectiveness

@gitguardian
Copy link

gitguardian bot commented Nov 20, 2025

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
9429425 Triggered Generic Password 36c39ba server/src/scripts/import-dpe/analyze_dpe_distribution.py View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@tristanrobert
Copy link
Contributor

tristanrobert commented Nov 20, 2025

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@loicguillois loicguillois added documentation Improvements or additions to documentation enhancement New feature or request labels Nov 20, 2025
@loicguillois loicguillois self-assigned this Dec 9, 2025
@loicguillois loicguillois force-pushed the feat/dpe-import-analysis-and-improvements branch from 9bbf561 to 36c39ba Compare December 9, 2025 13:37
@loicguillois loicguillois marked this pull request as ready for review December 9, 2025 13:45
Update test file to use new method signature with `existing_dpe_id`
and `building` parameters instead of the old `existing_dpe` dict.
@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 9, 2025

@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 35.16%. Comparing base (a7720e1) to head (155bed9).

❗ There is a different number of reports uploaded between BASE (a7720e1) and HEAD (155bed9). Click for more details.

HEAD has 2 uploads less than BASE
Flag BASE (a7720e1) HEAD (155bed9)
packages 1 0
server 1 0
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1471       +/-   ##
===========================================
- Coverage   63.61%   35.16%   -28.46%     
===========================================
  Files         356        4      -352     
  Lines       21734     1166    -20568     
  Branches     2048        0     -2048     
===========================================
- Hits        13826      410    -13416     
+ Misses       7875      756     -7119     
+ Partials       33        0       -33     
Flag Coverage Δ
packages ?
python-scripts 35.16% <ø> (ø)
server ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants