Update test_network.py#619
Closed
bosd wants to merge 1 commit into
Closed
Conversation
47e78f4 to
56537cc
Compare
Collaborator
Author
|
Triage update The substance here is not obsolete — the parser changes are legitimate:
Why CI is fully red:
Action items if you want to resurrect this:
I won't auto-rebase or close — this is your work and the parser side has nuance. Happy to do the rebase + fixture update pass if you confirm the direction; just say so. Leaving open for now. |
…eparator confusion Rebase of the older bosd-fix585-attempt onto current master, keeping only the network parser fixes + the new test fixture (the old branch had also gone behind master enough that its 'parsing_report' delta was a straight revert of #739's confidence metric — dropped). Substance preserved from the original branch: * compute_plausible_gaps: use the gap between text *edges* (h_textlines[i].x0 - h_textlines[i-1].x1) instead of between text *starts* (x0 - x0). The new formula is the right one for column spacing detection; text-width-dependent biases disappear. * search_table_body: handle 'no aligned textline found' explicitly by returning None instead of dereferencing it downstream. * search_header_from_body_bbox: add 'default=0' to the merged-zones max() so an empty iterator doesn't raise. * _generate_table_bbox: refactor for two distinct cases — user- provided table_areas (run network detection on the area's textlines only) vs discovery mode (scan whole page). * drop noqa: C901 on find_closest_tls — the refactor brought cyclomatic complexity back under the limit. New fixture / tests: * tests/files/good_energy.pdf: a reproducer for the column-separator confusion that motivated the original #585 issue. * tests/test_network.py: test_issue_585 + test_issue_585_network_flavor_with_table_areas exercising the table_areas + columns path on multiple_tables.pdf and good_energy.pdf. If existing test_network fixtures now produce slightly different column boundaries (the gap-edge change can do that on edge cases), update them per-fixture rather than narrowing the formula.
This was referenced May 20, 2026
bosd
pushed a commit
to bosd/camelot
that referenced
this pull request
May 20, 2026
…mal) Two real crash-path bugs from the year-old camelot-dev#619 (bosd-fix585-attempt): * TextNetworks.search_table_body: return None early when most_connected_textline() has nothing to return. Previously dereferenced None on the next line. * TextNetworks.search_header_from_body_bbox: max(..., default=0) on the merged-zones generator so an empty zones list no longer raises ValueError: max() arg is an empty sequence. The _generate_table_bbox refactor was included in an earlier attempt of this PR but turned out to depend on a _get_user_provided_bboxes helper that didn't exist on master, breaking every test_network. It also pushed cyclomatic complexity from 11 to 16 (past the flake8 C901 limit). Dropping it; the refactor's value can be revisited as its own focused PR. The gap-edge formula change (.x0 - .x1 instead of .x0 - .x0) stays held back for the planned camelot-dev#619 split 3/3 with fixture recalibration.
bosd
added a commit
that referenced
this pull request
May 20, 2026
…1/3) (#744) Co-authored-by: bosd <ebo@stefcy.com>
bosd
added a commit
that referenced
this pull request
May 20, 2026
…t 2/3) (#745) Co-authored-by: bosd <ebo@stefcy.com>
Collaborator
Author
|
Closing as superseded by a clean 3-PR split:
The defensive bits no longer need this branch as their staging ground; closing for hygiene. Will reopen or supersede with a new PR for 3/3 once the local validation against the #585 reproducer is done. |
bosd
pushed a commit
to bosd/camelot
that referenced
this pull request
May 20, 2026
…ormula
A single-line marker at the deferred change site so the work doesn't
slip out of memory when the 2.0 release window closes. The patch
itself is one character per gap line ('.x0 - .x0' -> '.x0 - .x1')
but needs the two xfailed regression tests in tests/test_network.py
(merged in camelot-dev#745) to flip to xpass before it can ship — and that
requires local-run validation against the camelot-dev#585 reproducers
(multiple_tables.pdf, good_energy.pdf).
Pairs with a separate tracking issue opened alongside this commit.
5 tasks
bosd
pushed a commit
to bosd/camelot
that referenced
this pull request
May 21, 2026
…ter (camelot-dev#770) Local validation (debug/619 notebook) showed the naive .x0->.x1 gap-formula swap fixes camelot-dev#585 but regresses 12 network/hybrid fixtures because the gap consumers are stride-calibrated. Replace the 'TODO: apply this change' comment with an 'NB: do NOT apply this naively, here's why' pointer to camelot-dev#770's analysis, so the next person doesn't re-attempt the same dead end. The two xfailed camelot-dev#585 tests remain as the documented limitation.
bosd
pushed a commit
to bosd/camelot
that referenced
this pull request
May 21, 2026
…ormula
A single-line marker at the deferred change site so the work doesn't
slip out of memory when the 2.0 release window closes. The patch
itself is one character per gap line ('.x0 - .x0' -> '.x0 - .x1')
but needs the two xfailed regression tests in tests/test_network.py
(merged in camelot-dev#745) to flip to xpass before it can ship — and that
requires local-run validation against the camelot-dev#585 reproducers
(multiple_tables.pdf, good_energy.pdf).
Pairs with a separate tracking issue opened alongside this commit.
bosd
pushed a commit
to bosd/camelot
that referenced
this pull request
May 21, 2026
…ter (camelot-dev#770) Local validation (debug/619 notebook) showed the naive .x0->.x1 gap-formula swap fixes camelot-dev#585 but regresses 12 network/hybrid fixtures because the gap consumers are stride-calibrated. Replace the 'TODO: apply this change' comment with an 'NB: do NOT apply this naively, here's why' pointer to camelot-dev#770's analysis, so the next person doesn't re-attempt the same dead end. The two xfailed camelot-dev#585 tests remain as the documented limitation.
bosd
pushed a commit
to bosd/camelot
that referenced
this pull request
May 21, 2026
…ormula
A single-line marker at the deferred change site so the work doesn't
slip out of memory when the 2.0 release window closes. The patch
itself is one character per gap line ('.x0 - .x0' -> '.x0 - .x1')
but needs the two xfailed regression tests in tests/test_network.py
(merged in camelot-dev#745) to flip to xpass before it can ship — and that
requires local-run validation against the camelot-dev#585 reproducers
(multiple_tables.pdf, good_energy.pdf).
Pairs with a separate tracking issue opened alongside this commit.
bosd
pushed a commit
to bosd/camelot
that referenced
this pull request
May 21, 2026
…ter (camelot-dev#770) Local validation (debug/619 notebook) showed the naive .x0->.x1 gap-formula swap fixes camelot-dev#585 but regresses 12 network/hybrid fixtures because the gap consumers are stride-calibrated. Replace the 'TODO: apply this change' comment with an 'NB: do NOT apply this naively, here's why' pointer to camelot-dev#770's analysis, so the next person doesn't re-attempt the same dead end. The two xfailed camelot-dev#585 tests remain as the documented limitation.
bosd
added a commit
that referenced
this pull request
May 21, 2026
Co-authored-by: bosd <ebo@stefcy.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.