Update test_network.py by bosd · Pull Request #619 · camelot-dev/camelot

bosd · 2025-08-14T14:54:13Z

No description provided.

bosd · 2026-05-20T12:58:21Z

Triage update

The substance here is not obsolete — the parser changes are legitimate:

compute_plausible_gaps: h_textlines[i].x0 - h_textlines[i-1].x1 (gap between text edges) replaces x0 - x0 (gap between text starts). The new formula is the right one for column-spacing detection — text-width-dependent biases disappear.
max(..., default=0) on the merged-zones loop in search_header_from_body_bbox is a real safety fix for the empty-iterator case.
New good_energy.pdf fixture is a genuine reproducer for Unable to specify column separators for only one table when there are multiple tables #585-style column-separator confusion.

Why CI is fully red:

pre-commit 3.10 fail = branch predates a couple of pre-commit-config.yaml / black / flake8 changes that landed since June.
Every test row failing in lockstep = the network gap-formula change shifts what counts as a column boundary on the existing test fixtures (foo.pdf, budget.pdf, …). Either those fixtures need their pinned column/row coords updated, or the formula change needs to be opt-in.

Action items if you want to resurrect this:

Rebase onto current master (mechanical).
Run tests/test_network.py locally and decide for each diff between old and new output: is the new output correct (then update the fixture)? Or did we lose information (then narrow the formula change)?
Add a test specifically exercising good_energy.pdf (the regression that prompted the PR).

I won't auto-rebase or close — this is your work and the parser side has nuance. Happy to do the rebase + fixture update pass if you confirm the direction; just say so. Leaving open for now.

…eparator confusion Rebase of the older bosd-fix585-attempt onto current master, keeping only the network parser fixes + the new test fixture (the old branch had also gone behind master enough that its 'parsing_report' delta was a straight revert of #739's confidence metric — dropped). Substance preserved from the original branch: * compute_plausible_gaps: use the gap between text *edges* (h_textlines[i].x0 - h_textlines[i-1].x1) instead of between text *starts* (x0 - x0). The new formula is the right one for column spacing detection; text-width-dependent biases disappear. * search_table_body: handle 'no aligned textline found' explicitly by returning None instead of dereferencing it downstream. * search_header_from_body_bbox: add 'default=0' to the merged-zones max() so an empty iterator doesn't raise. * _generate_table_bbox: refactor for two distinct cases — user- provided table_areas (run network detection on the area's textlines only) vs discovery mode (scan whole page). * drop noqa: C901 on find_closest_tls — the refactor brought cyclomatic complexity back under the limit. New fixture / tests: * tests/files/good_energy.pdf: a reproducer for the column-separator confusion that motivated the original #585 issue. * tests/test_network.py: test_issue_585 + test_issue_585_network_flavor_with_table_areas exercising the table_areas + columns path on multiple_tables.pdf and good_energy.pdf. If existing test_network fixtures now produce slightly different column boundaries (the gap-edge change can do that on edge cases), update them per-fixture rather than narrowing the formula.

…mal) Two real crash-path bugs from the year-old camelot-dev#619 (bosd-fix585-attempt): * TextNetworks.search_table_body: return None early when most_connected_textline() has nothing to return. Previously dereferenced None on the next line. * TextNetworks.search_header_from_body_bbox: max(..., default=0) on the merged-zones generator so an empty zones list no longer raises ValueError: max() arg is an empty sequence. The _generate_table_bbox refactor was included in an earlier attempt of this PR but turned out to depend on a _get_user_provided_bboxes helper that didn't exist on master, breaking every test_network. It also pushed cyclomatic complexity from 11 to 16 (past the flake8 C901 limit). Dropping it; the refactor's value can be revisited as its own focused PR. The gap-edge formula change (.x0 - .x1 instead of .x0 - .x0) stays held back for the planned camelot-dev#619 split 3/3 with fixture recalibration.

…1/3) (#744) Co-authored-by: bosd <ebo@stefcy.com>

…t 2/3) (#745) Co-authored-by: bosd <ebo@stefcy.com>

bosd · 2026-05-20T21:17:49Z

Closing as superseded by a clean 3-PR split:

fix(network): defensive guards in TextNetworks / Network (#619 split 1/3) #744 (merged) — the two safe defensive guards in TextNetworks (return None on missing connected textline, max(..., default=0) on empty zones).
test(network): #585 fixture + two xfailed regression tests (#619 split 2/3) #745 (merged) — the good_energy.pdf fixture + two xfailed regression tests pinned to Unable to specify column separators for only one table when there are multiple tables #585's specific symptom.
3/3 (not yet opened) — the gap-edge formula change (x0 - x1 instead of x0 - x0). Held back because it shifts the column-spread math in search_header_from_body_bbox and needs the xfailed tests of test(network): #585 fixture + two xfailed regression tests (#619 split 2/3) #745 to flip to xpass before it can be merged with confidence.

The defensive bits no longer need this branch as their staging ground; closing for hygiene. Will reopen or supersede with a new PR for 3/3 once the local validation against the #585 reproducer is done.

…ormula A single-line marker at the deferred change site so the work doesn't slip out of memory when the 2.0 release window closes. The patch itself is one character per gap line ('.x0 - .x0' -> '.x0 - .x1') but needs the two xfailed regression tests in tests/test_network.py (merged in camelot-dev#745) to flip to xpass before it can ship — and that requires local-run validation against the camelot-dev#585 reproducers (multiple_tables.pdf, good_energy.pdf). Pairs with a separate tracking issue opened alongside this commit.

…ter (camelot-dev#770) Local validation (debug/619 notebook) showed the naive .x0->.x1 gap-formula swap fixes camelot-dev#585 but regresses 12 network/hybrid fixtures because the gap consumers are stride-calibrated. Replace the 'TODO: apply this change' comment with an 'NB: do NOT apply this naively, here's why' pointer to camelot-dev#770's analysis, so the next person doesn't re-attempt the same dead end. The two xfailed camelot-dev#585 tests remain as the documented limitation.

…ormula A single-line marker at the deferred change site so the work doesn't slip out of memory when the 2.0 release window closes. The patch itself is one character per gap line ('.x0 - .x0' -> '.x0 - .x1') but needs the two xfailed regression tests in tests/test_network.py (merged in camelot-dev#745) to flip to xpass before it can ship — and that requires local-run validation against the camelot-dev#585 reproducers (multiple_tables.pdf, good_energy.pdf). Pairs with a separate tracking issue opened alongside this commit.

…ter (camelot-dev#770) Local validation (debug/619 notebook) showed the naive .x0->.x1 gap-formula swap fixes camelot-dev#585 but regresses 12 network/hybrid fixtures because the gap consumers are stride-calibrated. Replace the 'TODO: apply this change' comment with an 'NB: do NOT apply this naively, here's why' pointer to camelot-dev#770's analysis, so the next person doesn't re-attempt the same dead end. The two xfailed camelot-dev#585 tests remain as the documented limitation.

…ormula A single-line marker at the deferred change site so the work doesn't slip out of memory when the 2.0 release window closes. The patch itself is one character per gap line ('.x0 - .x0' -> '.x0 - .x1') but needs the two xfailed regression tests in tests/test_network.py (merged in camelot-dev#745) to flip to xpass before it can ship — and that requires local-run validation against the camelot-dev#585 reproducers (multiple_tables.pdf, good_energy.pdf). Pairs with a separate tracking issue opened alongside this commit.

…ter (camelot-dev#770) Local validation (debug/619 notebook) showed the naive .x0->.x1 gap-formula swap fixes camelot-dev#585 but regresses 12 network/hybrid fixtures because the gap consumers are stride-calibrated. Replace the 'TODO: apply this change' comment with an 'NB: do NOT apply this naively, here's why' pointer to camelot-dev#770's analysis, so the next person doesn't re-attempt the same dead end. The two xfailed camelot-dev#585 tests remain as the documented limitation.

Co-authored-by: bosd <ebo@stefcy.com>

bosd force-pushed the bosd-fix585-attempt branch 3 times, most recently from 47e78f4 to 56537cc Compare August 14, 2025 15:07

bosd force-pushed the bosd-fix585-attempt branch from ec30867 to b6bed22 Compare May 20, 2026 12:41

bosd force-pushed the bosd-fix585-attempt branch from b6bed22 to 9119be3 Compare May 20, 2026 13:14

This was referenced May 20, 2026

fix(network): defensive guards in TextNetworks / Network (#619 split 1/3) #744

Merged

test(network): #585 fixture + two xfailed regression tests (#619 split 2/3) #745

Merged

bosd added a commit that referenced this pull request May 20, 2026

fix(network): defensive guards in TextNetworks / Network (#619 split …

2b3c7c4

…1/3) (#744) Co-authored-by: bosd <ebo@stefcy.com>

bosd added a commit that referenced this pull request May 20, 2026

test(network): #585 fixture + two xfailed regression tests (#619 spli…

ac349b3

…t 2/3) (#745) Co-authored-by: bosd <ebo@stefcy.com>

bosd closed this May 20, 2026

bosd mentioned this pull request May 20, 2026

#619 split 3/3 — gap-edge formula change in compute_plausible_gaps #770

Closed

5 tasks

bosd added a commit that referenced this pull request May 21, 2026

chore(network): add TODO marker for #619 split 3/3 (#771)

4b5c1da

Co-authored-by: bosd <ebo@stefcy.com>

bosd mentioned this pull request May 22, 2026

bench: FinTabNet harness for the borderless/network parser path #798

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update test_network.py#619

Update test_network.py#619
bosd wants to merge 1 commit into
masterfrom
bosd-fix585-attempt

bosd commented Aug 14, 2025

Uh oh!

bosd commented May 20, 2026 •

edited

Loading

Uh oh!

bosd commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bosd commented Aug 14, 2025

Uh oh!

bosd commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bosd commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bosd commented May 20, 2026 •

edited

Loading