Skip to content

Extractor can misidentify long abbreviation sections as tables #8

@johnzed0102

Description

@johnzed0102

The parser/extractor can sometimes misidentify abbreviation sections near the end of a paper as tables, especially when the abbreviation list is long or visually table-like.

Observed behavior

  • Long abbreviation sections on the final pages may be extracted as table candidates.
  • These false-positive candidates can then appear in downstream parser artifacts as if they were real tables.
  • This is most noticeable when the abbreviation list has aligned terms and definitions or multiple short lines that resemble a table grid.

Expected behavior

  • Abbreviation sections should generally not be treated as scientific tables.
  • If extracted as table-like structures, they should be routed as non-Table-1 / unknown table family or otherwise marked so they do not pollute Table 1 parsing outputs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions