From ef2e5e415f96fd623deb252d1e4e566ba649936b Mon Sep 17 00:00:00 2001 From: Michele Dolfi Date: Mon, 3 Feb 2025 08:53:26 +0100 Subject: [PATCH] update test results Signed-off-by: Michele Dolfi --- .../docling_v2/2203.01017v2.doctags.txt | 8 ++-- .../groundtruth/docling_v2/2203.01017v2.json | 2 +- .../groundtruth/docling_v2/2203.01017v2.md | 22 ++++----- .../data/groundtruth/docling_v2/2206.01062.md | 10 ++-- .../groundtruth/docling_v2/2305.03393v1.md | 8 ++-- .../docling_v2/code_and_formula.doctags.txt | 2 +- .../docling_v2/code_and_formula.json | 2 +- .../docling_v2/code_and_formula.md | 2 +- .../groundtruth/docling_v2/elife-56337.xml.md | 24 +++++----- .../groundtruth/docling_v2/example_04.html.md | 6 +-- .../groundtruth/docling_v2/example_05.html.md | 6 +-- .../groundtruth/docling_v2/ipa20180000016.md | 20 ++++---- .../docling_v2/pntd.0008301.xml.md | 32 ++++++------- .../docling_v2/pone.0234687.xml.md | 4 +- .../docling_v2/redp5110_sampled.md | 46 +++++++++---------- .../groundtruth/docling_v2/wiki_duck.html.md | 22 ++++----- 16 files changed, 108 insertions(+), 108 deletions(-) diff --git a/tests/data/groundtruth/docling_v2/2203.01017v2.doctags.txt b/tests/data/groundtruth/docling_v2/2203.01017v2.doctags.txt index edc5c84ba..eaee84482 100644 --- a/tests/data/groundtruth/docling_v2/2203.01017v2.doctags.txt +++ b/tests/data/groundtruth/docling_v2/2203.01017v2.doctags.txt @@ -106,12 +106,12 @@ The output features for each table cell are then fed into the feed-forward network (FFN). The FFN consists of a Multi-Layer Perceptron (3 layers with ReLU activation function) that predicts the normalized coordinates for the bounding box of each table cell. Finally, the predicted bounding boxes are classified based on whether they are empty or not using a linear layer. Loss Functions. We formulate a multi-task loss Eq. 2 to train our network. The Cross-Entropy loss (denoted as l$_{s}$ ) is used to train the Structure Decoder which predicts the structure tokens. As for the Cell BBox Decoder it is trained with a combination of losses denoted as l$_{box}$ . l$_{box}$ consists of the generally used l$_{1}$ loss for object detection and the IoU loss ( l$_{iou}$ ) to be scale invariant as explained in [25]. In comparison to DETR, we do not use the Hungarian algorithm [15] to match the predicted bounding boxes with the ground-truth boxes, as we have already achieved a one-toone match through two steps: 1) Our token input sequence is naturally ordered, therefore the hidden states of the table data cells are also in order when they are provided as input to the Cell BBox Decoder , and 2) Our bounding boxes generation mechanism (see Sec. 3) ensures a one-to-one mapping between the cell content and its bounding box for all post-processed datasets. The loss used to train the TableFormer can be defined as following: -l$_{box}$ = λ$_{iou}$l$_{iou}$ + λ$_{l}$$_{1}$ l = λl$_{s}$ + (1 - λ ) l$_{box}$ (1) + where λ ∈ [0, 1], and λ$_{iou}$, λ$_{l}$$_{1}$ ∈$_{R}$ are hyper-parameters. 5. Experimental Results 5.1. Implementation Details TableFormer uses ResNet-18 as the CNN Backbone Network . The input images are resized to 448*448 pixels and the feature map has a dimension of 28*28. Additionally, we enforce the following input constraints: -Image width and height ≤ 1024 pixels Structural tags length ≤ 512 tokens. (2) + Although input constraints are used also by other methods, such as EDD, ours are less restrictive due to the improved runtime performance and lower memory footprint of TableFormer. This allows to utilize input samples with longer sequences and images with larger dimensions. The Transformer Encoder consists of two "Transformer Encoder Layers", with an input feature size of 512, feed forward network of 1024, and 4 attention heads. As for the Transformer Decoder it is composed of four "Transformer Decoder Layers" with similar input and output dimensions as the "Transformer Encoder Layers". Even though our model uses fewer layers and heads than the default implementation parameters, our extensive experimentation has proved this setup to be more suitable for table images. We attribute this finding to the inherent design of table images, which contain mostly lines and text, unlike the more elaborate content present in other scopes (e.g. the COCO dataset). Moreover, we have added ResNet blocks to the inputs of the Structure Decoder and Cell BBox Decoder. This prevents a decoder having a stronger influence over the learned weights which would damage the other prediction task (structure vs bounding boxes), but learn task specific weights instead. Lastly our dropout layers are set to 0.5. @@ -122,7 +122,7 @@ We also share our baseline results on the challenging SynthTabNet dataset. Throughout our experiments, the same parameters stated in Sec. 5.1 are utilized. 5.3. Datasets and Metrics The Tree-Edit-Distance-Based Similarity (TEDS) metric was introduced in [37]. It represents the prediction, and ground-truth as a tree structure of HTML tags. This similarity is calculated as: -TEDS ( T$_{a}$, T$_{b}$ ) = 1 - EditDist ( T$_{a}$, T$_{b}$ ) max ( | T$_{a}$ | , | T$_{b}$ | ) (3) + where T$_{a}$ and T$_{b}$ represent tables in tree structure HTML format. EditDist denotes the tree-edit distance, and | T | represents the number of nodes in T . 5.4. Quantitative Analysis Structure. As shown in Tab. 2, TableFormer outperforms all SOTA methods across different datasets by a large margin for predicting the table structure from an image. All the more, our model outperforms pre-trained methods. During the evaluation we do not apply any table filtering. We also provide our baseline results on the SynthTabNet dataset. It has been observed that large tables (e.g. tables that occupy half of the page or more) yield poor predictions. We attribute this issue to the image resizing during the preprocessing step, that produces downsampled images with indistinguishable features. This problem can be addressed by treating such big tables with a separate model which accepts a large input image size. @@ -304,7 +304,7 @@ 3.a. If all IOU scores in a column are below the threshold, discard all predictions (structure and bounding boxes) for that column. 4. Find the best-fitting content alignment for the predicted cells with good IOU per each column. The alignment of the column can be identified by the following formula: -alignment = arg min c { D$_{c}$ } D$_{c}$ = max { x$_{c}$ } - min { x$_{c}$ } (4) + where c is one of { left, centroid, right } and x$_{c}$ is the xcoordinate for the corresponding point. 5. Use the alignment computed in step 4, to compute the median x -coordinate for all table columns and the me- diff --git a/tests/data/groundtruth/docling_v2/2203.01017v2.json b/tests/data/groundtruth/docling_v2/2203.01017v2.json index 5e4cac038..7fc631657 100644 --- a/tests/data/groundtruth/docling_v2/2203.01017v2.json +++ b/tests/data/groundtruth/docling_v2/2203.01017v2.json @@ -1 +1 @@ -{"schema_name": "DoclingDocument", "version": "1.0.0", "name": "2203.01017v2", "origin": {"mimetype": "application/pdf", "binary_hash": 10763566541725197878, "filename": "2203.01017v2.pdf", "uri": null}, "furniture": {"self_ref": "#/furniture", "parent": null, "children": [], "name": "_root_", "label": "unspecified"}, "body": {"self_ref": "#/body", "parent": null, "children": [{"cref": "#/texts/0"}, {"cref": "#/texts/1"}, {"cref": "#/texts/2"}, {"cref": "#/groups/0"}, {"cref": "#/texts/4"}, {"cref": "#/texts/5"}, {"cref": "#/texts/6"}, {"cref": "#/texts/7"}, {"cref": "#/pictures/0"}, {"cref": "#/texts/11"}, {"cref": "#/tables/0"}, {"cref": "#/groups/1"}, {"cref": "#/pictures/1"}, {"cref": "#/groups/2"}, {"cref": "#/pictures/2"}, {"cref": "#/texts/63"}, {"cref": "#/tables/1"}, {"cref": "#/texts/64"}, {"cref": "#/texts/65"}, {"cref": "#/texts/66"}, {"cref": "#/texts/67"}, {"cref": "#/texts/68"}, {"cref": "#/texts/69"}, {"cref": "#/texts/70"}, {"cref": "#/groups/3"}, {"cref": "#/texts/75"}, {"cref": "#/texts/76"}, {"cref": "#/texts/77"}, {"cref": "#/texts/78"}, {"cref": "#/texts/79"}, {"cref": "#/texts/80"}, {"cref": "#/texts/81"}, {"cref": "#/texts/82"}, {"cref": "#/texts/83"}, {"cref": "#/texts/84"}, {"cref": "#/texts/85"}, {"cref": "#/texts/86"}, {"cref": "#/texts/87"}, {"cref": "#/texts/88"}, {"cref": "#/texts/89"}, {"cref": "#/texts/90"}, {"cref": "#/pictures/3"}, {"cref": "#/texts/124"}, {"cref": "#/texts/125"}, {"cref": "#/texts/126"}, {"cref": "#/texts/127"}, {"cref": "#/texts/128"}, {"cref": "#/texts/129"}, {"cref": "#/texts/130"}, {"cref": "#/texts/131"}, {"cref": "#/texts/132"}, {"cref": "#/texts/133"}, {"cref": "#/tables/2"}, {"cref": "#/texts/134"}, {"cref": "#/texts/135"}, {"cref": "#/texts/136"}, {"cref": "#/texts/137"}, {"cref": "#/texts/138"}, {"cref": "#/texts/139"}, {"cref": "#/texts/140"}, {"cref": "#/texts/141"}, {"cref": "#/pictures/4"}, {"cref": "#/texts/201"}, {"cref": "#/pictures/5"}, {"cref": "#/texts/246"}, {"cref": "#/texts/247"}, {"cref": "#/texts/248"}, {"cref": "#/texts/249"}, {"cref": "#/texts/250"}, {"cref": "#/texts/251"}, {"cref": "#/texts/252"}, {"cref": "#/texts/253"}, {"cref": "#/texts/254"}, {"cref": "#/texts/255"}, {"cref": "#/texts/256"}, {"cref": "#/texts/257"}, {"cref": "#/texts/258"}, {"cref": "#/texts/259"}, {"cref": "#/texts/260"}, {"cref": "#/texts/261"}, {"cref": "#/texts/262"}, {"cref": "#/texts/263"}, {"cref": "#/texts/264"}, {"cref": "#/texts/265"}, {"cref": "#/texts/266"}, {"cref": "#/texts/267"}, {"cref": "#/texts/268"}, {"cref": "#/texts/269"}, {"cref": "#/texts/270"}, {"cref": "#/texts/271"}, {"cref": "#/texts/272"}, {"cref": "#/texts/273"}, {"cref": "#/texts/274"}, {"cref": "#/texts/275"}, {"cref": "#/texts/276"}, {"cref": "#/texts/277"}, {"cref": "#/tables/3"}, {"cref": "#/texts/278"}, {"cref": "#/texts/279"}, {"cref": "#/texts/280"}, {"cref": "#/texts/281"}, {"cref": "#/texts/282"}, {"cref": "#/tables/4"}, {"cref": "#/texts/283"}, {"cref": "#/texts/284"}, {"cref": "#/tables/5"}, {"cref": "#/groups/4"}, {"cref": "#/texts/287"}, {"cref": "#/texts/288"}, {"cref": "#/pictures/6"}, {"cref": "#/texts/289"}, {"cref": "#/pictures/7"}, {"cref": "#/tables/6"}, {"cref": "#/texts/290"}, {"cref": "#/tables/7"}, {"cref": "#/texts/291"}, {"cref": "#/pictures/8"}, {"cref": "#/pictures/9"}, {"cref": "#/texts/348"}, {"cref": "#/pictures/10"}, {"cref": "#/texts/350"}, {"cref": "#/texts/351"}, {"cref": "#/texts/352"}, {"cref": "#/texts/353"}, {"cref": "#/texts/354"}, {"cref": "#/groups/5"}, {"cref": "#/texts/356"}, {"cref": "#/groups/6"}, {"cref": "#/texts/372"}, {"cref": "#/groups/7"}, {"cref": "#/texts/383"}, {"cref": "#/groups/8"}, {"cref": "#/texts/396"}, {"cref": "#/groups/9"}, {"cref": "#/texts/399"}, {"cref": "#/texts/400"}, {"cref": "#/texts/401"}, {"cref": "#/texts/402"}, {"cref": "#/texts/403"}, {"cref": "#/texts/404"}, {"cref": "#/texts/405"}, {"cref": "#/texts/406"}, {"cref": "#/texts/407"}, {"cref": "#/texts/408"}, {"cref": "#/groups/10"}, {"cref": "#/texts/414"}, {"cref": "#/texts/415"}, {"cref": "#/texts/416"}, {"cref": "#/texts/417"}, {"cref": "#/pictures/11"}, {"cref": "#/groups/11"}, {"cref": "#/texts/479"}, {"cref": "#/texts/480"}, {"cref": "#/groups/12"}, {"cref": "#/texts/486"}, {"cref": "#/texts/487"}, {"cref": "#/groups/13"}, {"cref": "#/texts/489"}, {"cref": "#/groups/14"}, {"cref": "#/texts/494"}, {"cref": "#/groups/15"}, {"cref": "#/texts/499"}, {"cref": "#/texts/500"}, {"cref": "#/texts/501"}, {"cref": "#/texts/502"}, {"cref": "#/tables/8"}, {"cref": "#/tables/9"}, {"cref": "#/tables/10"}, {"cref": "#/texts/503"}, {"cref": "#/tables/11"}, {"cref": "#/texts/504"}, {"cref": "#/tables/12"}, {"cref": "#/tables/13"}, {"cref": "#/tables/14"}, {"cref": "#/pictures/12"}, {"cref": "#/texts/505"}, {"cref": "#/tables/15"}, {"cref": "#/tables/16"}, {"cref": "#/tables/17"}, {"cref": "#/tables/18"}, {"cref": "#/pictures/13"}, {"cref": "#/texts/506"}, {"cref": "#/tables/19"}, {"cref": "#/tables/20"}, {"cref": "#/texts/507"}, {"cref": "#/pictures/14"}, {"cref": "#/tables/21"}, {"cref": "#/tables/22"}, {"cref": "#/tables/23"}, {"cref": "#/texts/508"}, {"cref": "#/pictures/15"}, {"cref": "#/texts/509"}, {"cref": "#/tables/24"}, {"cref": "#/tables/25"}, {"cref": "#/tables/26"}, {"cref": "#/texts/510"}, {"cref": "#/pictures/16"}, {"cref": "#/tables/27"}, {"cref": "#/tables/28"}, {"cref": "#/tables/29"}, {"cref": "#/texts/511"}, {"cref": "#/tables/30"}, {"cref": "#/pictures/17"}, {"cref": "#/tables/31"}, {"cref": "#/pictures/18"}, {"cref": "#/tables/32"}, {"cref": "#/pictures/19"}, {"cref": "#/pictures/20"}, {"cref": "#/texts/512"}, {"cref": "#/tables/33"}, {"cref": "#/texts/513"}, {"cref": "#/tables/34"}, {"cref": "#/tables/35"}, {"cref": "#/pictures/21"}, {"cref": "#/tables/36"}, {"cref": "#/pictures/22"}, {"cref": "#/texts/514"}, {"cref": "#/tables/37"}, {"cref": "#/texts/515"}, {"cref": "#/pictures/23"}, {"cref": "#/texts/516"}], "name": "_root_", "label": "unspecified"}, "groups": [{"self_ref": "#/groups/0", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/3"}], "name": "group", "label": "key_value_area"}, {"self_ref": "#/groups/1", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/12"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/2", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/38"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/3", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/71"}, {"cref": "#/texts/72"}, {"cref": "#/texts/73"}, {"cref": "#/texts/74"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/4", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/285"}, {"cref": "#/texts/286"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/5", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/355"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/6", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/357"}, {"cref": "#/texts/358"}, {"cref": "#/texts/359"}, {"cref": "#/texts/360"}, {"cref": "#/texts/361"}, {"cref": "#/texts/362"}, {"cref": "#/texts/363"}, {"cref": "#/texts/364"}, {"cref": "#/texts/365"}, {"cref": "#/texts/366"}, {"cref": "#/texts/367"}, {"cref": "#/texts/368"}, {"cref": "#/texts/369"}, {"cref": "#/texts/370"}, {"cref": "#/texts/371"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/7", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/373"}, {"cref": "#/texts/374"}, {"cref": "#/texts/375"}, {"cref": "#/texts/376"}, {"cref": "#/texts/377"}, {"cref": "#/texts/378"}, {"cref": "#/texts/379"}, {"cref": "#/texts/380"}, {"cref": "#/texts/381"}, {"cref": "#/texts/382"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/8", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/384"}, {"cref": "#/texts/385"}, {"cref": "#/texts/386"}, {"cref": "#/texts/387"}, {"cref": "#/texts/388"}, {"cref": "#/texts/389"}, {"cref": "#/texts/390"}, {"cref": "#/texts/391"}, {"cref": "#/texts/392"}, {"cref": "#/texts/393"}, {"cref": "#/texts/394"}, {"cref": "#/texts/395"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/9", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/397"}, {"cref": "#/texts/398"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/10", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/409"}, {"cref": "#/texts/410"}, {"cref": "#/texts/411"}, {"cref": "#/texts/412"}, {"cref": "#/texts/413"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/11", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/477"}, {"cref": "#/texts/478"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/12", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/481"}, {"cref": "#/texts/482"}, {"cref": "#/texts/483"}, {"cref": "#/texts/484"}, {"cref": "#/texts/485"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/13", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/488"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/14", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/490"}, {"cref": "#/texts/491"}, {"cref": "#/texts/492"}, {"cref": "#/texts/493"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/15", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/495"}, {"cref": "#/texts/496"}, {"cref": "#/texts/497"}, {"cref": "#/texts/498"}], "name": "list", "label": "list"}], "texts": [{"self_ref": "#/texts/0", "parent": {"cref": "#/body"}, "children": [], "label": "page_header", "prov": [{"page_no": 1, "bbox": {"l": 18.340221405029297, "t": 584.1799926757812, "r": 36.339778900146484, "b": 231.99996948242188, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 38]}], "orig": "arXiv:2203.01017v2 [cs.CV] 11 Mar 2022", "text": "arXiv:2203.01017v2 [cs.CV] 11 Mar 2022"}, {"self_ref": "#/texts/1", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 1, "bbox": {"l": 96.3010025024414, "t": 684.9658813476562, "r": 498.9270935058594, "b": 672.0686645507812, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 61]}], "orig": "TableFormer: Table Structure Understanding with Transformers.", "text": "TableFormer: Table Structure Understanding with Transformers.", "level": 1}, {"self_ref": "#/texts/2", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 1, "bbox": {"l": 142.4770050048828, "t": 645.3146362304688, "r": 452.7502746582031, "b": 620.6796264648438, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 73]}], "orig": "Ahmed Nassar, Nikolaos Livathinos, Maksym Lysak, Peter Staar IBM Research", "text": "Ahmed Nassar, Nikolaos Livathinos, Maksym Lysak, Peter Staar IBM Research", "level": 1}, {"self_ref": "#/texts/3", "parent": {"cref": "#/groups/0"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 208.123, "t": 616.03876, "r": 378.73257, "b": 607.57446, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 35]}], "orig": "{ ahn,nli,mly,taa } @zurich.ibm.com", "text": "{ ahn,nli,mly,taa } @zurich.ibm.com"}, {"self_ref": "#/texts/4", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 1, "bbox": {"l": 145.99497985839844, "t": 576.5170288085938, "r": 190.48028564453125, "b": 565.769287109375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 8]}], "orig": "Abstract", "text": "Abstract", "level": 1}, {"self_ref": "#/texts/5", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 1, "bbox": {"l": 315.5670166015625, "t": 573.9931640625, "r": 408.4407043457031, "b": 565.2451782226562, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 22]}], "orig": "a. Picture of a table:", "text": "a. Picture of a table:", "level": 1}, {"self_ref": "#/texts/6", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 1, "bbox": {"l": 50.111976623535156, "t": 252.05723571777344, "r": 126.94803619384766, "b": 241.30950927734375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 15]}], "orig": "1. Introduction", "text": "1. Introduction", "level": 1}, {"self_ref": "#/texts/7", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 50.111976623535156, "t": 231.216796875, "r": 286.3650817871094, "b": 78.84822082519531, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 712]}], "orig": "The occurrence of tables in documents is ubiquitous. They often summarise quantitative or factual data, which is cumbersome to describe in verbose text but nevertheless extremely valuable. Unfortunately, this compact representation is often not easy to parse by machines. There are many implicit conventions used to obtain a compact table representation. For example, tables often have complex columnand row-headers in order to reduce duplicated cell content. Lines of different shapes and sizes are leveraged to separate content or indicate a tree structure. Additionally, tables can also have empty/missing table-entries or multi-row textual table-entries. Fig. 1 shows a table which presents all these issues.", "text": "The occurrence of tables in documents is ubiquitous. They often summarise quantitative or factual data, which is cumbersome to describe in verbose text but nevertheless extremely valuable. Unfortunately, this compact representation is often not easy to parse by machines. There are many implicit conventions used to obtain a compact table representation. For example, tables often have complex columnand row-headers in order to reduce duplicated cell content. Lines of different shapes and sizes are leveraged to separate content or indicate a tree structure. Additionally, tables can also have empty/missing table-entries or multi-row textual table-entries. Fig. 1 shows a table which presents all these issues."}, {"self_ref": "#/texts/8", "parent": {"cref": "#/pictures/0"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 451.9457100000001, "t": 556.65295, "r": 457.95050000000003, "b": 546.52252, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/9", "parent": {"cref": "#/pictures/0"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 331.19681, "t": 522.64734, "r": 337.2016, "b": 512.51691, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "2", "text": "2"}, {"self_ref": "#/texts/10", "parent": {"cref": "#/pictures/0"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 384.0329, "t": 539.32104, "r": 390.03769, "b": 529.19061, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "3", "text": "3"}, {"self_ref": "#/texts/11", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 1, "bbox": {"l": 50.111976623535156, "t": 550.6049194335938, "r": 286.3651123046875, "b": 279.00335693359375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1320]}], "orig": "Tables organize valuable content in a concise and compact representation. This content is extremely valuable for systems such as search engines, Knowledge Graph's, etc, since they enhance their predictive capabilities. Unfortunately, tables come in a large variety of shapes and sizes. Furthermore, they can have complex column/row-header configurations, multiline rows, different variety of separation lines, missing entries, etc. As such, the correct identification of the table-structure from an image is a nontrivial task. In this paper, we present a new table-structure identification model. The latter improves the latest end-toend deep learning model (i.e. encoder-dual-decoder from PubTabNet) in two significant ways. First, we introduce a new object detection decoder for table-cells. In this way, we can obtain the content of the table-cells from programmatic PDF's directly from the PDF source and avoid the training of the custom OCR decoders. This architectural change leads to more accurate table-content extraction and allows us to tackle non-english tables. Second, we replace the LSTM decoders with transformer based decoders. This upgrade improves significantly the previous state-of-the-art tree-editing-distance-score (TEDS) from 91% to 98.5% on simple tables and from 88.7% to 95% on complex tables.", "text": "Tables organize valuable content in a concise and compact representation. This content is extremely valuable for systems such as search engines, Knowledge Graph's, etc, since they enhance their predictive capabilities. Unfortunately, tables come in a large variety of shapes and sizes. Furthermore, they can have complex column/row-header configurations, multiline rows, different variety of separation lines, missing entries, etc. As such, the correct identification of the table-structure from an image is a nontrivial task. In this paper, we present a new table-structure identification model. The latter improves the latest end-toend deep learning model (i.e. encoder-dual-decoder from PubTabNet) in two significant ways. First, we introduce a new object detection decoder for table-cells. In this way, we can obtain the content of the table-cells from programmatic PDF's directly from the PDF source and avoid the training of the custom OCR decoders. This architectural change leads to more accurate table-content extraction and allows us to tackle non-english tables. Second, we replace the LSTM decoders with transformer based decoders. This upgrade improves significantly the previous state-of-the-art tree-editing-distance-score (TEDS) from 91% to 98.5% on simple tables and from 88.7% to 95% on complex tables."}, {"self_ref": "#/texts/12", "parent": {"cref": "#/groups/1"}, "children": [], "label": "list_item", "prov": [{"page_no": 1, "bbox": {"l": 315.5670166015625, "t": 478.3052062988281, "r": 486.4019470214844, "b": 458.7572021484375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 68]}], "orig": "b. Red-annotation of bounding boxes, Blue-predictions by TableFormer", "text": "b. Red-annotation of bounding boxes, Blue-predictions by TableFormer", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/13", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 408.14752, "t": 449.17172, "r": 412.54001, "b": 440.38678, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/14", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 356.11011, "t": 450.42783, "r": 360.50259, "b": 441.64288, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/15", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 500.6777, "t": 451.06232, "r": 505.0701900000001, "b": 442.2773700000001, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "2", "text": "2"}, {"self_ref": "#/texts/16", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 356.13382, "t": 440.25211, "r": 360.52631, "b": 431.46716, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "3", "text": "3"}, {"self_ref": "#/texts/17", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 402.53992, "t": 436.1235, "r": 406.9324, "b": 427.33856, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "4", "text": "4"}, {"self_ref": "#/texts/18", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 448.58178999999996, "t": 439.15982, "r": 452.97427, "b": 430.37488, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "5", "text": "5"}, {"self_ref": "#/texts/19", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 491.65161000000006, "t": 438.29343, "r": 496.0441, "b": 429.50848, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "6", "text": "6"}, {"self_ref": "#/texts/20", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 535.13843, "t": 438.66031, "r": 539.53088, "b": 429.87537, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "7", "text": "7"}, {"self_ref": "#/texts/21", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 348.82822, "t": 404.90219, "r": 353.2207, "b": 396.11725, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "8", "text": "8"}, {"self_ref": "#/texts/22", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 389.27151, "t": 416.62772, "r": 393.664, "b": 407.84277, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "9", "text": "9"}, {"self_ref": "#/texts/23", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 442.67479999999995, "t": 416.35379, "r": 451.45889000000005, "b": 407.56885, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "10", "text": "10"}, {"self_ref": "#/texts/24", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 477.4382299999999, "t": 416.466, "r": 485.90167, "b": 407.68105999999995, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "11", "text": "11"}, {"self_ref": "#/texts/25", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 522.57263, "t": 416.35379, "r": 531.35669, "b": 407.56885, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "12", "text": "12"}, {"self_ref": "#/texts/26", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 400.22992, "t": 404.88571, "r": 409.01401, "b": 396.10077, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "13", "text": "13"}, {"self_ref": "#/texts/27", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 442.30792, "t": 405.01018999999997, "r": 451.0920100000001, "b": 396.22524999999996, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "14", "text": "14"}, {"self_ref": "#/texts/28", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 478.21941999999996, "t": 404.62531, "r": 487.00351000000006, "b": 395.84036, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "15", "text": "15"}, {"self_ref": "#/texts/29", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 523.2287, "t": 405.01018999999997, "r": 532.01276, "b": 396.22524999999996, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "16", "text": "16"}, {"self_ref": "#/texts/30", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 411.57233, "t": 392.57523, "r": 415.96481, "b": 383.79028, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/31", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 415.96393, "t": 392.57523, "r": 420.35641, "b": 383.79028, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "7", "text": "7"}, {"self_ref": "#/texts/32", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 442.30521, "t": 392.9628000000001, "r": 451.08929, "b": 384.17786000000007, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "18", "text": "18"}, {"self_ref": "#/texts/33", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 478.77893, "t": 393.00360000000006, "r": 487.56302, "b": 384.21866000000006, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "19", "text": "19"}, {"self_ref": "#/texts/34", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 523.97241, "t": 393.3885200000001, "r": 532.75647, "b": 384.60358, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "20", "text": "20"}, {"self_ref": "#/texts/35", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 385.09399, "t": 434.23969000000005, "r": 391.09879, "b": 424.10928, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "3", "text": "3"}, {"self_ref": "#/texts/36", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 333.43451, "t": 411.2735, "r": 339.4393, "b": 401.14310000000006, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "2", "text": "2"}, {"self_ref": "#/texts/37", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 478.07210999999995, "t": 450.9631999999999, "r": 484.0769, "b": 440.83279000000005, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/38", "parent": {"cref": "#/groups/2"}, "children": [], "label": "list_item", "prov": [{"page_no": 1, "bbox": {"l": 315.5670166015625, "t": 371.81719970703125, "r": 491.1912536621094, "b": 363.0691833496094, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 38]}], "orig": "c. Structure predicted by TableFormer:", "text": "c. Structure predicted by TableFormer:", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/39", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 347.24872, "t": 354.31412, "r": 351.6412, "b": 345.52917, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/40", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 318.88071, "t": 354.31412, "r": 323.27319, "b": 345.52917, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/41", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 394.10422, "t": 354.31412, "r": 398.4967, "b": 345.52917, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "2", "text": "2"}, {"self_ref": "#/texts/42", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 318.77316, "t": 342.4545, "r": 323.16565, "b": 333.66956, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "3", "text": "3"}, {"self_ref": "#/texts/43", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 347.24872, "t": 342.4545, "r": 351.6412, "b": 333.66956, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "4", "text": "4"}, {"self_ref": "#/texts/44", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 394.10422, "t": 342.4545, "r": 398.4967, "b": 333.66956, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "5", "text": "5"}, {"self_ref": "#/texts/45", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 440.95941000000005, "t": 342.4545, "r": 445.3519, "b": 333.66956, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "6", "text": "6"}, {"self_ref": "#/texts/46", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 487.81491, "t": 342.4545, "r": 492.2074, "b": 333.66956, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "7", "text": "7"}, {"self_ref": "#/texts/47", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 318.77316, "t": 318.29575, "r": 323.16565, "b": 309.5108, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "8", "text": "8"}, {"self_ref": "#/texts/48", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 347.24872, "t": 330.1554, "r": 351.6412, "b": 321.37045, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "9", "text": "9"}, {"self_ref": "#/texts/49", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 394.10422, "t": 330.1554, "r": 402.88831, "b": 321.37045, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "10", "text": "10"}, {"self_ref": "#/texts/50", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 440.95941000000005, "t": 330.1554, "r": 449.42285, "b": 321.37045, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "11", "text": "11"}, {"self_ref": "#/texts/51", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 487.81491, "t": 330.1554, "r": 496.599, "b": 321.37045, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "12", "text": "12"}, {"self_ref": "#/texts/52", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 347.24872, "t": 318.29575, "r": 356.03281, "b": 309.5108, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "13", "text": "13"}, {"self_ref": "#/texts/53", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 394.10422, "t": 318.29575, "r": 402.88831, "b": 309.5108, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "14", "text": "14"}, {"self_ref": "#/texts/54", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 440.95941000000005, "t": 318.29575, "r": 449.7435, "b": 309.5108, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "15", "text": "15"}, {"self_ref": "#/texts/55", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 487.81491, "t": 318.29575, "r": 496.599, "b": 309.5108, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "16", "text": "16"}, {"self_ref": "#/texts/56", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 347.24872, "t": 306.87531, "r": 356.03281, "b": 298.09036, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "17", "text": "17"}, {"self_ref": "#/texts/57", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 394.10422, "t": 306.87531, "r": 402.88831, "b": 298.09036, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "18", "text": "18"}, {"self_ref": "#/texts/58", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 440.95941000000005, "t": 306.87531, "r": 449.7435, "b": 298.09036, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "19", "text": "19"}, {"self_ref": "#/texts/59", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 487.81491, "t": 306.87531, "r": 496.599, "b": 298.09036, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "20", "text": "20"}, {"self_ref": "#/texts/60", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 366.70102, "t": 342.87918, "r": 372.70581, "b": 332.74878, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "3", "text": "3"}, {"self_ref": "#/texts/61", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 331.90424, "t": 318.67709, "r": 337.90903, "b": 308.54669, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "2", "text": "2"}, {"self_ref": "#/texts/62", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 459.87621999999993, "t": 354.4064, "r": 465.88101, "b": 344.276, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/63", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 1, "bbox": {"l": 308.86199951171875, "t": 277.4996337890625, "r": 545.1151733398438, "b": 232.7270965576172, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 220]}], "orig": "Figure 1: Picture of a table with subtle, complex features such as (1) multi-column headers, (2) cell with multi-row text and (3) cells with no content. Image from PubTabNet evaluation set, filename: 'PMC2944238 004 02'.", "text": "Figure 1: Picture of a table with subtle, complex features such as (1) multi-column headers, (2) cell with multi-row text and (3) cells with no content. Image from PubTabNet evaluation set, filename: 'PMC2944238 004 02'."}, {"self_ref": "#/texts/64", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 308.86199951171875, "t": 207.59063720703125, "r": 545.1151733398438, "b": 126.95307159423828, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 363]}], "orig": "Recently, significant progress has been made with vision based approaches to extract tables in documents. For the sake of completeness, the issue of table extraction from documents is typically decomposed into two separate challenges, i.e. (1) finding the location of the table(s) on a document-page and (2) finding the structure of a given table in the document.", "text": "Recently, significant progress has been made with vision based approaches to extract tables in documents. For the sake of completeness, the issue of table extraction from documents is typically decomposed into two separate challenges, i.e. (1) finding the location of the table(s) on a document-page and (2) finding the structure of a given table in the document."}, {"self_ref": "#/texts/65", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 308.86199951171875, "t": 123.61963653564453, "r": 545.1151123046875, "b": 78.84806823730469, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 229]}], "orig": "The first problem is called table-location and has been previously addressed [30, 38, 19, 21, 23, 26, 8] with stateof-the-art object-detection networks (e.g. YOLO and later on Mask-RCNN [9]). For all practical purposes, it can be", "text": "The first problem is called table-location and has been previously addressed [30, 38, 19, 21, 23, 26, 8] with stateof-the-art object-detection networks (e.g. YOLO and later on Mask-RCNN [9]). For all practical purposes, it can be"}, {"self_ref": "#/texts/66", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 1, "bbox": {"l": 295.1210021972656, "t": 57.866634368896484, "r": 300.102294921875, "b": 48.9600715637207, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/67", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 50.11199951171875, "t": 716.7916259765625, "r": 286.36505126953125, "b": 695.9300537109375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 75]}], "orig": "considered as a solved problem, given enough ground-truth data to train on.", "text": "considered as a solved problem, given enough ground-truth data to train on."}, {"self_ref": "#/texts/68", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 50.11199951171875, "t": 692.4285888671875, "r": 286.3651428222656, "b": 563.9699096679688, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 626]}], "orig": "The second problem is called table-structure decomposition. The latter is a long standing problem in the community of document understanding [6, 4, 14]. Contrary to the table-location problem, there are no commonly used approaches that can easily be re-purposed to solve this problem. Lately, a set of new model-architectures has been proposed by the community to address table-structure decomposition [37, 36, 18, 20]. All these models have some weaknesses (see Sec. 2). The common denominator here is the reliance on textual features and/or the inability to provide the bounding box of each table-cell in the original image.", "text": "The second problem is called table-structure decomposition. The latter is a long standing problem in the community of document understanding [6, 4, 14]. Contrary to the table-location problem, there are no commonly used approaches that can easily be re-purposed to solve this problem. Lately, a set of new model-architectures has been proposed by the community to address table-structure decomposition [37, 36, 18, 20]. All these models have some weaknesses (see Sec. 2). The common denominator here is the reliance on textual features and/or the inability to provide the bounding box of each table-cell in the original image."}, {"self_ref": "#/texts/69", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 50.11199951171875, "t": 560.4684448242188, "r": 286.3651123046875, "b": 420.054931640625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 643]}], "orig": "In this paper, we want to address these weaknesses and present a robust table-structure decomposition algorithm. The design criteria for our model are the following. First, we want our algorithm to be language agnostic. In this way, we can obtain the structure of any table, irregardless of the language. Second, we want our algorithm to leverage as much data as possible from the original PDF document. For programmatic PDF documents, the text-cells can often be extracted much faster and with higher accuracy compared to OCR methods. Last but not least, we want to have a direct link between the table-cell and its bounding box in the image.", "text": "In this paper, we want to address these weaknesses and present a robust table-structure decomposition algorithm. The design criteria for our model are the following. First, we want our algorithm to be language agnostic. In this way, we can obtain the structure of any table, irregardless of the language. Second, we want our algorithm to leverage as much data as possible from the original PDF document. For programmatic PDF documents, the text-cells can often be extracted much faster and with higher accuracy compared to OCR methods. Last but not least, we want to have a direct link between the table-cell and its bounding box in the image."}, {"self_ref": "#/texts/70", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 50.11199951171875, "t": 416.5534973144531, "r": 286.3665771484375, "b": 359.8269958496094, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 242]}], "orig": "To meet the design criteria listed above, we developed a new model called TableFormer and a synthetically generated table structure dataset called SynthTabNet $^{1}$. In particular, our contributions in this work can be summarised as follows:", "text": "To meet the design criteria listed above, we developed a new model called TableFormer and a synthetically generated table structure dataset called SynthTabNet $^{1}$. In particular, our contributions in this work can be summarised as follows:"}, {"self_ref": "#/texts/71", "parent": {"cref": "#/groups/3"}, "children": [], "label": "list_item", "prov": [{"page_no": 2, "bbox": {"l": 61.56901550292969, "t": 347.568115234375, "r": 286.3648986816406, "b": 302.6770324707031, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 166]}], "orig": "\u00b7 We propose TableFormer , a transformer based model that predicts tables structure and bounding boxes for the table content simultaneously in an end-to-end approach.", "text": "\u00b7 We propose TableFormer , a transformer based model that predicts tables structure and bounding boxes for the table content simultaneously in an end-to-end approach.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/72", "parent": {"cref": "#/groups/3"}, "children": [], "label": "list_item", "prov": [{"page_no": 2, "bbox": {"l": 61.56901550292969, "t": 289.9661560058594, "r": 286.3648986816406, "b": 245.0740509033203, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 181]}], "orig": "\u00b7 Across all benchmark datasets TableFormer significantly outperforms existing state-of-the-art metrics, while being much more efficient in training and inference to existing works.", "text": "\u00b7 Across all benchmark datasets TableFormer significantly outperforms existing state-of-the-art metrics, while being much more efficient in training and inference to existing works.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/73", "parent": {"cref": "#/groups/3"}, "children": [], "label": "list_item", "prov": [{"page_no": 2, "bbox": {"l": 61.569000244140625, "t": 232.3631591796875, "r": 286.36492919921875, "b": 199.4270477294922, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 106]}], "orig": "\u00b7 We present SynthTabNet a synthetically generated dataset, with various appearance styles and complexity.", "text": "\u00b7 We present SynthTabNet a synthetically generated dataset, with various appearance styles and complexity.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/74", "parent": {"cref": "#/groups/3"}, "children": [], "label": "list_item", "prov": [{"page_no": 2, "bbox": {"l": 61.569007873535156, "t": 186.5966033935547, "r": 286.3650817871094, "b": 153.779052734375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 131]}], "orig": "\u00b7 An augmented dataset based on PubTabNet [37], FinTabNet [36], and TableBank [17] with generated ground-truth for reproducibility.", "text": "\u00b7 An augmented dataset based on PubTabNet [37], FinTabNet [36], and TableBank [17] with generated ground-truth for reproducibility.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/75", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 50.11200714111328, "t": 141.401611328125, "r": 286.3651123046875, "b": 96.63004302978516, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 231]}], "orig": "The paper is structured as follows. In Sec. 2, we give a brief overview of the current state-of-the-art. In Sec. 3, we describe the datasets on which we train. In Sec. 4, we introduce the TableFormer model-architecture and describe", "text": "The paper is structured as follows. In Sec. 2, we give a brief overview of the current state-of-the-art. In Sec. 3, we describe the datasets on which we train. In Sec. 4, we introduce the TableFormer model-architecture and describe"}, {"self_ref": "#/texts/76", "parent": {"cref": "#/body"}, "children": [], "label": "footnote", "prov": [{"page_no": 2, "bbox": {"l": 60.97100067138672, "t": 86.40372467041016, "r": 183.7305450439453, "b": 79.27845764160156, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 40]}], "orig": "$^{1}$https://github.com/IBM/SynthTabNet", "text": "$^{1}$https://github.com/IBM/SynthTabNet"}, {"self_ref": "#/texts/77", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 2, "bbox": {"l": 295.1210021972656, "t": 57.86671829223633, "r": 300.102294921875, "b": 48.96015548706055, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "2", "text": "2"}, {"self_ref": "#/texts/78", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 308.86199951171875, "t": 716.7916259765625, "r": 545.1151123046875, "b": 683.9750366210938, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 166]}], "orig": "its results & performance in Sec. 5. As a conclusion, we describe how this new model-architecture can be re-purposed for other tasks in the computer-vision community.", "text": "its results & performance in Sec. 5. As a conclusion, we describe how this new model-architecture can be re-purposed for other tasks in the computer-vision community."}, {"self_ref": "#/texts/79", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 2, "bbox": {"l": 308.86199951171875, "t": 670.26806640625, "r": 498.28021240234375, "b": 659.5203247070312, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 37]}], "orig": "2. Previous work and State of the Art", "text": "2. Previous work and State of the Art", "level": 1}, {"self_ref": "#/texts/80", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 308.86199951171875, "t": 649.7786254882812, "r": 545.1151733398438, "b": 461.54498291015625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 901]}], "orig": "Identifying the structure of a table has been an outstanding problem in the document-parsing community, that motivates many organised public challenges [6, 4, 14]. The difficulty of the problem can be attributed to a number of factors. First, there is a large variety in the shapes and sizes of tables. Such large variety requires a flexible method. This is especially true for complex column- and row headers, which can be extremely intricate and demanding. A second factor of complexity is the lack of data with regard to table-structure. Until the publication of PubTabNet [37], there were no large datasets (i.e. > 100 K tables) that provided structure information. This happens primarily due to the fact that tables are notoriously time-consuming to annotate by hand. However, this has definitely changed in recent years with the deliverance of PubTabNet [37], FinTabNet [36], TableBank [17] etc.", "text": "Identifying the structure of a table has been an outstanding problem in the document-parsing community, that motivates many organised public challenges [6, 4, 14]. The difficulty of the problem can be attributed to a number of factors. First, there is a large variety in the shapes and sizes of tables. Such large variety requires a flexible method. This is especially true for complex column- and row headers, which can be extremely intricate and demanding. A second factor of complexity is the lack of data with regard to table-structure. Until the publication of PubTabNet [37], there were no large datasets (i.e. > 100 K tables) that provided structure information. This happens primarily due to the fact that tables are notoriously time-consuming to annotate by hand. However, this has definitely changed in recent years with the deliverance of PubTabNet [37], FinTabNet [36], TableBank [17] etc."}, {"self_ref": "#/texts/81", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 308.86199951171875, "t": 458.4305419921875, "r": 545.115234375, "b": 341.9270935058594, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 552]}], "orig": "Before the rising popularity of deep neural networks, the community relied heavily on heuristic and/or statistical methods to do table structure identification [3, 7, 11, 5, 13, 28]. Although such methods work well on constrained tables [12], a more data-driven approach can be applied due to the advent of convolutional neural networks (CNNs) and the availability of large datasets. To the best-of-our knowledge, there are currently two different types of network architecture that are being pursued for state-of-the-art tablestructure identification.", "text": "Before the rising popularity of deep neural networks, the community relied heavily on heuristic and/or statistical methods to do table structure identification [3, 7, 11, 5, 13, 28]. Although such methods work well on constrained tables [12], a more data-driven approach can be applied due to the advent of convolutional neural networks (CNNs) and the availability of large datasets. To the best-of-our knowledge, there are currently two different types of network architecture that are being pursued for state-of-the-art tablestructure identification."}, {"self_ref": "#/texts/82", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 308.8619689941406, "t": 338.9322204589844, "r": 545.1168823242188, "b": 78.84815216064453, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1262]}], "orig": "Image-to-Text networks : In this type of network, one predicts a sequence of tokens starting from an encoded image. Such sequences of tokens can be HTML table tags [37, 17] or LaTeX symbols[10]. The choice of symbols is ultimately not very important, since one can be transformed into the other. There are however subtle variations in the Image-to-Text networks. The easiest network architectures are \"image-encoder \u2192 text-decoder\" (IETD), similar to network architectures that try to provide captions to images [32]. In these IETD networks, one expects as output the LaTeX/HTML string of the entire table, i.e. the symbols necessary for creating the table with the content of the table. Another approach is the \"image-encoder \u2192 dual decoder\" (IEDD) networks. In these type of networks, one has two consecutive decoders with different purposes. The first decoder is the tag-decoder , i.e. it only produces the HTML/LaTeX tags which construct an empty table. The second content-decoder uses the encoding of the image in combination with the output encoding of each cell-tag (from the tag-decoder ) to generate the textual content of each table cell. The network architecture of IEDD is certainly more elaborate, but it has the advantage that one can pre-train the", "text": "Image-to-Text networks : In this type of network, one predicts a sequence of tokens starting from an encoded image. Such sequences of tokens can be HTML table tags [37, 17] or LaTeX symbols[10]. The choice of symbols is ultimately not very important, since one can be transformed into the other. There are however subtle variations in the Image-to-Text networks. The easiest network architectures are \"image-encoder \u2192 text-decoder\" (IETD), similar to network architectures that try to provide captions to images [32]. In these IETD networks, one expects as output the LaTeX/HTML string of the entire table, i.e. the symbols necessary for creating the table with the content of the table. Another approach is the \"image-encoder \u2192 dual decoder\" (IEDD) networks. In these type of networks, one has two consecutive decoders with different purposes. The first decoder is the tag-decoder , i.e. it only produces the HTML/LaTeX tags which construct an empty table. The second content-decoder uses the encoding of the image in combination with the output encoding of each cell-tag (from the tag-decoder ) to generate the textual content of each table cell. The network architecture of IEDD is certainly more elaborate, but it has the advantage that one can pre-train the"}, {"self_ref": "#/texts/83", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 50.11199951171875, "t": 716.7916259765625, "r": 250.15101623535156, "b": 707.8850708007812, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 51]}], "orig": "tag-decoder which is constrained to the table-tags.", "text": "tag-decoder which is constrained to the table-tags."}, {"self_ref": "#/texts/84", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 50.11199951171875, "t": 704.7806396484375, "r": 286.3651428222656, "b": 516.5458984375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 864]}], "orig": "In practice, both network architectures (IETD and IEDD) require an implicit, custom trained object-characterrecognition (OCR) to obtain the content of the table-cells. In the case of IETD, this OCR engine is implicit in the decoder similar to [24]. For the IEDD, the OCR is solely embedded in the content-decoder. This reliance on a custom, implicit OCR decoder is of course problematic. OCR is a well known and extremely tough problem, that often needs custom training for each individual language. However, the limited availability for non-english content in the current datasets, makes it impractical to apply the IETD and IEDD methods on tables with other languages. Additionally, OCR can be completely omitted if the tables originate from programmatic PDF documents with known positions of each cell. The latter was the inspiration for the work of this paper.", "text": "In practice, both network architectures (IETD and IEDD) require an implicit, custom trained object-characterrecognition (OCR) to obtain the content of the table-cells. In the case of IETD, this OCR engine is implicit in the decoder similar to [24]. For the IEDD, the OCR is solely embedded in the content-decoder. This reliance on a custom, implicit OCR decoder is of course problematic. OCR is a well known and extremely tough problem, that often needs custom training for each individual language. However, the limited availability for non-english content in the current datasets, makes it impractical to apply the IETD and IEDD methods on tables with other languages. Additionally, OCR can be completely omitted if the tables originate from programmatic PDF documents with known positions of each cell. The latter was the inspiration for the work of this paper."}, {"self_ref": "#/texts/85", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 50.11199188232422, "t": 513.56103515625, "r": 286.3651123046875, "b": 301.297119140625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1007]}], "orig": "Graph Neural networks : Graph Neural networks (GNN's) take a radically different approach to tablestructure extraction. Note that one table cell can constitute out of multiple text-cells. To obtain the table-structure, one creates an initial graph, where each of the text-cells becomes a node in the graph similar to [33, 34, 2]. Each node is then associated with en embedding vector coming from the encoded image, its coordinates and the encoded text. Furthermore, nodes that represent adjacent text-cells are linked. Graph Convolutional Networks (GCN's) based methods take the image as an input, but also the position of the text-cells and their content [18]. The purpose of a GCN is to transform the input graph into a new graph, which replaces the old links with new ones. The new links then represent the table-structure. With this approach, one can avoid the need to build custom OCR decoders. However, the quality of the reconstructed structure is not comparable to the current state-of-the-art [18].", "text": "Graph Neural networks : Graph Neural networks (GNN's) take a radically different approach to tablestructure extraction. Note that one table cell can constitute out of multiple text-cells. To obtain the table-structure, one creates an initial graph, where each of the text-cells becomes a node in the graph similar to [33, 34, 2]. Each node is then associated with en embedding vector coming from the encoded image, its coordinates and the encoded text. Furthermore, nodes that represent adjacent text-cells are linked. Graph Convolutional Networks (GCN's) based methods take the image as an input, but also the position of the text-cells and their content [18]. The purpose of a GCN is to transform the input graph into a new graph, which replaces the old links with new ones. The new links then represent the table-structure. With this approach, one can avoid the need to build custom OCR decoders. However, the quality of the reconstructed structure is not comparable to the current state-of-the-art [18]."}, {"self_ref": "#/texts/86", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 50.11198425292969, "t": 298.3112487792969, "r": 286.36627197265625, "b": 169.733154296875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 619]}], "orig": "Hybrid Deep Learning-Rule-Based approach : A popular current model for table-structure identification is the use of a hybrid Deep Learning-Rule-Based approach similar to [27, 29]. In this approach, one first detects the position of the table-cells with object detection (e.g. YoloVx or MaskRCNN), then classifies the table into different types (from its images) and finally uses different rule-sets to obtain its table-structure. Currently, this approach achieves stateof-the-art results, but is not an end-to-end deep-learning method. As such, new rules need to be written if different types of tables are encountered.", "text": "Hybrid Deep Learning-Rule-Based approach : A popular current model for table-structure identification is the use of a hybrid Deep Learning-Rule-Based approach similar to [27, 29]. In this approach, one first detects the position of the table-cells with object detection (e.g. YoloVx or MaskRCNN), then classifies the table into different types (from its images) and finally uses different rule-sets to obtain its table-structure. Currently, this approach achieves stateof-the-art results, but is not an end-to-end deep-learning method. As such, new rules need to be written if different types of tables are encountered."}, {"self_ref": "#/texts/87", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 3, "bbox": {"l": 50.11198425292969, "t": 156.05516052246094, "r": 105.22545623779297, "b": 145.30743408203125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 11]}], "orig": "3. Datasets", "text": "3. Datasets", "level": 1}, {"self_ref": "#/texts/88", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 50.11198425292969, "t": 135.57470703125, "r": 286.3650817871094, "b": 78.84813690185547, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 281]}], "orig": "We rely on large-scale datasets such as PubTabNet [37], FinTabNet [36], and TableBank [17] datasets to train and evaluate our models. These datasets span over various appearance styles and content. We also introduce our own synthetically generated SynthTabNet dataset to fix an im-", "text": "We rely on large-scale datasets such as PubTabNet [37], FinTabNet [36], and TableBank [17] datasets to train and evaluate our models. These datasets span over various appearance styles and content. We also introduce our own synthetically generated SynthTabNet dataset to fix an im-"}, {"self_ref": "#/texts/89", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 3, "bbox": {"l": 295.1210021972656, "t": 57.86680221557617, "r": 300.102294921875, "b": 48.96023941040039, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "3", "text": "3"}, {"self_ref": "#/texts/90", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 3, "bbox": {"l": 308.86199951171875, "t": 524.1636352539062, "r": 545.1151123046875, "b": 503.3020935058594, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 104]}], "orig": "Figure 2: Distribution of the tables across different table dimensions in PubTabNet + FinTabNet datasets", "text": "Figure 2: Distribution of the tables across different table dimensions in PubTabNet + FinTabNet datasets"}, {"self_ref": "#/texts/91", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "section_header", "prov": [{"page_no": 3, "bbox": {"l": 380.79849, "t": 712.1882300000001, "r": 486.84909, "b": 703.44025, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 21]}], "orig": "PubTabNet + FinTabNet", "text": "PubTabNet + FinTabNet", "level": 1}, {"self_ref": "#/texts/92", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 396.76776, "t": 549.97302, "r": 469.78748, "b": 541.22504, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 14]}], "orig": "Rows / Columns", "text": "Rows / Columns"}, {"self_ref": "#/texts/93", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 320.97653, "t": 558.57703, "r": 324.79254, "b": 552.745, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/94", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 410.483, "t": 558.57703, "r": 418.11319, "b": 552.745, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "20", "text": "20"}, {"self_ref": "#/texts/95", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 500.84949, "t": 558.57703, "r": 508.47968000000003, "b": 552.745, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "40", "text": "40"}, {"self_ref": "#/texts/96", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 365.29999, "t": 558.57703, "r": 372.93018, "b": 552.745, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "10", "text": "10"}, {"self_ref": "#/texts/97", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 455.66626, "t": 558.57703, "r": 463.29645, "b": 552.745, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "30", "text": "30"}, {"self_ref": "#/texts/98", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 542.03528, "t": 558.57703, "r": 549.66547, "b": 552.745, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "50", "text": "50"}, {"self_ref": "#/texts/99", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 316.04474, "t": 561.55383, "r": 319.86075, "b": 555.7218, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/100", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 312.62521, "t": 593.30927, "r": 316.44122, "b": 587.47723, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "2", "text": "2"}, {"self_ref": "#/texts/101", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 316.43942, "t": 593.30927, "r": 320.2554, "b": 587.47723, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/102", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 313.14951, "t": 623.90204, "r": 316.96552, "b": 618.07001, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "4", "text": "4"}, {"self_ref": "#/texts/103", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 316.96371, "t": 623.90204, "r": 320.77969, "b": 618.07001, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/104", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 312.92972, "t": 655.41229, "r": 316.74573, "b": 649.58026, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "6", "text": "6"}, {"self_ref": "#/texts/105", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 316.74393, "t": 655.41229, "r": 320.55991, "b": 649.58026, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/106", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 312.48227, "t": 686.39825, "r": 316.29828, "b": 680.56622, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "8", "text": "8"}, {"self_ref": "#/texts/107", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 316.29648, "t": 686.39825, "r": 320.11246, "b": 680.56622, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/108", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 312.48227, "t": 579.74078, "r": 316.29828, "b": 573.90875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/109", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 316.29648, "t": 579.74078, "r": 320.11246, "b": 573.90875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/110", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 313.07639, "t": 608.27802, "r": 316.8924, "b": 602.44598, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "3", "text": "3"}, {"self_ref": "#/texts/111", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 316.89059, "t": 608.27802, "r": 320.70657, "b": 602.44598, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/112", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 312.76321, "t": 639.526, "r": 316.57922, "b": 633.69397, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "5", "text": "5"}, {"self_ref": "#/texts/113", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 316.57742, "t": 639.526, "r": 320.3934, "b": 633.69397, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/114", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 312.19775, "t": 671.4295, "r": 316.01376, "b": 665.59747, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "7", "text": "7"}, {"self_ref": "#/texts/115", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 316.01196, "t": 671.4295, "r": 319.82794, "b": 665.59747, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/116", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 312.8165, "t": 701.8913, "r": 316.63251, "b": 696.05927, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "9", "text": "9"}, {"self_ref": "#/texts/117", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 316.63071, "t": 701.8913, "r": 320.44669, "b": 696.05927, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/118", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 532.17426, "t": 569.27271, "r": 536.94427, "b": 561.98273, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/119", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 532.87952, "t": 683.7329700000001, "r": 547.61249, "b": 676.44299, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "10K", "text": "10K"}, {"self_ref": "#/texts/120", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 532.7735, "t": 661.21899, "r": 542.73877, "b": 653.92902, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "8K", "text": "8K"}, {"self_ref": "#/texts/121", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 532.79901, "t": 638.07648, "r": 542.76428, "b": 630.7865, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "6K", "text": "6K"}, {"self_ref": "#/texts/122", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 532.5705, "t": 615.242, "r": 542.53577, "b": 607.95203, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "4K", "text": "4K"}, {"self_ref": "#/texts/123", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 532.14551, "t": 592.3537, "r": 542.11078, "b": 585.06372, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "2K", "text": "2K"}, {"self_ref": "#/texts/124", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 308.86199951171875, "t": 474.5266418457031, "r": 437.27001953125, "b": 465.6200866699219, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 33]}], "orig": "balance in the previous datasets.", "text": "balance in the previous datasets."}, {"self_ref": "#/texts/125", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 308.86199951171875, "t": 460.4686279296875, "r": 545.1151733398438, "b": 164.6382598876953, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1400]}], "orig": "The PubTabNet dataset contains 509k tables delivered as annotated PNG images. The annotations consist of the table structure represented in HTML format, the tokenized text and its bounding boxes per table cell. Fig. 1 shows the appearance style of PubTabNet. Depending on its complexity, a table is characterized as \"simple\" when it does not contain row spans or column spans, otherwise it is \"complex\". The dataset is divided into Train and Val splits (roughly 98% and 2%). The Train split consists of 54% simple and 46% complex tables and the Val split of 51% and 49% respectively. The FinTabNet dataset contains 112k tables delivered as single-page PDF documents with mixed table structures and text content. Similarly to the PubTabNet, the annotations of FinTabNet include the table structure in HTML, the tokenized text and the bounding boxes on a table cell basis. The dataset is divided into Train, Test and Val splits (81%, 9.5%, 9.5%), and each one is almost equally divided into simple and complex tables (Train: 48% simple, 52% complex, Test: 48% simple, 52% complex, Test: 53% simple, 47% complex). Finally the TableBank dataset consists of 145k tables provided as JPEG images. The latter has annotations for the table structure, but only few with bounding boxes of the table cells. The entire dataset consists of simple tables and it is divided into 90% Train, 3% Test and 7% Val splits.", "text": "The PubTabNet dataset contains 509k tables delivered as annotated PNG images. The annotations consist of the table structure represented in HTML format, the tokenized text and its bounding boxes per table cell. Fig. 1 shows the appearance style of PubTabNet. Depending on its complexity, a table is characterized as \"simple\" when it does not contain row spans or column spans, otherwise it is \"complex\". The dataset is divided into Train and Val splits (roughly 98% and 2%). The Train split consists of 54% simple and 46% complex tables and the Val split of 51% and 49% respectively. The FinTabNet dataset contains 112k tables delivered as single-page PDF documents with mixed table structures and text content. Similarly to the PubTabNet, the annotations of FinTabNet include the table structure in HTML, the tokenized text and the bounding boxes on a table cell basis. The dataset is divided into Train, Test and Val splits (81%, 9.5%, 9.5%), and each one is almost equally divided into simple and complex tables (Train: 48% simple, 52% complex, Test: 48% simple, 52% complex, Test: 53% simple, 47% complex). Finally the TableBank dataset consists of 145k tables provided as JPEG images. The latter has annotations for the table structure, but only few with bounding boxes of the table cells. The entire dataset consists of simple tables and it is divided into 90% Train, 3% Test and 7% Val splits."}, {"self_ref": "#/texts/126", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 308.86199951171875, "t": 159.48580932617188, "r": 545.1151123046875, "b": 78.84823608398438, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 406]}], "orig": "Due to the heterogeneity across the dataset formats, it was necessary to combine all available data into one homogenized dataset before we could train our models for practical purposes. Given the size of PubTabNet, we adopted its annotation format and we extracted and converted all tables as PNG images with a resolution of 72 dpi. Additionally, we have filtered out tables with extreme sizes due to small", "text": "Due to the heterogeneity across the dataset formats, it was necessary to combine all available data into one homogenized dataset before we could train our models for practical purposes. Given the size of PubTabNet, we adopted its annotation format and we extracted and converted all tables as PNG images with a resolution of 72 dpi. Additionally, we have filtered out tables with extreme sizes due to small"}, {"self_ref": "#/texts/127", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 4, "bbox": {"l": 50.11199951171875, "t": 716.7916259765625, "r": 286.3651123046875, "b": 695.9300537109375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 93]}], "orig": "amount of such tables, and kept only those ones ranging between 1*1 and 20*10 (rows/columns).", "text": "amount of such tables, and kept only those ones ranging between 1*1 and 20*10 (rows/columns)."}, {"self_ref": "#/texts/128", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 4, "bbox": {"l": 50.11199951171875, "t": 691.0396118164062, "r": 286.3651428222656, "b": 478.8949279785156, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 983]}], "orig": "The availability of the bounding boxes for all table cells is essential to train our models. In order to distinguish between empty and non-empty bounding boxes, we have introduced a binary class in the annotation. Unfortunately, the original datasets either omit the bounding boxes for whole tables (e.g. TableBank) or they narrow their scope only to non-empty cells. Therefore, it was imperative to introduce a data pre-processing procedure that generates the missing bounding boxes out of the annotation information. This procedure first parses the provided table structure and calculates the dimensions of the most fine-grained grid that covers the table structure. Notice that each table cell may occupy multiple grid squares due to row or column spans. In case of PubTabNet we had to compute missing bounding boxes for 48% of the simple and 69% of the complex tables. Regarding FinTabNet, 68% of the simple and 98% of the complex tables require the generation of bounding boxes.", "text": "The availability of the bounding boxes for all table cells is essential to train our models. In order to distinguish between empty and non-empty bounding boxes, we have introduced a binary class in the annotation. Unfortunately, the original datasets either omit the bounding boxes for whole tables (e.g. TableBank) or they narrow their scope only to non-empty cells. Therefore, it was imperative to introduce a data pre-processing procedure that generates the missing bounding boxes out of the annotation information. This procedure first parses the provided table structure and calculates the dimensions of the most fine-grained grid that covers the table structure. Notice that each table cell may occupy multiple grid squares due to row or column spans. In case of PubTabNet we had to compute missing bounding boxes for 48% of the simple and 69% of the complex tables. Regarding FinTabNet, 68% of the simple and 98% of the complex tables require the generation of bounding boxes."}, {"self_ref": "#/texts/129", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 4, "bbox": {"l": 50.11199951171875, "t": 474.0044860839844, "r": 286.3651123046875, "b": 357.50103759765625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 571]}], "orig": "As it is illustrated in Fig. 2, the table distributions from all datasets are skewed towards simpler structures with fewer number of rows/columns. Additionally, there is very limited variance in the table styles, which in case of PubTabNet and FinTabNet means one styling format for the majority of the tables. Similar limitations appear also in the type of table content, which in some cases (e.g. FinTabNet) is restricted to a certain domain. Ultimately, the lack of diversity in the training dataset damages the ability of the models to generalize well on unseen data.", "text": "As it is illustrated in Fig. 2, the table distributions from all datasets are skewed towards simpler structures with fewer number of rows/columns. Additionally, there is very limited variance in the table styles, which in case of PubTabNet and FinTabNet means one styling format for the majority of the tables. Similar limitations appear also in the type of table content, which in some cases (e.g. FinTabNet) is restricted to a certain domain. Ultimately, the lack of diversity in the training dataset damages the ability of the models to generalize well on unseen data."}, {"self_ref": "#/texts/130", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 4, "bbox": {"l": 50.11199951171875, "t": 352.610595703125, "r": 286.3665466308594, "b": 164.37611389160156, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 941]}], "orig": "Motivated by those observations we aimed at generating a synthetic table dataset named SynthTabNet . This approach offers control over: 1) the size of the dataset, 2) the table structure, 3) the table style and 4) the type of content. The complexity of the table structure is described by the size of the table header and the table body, as well as the percentage of the table cells covered by row spans and column spans. A set of carefully designed styling templates provides the basis to build a wide range of table appearances. Lastly, the table content is generated out of a curated collection of text corpora. By controlling the size and scope of the synthetic datasets we are able to train and evaluate our models in a variety of different conditions. For example, we can first generate a highly diverse dataset to train our models and then evaluate their performance on other synthetic datasets which are focused on a specific domain.", "text": "Motivated by those observations we aimed at generating a synthetic table dataset named SynthTabNet . This approach offers control over: 1) the size of the dataset, 2) the table structure, 3) the table style and 4) the type of content. The complexity of the table structure is described by the size of the table header and the table body, as well as the percentage of the table cells covered by row spans and column spans. A set of carefully designed styling templates provides the basis to build a wide range of table appearances. Lastly, the table content is generated out of a curated collection of text corpora. By controlling the size and scope of the synthetic datasets we are able to train and evaluate our models in a variety of different conditions. For example, we can first generate a highly diverse dataset to train our models and then evaluate their performance on other synthetic datasets which are focused on a specific domain."}, {"self_ref": "#/texts/131", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 4, "bbox": {"l": 50.11201477050781, "t": 159.4856719970703, "r": 286.3651123046875, "b": 78.84810638427734, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 405]}], "orig": "In this regard, we have prepared four synthetic datasets, each one containing 150k examples. The corpora to generate the table text consists of the most frequent terms appearing in PubTabNet and FinTabNet together with randomly generated text. The first two synthetic datasets have been fine-tuned to mimic the appearance of the original datasets but encompass more complicated table structures. The third", "text": "In this regard, we have prepared four synthetic datasets, each one containing 150k examples. The corpora to generate the table text consists of the most frequent terms appearing in PubTabNet and FinTabNet together with randomly generated text. The first two synthetic datasets have been fine-tuned to mimic the appearance of the original datasets but encompass more complicated table structures. The third"}, {"self_ref": "#/texts/132", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 4, "bbox": {"l": 295.1209716796875, "t": 57.86674880981445, "r": 300.1022644042969, "b": 48.96018600463867, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "4", "text": "4"}, {"self_ref": "#/texts/133", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 4, "bbox": {"l": 308.86199951171875, "t": 624.338623046875, "r": 545.1150512695312, "b": 567.6110229492188, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 267]}], "orig": "Table 1: Both \"Combined-Tabnet\" and \"CombinedTabnet\" are variations of the following: (*) The CombinedTabnet dataset is the processed combination of PubTabNet and Fintabnet. (**) The combined dataset is the processed combination of PubTabNet, Fintabnet and TableBank.", "text": "Table 1: Both \"Combined-Tabnet\" and \"CombinedTabnet\" are variations of the following: (*) The CombinedTabnet dataset is the processed combination of PubTabNet and Fintabnet. (**) The combined dataset is the processed combination of PubTabNet, Fintabnet and TableBank."}, {"self_ref": "#/texts/134", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 4, "bbox": {"l": 308.86199951171875, "t": 542.3795776367188, "r": 545.1151733398438, "b": 497.6080322265625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 210]}], "orig": "one adopts a colorful appearance with high contrast and the last one contains tables with sparse content. Lastly, we have combined all synthetic datasets into one big unified synthetic dataset of 600k examples.", "text": "one adopts a colorful appearance with high contrast and the last one contains tables with sparse content. Lastly, we have combined all synthetic datasets into one big unified synthetic dataset of 600k examples."}, {"self_ref": "#/texts/135", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 4, "bbox": {"l": 320.8169860839844, "t": 494.22760009765625, "r": 542.7439575195312, "b": 485.321044921875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 57]}], "orig": "Tab. 1 summarizes the various attributes of the datasets.", "text": "Tab. 1 summarizes the various attributes of the datasets."}, {"self_ref": "#/texts/136", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 4, "bbox": {"l": 308.86199951171875, "t": 470.8160400390625, "r": 444.9360656738281, "b": 460.0683288574219, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 24]}], "orig": "4. The TableFormer model", "text": "4. The TableFormer model", "level": 1}, {"self_ref": "#/texts/137", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 4, "bbox": {"l": 308.86199951171875, "t": 450.06060791015625, "r": 545.115234375, "b": 345.5131530761719, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 504]}], "orig": "Given the image of a table, TableFormer is able to predict: 1) a sequence of tokens that represent the structure of a table, and 2) a bounding box coupled to a subset of those tokens. The conversion of an image into a sequence of tokens is a well-known task [35, 16]. While attention is often used as an implicit method to associate each token of the sequence with a position in the original image, an explicit association between the individual table-cells and the image bounding boxes is also required.", "text": "Given the image of a table, TableFormer is able to predict: 1) a sequence of tokens that represent the structure of a table, and 2) a bounding box coupled to a subset of those tokens. The conversion of an image into a sequence of tokens is a well-known task [35, 16]. While attention is often used as an implicit method to associate each token of the sequence with a position in the original image, an explicit association between the individual table-cells and the image bounding boxes is also required."}, {"self_ref": "#/texts/138", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 4, "bbox": {"l": 308.86199951171875, "t": 334.30572509765625, "r": 420.16058349609375, "b": 324.45367431640625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 24]}], "orig": "4.1. Model architecture.", "text": "4.1. Model architecture.", "level": 1}, {"self_ref": "#/texts/139", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 4, "bbox": {"l": 308.8619689941406, "t": 315.2347106933594, "r": 545.11572265625, "b": 127.00019073486328, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 907]}], "orig": "We now describe in detail the proposed method, which is composed of three main components, see Fig. 4. Our CNN Backbone Network encodes the input as a feature vector of predefined length. The input feature vector of the encoded image is passed to the Structure Decoder to produce a sequence of HTML tags that represent the structure of the table. With each prediction of an HTML standard data cell (' < td > ') the hidden state of that cell is passed to the Cell BBox Decoder. As for spanning cells, such as row or column span, the tag is broken down to ' < ', 'rowspan=' or 'colspan=', with the number of spanning cells (attribute), and ' > '. The hidden state attached to ' < ' is passed to the Cell BBox Decoder. A shared feed forward network (FFN) receives the hidden states from the Structure Decoder, to provide the final detection predictions of the bounding box coordinates and their classification.", "text": "We now describe in detail the proposed method, which is composed of three main components, see Fig. 4. Our CNN Backbone Network encodes the input as a feature vector of predefined length. The input feature vector of the encoded image is passed to the Structure Decoder to produce a sequence of HTML tags that represent the structure of the table. With each prediction of an HTML standard data cell (' < td > ') the hidden state of that cell is passed to the Cell BBox Decoder. As for spanning cells, such as row or column span, the tag is broken down to ' < ', 'rowspan=' or 'colspan=', with the number of spanning cells (attribute), and ' > '. The hidden state attached to ' < ' is passed to the Cell BBox Decoder. A shared feed forward network (FFN) receives the hidden states from the Structure Decoder, to provide the final detection predictions of the bounding box coordinates and their classification."}, {"self_ref": "#/texts/140", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 4, "bbox": {"l": 308.8619689941406, "t": 123.73930358886719, "r": 545.1151123046875, "b": 78.84818267822266, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 223]}], "orig": "CNN Backbone Network. A ResNet-18 CNN is the backbone that receives the table image and encodes it as a vector of predefined length. The network has been modified by removing the linear and pooling layer, as we are not per-", "text": "CNN Backbone Network. A ResNet-18 CNN is the backbone that receives the table image and encodes it as a vector of predefined length. The network has been modified by removing the linear and pooling layer, as we are not per-"}, {"self_ref": "#/texts/141", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 5, "bbox": {"l": 50.11199188232422, "t": 588.0142211914062, "r": 545.1084594726562, "b": 567.0330810546875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 212]}], "orig": "Figure 3: TableFormer takes in an image of the PDF and creates bounding box and HTML structure predictions that are synchronized. The bounding boxes grabs the content from the PDF and inserts it in the structure.", "text": "Figure 3: TableFormer takes in an image of the PDF and creates bounding box and HTML structure predictions that are synchronized. The bounding boxes grabs the content from the PDF and inserts it in the structure."}, {"self_ref": "#/texts/142", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 81.688072, "t": 669.5603, "r": 84.927567, "b": 666.37109, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "1.", "text": "1."}, {"self_ref": "#/texts/143", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 86.54731, "t": 669.5603, "r": 93.026291, "b": 666.37109, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "Item", "text": "Item"}, {"self_ref": "#/texts/144", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 102.50498, "t": 676.74786, "r": 115.3461, "b": 673.55865, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Amount", "text": "Amount"}, {"self_ref": "#/texts/145", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 82.140205, "t": 676.7851, "r": 93.291527, "b": 673.59589, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 5]}], "orig": "Names", "text": "Names"}, {"self_ref": "#/texts/146", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 96.748268, "t": 669.5603, "r": 104.3119, "b": 666.37109, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "1000", "text": "1000"}, {"self_ref": "#/texts/147", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 96.748268, "t": 664.2562900000001, "r": 102.42083, "b": 661.06708, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "500", "text": "500"}, {"self_ref": "#/texts/148", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 96.748268, "t": 658.54431, "r": 104.3119, "b": 655.3551, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "3500", "text": "3500"}, {"self_ref": "#/texts/149", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 96.748268, "t": 652.83228, "r": 102.42083, "b": 649.64307, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "150", "text": "150"}, {"self_ref": "#/texts/150", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 110.66107, "t": 669.5603, "r": 116.14391, "b": 666.37109, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "unit", "text": "unit"}, {"self_ref": "#/texts/151", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 110.66107, "t": 664.2562900000001, "r": 116.14391, "b": 661.06708, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "unit", "text": "unit"}, {"self_ref": "#/texts/152", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 110.66107, "t": 658.54431, "r": 116.14391, "b": 655.3551, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "unit", "text": "unit"}, {"self_ref": "#/texts/153", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 110.66107, "t": 652.83228, "r": 116.14391, "b": 649.64307, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "unit", "text": "unit"}, {"self_ref": "#/texts/154", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 81.688072, "t": 664.2562900000001, "r": 84.927567, "b": 661.06708, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "2.", "text": "2."}, {"self_ref": "#/texts/155", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 86.54731, "t": 664.2562900000001, "r": 93.026291, "b": 661.06708, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "Item", "text": "Item"}, {"self_ref": "#/texts/156", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 81.688072, "t": 658.54431, "r": 84.927567, "b": 655.3551, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "3.", "text": "3."}, {"self_ref": "#/texts/157", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 86.54731, "t": 658.54431, "r": 93.026291, "b": 655.3551, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "Item", "text": "Item"}, {"self_ref": "#/texts/158", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 81.688072, "t": 652.83228, "r": 84.927567, "b": 649.64307, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "4.", "text": "4."}, {"self_ref": "#/texts/159", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 86.54731, "t": 652.83228, "r": 93.026291, "b": 649.64307, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "Item", "text": "Item"}, {"self_ref": "#/texts/160", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 88.084389, "t": 701.50262, "r": 113.93649, "b": 695.76202, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 9]}], "orig": "Extracted", "text": "Extracted"}, {"self_ref": "#/texts/161", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 82.81002, "t": 694.36261, "r": 119.21240000000002, "b": 688.62201, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 12]}], "orig": "Table Images", "text": "Table Images"}, {"self_ref": "#/texts/162", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 143.94247, "t": 691.39764, "r": 180.01131, "b": 685.65704, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 12]}], "orig": "Standardized", "text": "Standardized"}, {"self_ref": "#/texts/163", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 151.94064, "t": 684.25763, "r": 172.0118, "b": 678.5170299999999, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Images", "text": "Images"}, {"self_ref": "#/texts/164", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 251.76939000000002, "t": 711.0690300000001, "r": 266.39557, "b": 705.32843, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "BBox", "text": "BBox"}, {"self_ref": "#/texts/165", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 247.51601, "t": 705.96899, "r": 270.65021, "b": 700.22839, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Decoder", "text": "Decoder"}, {"self_ref": "#/texts/166", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 331.03699, "t": 713.44019, "r": 352.12589, "b": 707.69958, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "BBoxes", "text": "BBoxes"}, {"self_ref": "#/texts/167", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 390.56421, "t": 695.96777, "r": 431.7261, "b": 690.2271700000001, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 13]}], "orig": "BBoxes can be", "text": "BBoxes can be"}, {"self_ref": "#/texts/168", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 386.82422, "t": 689.8477199999999, "r": 435.46966999999995, "b": 684.10712, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 18]}], "orig": "traced back to the", "text": "traced back to the"}, {"self_ref": "#/texts/169", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 388.69589, "t": 683.72772, "r": 433.6032400000001, "b": 677.9871199999999, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 17]}], "orig": "original image to", "text": "original image to"}, {"self_ref": "#/texts/170", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 391.07761, "t": 677.60773, "r": 431.22542999999996, "b": 671.8671300000001, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 15]}], "orig": "extract content", "text": "extract content"}, {"self_ref": "#/texts/171", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 431.22650000000004, "t": 640.31488, "r": 498.82068, "b": 634.57428, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 23]}], "orig": "Structure Tags sequence", "text": "Structure Tags sequence"}, {"self_ref": "#/texts/172", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 431.1738, "t": 634.19482, "r": 498.87753000000004, "b": 628.45422, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 27]}], "orig": "provide full description of", "text": "provide full description of"}, {"self_ref": "#/texts/173", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 440.5289, "t": 628.07483, "r": 489.51827999999995, "b": 622.33423, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 19]}], "orig": "the table structure", "text": "the table structure"}, {"self_ref": "#/texts/174", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 328.37479, "t": 613.74615, "r": 367.72333, "b": 608.00555, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 14]}], "orig": "Structure Tags", "text": "Structure Tags"}, {"self_ref": "#/texts/175", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 331.84451, "t": 668.09113, "r": 373.67963, "b": 662.3505199999998, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 14]}], "orig": "BBoxes in sync", "text": "BBoxes in sync"}, {"self_ref": "#/texts/176", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 331.84451, "t": 662.9911499999998, "r": 381.17786, "b": 657.25055, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 17]}], "orig": "with tag sequence", "text": "with tag sequence"}, {"self_ref": "#/texts/177", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 196.62633, "t": 703.88379, "r": 219.42332, "b": 698.14319, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Encoder", "text": "Encoder"}, {"self_ref": "#/texts/178", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 246.66771, "t": 662.5053099999999, "r": 271.49899, "b": 656.76471, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 9]}], "orig": "Structure", "text": "Structure"}, {"self_ref": "#/texts/179", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 247.51601, "t": 657.40527, "r": 270.65021, "b": 651.66467, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Decoder", "text": "Decoder"}, {"self_ref": "#/texts/180", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 330.63071, "t": 702.98077, "r": 365.55347, "b": 697.24017, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 16]}], "orig": "[x1, y2, x2, y2]", "text": "[x1, y2, x2, y2]"}, {"self_ref": "#/texts/181", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 330.63071, "t": 694.82074, "r": 370.22717, "b": 689.08014, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 20]}], "orig": "[x1', y2', x2', y2']", "text": "[x1', y2', x2', y2']"}, {"self_ref": "#/texts/182", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 330.63071, "t": 686.6607700000001, "r": 374.51157, "b": 680.92017, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 24]}], "orig": "[x1'', y2'', x2'', y2'']", "text": "[x1'', y2'', x2'', y2'']"}, {"self_ref": "#/texts/183", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 330.63071, "t": 678.5007300000001, "r": 335.73233, "b": 672.76013, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "...", "text": "..."}, {"self_ref": "#/texts/184", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 322.30579, "t": 650.20764, "r": 335.05988, "b": 645.42383, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "", "text": ""}, {"self_ref": "#/texts/185", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 322.30579, "t": 643.06769, "r": 335.05988, "b": 638.28387, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "", "text": ""}, {"self_ref": "#/texts/186", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 337.54971, "t": 643.44421, "r": 340.95242, "b": 637.70361, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/187", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 343.56262, "t": 643.06769, "r": 398.91446, "b": 638.28387, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 21]}], "orig": "", "text": ""}, {"self_ref": "#/texts/188", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 407.41718, "t": 643.06769, "r": 421.58801, "b": 638.28387, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 5]}], "orig": "", "text": ""}, {"self_ref": "#/texts/189", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 322.30579, "t": 635.92767, "r": 349.23022, "b": 631.14386, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 9]}], "orig": "", "text": ""}, {"self_ref": "#/texts/190", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 322.30579, "t": 628.78766, "r": 335.05988, "b": 624.00385, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "", "text": ""}, {"self_ref": "#/texts/191", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 343.56155, "t": 628.78766, "r": 374.73685, "b": 624.00385, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 12]}], "orig": "...", "text": "..."}, {"self_ref": "#/texts/192", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 322.30579, "t": 621.64764, "r": 326.55716, "b": 616.86383, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "...", "text": "..."}, {"self_ref": "#/texts/193", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 323.51111, "t": 702.33032, "r": 326.91382, "b": 696.58972, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/194", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 323.71509, "t": 694.21112, "r": 327.1178, "b": 688.47052, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "2", "text": "2"}, {"self_ref": "#/texts/195", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 323.71509, "t": 686.01031, "r": 327.1178, "b": 680.2697099999999, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "3", "text": "3"}, {"self_ref": "#/texts/196", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 401.4816, "t": 643.45374, "r": 404.88431, "b": 637.71313, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "2", "text": "2"}, {"self_ref": "#/texts/197", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 337.6976, "t": 629.31549, "r": 341.10031, "b": 623.57489, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "3", "text": "3"}, {"self_ref": "#/texts/198", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 454.46378, "t": 687.45416, "r": 457.86648999999994, "b": 681.7135599999999, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "3", "text": "3"}, {"self_ref": "#/texts/199", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 493.32580999999993, "t": 700.90454, "r": 496.72852, "b": 695.16394, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "2", "text": "2"}, {"self_ref": "#/texts/200", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 454.08298, "t": 701.4312099999999, "r": 457.48569000000003, "b": 695.69061, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/201", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 5, "bbox": {"l": 50.11199951171875, "t": 264.2171936035156, "r": 286.365966796875, "b": 111.72905731201172, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 745]}], "orig": "Figure 4: Given an input image of a table, the Encoder produces fixed-length features that represent the input image. The features are then passed to both the Structure Decoder and Cell BBox Decoder . During training, the Structure Decoder receives 'tokenized tags' of the HTML code that represent the table structure. Afterwards, a transformer encoder and decoder architecture is employed to produce features that are received by a linear layer, and the Cell BBox Decoder. The linear layer is applied to the features to predict the tags. Simultaneously, the Cell BBox Decoder selects features referring to the data cells (' < td > ', ' < ') and passes them through an attention network, an MLP, and a linear layer to predict the bounding boxes.", "text": "Figure 4: Given an input image of a table, the Encoder produces fixed-length features that represent the input image. The features are then passed to both the Structure Decoder and Cell BBox Decoder . During training, the Structure Decoder receives 'tokenized tags' of the HTML code that represent the table structure. Afterwards, a transformer encoder and decoder architecture is employed to produce features that are received by a linear layer, and the Cell BBox Decoder. The linear layer is applied to the features to predict the tags. Simultaneously, the Cell BBox Decoder selects features referring to the data cells (' < td > ', ' < ') and passes them through an attention network, an MLP, and a linear layer to predict the bounding boxes."}, {"self_ref": "#/texts/202", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 74.253464, "t": 533.78528, "r": 101.75846, "b": 527.82526, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 11]}], "orig": "Input Image", "text": "Input Image"}, {"self_ref": "#/texts/203", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 122.29972, "t": 533.65479, "r": 157.83972, "b": 527.69476, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 14]}], "orig": "Tokenised Tags", "text": "Tokenised Tags"}, {"self_ref": "#/texts/204", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 78.549347, "t": 420.61420000000004, "r": 125.68359000000001, "b": 414.95218, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 20]}], "orig": "Multi-Head Attention", "text": "Multi-Head Attention"}, {"self_ref": "#/texts/205", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 78.513298, "t": 400.68143, "r": 84.644547, "b": 395.01941, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "Add", "text": "Add"}, {"self_ref": "#/texts/206", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 116.52705, "t": 400.68143, "r": 125.11079999999998, "b": 395.01941, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 15]}], "orig": "& Normalisation", "text": "& Normalisation"}, {"self_ref": "#/texts/207", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 76.024773, "t": 367.54691, "r": 127.92327000000002, "b": 361.88489, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 20]}], "orig": "Feed Forward Network", "text": "Feed Forward Network"}, {"self_ref": "#/texts/208", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 78.382828, "t": 347.11044, "r": 84.514076, "b": 341.44843, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "Add", "text": "Add"}, {"self_ref": "#/texts/209", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 116.39658, "t": 347.11044, "r": 124.98033, "b": 341.44843, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 15]}], "orig": "& Normalisation", "text": "& Normalisation"}, {"self_ref": "#/texts/210", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 167.46945, "t": 329.55676, "r": 181.6292, "b": 323.89474, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Linear", "text": "Linear"}, {"self_ref": "#/texts/211", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 165.61292, "t": 313.52893, "r": 184.43242, "b": 307.86691, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Softmax", "text": "Softmax"}, {"self_ref": "#/texts/212", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 65.319511, "t": 467.73764000000006, "r": 132.9245, "b": 461.77764999999994, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 20]}], "orig": "CNN BACKBONE ENCODER", "text": "CNN BACKBONE ENCODER"}, {"self_ref": "#/texts/213", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 119.51457, "t": 522.33606, "r": 162.98782, "b": 517.27008, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 21]}], "orig": "[30, 1, 2, 3, 4, \u2026 3,", "text": "[30, 1, 2, 3, 4, \u2026 3,"}, {"self_ref": "#/texts/214", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 128.72858, "t": 517.08606, "r": 151.41083, "b": 512.02008, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 12]}], "orig": "4, 5, 8, 31]", "text": "4, 5, 8, 31]"}, {"self_ref": "#/texts/215", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 60.434211999999995, "t": 453.04007, "r": 80.27021, "b": 447.73007, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 10]}], "orig": "Positional", "text": "Positional"}, {"self_ref": "#/texts/216", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 60.598457, "t": 448.61395, "r": 78.854958, "b": 443.30396, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 8]}], "orig": "Encoding", "text": "Encoding"}, {"self_ref": "#/texts/217", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 134.82877, "t": 498.62238, "r": 154.66476, "b": 493.31238, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 10]}], "orig": "Positional", "text": "Positional"}, {"self_ref": "#/texts/218", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 134.99303, "t": 494.19629000000003, "r": 153.24953, "b": 488.88629, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 8]}], "orig": "Encoding", "text": "Encoding"}, {"self_ref": "#/texts/219", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 150.55193, "t": 446.64139, "r": 197.14943, "b": 440.97937, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 19]}], "orig": "Add & Normalisation", "text": "Add & Normalisation"}, {"self_ref": "#/texts/220", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 150.55193, "t": 397.5766, "r": 156.68318, "b": 391.91458, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "Add", "text": "Add"}, {"self_ref": "#/texts/221", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 188.56567, "t": 397.5766, "r": 197.14943, "b": 391.91458, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 15]}], "orig": "& Normalisation", "text": "& Normalisation"}, {"self_ref": "#/texts/222", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 150.18539, "t": 416.33157, "r": 197.31964, "b": 410.66956, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 20]}], "orig": "Multi-Head Attention", "text": "Multi-Head Attention"}, {"self_ref": "#/texts/223", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 150.55193, "t": 351.75152999999995, "r": 156.68318, "b": 346.08951, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "Add", "text": "Add"}, {"self_ref": "#/texts/224", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 188.56567, "t": 351.75152999999995, "r": 197.14943, "b": 346.08951, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 15]}], "orig": "& Normalisation", "text": "& Normalisation"}, {"self_ref": "#/texts/225", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 147.86377, "t": 369.90665, "r": 199.76227, "b": 364.24463, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 20]}], "orig": "Feed Forward Network", "text": "Feed Forward Network"}, {"self_ref": "#/texts/226", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 241.56567000000004, "t": 477.73714999999993, "r": 255.72542, "b": 472.07513, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Linear", "text": "Linear"}, {"self_ref": "#/texts/227", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 241.91730000000004, "t": 430.63507, "r": 256.07706, "b": 424.97305, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Linear", "text": "Linear"}, {"self_ref": "#/texts/228", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 228.054, "t": 455.38070999999997, "r": 248.72363000000004, "b": 449.71869, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 9]}], "orig": "Attention", "text": "Attention"}, {"self_ref": "#/texts/229", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 246.2919, "t": 455.38070999999997, "r": 269.39325, "b": 449.71869, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Network", "text": "Network"}, {"self_ref": "#/texts/230", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 228.44568000000004, "t": 386.85318, "r": 238.73892, "b": 381.19116, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "MLP", "text": "MLP"}, {"self_ref": "#/texts/231", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 256.29767, "t": 386.7967499999999, "r": 271.77792, "b": 381.13474, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Linear", "text": "Linear"}, {"self_ref": "#/texts/232", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 239.54543, "t": 409.78656, "r": 258.08942, "b": 404.12454, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Sigmoid", "text": "Sigmoid"}, {"self_ref": "#/texts/233", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 54.14704100000001, "t": 407.12817, "r": 59.51152, "b": 342.21674, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 27]}], "orig": "Transformer Encoder Network", "text": "Transformer Encoder Network"}, {"self_ref": "#/texts/234", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 54.235424, "t": 418.18768, "r": 59.30449699999999, "b": 413.54578000000004, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "x2", "text": "x2"}, {"self_ref": "#/texts/235", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 85.295891, "t": 307.46811, "r": 122.16431, "b": 301.63312, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 14]}], "orig": "Encoded Output", "text": "Encoded Output"}, {"self_ref": "#/texts/236", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 229.66599, "t": 512.45392, "r": 265.3194, "b": 506.54427999999996, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 14]}], "orig": "Encoded Output", "text": "Encoded Output"}, {"self_ref": "#/texts/237", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 157.17369, "t": 291.6969, "r": 190.41711, "b": 285.87057, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 14]}], "orig": "Predicted Tags", "text": "Predicted Tags"}, {"self_ref": "#/texts/238", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 227.81598999999997, "t": 353.94458, "r": 270.78442, "b": 348.10794, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 16]}], "orig": "Bounding Boxes &", "text": "Bounding Boxes &"}, {"self_ref": "#/texts/239", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 233.70262, "t": 347.93817, "r": 263.51105, "b": 342.1095000000001, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 14]}], "orig": "Classification", "text": "Classification"}, {"self_ref": "#/texts/240", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 184.74655, "t": 498.60498, "r": 212.16055, "b": 493.24097, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 11]}], "orig": "Transformer", "text": "Transformer"}, {"self_ref": "#/texts/241", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 178.91229, "t": 492.85498, "r": 216.74378999999996, "b": 487.49097, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 15]}], "orig": "Decoder Network", "text": "Decoder Network"}, {"self_ref": "#/texts/242", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 194.24574, "t": 509.2178, "r": 198.89099, "b": 504.15182000000004, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "x4", "text": "x4"}, {"self_ref": "#/texts/243", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 221.45587, "t": 520.13086, "r": 276.47089, "b": 514.17084, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 17]}], "orig": "CELL BBOX DECODER", "text": "CELL BBOX DECODER"}, {"self_ref": "#/texts/244", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 151.65219, "t": 468.55759, "r": 197.29019, "b": 462.89557, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 17]}], "orig": "Masked Multi-Head", "text": "Masked Multi-Head"}, {"self_ref": "#/texts/245", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 163.43277, "t": 462.55759, "r": 184.19028, "b": 456.89557, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 9]}], "orig": "Attention", "text": "Attention"}, {"self_ref": "#/texts/246", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 308.86199951171875, "t": 542.465576171875, "r": 545.1150512695312, "b": 497.69305419921875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 227]}], "orig": "forming classification, and adding an adaptive pooling layer of size 28*28. ResNet by default downsamples the image resolution by 32 and then the encoded image is provided to both the Structure Decoder , and Cell BBox Decoder .", "text": "forming classification, and adding an adaptive pooling layer of size 28*28. ResNet by default downsamples the image resolution by 32 and then the encoded image is provided to both the Structure Decoder , and Cell BBox Decoder ."}, {"self_ref": "#/texts/247", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 308.8619384765625, "t": 494.6601867675781, "r": 545.1151123046875, "b": 378.0381774902344, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 563]}], "orig": "Structure Decoder. The transformer architecture of this component is based on the work proposed in [31]. After extensive experimentation, the Structure Decoder is modeled as a transformer encoder with two encoder layers and a transformer decoder made from a stack of 4 decoder layers that comprise mainly of multi-head attention and feed forward layers. This configuration uses fewer layers and heads in comparison to networks applied to other problems (e.g. \"Scene Understanding\", \"Image Captioning\"), something which we relate to the simplicity of table images.", "text": "Structure Decoder. The transformer architecture of this component is based on the work proposed in [31]. After extensive experimentation, the Structure Decoder is modeled as a transformer encoder with two encoder layers and a transformer decoder made from a stack of 4 decoder layers that comprise mainly of multi-head attention and feed forward layers. This configuration uses fewer layers and heads in comparison to networks applied to other problems (e.g. \"Scene Understanding\", \"Image Captioning\"), something which we relate to the simplicity of table images."}, {"self_ref": "#/texts/248", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 308.8619689941406, "t": 374.8857421875, "r": 545.1151123046875, "b": 246.4272918701172, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 592]}], "orig": "The transformer encoder receives an encoded image from the CNN Backbone Network and refines it through a multi-head dot-product attention layer, followed by a Feed Forward Network. During training, the transformer decoder receives as input the output feature produced by the transformer encoder, and the tokenized input of the HTML ground-truth tags. Using a stack of multi-head attention layers, different aspects of the tag sequence could be inferred. This is achieved by each attention head on a layer operating in a different subspace, and then combining altogether their attention score.", "text": "The transformer encoder receives an encoded image from the CNN Backbone Network and refines it through a multi-head dot-product attention layer, followed by a Feed Forward Network. During training, the transformer decoder receives as input the output feature produced by the transformer encoder, and the tokenized input of the HTML ground-truth tags. Using a stack of multi-head attention layers, different aspects of the tag sequence could be inferred. This is achieved by each attention head on a layer operating in a different subspace, and then combining altogether their attention score."}, {"self_ref": "#/texts/249", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 308.8619384765625, "t": 243.39540100097656, "r": 545.1151123046875, "b": 138.727294921875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 483]}], "orig": "Cell BBox Decoder. Our architecture allows to simultaneously predict HTML tags and bounding boxes for each table cell without the need of a separate object detector end to end. This approach is inspired by DETR [1] which employs a Transformer Encoder, and Decoder that looks for a specific number of object queries (potential object detections). As our model utilizes a transformer architecture, the hidden state of the < td > ' and ' < ' HTML structure tags become the object query.", "text": "Cell BBox Decoder. Our architecture allows to simultaneously predict HTML tags and bounding boxes for each table cell without the need of a separate object detector end to end. This approach is inspired by DETR [1] which employs a Transformer Encoder, and Decoder that looks for a specific number of object queries (potential object detections). As our model utilizes a transformer architecture, the hidden state of the < td > ' and ' < ' HTML structure tags become the object query."}, {"self_ref": "#/texts/250", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 308.8619384765625, "t": 135.57484436035156, "r": 545.1150512695312, "b": 78.84827423095703, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 286]}], "orig": "The encoding generated by the CNN Backbone Network along with the features acquired for every data cell from the Transformer Decoder are then passed to the attention network. The attention network takes both inputs and learns to provide an attention weighted encoding. This weighted at-", "text": "The encoding generated by the CNN Backbone Network along with the features acquired for every data cell from the Transformer Decoder are then passed to the attention network. The attention network takes both inputs and learns to provide an attention weighted encoding. This weighted at-"}, {"self_ref": "#/texts/251", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 5, "bbox": {"l": 295.1209411621094, "t": 57.86684036254883, "r": 300.10223388671875, "b": 48.96027755737305, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "5", "text": "5"}, {"self_ref": "#/texts/252", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 50.11199951171875, "t": 716.7916259765625, "r": 286.3651428222656, "b": 636.1539916992188, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 380]}], "orig": "tention encoding is then multiplied to the encoded image to produce a feature for each table cell. Notice that this is different than the typical object detection problem where imbalances between the number of detections and the amount of objects may exist. In our case, we know up front that the produced detections always match with the table cells in number and correspondence.", "text": "tention encoding is then multiplied to the encoded image to produce a feature for each table cell. Notice that this is different than the typical object detection problem where imbalances between the number of detections and the amount of objects may exist. In our case, we know up front that the produced detections always match with the table cells in number and correspondence."}, {"self_ref": "#/texts/253", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 50.11199951171875, "t": 632.3755493164062, "r": 286.3651123046875, "b": 551.7369384765625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 371]}], "orig": "The output features for each table cell are then fed into the feed-forward network (FFN). The FFN consists of a Multi-Layer Perceptron (3 layers with ReLU activation function) that predicts the normalized coordinates for the bounding box of each table cell. Finally, the predicted bounding boxes are classified based on whether they are empty or not using a linear layer.", "text": "The output features for each table cell are then fed into the feed-forward network (FFN). The FFN consists of a Multi-Layer Perceptron (3 layers with ReLU activation function) that predicts the normalized coordinates for the bounding box of each table cell. Finally, the predicted bounding boxes are classified based on whether they are empty or not using a linear layer."}, {"self_ref": "#/texts/254", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 50.11199951171875, "t": 548.0780639648438, "r": 286.36572265625, "b": 347.76910400390625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 985]}], "orig": "Loss Functions. We formulate a multi-task loss Eq. 2 to train our network. The Cross-Entropy loss (denoted as l$_{s}$ ) is used to train the Structure Decoder which predicts the structure tokens. As for the Cell BBox Decoder it is trained with a combination of losses denoted as l$_{box}$ . l$_{box}$ consists of the generally used l$_{1}$ loss for object detection and the IoU loss ( l$_{iou}$ ) to be scale invariant as explained in [25]. In comparison to DETR, we do not use the Hungarian algorithm [15] to match the predicted bounding boxes with the ground-truth boxes, as we have already achieved a one-toone match through two steps: 1) Our token input sequence is naturally ordered, therefore the hidden states of the table data cells are also in order when they are provided as input to the Cell BBox Decoder , and 2) Our bounding boxes generation mechanism (see Sec. 3) ensures a one-to-one mapping between the cell content and its bounding box for all post-processed datasets.", "text": "Loss Functions. We formulate a multi-task loss Eq. 2 to train our network. The Cross-Entropy loss (denoted as l$_{s}$ ) is used to train the Structure Decoder which predicts the structure tokens. As for the Cell BBox Decoder it is trained with a combination of losses denoted as l$_{box}$ . l$_{box}$ consists of the generally used l$_{1}$ loss for object detection and the IoU loss ( l$_{iou}$ ) to be scale invariant as explained in [25]. In comparison to DETR, we do not use the Hungarian algorithm [15] to match the predicted bounding boxes with the ground-truth boxes, as we have already achieved a one-toone match through two steps: 1) Our token input sequence is naturally ordered, therefore the hidden states of the table data cells are also in order when they are provided as input to the Cell BBox Decoder , and 2) Our bounding boxes generation mechanism (see Sec. 3) ensures a one-to-one mapping between the cell content and its bounding box for all post-processed datasets."}, {"self_ref": "#/texts/255", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 50.112022399902344, "t": 343.9896545410156, "r": 286.364990234375, "b": 323.12811279296875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 67]}], "orig": "The loss used to train the TableFormer can be defined as following:", "text": "The loss used to train the TableFormer can be defined as following:"}, {"self_ref": "#/texts/256", "parent": {"cref": "#/body"}, "children": [], "label": "formula", "prov": [{"page_no": 6, "bbox": {"l": 124.33001708984375, "t": 298.71905517578125, "r": 286.3624267578125, "b": 274.92828369140625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 84]}], "orig": "l$_{box}$ = \u03bb$_{iou}$l$_{iou}$ + \u03bb$_{l}$$_{1}$ l = \u03bbl$_{s}$ + (1 - \u03bb ) l$_{box}$ (1)", "text": "l$_{box}$ = \u03bb$_{iou}$l$_{iou}$ + \u03bb$_{l}$$_{1}$ l = \u03bbl$_{s}$ + (1 - \u03bb ) l$_{box}$ (1)"}, {"self_ref": "#/texts/257", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 50.112030029296875, "t": 261.4079895019531, "r": 281.596923828125, "b": 251.78411865234375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 76]}], "orig": "where \u03bb \u2208 [0, 1], and \u03bb$_{iou}$, \u03bb$_{l}$$_{1}$ \u2208$_{R}$ are hyper-parameters.", "text": "where \u03bb \u2208 [0, 1], and \u03bb$_{iou}$, \u03bb$_{l}$$_{1}$ \u2208$_{R}$ are hyper-parameters."}, {"self_ref": "#/texts/258", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 6, "bbox": {"l": 50.11204528808594, "t": 236.08311462402344, "r": 171.9833526611328, "b": 225.33538818359375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 23]}], "orig": "5. Experimental Results", "text": "5. Experimental Results", "level": 1}, {"self_ref": "#/texts/259", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 6, "bbox": {"l": 50.11204528808594, "t": 215.7356719970703, "r": 179.17501831054688, "b": 205.8836212158203, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 27]}], "orig": "5.1. Implementation Details", "text": "5.1. Implementation Details", "level": 1}, {"self_ref": "#/texts/260", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 50.11204528808594, "t": 196.2656707763672, "r": 286.36517333984375, "b": 151.4931182861328, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 207]}], "orig": "TableFormer uses ResNet-18 as the CNN Backbone Network . The input images are resized to 448*448 pixels and the feature map has a dimension of 28*28. Additionally, we enforce the following input constraints:", "text": "TableFormer uses ResNet-18 as the CNN Backbone Network . The input images are resized to 448*448 pixels and the feature map has a dimension of 28*28. Additionally, we enforce the following input constraints:"}, {"self_ref": "#/texts/261", "parent": {"cref": "#/body"}, "children": [], "label": "formula", "prov": [{"page_no": 6, "bbox": {"l": 91.66104888916016, "t": 138.1719970703125, "r": 286.3624572753906, "b": 113.60411834716797, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 77]}], "orig": "Image width and height \u2264 1024 pixels Structural tags length \u2264 512 tokens. (2)", "text": "Image width and height \u2264 1024 pixels Structural tags length \u2264 512 tokens. (2)"}, {"self_ref": "#/texts/262", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 50.112060546875, "t": 99.70968627929688, "r": 286.3651428222656, "b": 78.8481216430664, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 117]}], "orig": "Although input constraints are used also by other methods, such as EDD, ours are less restrictive due to the improved", "text": "Although input constraints are used also by other methods, such as EDD, ours are less restrictive due to the improved"}, {"self_ref": "#/texts/263", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 6, "bbox": {"l": 295.12103271484375, "t": 57.86667251586914, "r": 300.1023254394531, "b": 48.96010971069336, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "6", "text": "6"}, {"self_ref": "#/texts/264", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 308.862060546875, "t": 716.7916870117188, "r": 545.115234375, "b": 683.97509765625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 156]}], "orig": "runtime performance and lower memory footprint of TableFormer. This allows to utilize input samples with longer sequences and images with larger dimensions.", "text": "runtime performance and lower memory footprint of TableFormer. This allows to utilize input samples with longer sequences and images with larger dimensions."}, {"self_ref": "#/texts/265", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 308.862060546875, "t": 675.7706298828125, "r": 545.1152954101562, "b": 463.6259460449219, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1024]}], "orig": "The Transformer Encoder consists of two \"Transformer Encoder Layers\", with an input feature size of 512, feed forward network of 1024, and 4 attention heads. As for the Transformer Decoder it is composed of four \"Transformer Decoder Layers\" with similar input and output dimensions as the \"Transformer Encoder Layers\". Even though our model uses fewer layers and heads than the default implementation parameters, our extensive experimentation has proved this setup to be more suitable for table images. We attribute this finding to the inherent design of table images, which contain mostly lines and text, unlike the more elaborate content present in other scopes (e.g. the COCO dataset). Moreover, we have added ResNet blocks to the inputs of the Structure Decoder and Cell BBox Decoder. This prevents a decoder having a stronger influence over the learned weights which would damage the other prediction task (structure vs bounding boxes), but learn task specific weights instead. Lastly our dropout layers are set to 0.5.", "text": "The Transformer Encoder consists of two \"Transformer Encoder Layers\", with an input feature size of 512, feed forward network of 1024, and 4 attention heads. As for the Transformer Decoder it is composed of four \"Transformer Decoder Layers\" with similar input and output dimensions as the \"Transformer Encoder Layers\". Even though our model uses fewer layers and heads than the default implementation parameters, our extensive experimentation has proved this setup to be more suitable for table images. We attribute this finding to the inherent design of table images, which contain mostly lines and text, unlike the more elaborate content present in other scopes (e.g. the COCO dataset). Moreover, we have added ResNet blocks to the inputs of the Structure Decoder and Cell BBox Decoder. This prevents a decoder having a stronger influence over the learned weights which would damage the other prediction task (structure vs bounding boxes), but learn task specific weights instead. Lastly our dropout layers are set to 0.5."}, {"self_ref": "#/texts/266", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 308.8620300292969, "t": 455.4224853515625, "r": 545.1151733398438, "b": 362.83001708984375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 419]}], "orig": "For training, TableFormer is trained with 3 Adam optimizers, each one for the CNN Backbone Network , Structure Decoder , and Cell BBox Decoder . Taking the PubTabNet as an example for our parameter set up, the initializing learning rate is 0.001 for 12 epochs with a batch size of 24, and \u03bb set to 0.5. Afterwards, we reduce the learning rate to 0.0001, the batch size to 18 and train for 12 more epochs or convergence.", "text": "For training, TableFormer is trained with 3 Adam optimizers, each one for the CNN Backbone Network , Structure Decoder , and Cell BBox Decoder . Taking the PubTabNet as an example for our parameter set up, the initializing learning rate is 0.001 for 12 epochs with a batch size of 24, and \u03bb set to 0.5. Afterwards, we reduce the learning rate to 0.0001, the batch size to 18 and train for 12 more epochs or convergence."}, {"self_ref": "#/texts/267", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 308.8620300292969, "t": 354.6255798339844, "r": 545.115234375, "b": 238.12310791015625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 528]}], "orig": "TableFormer is implemented with PyTorch and Torchvision libraries [22]. To speed up the inference, the image undergoes a single forward pass through the CNN Backbone Network and transformer encoder. This eliminates the overhead of generating the same features for each decoding step. Similarly, we employ a 'caching' technique to preform faster autoregressive decoding. This is achieved by storing the features of decoded tokens so we can reuse them for each time step. Therefore, we only compute the attention for each new tag.", "text": "TableFormer is implemented with PyTorch and Torchvision libraries [22]. To speed up the inference, the image undergoes a single forward pass through the CNN Backbone Network and transformer encoder. This eliminates the overhead of generating the same features for each decoding step. Similarly, we employ a 'caching' technique to preform faster autoregressive decoding. This is achieved by storing the features of decoded tokens so we can reuse them for each time step. Therefore, we only compute the attention for each new tag."}, {"self_ref": "#/texts/268", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 6, "bbox": {"l": 308.8620300292969, "t": 212.4456787109375, "r": 397.44281005859375, "b": 202.5936279296875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 19]}], "orig": "5.2. Generalization", "text": "5.2. Generalization", "level": 1}, {"self_ref": "#/texts/269", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 308.8620300292969, "t": 188.55067443847656, "r": 545.1151733398438, "b": 119.86811065673828, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 299]}], "orig": "TableFormer is evaluated on three major publicly available datasets of different nature to prove the generalization and effectiveness of our model. The datasets used for evaluation are the PubTabNet, FinTabNet and TableBank which stem from the scientific, financial and general domains respectively.", "text": "TableFormer is evaluated on three major publicly available datasets of different nature to prove the generalization and effectiveness of our model. The datasets used for evaluation are the PubTabNet, FinTabNet and TableBank which stem from the scientific, financial and general domains respectively."}, {"self_ref": "#/texts/270", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 308.8620300292969, "t": 111.6646728515625, "r": 545.115234375, "b": 78.84710693359375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 155]}], "orig": "We also share our baseline results on the challenging SynthTabNet dataset. Throughout our experiments, the same parameters stated in Sec. 5.1 are utilized.", "text": "We also share our baseline results on the challenging SynthTabNet dataset. Throughout our experiments, the same parameters stated in Sec. 5.1 are utilized."}, {"self_ref": "#/texts/271", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 7, "bbox": {"l": 50.11199951171875, "t": 717.5986328125, "r": 167.89825439453125, "b": 707.74658203125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 25]}], "orig": "5.3. Datasets and Metrics", "text": "5.3. Datasets and Metrics", "level": 1}, {"self_ref": "#/texts/272", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 7, "bbox": {"l": 50.11199951171875, "t": 698.6495971679688, "r": 286.3651123046875, "b": 653.8770141601562, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 192]}], "orig": "The Tree-Edit-Distance-Based Similarity (TEDS) metric was introduced in [37]. It represents the prediction, and ground-truth as a tree structure of HTML tags. This similarity is calculated as:", "text": "The Tree-Edit-Distance-Based Similarity (TEDS) metric was introduced in [37]. It represents the prediction, and ground-truth as a tree structure of HTML tags. This similarity is calculated as:"}, {"self_ref": "#/texts/273", "parent": {"cref": "#/body"}, "children": [], "label": "formula", "prov": [{"page_no": 7, "bbox": {"l": 86.218994140625, "t": 641.6820068359375, "r": 286.3623962402344, "b": 619.26123046875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 99]}], "orig": "TEDS ( T$_{a}$, T$_{b}$ ) = 1 - EditDist ( T$_{a}$, T$_{b}$ ) max ( | T$_{a}$ | , | T$_{b}$ | ) (3)", "text": "TEDS ( T$_{a}$, T$_{b}$ ) = 1 - EditDist ( T$_{a}$, T$_{b}$ ) max ( | T$_{a}$ | , | T$_{b}$ | ) (3)"}, {"self_ref": "#/texts/274", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 7, "bbox": {"l": 50.11198425292969, "t": 610.9970092773438, "r": 286.36285400390625, "b": 578.02099609375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 162]}], "orig": "where T$_{a}$ and T$_{b}$ represent tables in tree structure HTML format. EditDist denotes the tree-edit distance, and | T | represents the number of nodes in T .", "text": "where T$_{a}$ and T$_{b}$ represent tables in tree structure HTML format. EditDist denotes the tree-edit distance, and | T | represents the number of nodes in T ."}, {"self_ref": "#/texts/275", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 7, "bbox": {"l": 50.11199951171875, "t": 567.1805419921875, "r": 170.45169067382812, "b": 557.3284912109375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 26]}], "orig": "5.4. Quantitative Analysis", "text": "5.4. Quantitative Analysis", "level": 1}, {"self_ref": "#/texts/276", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 7, "bbox": {"l": 50.11199951171875, "t": 548.35009765625, "r": 286.3651428222656, "b": 395.862060546875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 723]}], "orig": "Structure. As shown in Tab. 2, TableFormer outperforms all SOTA methods across different datasets by a large margin for predicting the table structure from an image. All the more, our model outperforms pre-trained methods. During the evaluation we do not apply any table filtering. We also provide our baseline results on the SynthTabNet dataset. It has been observed that large tables (e.g. tables that occupy half of the page or more) yield poor predictions. We attribute this issue to the image resizing during the preprocessing step, that produces downsampled images with indistinguishable features. This problem can be addressed by treating such big tables with a separate model which accepts a large input image size.", "text": "Structure. As shown in Tab. 2, TableFormer outperforms all SOTA methods across different datasets by a large margin for predicting the table structure from an image. All the more, our model outperforms pre-trained methods. During the evaluation we do not apply any table filtering. We also provide our baseline results on the SynthTabNet dataset. It has been observed that large tables (e.g. tables that occupy half of the page or more) yield poor predictions. We attribute this issue to the image resizing during the preprocessing step, that produces downsampled images with indistinguishable features. This problem can be addressed by treating such big tables with a separate model which accepts a large input image size."}, {"self_ref": "#/texts/277", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 7, "bbox": {"l": 50.11199951171875, "t": 199.56663513183594, "r": 286.3651123046875, "b": 178.705078125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 101]}], "orig": "Table 2: Structure results on PubTabNet (PTN), FinTabNet (FTN), TableBank (TB) and SynthTabNet (STN).", "text": "Table 2: Structure results on PubTabNet (PTN), FinTabNet (FTN), TableBank (TB) and SynthTabNet (STN)."}, {"self_ref": "#/texts/278", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 7, "bbox": {"l": 50.11199951171875, "t": 175.65663146972656, "r": 261.7873229980469, "b": 166.7500762939453, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 50]}], "orig": "FT: Model was trained on PubTabNet then finetuned.", "text": "FT: Model was trained on PubTabNet then finetuned."}, {"self_ref": "#/texts/279", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 7, "bbox": {"l": 50.11201477050781, "t": 147.6501922607422, "r": 286.3659973144531, "b": 78.84806823730469, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 346]}], "orig": "Cell Detection. Like any object detector, our Cell BBox Detector provides bounding boxes that can be improved with post-processing during inference. We make use of the grid-like structure of tables to refine the predictions. A detailed explanation on the post-processing is available in the supplementary material. As shown in Tab. 3, we evaluate", "text": "Cell Detection. Like any object detector, our Cell BBox Detector provides bounding boxes that can be improved with post-processing during inference. We make use of the grid-like structure of tables to refine the predictions. A detailed explanation on the post-processing is available in the supplementary material. As shown in Tab. 3, we evaluate"}, {"self_ref": "#/texts/280", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 7, "bbox": {"l": 295.1210021972656, "t": 57.866641998291016, "r": 300.102294921875, "b": 48.960079193115234, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "7", "text": "7"}, {"self_ref": "#/texts/281", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 7, "bbox": {"l": 308.86199951171875, "t": 716.7916259765625, "r": 545.1151733398438, "b": 564.4229125976562, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 737]}], "orig": "our Cell BBox Decoder accuracy for cells with a class label of 'content' only using the PASCAL VOC mAP metric for pre-processing and post-processing. Note that we do not have post-processing results for SynthTabNet as images are only provided. To compare the performance of our proposed approach, we've integrated TableFormer's Cell BBox Decoder into EDD architecture. As mentioned previously, the Structure Decoder provides the Cell BBox Decoder with the features needed to predict the bounding box predictions. Therefore, the accuracy of the Structure Decoder directly influences the accuracy of the Cell BBox Decoder . If the Structure Decoder predicts an extra column, this will result in an extra column of predicted bounding boxes.", "text": "our Cell BBox Decoder accuracy for cells with a class label of 'content' only using the PASCAL VOC mAP metric for pre-processing and post-processing. Note that we do not have post-processing results for SynthTabNet as images are only provided. To compare the performance of our proposed approach, we've integrated TableFormer's Cell BBox Decoder into EDD architecture. As mentioned previously, the Structure Decoder provides the Cell BBox Decoder with the features needed to predict the bounding box predictions. Therefore, the accuracy of the Structure Decoder directly influences the accuracy of the Cell BBox Decoder . If the Structure Decoder predicts an extra column, this will result in an extra column of predicted bounding boxes."}, {"self_ref": "#/texts/282", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 7, "bbox": {"l": 308.86199951171875, "t": 475.5506896972656, "r": 545.1151733398438, "b": 454.68914794921875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 94]}], "orig": "Table 3: Cell Bounding Box detection results on PubTabNet, and FinTabNet. PP: Post-processing.", "text": "Table 3: Cell Bounding Box detection results on PubTabNet, and FinTabNet. PP: Post-processing."}, {"self_ref": "#/texts/283", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 7, "bbox": {"l": 308.8619689941406, "t": 424.3202819824219, "r": 545.1156616210938, "b": 271.8323059082031, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 715]}], "orig": "Cell Content. In this section, we evaluate the entire pipeline of recovering a table with content. Here we put our approach to test by capitalizing on extracting content from the PDF cells rather than decoding from images. Tab. 4 shows the TEDs score of HTML code representing the structure of the table along with the content inserted in the data cell and compared with the ground-truth. Our method achieved a 5.3% increase over the state-of-the-art, and commercial solutions. We believe our scores would be higher if the HTML ground-truth matched the extracted PDF cell content. Unfortunately, there are small discrepancies such as spacings around words or special characters with various unicode representations.", "text": "Cell Content. In this section, we evaluate the entire pipeline of recovering a table with content. Here we put our approach to test by capitalizing on extracting content from the PDF cells rather than decoding from images. Tab. 4 shows the TEDs score of HTML code representing the structure of the table along with the content inserted in the data cell and compared with the ground-truth. Our method achieved a 5.3% increase over the state-of-the-art, and commercial solutions. We believe our scores would be higher if the HTML ground-truth matched the extracted PDF cell content. Unfortunately, there are small discrepancies such as spacings around words or special characters with various unicode representations."}, {"self_ref": "#/texts/284", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 7, "bbox": {"l": 308.86199951171875, "t": 135.13864135742188, "r": 545.1151733398438, "b": 102.32206726074219, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 148]}], "orig": "Table 4: Results of structure with content retrieved using cell detection on PubTabNet. In all cases the input is PDF documents with cropped tables.", "text": "Table 4: Results of structure with content retrieved using cell detection on PubTabNet. In all cases the input is PDF documents with cropped tables."}, {"self_ref": "#/texts/285", "parent": {"cref": "#/groups/4"}, "children": [], "label": "list_item", "prov": [{"page_no": 8, "bbox": {"l": 53.28603744506836, "t": 713.3124389648438, "r": 61.550289154052734, "b": 705.4392700195312, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "a.", "text": "a.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/286", "parent": {"cref": "#/groups/4"}, "children": [], "label": "list_item", "prov": [{"page_no": 8, "bbox": {"l": 65.68241882324219, "t": 713.3124389648438, "r": 499.5556335449219, "b": 705.4392700195312, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 105]}], "orig": "Red - PDF cells, Green - predicted bounding boxes, Blue - post-processed predictions matched to PDF cells", "text": "Red - PDF cells, Green - predicted bounding boxes, Blue - post-processed predictions matched to PDF cells", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/287", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 8, "bbox": {"l": 53.81178283691406, "t": 697.7188720703125, "r": 284.3459167480469, "b": 689.845703125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 53]}], "orig": "Japanese language (previously unseen by TableFormer):", "text": "Japanese language (previously unseen by TableFormer):", "level": 1}, {"self_ref": "#/texts/288", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 8, "bbox": {"l": 304.830810546875, "t": 697.7188720703125, "r": 431.0911865234375, "b": 689.845703125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 29]}], "orig": "Example table from FinTabNet:", "text": "Example table from FinTabNet:", "level": 1}, {"self_ref": "#/texts/289", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 8, "bbox": {"l": 53.81178283691406, "t": 583.7667236328125, "r": 385.93450927734375, "b": 575.8935546875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 79]}], "orig": "b. Structure predicted by TableFormer, with superimposed matched PDF cell text:", "text": "b. Structure predicted by TableFormer, with superimposed matched PDF cell text:"}, {"self_ref": "#/texts/290", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 8, "bbox": {"l": 380.42730712890625, "t": 499.69573974609375, "r": 549.4217529296875, "b": 493.39715576171875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 53]}], "orig": "Text is aligned to match original for ease of viewing", "text": "Text is aligned to match original for ease of viewing"}, {"self_ref": "#/texts/291", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 8, "bbox": {"l": 50.11199951171875, "t": 471.1226501464844, "r": 545.11376953125, "b": 426.3501281738281, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 397]}], "orig": "Figure 5: One of the benefits of TableFormer is that it is language agnostic, as an example, the left part of the illustration demonstrates TableFormer predictions on previously unseen language (Japanese). Additionally, we see that TableFormer is robust to variability in style and content, right side of the illustration shows the example of the TableFormer prediction from the FinTabNet dataset.", "text": "Figure 5: One of the benefits of TableFormer is that it is language agnostic, as an example, the left part of the illustration demonstrates TableFormer predictions on previously unseen language (Japanese). Additionally, we see that TableFormer is robust to variability in style and content, right side of the illustration shows the example of the TableFormer prediction from the FinTabNet dataset."}, {"self_ref": "#/texts/292", "parent": {"cref": "#/pictures/8"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 53.715248, "t": 410.22278, "r": 85.657333, "b": 405.55719, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 12]}], "orig": "Ground Truth", "text": "Ground Truth"}, {"self_ref": "#/texts/293", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 437.37939, "t": 391.44705, "r": 443.69870000000003, "b": 385.12842, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "16", "text": "16"}, {"self_ref": "#/texts/294", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 450.33203, "t": 391.44705, "r": 456.6513100000001, "b": 385.12842, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "17", "text": "17"}, {"self_ref": "#/texts/295", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 463.28464, "t": 391.44705, "r": 469.60394, "b": 385.12842, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "18", "text": "18"}, {"self_ref": "#/texts/296", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 476.23724000000004, "t": 391.44705, "r": 482.5565500000001, "b": 385.12842, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "19", "text": "19"}, {"self_ref": "#/texts/297", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 489.18988, "t": 391.44705, "r": 495.50916, "b": 385.12842, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "20", "text": "20"}, {"self_ref": "#/texts/298", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 502.14251999999993, "t": 391.44705, "r": 508.46178999999995, "b": 385.12842, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "21", "text": "21"}, {"self_ref": "#/texts/299", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 515.09509, "t": 391.44705, "r": 521.41443, "b": 385.12842, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "22", "text": "22"}, {"self_ref": "#/texts/300", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 385.2814, "t": 380.96163999999993, "r": 391.60071, "b": 374.64301, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "23", "text": "23"}, {"self_ref": "#/texts/301", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 398.52341, "t": 380.96163999999993, "r": 404.84271, "b": 374.64301, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "24", "text": "24"}, {"self_ref": "#/texts/302", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 411.47604, "t": 380.96163999999993, "r": 417.79535, "b": 374.64301, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "25", "text": "25"}, {"self_ref": "#/texts/303", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 437.37939, "t": 380.96163999999993, "r": 443.69870000000003, "b": 374.64301, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "26", "text": "26"}, {"self_ref": "#/texts/304", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 450.33203, "t": 380.96163999999993, "r": 456.6513100000001, "b": 374.64301, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "27", "text": "27"}, {"self_ref": "#/texts/305", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 463.28464, "t": 380.96163999999993, "r": 469.60394, "b": 374.64301, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "28", "text": "28"}, {"self_ref": "#/texts/306", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 385.2814, "t": 370.9303, "r": 391.60071, "b": 364.61166, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "30", "text": "30"}, {"self_ref": "#/texts/307", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 398.52341, "t": 370.9303, "r": 404.84271, "b": 364.61166, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "31", "text": "31"}, {"self_ref": "#/texts/308", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 411.47604, "t": 370.9303, "r": 417.79532, "b": 364.61166, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "32", "text": "32"}, {"self_ref": "#/texts/309", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 424.42865, "t": 370.9303, "r": 430.74796, "b": 364.61166, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "33", "text": "33"}, {"self_ref": "#/texts/310", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 437.38129, "t": 370.9303, "r": 443.70056, "b": 364.61166, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "34", "text": "34"}, {"self_ref": "#/texts/311", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 450.33389000000005, "t": 370.9303, "r": 456.65319999999997, "b": 364.61166, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "35", "text": "35"}, {"self_ref": "#/texts/312", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 463.2865, "t": 370.9303, "r": 469.6058, "b": 364.61166, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "36", "text": "36"}, {"self_ref": "#/texts/313", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 476.23914, "t": 370.9303, "r": 482.55841, "b": 364.61166, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "37", "text": "37"}, {"self_ref": "#/texts/314", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 489.1917700000001, "t": 370.9303, "r": 495.51105, "b": 364.61166, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "38", "text": "38"}, {"self_ref": "#/texts/315", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 502.14438, "t": 370.9303, "r": 508.46368, "b": 364.61166, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "39", "text": "39"}, {"self_ref": "#/texts/316", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 515.09705, "t": 370.9303, "r": 521.41632, "b": 364.61166, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "40", "text": "40"}, {"self_ref": "#/texts/317", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 528.04962, "t": 370.9303, "r": 534.3689, "b": 364.61166, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "41", "text": "41"}, {"self_ref": "#/texts/318", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 385.2814, "t": 359.95569, "r": 391.60071, "b": 353.63705, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "42", "text": "42"}, {"self_ref": "#/texts/319", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 398.52341, "t": 359.95569, "r": 404.84271, "b": 353.63705, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "43", "text": "43"}, {"self_ref": "#/texts/320", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 411.47604, "t": 359.95569, "r": 417.79532, "b": 353.63705, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "44", "text": "44"}, {"self_ref": "#/texts/321", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 424.42865, "t": 359.95569, "r": 430.74796, "b": 353.63705, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "45", "text": "45"}, {"self_ref": "#/texts/322", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 437.38129, "t": 359.95569, "r": 443.70056, "b": 353.63705, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "46", "text": "46"}, {"self_ref": "#/texts/323", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 450.33389000000005, "t": 359.95569, "r": 456.65319999999997, "b": 353.63705, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "47", "text": "47"}, {"self_ref": "#/texts/324", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 463.2865, "t": 359.95569, "r": 469.6058, "b": 353.63705, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "48", "text": "48"}, {"self_ref": "#/texts/325", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 476.23914, "t": 359.95569, "r": 482.55841, "b": 353.63705, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "49", "text": "49"}, {"self_ref": "#/texts/326", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 489.1917700000001, "t": 359.95569, "r": 495.51105, "b": 353.63705, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "50", "text": "50"}, {"self_ref": "#/texts/327", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 502.14438, "t": 359.95569, "r": 508.46368, "b": 353.63705, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "51", "text": "51"}, {"self_ref": "#/texts/328", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 515.09705, "t": 359.95569, "r": 521.41632, "b": 353.63705, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "52", "text": "52"}, {"self_ref": "#/texts/329", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 528.04962, "t": 359.95569, "r": 534.3689, "b": 353.63705, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "53", "text": "53"}, {"self_ref": "#/texts/330", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 385.2814, "t": 402.79996, "r": 388.44073, "b": 396.48132, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/331", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 398.52341, "t": 402.79996, "r": 401.68274, "b": 396.48132, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/332", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 411.4754, "t": 402.79996, "r": 414.63474, "b": 396.48132, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "2", "text": "2"}, {"self_ref": "#/texts/333", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 424.4274, "t": 402.79996, "r": 427.58673, "b": 396.48132, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "3", "text": "3"}, {"self_ref": "#/texts/334", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 437.37939, "t": 402.79996, "r": 440.53870000000006, "b": 396.48132, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "4", "text": "4"}, {"self_ref": "#/texts/335", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 450.33136, "t": 402.79996, "r": 453.49069000000003, "b": 396.48132, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "5", "text": "5"}, {"self_ref": "#/texts/336", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 463.28336, "t": 402.79996, "r": 466.44269, "b": 396.48132, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "6", "text": "6"}, {"self_ref": "#/texts/337", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 476.23535, "t": 402.79996, "r": 479.39468, "b": 396.48132, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "7", "text": "7"}, {"self_ref": "#/texts/338", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 489.18735, "t": 402.79996, "r": 492.34668, "b": 396.48132, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "8", "text": "8"}, {"self_ref": "#/texts/339", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 502.13933999999995, "t": 402.79996, "r": 505.29868000000005, "b": 396.48132, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "9", "text": "9"}, {"self_ref": "#/texts/340", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 515.09131, "t": 402.79996, "r": 521.41064, "b": 396.48132, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "10", "text": "10"}, {"self_ref": "#/texts/341", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 528.04364, "t": 402.79996, "r": 534.13104, "b": 396.48132, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "11", "text": "11"}, {"self_ref": "#/texts/342", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 385.2814, "t": 393.02536, "r": 391.60071, "b": 386.70673, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "12", "text": "12"}, {"self_ref": "#/texts/343", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 398.52341, "t": 393.02536, "r": 404.84271, "b": 386.70673, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "13", "text": "13"}, {"self_ref": "#/texts/344", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 411.47604, "t": 393.02536, "r": 417.79535, "b": 386.70673, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "14", "text": "14"}, {"self_ref": "#/texts/345", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 424.42719, "t": 385.22536999999994, "r": 430.74648999999994, "b": 378.90674, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "15", "text": "15"}, {"self_ref": "#/texts/346", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 502.86941999999993, "t": 381.00562, "r": 509.18871999999993, "b": 374.68698, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "29", "text": "29"}, {"self_ref": "#/texts/347", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 384.35437, "t": 410.22278, "r": 430.99261, "b": 405.55719, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 19]}], "orig": "Predicted Structure", "text": "Predicted Structure"}, {"self_ref": "#/texts/348", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 8, "bbox": {"l": 62.595001220703125, "t": 333.2716369628906, "r": 532.6304931640625, "b": 324.3650817871094, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 112]}], "orig": "Figure 6: An example of TableFormer predictions (bounding boxes and structure) from generated SynthTabNet table.", "text": "Figure 6: An example of TableFormer predictions (bounding boxes and structure) from generated SynthTabNet table."}, {"self_ref": "#/texts/349", "parent": {"cref": "#/pictures/10"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 220.26282, "t": 410.22278, "r": 342.07819, "b": 405.55719, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 49]}], "orig": "Red - PDF cells, Green - predicted bounding boxes", "text": "Red - PDF cells, Green - predicted bounding boxes"}, {"self_ref": "#/texts/350", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 8, "bbox": {"l": 50.11199951171875, "t": 300.6046447753906, "r": 163.75579833984375, "b": 290.7525939941406, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 25]}], "orig": "5.5. Qualitative Analysis", "text": "5.5. Qualitative Analysis", "level": 1}, {"self_ref": "#/texts/351", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 50.11199951171875, "t": 255.1266326904297, "r": 286.3651123046875, "b": 78.84805297851562, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 866]}], "orig": "We showcase several visualizations for the different components of our network on various \"complex\" tables within datasets presented in this work in Fig. 5 and Fig. 6 As it is shown, our model is able to predict bounding boxes for all table cells, even for the empty ones. Additionally, our post-processing techniques can extract the cell content by matching the predicted bounding boxes to the PDF cells based on their overlap and spatial proximity. The left part of Fig. 5 demonstrates also the adaptability of our method to any language, as it can successfully extract Japanese text, although the training set contains only English content. We provide more visualizations including the intermediate steps in the supplementary material. Overall these illustrations justify the versatility of our method across a diverse range of table appearances and content type.", "text": "We showcase several visualizations for the different components of our network on various \"complex\" tables within datasets presented in this work in Fig. 5 and Fig. 6 As it is shown, our model is able to predict bounding boxes for all table cells, even for the empty ones. Additionally, our post-processing techniques can extract the cell content by matching the predicted bounding boxes to the PDF cells based on their overlap and spatial proximity. The left part of Fig. 5 demonstrates also the adaptability of our method to any language, as it can successfully extract Japanese text, although the training set contains only English content. We provide more visualizations including the intermediate steps in the supplementary material. Overall these illustrations justify the versatility of our method across a diverse range of table appearances and content type."}, {"self_ref": "#/texts/352", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 8, "bbox": {"l": 308.86199951171875, "t": 301.29107666015625, "r": 460.8484802246094, "b": 290.5433654785156, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 27]}], "orig": "6. Future Work & Conclusion", "text": "6. Future Work & Conclusion", "level": 1}, {"self_ref": "#/texts/353", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 308.86199951171875, "t": 279.10662841796875, "r": 545.1151733398438, "b": 138.69407653808594, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 640]}], "orig": "In this paper, we presented TableFormer an end-to-end transformer based approach to predict table structures and bounding boxes of cells from an image. This approach enables us to recreate the table structure, and extract the cell content from PDF or OCR by using bounding boxes. Additionally, it provides the versatility required in real-world scenarios when dealing with various types of PDF documents, and languages. Furthermore, our method outperforms all state-of-the-arts with a wide margin. Finally, we introduce \"SynthTabNet\" a challenging synthetically generated dataset that reinforces missing characteristics from other datasets.", "text": "In this paper, we presented TableFormer an end-to-end transformer based approach to predict table structures and bounding boxes of cells from an image. This approach enables us to recreate the table structure, and extract the cell content from PDF or OCR by using bounding boxes. Additionally, it provides the versatility required in real-world scenarios when dealing with various types of PDF documents, and languages. Furthermore, our method outperforms all state-of-the-arts with a wide margin. Finally, we introduce \"SynthTabNet\" a challenging synthetically generated dataset that reinforces missing characteristics from other datasets."}, {"self_ref": "#/texts/354", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 8, "bbox": {"l": 308.86199951171875, "t": 119.90107727050781, "r": 364.4058532714844, "b": 109.15335845947266, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 10]}], "orig": "References", "text": "References", "level": 1}, {"self_ref": "#/texts/355", "parent": {"cref": "#/groups/5"}, "children": [], "label": "list_item", "prov": [{"page_no": 8, "bbox": {"l": 313.3450012207031, "t": 98.0382080078125, "r": 545.1134033203125, "b": 79.06324768066406, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 121]}], "orig": "[1] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-", "text": "[1] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/356", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 8, "bbox": {"l": 295.1210021972656, "t": 57.866634368896484, "r": 300.102294921875, "b": 48.9600715637207, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "8", "text": "8"}, {"self_ref": "#/texts/357", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 70.03099822998047, "t": 716.1162109375, "r": 286.36334228515625, "b": 675.2242431640625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 212]}], "orig": "end object detection with transformers. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision - ECCV 2020 , pages 213-229, Cham, 2020. Springer International Publishing. 5", "text": "end object detection with transformers. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision - ECCV 2020 , pages 213-229, Cham, 2020. Springer International Publishing. 5", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/358", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 54.59500503540039, "t": 671.96826171875, "r": 286.36334228515625, "b": 642.0343017578125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 165]}], "orig": "[2] Zewen Chi, Heyan Huang, Heng-Da Xu, Houjin Yu, Wanxuan Yin, and Xian-Ling Mao. Complicated table structure recognition. arXiv preprint arXiv:1908.04729 , 2019. 3", "text": "[2] Zewen Chi, Heyan Huang, Heng-Da Xu, Houjin Yu, Wanxuan Yin, and Xian-Ling Mao. Complicated table structure recognition. arXiv preprint arXiv:1908.04729 , 2019. 3", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/359", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 54.595001220703125, "t": 638.7783203125, "r": 286.3630065917969, "b": 608.8453369140625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 125]}], "orig": "[3] Bertrand Couasnon and Aurelie Lemaitre. Recognition of Tables and Forms , pages 647-677. Springer London, London, 2014. 2", "text": "[3] Bertrand Couasnon and Aurelie Lemaitre. Recognition of Tables and Forms , pages 647-677. Springer London, London, 2014. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/360", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 54.59498977661133, "t": 605.58935546875, "r": 286.364013671875, "b": 564.6964111328125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 216]}], "orig": "[4] Herv'e D'ejean, Jean-Luc Meunier, Liangcai Gao, Yilun Huang, Yu Fang, Florian Kleber, and Eva-Maria Lang. ICDAR 2019 Competition on Table Detection and Recognition (cTDaR), Apr. 2019. http://sac.founderit.com/. 2", "text": "[4] Herv'e D'ejean, Jean-Luc Meunier, Liangcai Gao, Yilun Huang, Yu Fang, Florian Kleber, and Eva-Maria Lang. ICDAR 2019 Competition on Table Detection and Recognition (cTDaR), Apr. 2019. http://sac.founderit.com/. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/361", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 54.5949821472168, "t": 561.4404296875, "r": 286.36334228515625, "b": 520.5484619140625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 236]}], "orig": "[5] Basilios Gatos, Dimitrios Danatsas, Ioannis Pratikakis, and Stavros J Perantonis. Automatic table detection in document images. In International Conference on Pattern Recognition and Image Analysis , pages 609-618. Springer, 2005. 2", "text": "[5] Basilios Gatos, Dimitrios Danatsas, Ioannis Pratikakis, and Stavros J Perantonis. Automatic table detection in document images. In International Conference on Pattern Recognition and Image Analysis , pages 609-618. Springer, 2005. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/362", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 54.594970703125, "t": 517.2924194335938, "r": 286.36676025390625, "b": 476.3995056152344, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 193]}], "orig": "[6] Max Gobel, Tamir Hassan, Ermelinda Oro, and Giorgio Orsi. Icdar 2013 table competition. In 2013 12th International Conference on Document Analysis and Recognition , pages 1449-1453, 2013. 2", "text": "[6] Max Gobel, Tamir Hassan, Ermelinda Oro, and Giorgio Orsi. Icdar 2013 table competition. In 2013 12th International Conference on Document Analysis and Recognition , pages 1449-1453, 2013. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/363", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 54.59498977661133, "t": 473.1434631347656, "r": 286.3631896972656, "b": 443.2104797363281, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 165]}], "orig": "[7] EA Green and M Krishnamoorthy. Recognition of tables using table grammars. procs. In Symposium on Document Analysis and Recognition (SDAIR'95) , pages 261-277. 2", "text": "[7] EA Green and M Krishnamoorthy. Recognition of tables using table grammars. procs. In Symposium on Document Analysis and Recognition (SDAIR'95) , pages 261-277. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/364", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 54.59498596191406, "t": 439.9544372558594, "r": 286.3633117675781, "b": 388.1025085449219, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 273]}], "orig": "[8] Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Didier Stricker, and Muhammad Zeshan Afzal. Castabdetectors: Cascade network for table detection in document images with recursive feature pyramid and switchable atrous convolution. Journal of Imaging , 7(10), 2021. 1", "text": "[8] Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Didier Stricker, and Muhammad Zeshan Afzal. Castabdetectors: Cascade network for table detection in document images with recursive feature pyramid and switchable atrous convolution. Journal of Imaging , 7(10), 2021. 1", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/365", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 54.595001220703125, "t": 384.84747314453125, "r": 286.3598937988281, "b": 354.9135437011719, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 170]}], "orig": "[9] Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) , Oct 2017. 1", "text": "[9] Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) , Oct 2017. 1", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/366", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 50.11199951171875, "t": 351.6575012207031, "r": 286.36334228515625, "b": 310.7645568847656, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 226]}], "orig": "[10] Yelin He, X. Qi, Jiaquan Ye, Peng Gao, Yihao Chen, Bingcong Li, Xin Tang, and Rong Xiao. Pingan-vcgroup's solution for icdar 2021 competition on scientific table image recognition to latex. ArXiv , abs/2105.01846, 2021. 2", "text": "[10] Yelin He, X. Qi, Jiaquan Ye, Peng Gao, Yihao Chen, Bingcong Li, Xin Tang, and Rong Xiao. Pingan-vcgroup's solution for icdar 2021 competition on scientific table image recognition to latex. ArXiv , abs/2105.01846, 2021. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/367", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 50.11199951171875, "t": 307.509521484375, "r": 286.3633117675781, "b": 255.65762329101562, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 239]}], "orig": "[11] Jianying Hu, Ramanujan S Kashi, Daniel P Lopresti, and Gordon Wilfong. Medium-independent table detection. In Document Recognition and Retrieval VII , volume 3967, pages 291-302. International Society for Optics and Photonics, 1999. 2", "text": "[11] Jianying Hu, Ramanujan S Kashi, Daniel P Lopresti, and Gordon Wilfong. Medium-independent table detection. In Document Recognition and Retrieval VII , volume 3967, pages 291-302. International Society for Optics and Photonics, 1999. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/368", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 50.11200714111328, "t": 252.40158081054688, "r": 286.36334228515625, "b": 200.55062866210938, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 240]}], "orig": "[12] Matthew Hurst. A constraint-based approach to table structure derivation. In Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2 , ICDAR '03, page 911, USA, 2003. IEEE Computer Society. 2", "text": "[12] Matthew Hurst. A constraint-based approach to table structure derivation. In Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2 , ICDAR '03, page 911, USA, 2003. IEEE Computer Society. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/369", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 50.11200714111328, "t": 197.29458618164062, "r": 286.3633117675781, "b": 145.442626953125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 283]}], "orig": "[13] Thotreingam Kasar, Philippine Barlas, Sebastien Adam, Cl'ement Chatelain, and Thierry Paquet. Learning to detect tables in scanned document images using line information. In 2013 12th International Conference on Document Analysis and Recognition , pages 1185-1189. IEEE, 2013. 2", "text": "[13] Thotreingam Kasar, Philippine Barlas, Sebastien Adam, Cl'ement Chatelain, and Thierry Paquet. Learning to detect tables in scanned document images using line information. In 2013 12th International Conference on Document Analysis and Recognition , pages 1185-1189. IEEE, 2013. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/370", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 50.11199188232422, "t": 142.18658447265625, "r": 286.36334228515625, "b": 112.25361633300781, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 142]}], "orig": "[14] Pratik Kayal, Mrinal Anand, Harsh Desai, and Mayank Singh. Icdar 2021 competition on scientific table image recognition to latex, 2021. 2", "text": "[14] Pratik Kayal, Mrinal Anand, Harsh Desai, and Mayank Singh. Icdar 2021 competition on scientific table image recognition to latex, 2021. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/371", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 50.11199188232422, "t": 108.99756622314453, "r": 286.35931396484375, "b": 79.06361389160156, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 127]}], "orig": "[15] Harold W Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly , 2(1-2):83-97, 1955. 6", "text": "[15] Harold W Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly , 2(1-2):83-97, 1955. 6", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/372", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 9, "bbox": {"l": 295.12103271484375, "t": 57.86741256713867, "r": 300.1023254394531, "b": 48.96084976196289, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "9", "text": "9"}, {"self_ref": "#/texts/373", "parent": {"cref": "#/groups/7"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 308.8619689941406, "t": 716.1165771484375, "r": 545.11474609375, "b": 653.306640625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 287]}], "orig": "[16] Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C. Berg, and Tamara L. Berg. Babytalk: Understanding and generating simple image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence , 35(12):2891-2903, 2013. 4", "text": "[16] Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C. Berg, and Tamara L. Berg. Babytalk: Understanding and generating simple image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence , 35(12):2891-2903, 2013. 4", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/374", "parent": {"cref": "#/groups/7"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 308.86199951171875, "t": 649.8766479492188, "r": 545.1134033203125, "b": 619.9436645507812, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 156]}], "orig": "[17] Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, and Zhoujun Li. Tablebank: A benchmark dataset for table detection and recognition, 2019. 2, 3", "text": "[17] Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, and Zhoujun Li. Tablebank: A benchmark dataset for table detection and recognition, 2019. 2, 3", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/375", "parent": {"cref": "#/groups/7"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 308.86199951171875, "t": 616.513671875, "r": 545.113525390625, "b": 531.7857666015625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 407]}], "orig": "[18] Yiren Li, Zheng Huang, Junchi Yan, Yi Zhou, Fan Ye, and Xianhui Liu. Gfte: Graph-based financial table extraction. In Alberto Del Bimbo, Rita Cucchiara, Stan Sclaroff, Giovanni Maria Farinella, Tao Mei, Marco Bertini, Hugo Jair Escalante, and Roberto Vezzani, editors, Pattern Recognition. ICPR International Workshops and Challenges , pages 644-658, Cham, 2021. Springer International Publishing. 2, 3", "text": "[18] Yiren Li, Zheng Huang, Junchi Yan, Yi Zhou, Fan Ye, and Xianhui Liu. Gfte: Graph-based financial table extraction. In Alberto Del Bimbo, Rita Cucchiara, Stan Sclaroff, Giovanni Maria Farinella, Tao Mei, Marco Bertini, Hugo Jair Escalante, and Roberto Vezzani, editors, Pattern Recognition. ICPR International Workshops and Challenges , pages 644-658, Cham, 2021. Springer International Publishing. 2, 3", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/376", "parent": {"cref": "#/groups/7"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 308.86199951171875, "t": 528.3557739257812, "r": 545.1141967773438, "b": 465.5458679199219, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 328]}], "orig": "[19] Nikolaos Livathinos, Cesar Berrospi, Maksym Lysak, Viktor Kuropiatnyk, Ahmed Nassar, Andre Carvalho, Michele Dolfi, Christoph Auer, Kasper Dinkla, and Peter Staar. Robust pdf document conversion using recurrent neural networks. Proceedings of the AAAI Conference on Artificial Intelligence , 35(17):15137-15145, May 2021. 1", "text": "[19] Nikolaos Livathinos, Cesar Berrospi, Maksym Lysak, Viktor Kuropiatnyk, Ahmed Nassar, Andre Carvalho, Michele Dolfi, Christoph Auer, Kasper Dinkla, and Peter Staar. Robust pdf document conversion using recurrent neural networks. Proceedings of the AAAI Conference on Artificial Intelligence , 35(17):15137-15145, May 2021. 1", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/377", "parent": {"cref": "#/groups/7"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 308.86199951171875, "t": 462.1158142089844, "r": 545.1160888671875, "b": 421.2228698730469, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 229]}], "orig": "[20] Rujiao Long, Wen Wang, Nan Xue, Feiyu Gao, Zhibo Yang, Yongpan Wang, and Gui-Song Xia. Parsing table structures in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 944-952, 2021. 2", "text": "[20] Rujiao Long, Wen Wang, Nan Xue, Feiyu Gao, Zhibo Yang, Yongpan Wang, and Gui-Song Xia. Parsing table structures in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 944-952, 2021. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/378", "parent": {"cref": "#/groups/7"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 308.86199951171875, "t": 417.7938232421875, "r": 545.1134643554688, "b": 354.9829406738281, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 315]}], "orig": "[21] Shubham Singh Paliwal, D Vishwanath, Rohit Rahul, Monika Sharma, and Lovekesh Vig. Tablenet: Deep learning model for end-to-end table detection and tabular data extraction from scanned document images. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 128-133. IEEE, 2019. 1", "text": "[21] Shubham Singh Paliwal, D Vishwanath, Rohit Rahul, Monika Sharma, and Lovekesh Vig. Tablenet: Deep learning model for end-to-end table detection and tabular data extraction from scanned document images. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 128-133. IEEE, 2019. 1", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/379", "parent": {"cref": "#/groups/7"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 308.86199951171875, "t": 351.55389404296875, "r": 545.11474609375, "b": 233.94903564453125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 592]}], "orig": "[22] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch'e-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32 , pages 8024-8035. Curran Associates, Inc., 2019. 6", "text": "[22] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch'e-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32 , pages 8024-8035. Curran Associates, Inc., 2019. 6", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/380", "parent": {"cref": "#/groups/7"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 308.86199951171875, "t": 230.5189971923828, "r": 545.1134033203125, "b": 167.7090301513672, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 322]}], "orig": "[23] Devashish Prasad, Ayan Gadpal, Kshitij Kapadni, Manish Visave, and Kavita Sultanpure. Cascadetabnet: An approach for end to end table detection and structure recognition from image-based documents. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops , pages 572-573, 2020. 1", "text": "[23] Devashish Prasad, Ayan Gadpal, Kshitij Kapadni, Manish Visave, and Kavita Sultanpure. Cascadetabnet: An approach for end to end table detection and structure recognition from image-based documents. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops , pages 572-573, 2020. 1", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/381", "parent": {"cref": "#/groups/7"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 308.86199951171875, "t": 164.27899169921875, "r": 545.1162109375, "b": 123.38601684570312, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 224]}], "orig": "[24] Shah Rukh Qasim, Hassan Mahmood, and Faisal Shafait. Rethinking table recognition using graph neural networks. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 142-147. IEEE, 2019. 3", "text": "[24] Shah Rukh Qasim, Hassan Mahmood, and Faisal Shafait. Rethinking table recognition using graph neural networks. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 142-147. IEEE, 2019. 3", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/382", "parent": {"cref": "#/groups/7"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 308.8620300292969, "t": 119.95699310302734, "r": 545.1134033203125, "b": 79.06402587890625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 229]}], "orig": "[25] Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on", "text": "[25] Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/383", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 10, "bbox": {"l": 70.03099822998047, "t": 716.1162109375, "r": 286.36175537109375, "b": 697.1412353515625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 64]}], "orig": "Computer Vision and Pattern Recognition , pages 658-666, 2019. 6", "text": "Computer Vision and Pattern Recognition , pages 658-666, 2019. 6"}, {"self_ref": "#/texts/384", "parent": {"cref": "#/groups/8"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 50.11200714111328, "t": 693.834228515625, "r": 286.36578369140625, "b": 631.0233154296875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 302]}], "orig": "[26] Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Dengel, and Sheraz Ahmed. Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) , volume 01, pages 11621167, 2017. 1", "text": "[26] Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Dengel, and Sheraz Ahmed. Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) , volume 01, pages 11621167, 2017. 1", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/385", "parent": {"cref": "#/groups/8"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 50.11200714111328, "t": 627.71533203125, "r": 286.3633728027344, "b": 564.9053955078125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 308]}], "orig": "[27] Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Dengel, and Sheraz Ahmed. Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In 2017 14th IAPR international conference on document analysis and recognition (ICDAR) , volume 1, pages 1162-1167. IEEE, 2017. 3", "text": "[27] Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Dengel, and Sheraz Ahmed. Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In 2017 14th IAPR international conference on document analysis and recognition (ICDAR) , volume 1, pages 1162-1167. IEEE, 2017. 3", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/386", "parent": {"cref": "#/groups/8"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 50.11200714111328, "t": 561.597412109375, "r": 286.36578369140625, "b": 520.7044677734375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 183]}], "orig": "[28] Faisal Shafait and Ray Smith. Table detection in heterogeneous documents. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems , pages 6572, 2010. 2", "text": "[28] Faisal Shafait and Ray Smith. Table detection in heterogeneous documents. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems , pages 6572, 2010. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/387", "parent": {"cref": "#/groups/8"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 50.11200714111328, "t": 517.3964233398438, "r": 286.36627197265625, "b": 465.5455017089844, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 275]}], "orig": "[29] Shoaib Ahmed Siddiqui, Imran Ali Fateh, Syed Tahseen Raza Rizvi, Andreas Dengel, and Sheraz Ahmed. Deeptabstr: Deep learning based table structure recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 1403-1409. IEEE, 2019. 3", "text": "[29] Shoaib Ahmed Siddiqui, Imran Ali Fateh, Syed Tahseen Raza Rizvi, Andreas Dengel, and Sheraz Ahmed. Deeptabstr: Deep learning based table structure recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 1403-1409. IEEE, 2019. 3", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/388", "parent": {"cref": "#/groups/8"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 50.11200714111328, "t": 462.2374572753906, "r": 286.36334228515625, "b": 410.3855285644531, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 251]}], "orig": "[30] Peter W J Staar, Michele Dolfi, Christoph Auer, and Costas Bekas. Corpus conversion service: A machine learning platform to ingest documents at scale. In Proceedings of the 24th ACM SIGKDD , KDD '18, pages 774-782, New York, NY, USA, 2018. ACM. 1", "text": "[30] Peter W J Staar, Michele Dolfi, Christoph Auer, and Costas Bekas. Corpus conversion service: A machine learning platform to ingest documents at scale. In Proceedings of the 24th ACM SIGKDD , KDD '18, pages 774-782, New York, NY, USA, 2018. ACM. 1", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/389", "parent": {"cref": "#/groups/8"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 50.11200714111328, "t": 407.0774841308594, "r": 286.3638916015625, "b": 333.3085632324219, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 366]}], "orig": "[31] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141 ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30 , pages 5998-6008. Curran Associates, Inc., 2017. 5", "text": "[31] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141 ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30 , pages 5998-6008. Curran Associates, Inc., 2017. 5", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/390", "parent": {"cref": "#/groups/8"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 50.11200714111328, "t": 330.0005187988281, "r": 286.36334228515625, "b": 289.1075744628906, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 221]}], "orig": "[32] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2015. 2", "text": "[32] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2015. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/391", "parent": {"cref": "#/groups/8"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 50.11201477050781, "t": 285.7995300292969, "r": 286.3633728027344, "b": 244.90756225585938, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 217]}], "orig": "[33] Wenyuan Xue, Qingyong Li, and Dacheng Tao. Res2tim: reconstruct syntactic structures from table images. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 749-755. IEEE, 2019. 3", "text": "[33] Wenyuan Xue, Qingyong Li, and Dacheng Tao. Res2tim: reconstruct syntactic structures from table images. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 749-755. IEEE, 2019. 3", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/392", "parent": {"cref": "#/groups/8"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 50.112022399902344, "t": 241.59951782226562, "r": 286.3633728027344, "b": 200.70655822753906, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 190]}], "orig": "[34] Wenyuan Xue, Baosheng Yu, Wen Wang, Dacheng Tao, and Qingyong Li. Tgrnet: A table graph reconstruction network for table structure recognition. arXiv preprint arXiv:2106.10598 , 2021. 3", "text": "[34] Wenyuan Xue, Baosheng Yu, Wen Wang, Dacheng Tao, and Qingyong Li. Tgrnet: A table graph reconstruction network for table structure recognition. arXiv preprint arXiv:2106.10598 , 2021. 3", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/393", "parent": {"cref": "#/groups/8"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 50.112030029296875, "t": 197.3985137939453, "r": 286.3634033203125, "b": 156.50555419921875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 220]}], "orig": "[35] Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. Image captioning with semantic attention. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 4651-4659, 2016. 4", "text": "[35] Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. Image captioning with semantic attention. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 4651-4659, 2016. 4", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/394", "parent": {"cref": "#/groups/8"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 50.112022399902344, "t": 153.197509765625, "r": 286.3633728027344, "b": 101.34652709960938, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 280]}], "orig": "[36] Xinyi Zheng, Doug Burdick, Lucian Popa, Peter Zhong, and Nancy Xin Ru Wang. Global table extractor (gte): A framework for joint table identification and cell structure recognition using visual context. Winter Conference for Applications in Computer Vision (WACV) , 2021. 2, 3", "text": "[36] Xinyi Zheng, Doug Burdick, Lucian Popa, Peter Zhong, and Nancy Xin Ru Wang. Global table extractor (gte): A framework for joint table identification and cell structure recognition using visual context. Winter Conference for Applications in Computer Vision (WACV) , 2021. 2, 3", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/395", "parent": {"cref": "#/groups/8"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 50.11201477050781, "t": 98.03849792480469, "r": 286.36334228515625, "b": 79.06353759765625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 106]}], "orig": "[37] Xu Zhong, Elaheh ShafieiBavani, and Antonio Jimeno Yepes. Image-based table recognition: Data, model,", "text": "[37] Xu Zhong, Elaheh ShafieiBavani, and Antonio Jimeno Yepes. Image-based table recognition: Data, model,", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/396", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 10, "bbox": {"l": 292.6300048828125, "t": 57.867008209228516, "r": 302.59259033203125, "b": 48.960445404052734, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "10", "text": "10"}, {"self_ref": "#/texts/397", "parent": {"cref": "#/groups/9"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 328.781005859375, "t": 716.1165161132812, "r": 545.1145629882812, "b": 675.2245483398438, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 192]}], "orig": "and evaluation. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision ECCV 2020 , pages 564-580, Cham, 2020. Springer International Publishing. 2, 3, 7", "text": "and evaluation. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision ECCV 2020 , pages 564-580, Cham, 2020. Springer International Publishing. 2, 3, 7", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/398", "parent": {"cref": "#/groups/9"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 308.86199951171875, "t": 671.2855224609375, "r": 545.1133422851562, "b": 630.392578125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 221]}], "orig": "[38] Xu Zhong, Jianbin Tang, and Antonio Jimeno Yepes. Publaynet: Largest dataset ever for document layout analysis. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 1015-1022, 2019. 1", "text": "[38] Xu Zhong, Jianbin Tang, and Antonio Jimeno Yepes. Publaynet: Largest dataset ever for document layout analysis. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 1015-1022, 2019. 1", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/399", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 11, "bbox": {"l": 132.8419952392578, "t": 681.4251098632812, "r": 465.37591552734375, "b": 656.4699096679688, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 83]}], "orig": "TableFormer: Table Structure Understanding with Transformers Supplementary Material", "text": "TableFormer: Table Structure Understanding with Transformers Supplementary Material", "level": 1}, {"self_ref": "#/texts/400", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 11, "bbox": {"l": 50.11198425292969, "t": 630.839111328125, "r": 175.96437072753906, "b": 620.0913696289062, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 26]}], "orig": "1. Details on the datasets", "text": "1. Details on the datasets", "level": 1}, {"self_ref": "#/texts/401", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 11, "bbox": {"l": 50.11198425292969, "t": 611.0206909179688, "r": 150.364013671875, "b": 601.1686401367188, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 21]}], "orig": "1.1. Data preparation", "text": "1.1. Data preparation", "level": 1}, {"self_ref": "#/texts/402", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 11, "bbox": {"l": 50.11198425292969, "t": 592.0797119140625, "r": 286.3651428222656, "b": 403.8451843261719, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 931]}], "orig": "As a first step of our data preparation process, we have calculated statistics over the datasets across the following dimensions: (1) table size measured in the number of rows and columns, (2) complexity of the table, (3) strictness of the provided HTML structure and (4) completeness (i.e. no omitted bounding boxes). A table is considered to be simple if it does not contain row spans or column spans. Additionally, a table has a strict HTML structure if every row has the same number of columns after taking into account any row or column spans. Therefore a strict HTML structure looks always rectangular. However, HTML is a lenient encoding format, i.e. tables with rows of different sizes might still be regarded as correct due to implicit display rules. These implicit rules leave room for ambiguity, which we want to avoid. As such, we prefer to have \"strict\" tables, i.e. tables where every row has exactly the same length.", "text": "As a first step of our data preparation process, we have calculated statistics over the datasets across the following dimensions: (1) table size measured in the number of rows and columns, (2) complexity of the table, (3) strictness of the provided HTML structure and (4) completeness (i.e. no omitted bounding boxes). A table is considered to be simple if it does not contain row spans or column spans. Additionally, a table has a strict HTML structure if every row has the same number of columns after taking into account any row or column spans. Therefore a strict HTML structure looks always rectangular. However, HTML is a lenient encoding format, i.e. tables with rows of different sizes might still be regarded as correct due to implicit display rules. These implicit rules leave room for ambiguity, which we want to avoid. As such, we prefer to have \"strict\" tables, i.e. tables where every row has exactly the same length."}, {"self_ref": "#/texts/403", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 11, "bbox": {"l": 50.11198425292969, "t": 400.5947265625, "r": 286.3651123046875, "b": 164.54029846191406, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1149]}], "orig": "We have developed a technique that tries to derive a missing bounding box out of its neighbors. As a first step, we use the annotation data to generate the most fine-grained grid that covers the table structure. In case of strict HTML tables, all grid squares are associated with some table cell and in the presence of table spans a cell extends across multiple grid squares. When enough bounding boxes are known for a rectangular table, it is possible to compute the geometrical border lines between the grid rows and columns. Eventually this information is used to generate the missing bounding boxes. Additionally, the existence of unused grid squares indicates that the table rows have unequal number of columns and the overall structure is non-strict. The generation of missing bounding boxes for non-strict HTML tables is ambiguous and therefore quite challenging. Thus, we have decided to simply discard those tables. In case of PubTabNet we have computed missing bounding boxes for 48% of the simple and 69% of the complex tables. Regarding FinTabNet, 68% of the simple and 98% of the complex tables require the generation of bounding boxes.", "text": "We have developed a technique that tries to derive a missing bounding box out of its neighbors. As a first step, we use the annotation data to generate the most fine-grained grid that covers the table structure. In case of strict HTML tables, all grid squares are associated with some table cell and in the presence of table spans a cell extends across multiple grid squares. When enough bounding boxes are known for a rectangular table, it is possible to compute the geometrical border lines between the grid rows and columns. Eventually this information is used to generate the missing bounding boxes. Additionally, the existence of unused grid squares indicates that the table rows have unequal number of columns and the overall structure is non-strict. The generation of missing bounding boxes for non-strict HTML tables is ambiguous and therefore quite challenging. Thus, we have decided to simply discard those tables. In case of PubTabNet we have computed missing bounding boxes for 48% of the simple and 69% of the complex tables. Regarding FinTabNet, 68% of the simple and 98% of the complex tables require the generation of bounding boxes."}, {"self_ref": "#/texts/404", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 11, "bbox": {"l": 50.11198425292969, "t": 161.28985595703125, "r": 286.3649597167969, "b": 140.42730712890625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 92]}], "orig": "Figure 7 illustrates the distribution of the tables across different dimensions per dataset.", "text": "Figure 7 illustrates the distribution of the tables across different dimensions per dataset."}, {"self_ref": "#/texts/405", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 11, "bbox": {"l": 50.11198425292969, "t": 129.60986328125, "r": 153.60784912109375, "b": 119.7578125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 23]}], "orig": "1.2. Synthetic datasets", "text": "1.2. Synthetic datasets", "level": 1}, {"self_ref": "#/texts/406", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 11, "bbox": {"l": 50.11198425292969, "t": 110.66886901855469, "r": 286.36505126953125, "b": 77.852294921875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 167]}], "orig": "Aiming to train and evaluate our models in a broader spectrum of table data we have synthesized four types of datasets. Each one contains tables with different appear-", "text": "Aiming to train and evaluate our models in a broader spectrum of table data we have synthesized four types of datasets. Each one contains tables with different appear-"}, {"self_ref": "#/texts/407", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 11, "bbox": {"l": 308.86199951171875, "t": 629.3448486328125, "r": 545.1151123046875, "b": 584.572265625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 221]}], "orig": "ances in regard to their size, structure, style and content. Every synthetic dataset contains 150k examples, summing up to 600k synthetic examples. All datasets are divided into Train, Test and Val splits (80%, 10%, 10%).", "text": "ances in regard to their size, structure, style and content. Every synthetic dataset contains 150k examples, summing up to 600k synthetic examples. All datasets are divided into Train, Test and Val splits (80%, 10%, 10%)."}, {"self_ref": "#/texts/408", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 11, "bbox": {"l": 308.86199951171875, "t": 580.7648315429688, "r": 545.1150512695312, "b": 559.9032592773438, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 89]}], "orig": "The process of generating a synthetic dataset can be decomposed into the following steps:", "text": "The process of generating a synthetic dataset can be decomposed into the following steps:"}, {"self_ref": "#/texts/409", "parent": {"cref": "#/groups/10"}, "children": [], "label": "list_item", "prov": [{"page_no": 11, "bbox": {"l": 308.86199951171875, "t": 556.0947875976562, "r": 545.1151123046875, "b": 475.45721435546875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 373]}], "orig": "1. Prepare styling and content templates: The styling templates have been manually designed and organized into groups of scope specific appearances (e.g. financial data, marketing data, etc.) Additionally, we have prepared curated collections of content templates by extracting the most frequently used terms out of non-synthetic datasets (e.g. PubTabNet, FinTabNet, etc.).", "text": "1. Prepare styling and content templates: The styling templates have been manually designed and organized into groups of scope specific appearances (e.g. financial data, marketing data, etc.) Additionally, we have prepared curated collections of content templates by extracting the most frequently used terms out of non-synthetic datasets (e.g. PubTabNet, FinTabNet, etc.).", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/410", "parent": {"cref": "#/groups/10"}, "children": [], "label": "list_item", "prov": [{"page_no": 11, "bbox": {"l": 308.86199951171875, "t": 471.6497802734375, "r": 545.1151733398438, "b": 343.19134521484375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 573]}], "orig": "2. Generate table structures: The structure of each synthetic dataset assumes a horizontal table header which potentially spans over multiple rows and a table body that may contain a combination of row spans and column spans. However, spans are not allowed to cross the header - body boundary. The table structure is described by the parameters: Total number of table rows and columns, number of header rows, type of spans (header only spans, row only spans, column only spans, both row and column spans), maximum span size and the ratio of the table area covered by spans.", "text": "2. Generate table structures: The structure of each synthetic dataset assumes a horizontal table header which potentially spans over multiple rows and a table body that may contain a combination of row spans and column spans. However, spans are not allowed to cross the header - body boundary. The table structure is described by the parameters: Total number of table rows and columns, number of header rows, type of spans (header only spans, row only spans, column only spans, both row and column spans), maximum span size and the ratio of the table area covered by spans.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/411", "parent": {"cref": "#/groups/10"}, "children": [], "label": "list_item", "prov": [{"page_no": 11, "bbox": {"l": 308.86199951171875, "t": 339.3839111328125, "r": 545.1151733398438, "b": 294.61138916015625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 195]}], "orig": "3. Generate content: Based on the dataset theme , a set of suitable content templates is chosen first. Then, this content can be combined with purely random text to produce the synthetic content.", "text": "3. Generate content: Based on the dataset theme , a set of suitable content templates is chosen first. Then, this content can be combined with purely random text to produce the synthetic content.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/412", "parent": {"cref": "#/groups/10"}, "children": [], "label": "list_item", "prov": [{"page_no": 11, "bbox": {"l": 308.86199951171875, "t": 290.803955078125, "r": 545.1152954101562, "b": 246.0314178466797, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 218]}], "orig": "4. Apply styling templates: Depending on the domain of the synthetic dataset, a set of styling templates is first manually selected. Then, a style is randomly selected to format the appearance of the synthesized table.", "text": "4. Apply styling templates: Depending on the domain of the synthetic dataset, a set of styling templates is first manually selected. Then, a style is randomly selected to format the appearance of the synthesized table.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/413", "parent": {"cref": "#/groups/10"}, "children": [], "label": "list_item", "prov": [{"page_no": 11, "bbox": {"l": 308.86199951171875, "t": 242.22396850585938, "r": 545.1151733398438, "b": 185.4964141845703, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 238]}], "orig": "5. Render the complete tables: The synthetic table is finally rendered by a web browser engine to generate the bounding boxes for each table cell. A batching technique is utilized to optimize the runtime overhead of the rendering process.", "text": "5. Render the complete tables: The synthetic table is finally rendered by a web browser engine to generate the bounding boxes for each table cell. A batching technique is utilized to optimize the runtime overhead of the rendering process.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/414", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 11, "bbox": {"l": 308.86199951171875, "t": 169.70941162109375, "r": 545.1087646484375, "b": 145.01368713378906, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 47]}], "orig": "2. Prediction post-processing for PDF documents", "text": "2. Prediction post-processing for PDF documents", "level": 1}, {"self_ref": "#/texts/415", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 11, "bbox": {"l": 308.8620300292969, "t": 134.57896423339844, "r": 545.1151733398438, "b": 77.85139465332031, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 247]}], "orig": "Although TableFormer can predict the table structure and the bounding boxes for tables recognized inside PDF documents, this is not enough when a full reconstruction of the original table is required. This happens mainly due the following reasons:", "text": "Although TableFormer can predict the table structure and the bounding boxes for tables recognized inside PDF documents, this is not enough when a full reconstruction of the original table is required. This happens mainly due the following reasons:"}, {"self_ref": "#/texts/416", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 11, "bbox": {"l": 292.63104248046875, "t": 57.86696243286133, "r": 302.5936279296875, "b": 48.96039962768555, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "11", "text": "11"}, {"self_ref": "#/texts/417", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 12, "bbox": {"l": 50.11199951171875, "t": 626.4976196289062, "r": 545.1137084960938, "b": 605.6360473632812, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 245]}], "orig": "Figure 7: Distribution of the tables across different dimensions per dataset. Simple vs complex tables per dataset and split, strict vs non strict html structures per dataset and table complexity, missing bboxes per dataset and table complexity.", "text": "Figure 7: Distribution of the tables across different dimensions per dataset. Simple vs complex tables per dataset and split, strict vs non strict html structures per dataset and table complexity, missing bboxes per dataset and table complexity."}, {"self_ref": "#/texts/418", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 119.39108, "t": 714.68945, "r": 151.94641, "b": 708.74078, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 9]}], "orig": "PubTabNet", "text": "PubTabNet"}, {"self_ref": "#/texts/419", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 53.345978, "t": 716.80847, "r": 59.327053, "b": 710.8598, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "b.", "text": "b."}, {"self_ref": "#/texts/420", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 289.5791, "t": 714.54169, "r": 319.8266, "b": 708.5930199999999, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 9]}], "orig": "FinTabNet", "text": "FinTabNet"}, {"self_ref": "#/texts/421", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 448.37271, "t": 714.7460300000001, "r": 481.75916, "b": 708.79736, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 10]}], "orig": "Table Bank", "text": "Table Bank"}, {"self_ref": "#/texts/422", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 82.553436, "t": 650.72382, "r": 94.976013, "b": 645.7666, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 5]}], "orig": "Train", "text": "Train"}, {"self_ref": "#/texts/423", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 63.03878399999999, "t": 690.89587, "r": 85.290085, "b": 685.9386600000001, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Complex", "text": "Complex"}, {"self_ref": "#/texts/424", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 67.76786, "t": 667.60468, "r": 85.231277, "b": 662.64746, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Simple", "text": "Simple"}, {"self_ref": "#/texts/425", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 227.55121, "t": 689.46008, "r": 249.80251, "b": 684.50287, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Complex", "text": "Complex"}, {"self_ref": "#/texts/426", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 232.19898999999998, "t": 665.0142200000001, "r": 249.66241, "b": 660.05701, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Simple", "text": "Simple"}, {"self_ref": "#/texts/427", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 396.2337, "t": 677.95477, "r": 413.69711, "b": 672.99756, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Simple", "text": "Simple"}, {"self_ref": "#/texts/428", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 97.382202, "t": 650.72382, "r": 105.08014, "b": 645.7666, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "Val", "text": "Val"}, {"self_ref": "#/texts/429", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 60.93763400000001, "t": 706.26678, "r": 76.151443, "b": 701.30957, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "100%", "text": "100%"}, {"self_ref": "#/texts/430", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 82.304901, "t": 705.77649, "r": 106.99162, "b": 700.8192699999998, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 8]}], "orig": "500K 10K", "text": "500K 10K"}, {"self_ref": "#/texts/431", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 246.20530999999997, "t": 650.39392, "r": 281.88013, "b": 645.43671, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 14]}], "orig": "Train Test Val", "text": "Train Test Val"}, {"self_ref": "#/texts/432", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 226.69780000000003, "t": 706.26678, "r": 241.91161, "b": 701.30957, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "100%", "text": "100%"}, {"self_ref": "#/texts/433", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 249.93848999999997, "t": 705.91199, "r": 282.49384, "b": 700.95477, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 11]}], "orig": "91K 10K 10K", "text": "91K 10K 10K"}, {"self_ref": "#/texts/434", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 410.19409, "t": 650.72382, "r": 444.68915, "b": 645.7666, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 14]}], "orig": "Train Test Val", "text": "Train Test Val"}, {"self_ref": "#/texts/435", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 391.37341, "t": 706.26678, "r": 432.6716599999999, "b": 701.30957, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 12]}], "orig": "100% 130K 5K", "text": "100% 130K 5K"}, {"self_ref": "#/texts/436", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 435.60571000000004, "t": 705.73859, "r": 445.62414999999993, "b": 700.78137, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "10K", "text": "10K"}, {"self_ref": "#/texts/437", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 113.94921, "t": 650.71155, "r": 136.20052, "b": 645.75433, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Complex", "text": "Complex"}, {"self_ref": "#/texts/438", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 116.91554000000001, "t": 697.18146, "r": 127.05433999999998, "b": 692.22424, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "Non", "text": "Non"}, {"self_ref": "#/texts/439", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 113.3146, "t": 691.06146, "r": 127.05298, "b": 686.10425, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Strict", "text": "Strict"}, {"self_ref": "#/texts/440", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 112.94112, "t": 684.9414699999999, "r": 127.05537, "b": 679.98425, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "HTML", "text": "HTML"}, {"self_ref": "#/texts/441", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 113.22738999999999, "t": 669.38477, "r": 126.96577, "b": 664.42755, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Strict", "text": "Strict"}, {"self_ref": "#/texts/442", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 112.85390000000001, "t": 663.26477, "r": 126.96814999999998, "b": 658.30756, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "HTML", "text": "HTML"}, {"self_ref": "#/texts/443", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 138.57864, "t": 650.5636, "r": 156.04207, "b": 645.60638, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Simple", "text": "Simple"}, {"self_ref": "#/texts/444", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 122.03101, "t": 705.7287, "r": 151.04185, "b": 700.77148, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 9]}], "orig": "230K 280K", "text": "230K 280K"}, {"self_ref": "#/texts/445", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 311.65359, "t": 705.44501, "r": 321.67203, "b": 700.4877899999999, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "65K", "text": "65K"}, {"self_ref": "#/texts/446", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 287.89441, "t": 650.28937, "r": 310.14572, "b": 645.33215, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Complex", "text": "Complex"}, {"self_ref": "#/texts/447", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 289.23572, "t": 698.92023, "r": 299.37451, "b": 693.96301, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "Non", "text": "Non"}, {"self_ref": "#/texts/448", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 285.63513, "t": 692.80023, "r": 299.3735, "b": 687.8430199999999, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Strict", "text": "Strict"}, {"self_ref": "#/texts/449", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 285.26111, "t": 686.68024, "r": 299.37537, "b": 681.72302, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "HTML", "text": "HTML"}, {"self_ref": "#/texts/450", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 285.43109, "t": 671.61005, "r": 299.16946, "b": 666.65283, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Strict", "text": "Strict"}, {"self_ref": "#/texts/451", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 285.05713, "t": 665.49005, "r": 299.17139, "b": 660.53284, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "HTML", "text": "HTML"}, {"self_ref": "#/texts/452", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 311.34592, "t": 650.28937, "r": 328.80933, "b": 645.33215, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Simple", "text": "Simple"}, {"self_ref": "#/texts/453", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 299.58362, "t": 705.30646, "r": 309.60205, "b": 700.34924, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "47K", "text": "47K"}, {"self_ref": "#/texts/454", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 466.04077000000007, "t": 650.32831, "r": 483.50418, "b": 645.37109, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Simple", "text": "Simple"}, {"self_ref": "#/texts/455", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 459.02151, "t": 698.23883, "r": 469.16031000000004, "b": 693.28162, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "Non", "text": "Non"}, {"self_ref": "#/texts/456", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 455.4209, "t": 692.11884, "r": 469.15927000000005, "b": 687.16162, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Strict", "text": "Strict"}, {"self_ref": "#/texts/457", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 455.04691, "t": 685.9988399999999, "r": 469.16115999999994, "b": 681.04163, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "HTML", "text": "HTML"}, {"self_ref": "#/texts/458", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 467.39401, "t": 706.42761, "r": 480.6545100000001, "b": 701.4704, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "145K", "text": "145K"}, {"self_ref": "#/texts/459", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 160.37672, "t": 650.41614, "r": 182.62802, "b": 645.45892, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Complex", "text": "Complex"}, {"self_ref": "#/texts/460", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 153.74265, "t": 697.13519, "r": 173.32664, "b": 692.17798, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Contain", "text": "Contain"}, {"self_ref": "#/texts/461", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 154.50967, "t": 691.0152, "r": 173.3246, "b": 686.0579799999999, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Missing", "text": "Missing"}, {"self_ref": "#/texts/462", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 155.27162, "t": 684.8952, "r": 173.32664, "b": 679.9379900000001, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "bboxes", "text": "bboxes"}, {"self_ref": "#/texts/463", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 326.41302, "t": 684.76752, "r": 345.99701, "b": 679.8103, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Contain", "text": "Contain"}, {"self_ref": "#/texts/464", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 327.17972, "t": 678.64752, "r": 345.99463, "b": 673.69031, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Missing", "text": "Missing"}, {"self_ref": "#/texts/465", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 327.94131, "t": 672.52753, "r": 345.99634, "b": 667.57031, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "bboxes", "text": "bboxes"}, {"self_ref": "#/texts/466", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 488.9942, "t": 687.8462500000002, "r": 508.76384999999993, "b": 682.88904, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Dataset", "text": "Dataset"}, {"self_ref": "#/texts/467", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 490.1893, "t": 681.72626, "r": 508.76349000000005, "b": 676.7690399999999, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "doesn't", "text": "doesn't"}, {"self_ref": "#/texts/468", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 489.72009, "t": 675.60626, "r": 508.76758, "b": 670.6490499999999, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "provide", "text": "provide"}, {"self_ref": "#/texts/469", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 490.71121, "t": 669.48627, "r": 508.76624, "b": 664.52905, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "bboxes", "text": "bboxes"}, {"self_ref": "#/texts/470", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 185.37759, "t": 650.28882, "r": 202.84102, "b": 645.3316, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Simple", "text": "Simple"}, {"self_ref": "#/texts/471", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 168.50357, "t": 705.86389, "r": 197.52699, "b": 700.90668, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 9]}], "orig": "230K 280K", "text": "230K 280K"}, {"self_ref": "#/texts/472", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 357.3768, "t": 706.00293, "r": 367.39523, "b": 701.04572, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "65K", "text": "65K"}, {"self_ref": "#/texts/473", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 333.73151, "t": 650.37677, "r": 374.92862, "b": 645.41956, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 14]}], "orig": "Complex Simple", "text": "Complex Simple"}, {"self_ref": "#/texts/474", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 345.69101, "t": 705.94409, "r": 355.70944, "b": 700.9868799999999, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "47K", "text": "47K"}, {"self_ref": "#/texts/475", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 508.54248, "t": 650.62317, "r": 526.00592, "b": 645.66595, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Simple", "text": "Simple"}, {"self_ref": "#/texts/476", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 510.44653000000005, "t": 705.9074100000001, "r": 523.70703, "b": 700.9502, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "145K", "text": "145K"}, {"self_ref": "#/texts/477", "parent": {"cref": "#/groups/11"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 61.569000244140625, "t": 581.068603515625, "r": 286.3651123046875, "b": 560.20703125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 61]}], "orig": "\u00b7 TableFormer output does not include the table cell content.", "text": "\u00b7 TableFormer output does not include the table cell content.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/478", "parent": {"cref": "#/groups/11"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 61.569000244140625, "t": 547.9285888671875, "r": 286.3651428222656, "b": 527.0670166015625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 77]}], "orig": "\u00b7 There are occasional inaccuracies in the predictions of the bounding boxes.", "text": "\u00b7 There are occasional inaccuracies in the predictions of the bounding boxes.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/479", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 50.11199951171875, "t": 512.7965698242188, "r": 286.3651123046875, "b": 396.2931213378906, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 545]}], "orig": "However, it is possible to mitigate those limitations by combining the TableFormer predictions with the information already present inside a programmatic PDF document. More specifically, PDF documents can be seen as a sequence of PDF cells where each cell is described by its content and bounding box. If we are able to associate the PDF cells with the predicted table cells, we can directly link the PDF cell content to the table cell structure and use the PDF bounding boxes to correct misalignments in the predicted table cell bounding boxes.", "text": "However, it is possible to mitigate those limitations by combining the TableFormer predictions with the information already present inside a programmatic PDF document. More specifically, PDF documents can be seen as a sequence of PDF cells where each cell is described by its content and bounding box. If we are able to associate the PDF cells with the predicted table cells, we can directly link the PDF cell content to the table cell structure and use the PDF bounding boxes to correct misalignments in the predicted table cell bounding boxes."}, {"self_ref": "#/texts/480", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 50.11199951171875, "t": 392.9306640625, "r": 286.3649597167969, "b": 372.068115234375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 68]}], "orig": "Here is a step-by-step description of the prediction postprocessing:", "text": "Here is a step-by-step description of the prediction postprocessing:"}, {"self_ref": "#/texts/481", "parent": {"cref": "#/groups/12"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 50.11199951171875, "t": 368.7046813964844, "r": 286.3650817871094, "b": 335.8881530761719, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 173]}], "orig": "1. Get the minimal grid dimensions - number of rows and columns for the predicted table structure. This represents the most granular grid for the underlying table structure.", "text": "1. Get the minimal grid dimensions - number of rows and columns for the predicted table structure. This represents the most granular grid for the underlying table structure.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/482", "parent": {"cref": "#/groups/12"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 50.11199951171875, "t": 332.52471923828125, "r": 286.36505126953125, "b": 287.7532043457031, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 187]}], "orig": "2. Generate pair-wise matches between the bounding boxes of the PDF cells and the predicted cells. The Intersection Over Union (IOU) metric is used to evaluate the quality of the matches.", "text": "2. Generate pair-wise matches between the bounding boxes of the PDF cells and the predicted cells. The Intersection Over Union (IOU) metric is used to evaluate the quality of the matches.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/483", "parent": {"cref": "#/groups/12"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 50.11199951171875, "t": 284.3897705078125, "r": 286.36492919921875, "b": 263.5272216796875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 97]}], "orig": "3. Use a carefully selected IOU threshold to designate the matches as \"good\" ones and \"bad\" ones.", "text": "3. Use a carefully selected IOU threshold to designate the matches as \"good\" ones and \"bad\" ones.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/484", "parent": {"cref": "#/groups/12"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 50.11199951171875, "t": 260.164794921875, "r": 286.3651123046875, "b": 227.34722900390625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 131]}], "orig": "3.a. If all IOU scores in a column are below the threshold, discard all predictions (structure and bounding boxes) for that column.", "text": "3.a. If all IOU scores in a column are below the threshold, discard all predictions (structure and bounding boxes) for that column.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/485", "parent": {"cref": "#/groups/12"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 50.11199951171875, "t": 223.98377990722656, "r": 286.3650817871094, "b": 191.16722106933594, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 169]}], "orig": "4. Find the best-fitting content alignment for the predicted cells with good IOU per each column. The alignment of the column can be identified by the following formula:", "text": "4. Find the best-fitting content alignment for the predicted cells with good IOU per each column. The alignment of the column can be identified by the following formula:", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/486", "parent": {"cref": "#/body"}, "children": [], "label": "formula", "prov": [{"page_no": 12, "bbox": {"l": 110.70498657226562, "t": 168.5640869140625, "r": 286.3623962402344, "b": 137.89439392089844, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 81]}], "orig": "alignment = arg min c { D$_{c}$ } D$_{c}$ = max { x$_{c}$ } - min { x$_{c}$ } (4)", "text": "alignment = arg min c { D$_{c}$ } D$_{c}$ = max { x$_{c}$ } - min { x$_{c}$ } (4)"}, {"self_ref": "#/texts/487", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 50.11199951171875, "t": 124.6520767211914, "r": 286.36199951171875, "b": 103.07321166992188, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 103]}], "orig": "where c is one of { left, centroid, right } and x$_{c}$ is the xcoordinate for the corresponding point.", "text": "where c is one of { left, centroid, right } and x$_{c}$ is the xcoordinate for the corresponding point."}, {"self_ref": "#/texts/488", "parent": {"cref": "#/groups/13"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 50.11199951171875, "t": 99.70977783203125, "r": 286.3649597167969, "b": 78.84821319580078, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 110]}], "orig": "5. Use the alignment computed in step 4, to compute the median x -coordinate for all table columns and the me-", "text": "5. Use the alignment computed in step 4, to compute the median x -coordinate for all table columns and the me-", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/489", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 308.86199951171875, "t": 581.0687866210938, "r": 545.1151733398438, "b": 536.2962036132812, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 183]}], "orig": "dian cell size for all table cells. The usage of median during the computations, helps to eliminate outliers caused by occasional column spans which are usually wider than the normal.", "text": "dian cell size for all table cells. The usage of median during the computations, helps to eliminate outliers caused by occasional column spans which are usually wider than the normal."}, {"self_ref": "#/texts/490", "parent": {"cref": "#/groups/14"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 308.86199951171875, "t": 532.8977661132812, "r": 545.114990234375, "b": 512.0361938476562, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 91]}], "orig": "6. Snap all cells with bad IOU to their corresponding median x -coordinates and cell sizes.", "text": "6. Snap all cells with bad IOU to their corresponding median x -coordinates and cell sizes.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/491", "parent": {"cref": "#/groups/14"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 308.8620300292969, "t": 508.6367492675781, "r": 545.1151123046875, "b": 404.08929443359375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 471]}], "orig": "7. Generate a new set of pair-wise matches between the corrected bounding boxes and PDF cells. This time use a modified version of the IOU metric, where the area of the intersection between the predicted and PDF cells is divided by the PDF cell area. In case there are multiple matches for the same PDF cell, the prediction with the higher score is preferred. This covers the cases where the PDF cells are smaller than the area of predicted or corrected prediction cells.", "text": "7. Generate a new set of pair-wise matches between the corrected bounding boxes and PDF cells. This time use a modified version of the IOU metric, where the area of the intersection between the predicted and PDF cells is divided by the PDF cell area. In case there are multiple matches for the same PDF cell, the prediction with the higher score is preferred. This covers the cases where the PDF cells are smaller than the area of predicted or corrected prediction cells.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/492", "parent": {"cref": "#/groups/14"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 308.8620300292969, "t": 400.6898498535156, "r": 545.1151733398438, "b": 332.00836181640625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 311]}], "orig": "8. In some rare occasions, we have noticed that TableFormer can confuse a single column as two. When the postprocessing steps are applied, this results with two predicted columns pointing to the same PDF column. In such case we must de-duplicate the columns according to highest total column intersection score.", "text": "8. In some rare occasions, we have noticed that TableFormer can confuse a single column as two. When the postprocessing steps are applied, this results with two predicted columns pointing to the same PDF column. In such case we must de-duplicate the columns according to highest total column intersection score.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/493", "parent": {"cref": "#/groups/14"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 308.8620300292969, "t": 328.6089172363281, "r": 545.1151733398438, "b": 224.06141662597656, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 503]}], "orig": "9. Pick up the remaining orphan cells. There could be cases, when after applying all the previous post-processing steps, some PDF cells could still remain without any match to predicted cells. However, it is still possible to deduce the correct matching for an orphan PDF cell by mapping its bounding box on the geometry of the grid. This mapping decides if the content of the orphan cell will be appended to an already matched table cell, or a new table cell should be created to match with the orphan.", "text": "9. Pick up the remaining orphan cells. There could be cases, when after applying all the previous post-processing steps, some PDF cells could still remain without any match to predicted cells. However, it is still possible to deduce the correct matching for an orphan PDF cell by mapping its bounding box on the geometry of the grid. This mapping decides if the content of the orphan cell will be appended to an already matched table cell, or a new table cell should be created to match with the orphan.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/494", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 308.8620300292969, "t": 220.66197204589844, "r": 545.1168823242188, "b": 187.8454132080078, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 113]}], "orig": "9a. Compute the top and bottom boundary of the horizontal band for each grid row (min/max y coordinates per row).", "text": "9a. Compute the top and bottom boundary of the horizontal band for each grid row (min/max y coordinates per row)."}, {"self_ref": "#/texts/495", "parent": {"cref": "#/groups/15"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 308.862060546875, "t": 184.44696044921875, "r": 545.1150512695312, "b": 163.58441162109375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 101]}], "orig": "9b. Intersect the orphan's bounding box with the row bands, and map the cell to the closest grid row.", "text": "9b. Intersect the orphan's bounding box with the row bands, and map the cell to the closest grid row.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/496", "parent": {"cref": "#/groups/15"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 308.862060546875, "t": 160.18597412109375, "r": 545.1150512695312, "b": 127.3694076538086, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 117]}], "orig": "9c. Compute the left and right boundary of the vertical band for each grid column (min/max x coordinates per column).", "text": "9c. Compute the left and right boundary of the vertical band for each grid column (min/max x coordinates per column).", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/497", "parent": {"cref": "#/groups/15"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 308.862060546875, "t": 123.969970703125, "r": 545.114990234375, "b": 103.10841369628906, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 107]}], "orig": "9d. Intersect the orphan's bounding box with the column bands, and map the cell to the closest grid column.", "text": "9d. Intersect the orphan's bounding box with the column bands, and map the cell to the closest grid column.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/498", "parent": {"cref": "#/groups/15"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 308.862060546875, "t": 99.70997619628906, "r": 545.1151733398438, "b": 78.84840393066406, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 118]}], "orig": "9e. If the table cell under the identified row and column is not empty, extend its content with the content of the or-", "text": "9e. If the table cell under the identified row and column is not empty, extend its content with the content of the or-", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/499", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 12, "bbox": {"l": 292.6310729980469, "t": 57.86697006225586, "r": 302.5936584472656, "b": 48.96040725708008, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "12", "text": "12"}, {"self_ref": "#/texts/500", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 13, "bbox": {"l": 50.11199951171875, "t": 716.7916259765625, "r": 88.84658813476562, "b": 707.8850708007812, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 10]}], "orig": "phan cell.", "text": "phan cell."}, {"self_ref": "#/texts/501", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 13, "bbox": {"l": 50.11199951171875, "t": 704.8366088867188, "r": 286.3649597167969, "b": 683.9750366210938, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 76]}], "orig": "9f. Otherwise create a new structural cell and match it wit the orphan cell.", "text": "9f. Otherwise create a new structural cell and match it wit the orphan cell."}, {"self_ref": "#/texts/502", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 13, "bbox": {"l": 50.11199951171875, "t": 680.8369140625, "r": 286.364990234375, "b": 660.2941284179688, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 97]}], "orig": "Aditional images with examples of TableFormer predictions and post-processing can be found below.", "text": "Aditional images with examples of TableFormer predictions and post-processing can be found below."}, {"self_ref": "#/texts/503", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 13, "bbox": {"l": 63.340999603271484, "t": 289.9436340332031, "r": 273.1334228515625, "b": 281.0370788574219, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 52]}], "orig": "Figure 8: Example of a table with multi-line header.", "text": "Figure 8: Example of a table with multi-line header."}, {"self_ref": "#/texts/504", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 13, "bbox": {"l": 292.6309814453125, "t": 57.866641998291016, "r": 302.59356689453125, "b": 48.960079193115234, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "13", "text": "13"}, {"self_ref": "#/texts/505", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 13, "bbox": {"l": 308.86199951171875, "t": 485.4016418457031, "r": 545.1151123046875, "b": 464.54010009765625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 67]}], "orig": "Figure 9: Example of a table with big empty distance between cells.", "text": "Figure 9: Example of a table with big empty distance between cells."}, {"self_ref": "#/texts/506", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 13, "bbox": {"l": 312.3429870605469, "t": 111.50663757324219, "r": 541.63232421875, "b": 102.60006713867188, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 55]}], "orig": "Figure 10: Example of a complex table with empty cells.", "text": "Figure 10: Example of a complex table with empty cells."}, {"self_ref": "#/texts/507", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 14, "bbox": {"l": 50.11199951171875, "t": 435.2296447753906, "r": 286.3650817871094, "b": 414.36810302734375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 61]}], "orig": "Figure 11: Simple table with different style and empty cells.", "text": "Figure 11: Simple table with different style and empty cells."}, {"self_ref": "#/texts/508", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 14, "bbox": {"l": 54.61899948120117, "t": 120.181640625, "r": 281.85589599609375, "b": 111.27507781982422, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 56]}], "orig": "Figure 12: Simple table predictions and post processing.", "text": "Figure 12: Simple table predictions and post processing."}, {"self_ref": "#/texts/509", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 14, "bbox": {"l": 292.6309814453125, "t": 57.86663818359375, "r": 302.59356689453125, "b": 48.96007537841797, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "14", "text": "14"}, {"self_ref": "#/texts/510", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 14, "bbox": {"l": 315.7900085449219, "t": 420.3156433105469, "r": 538.1852416992188, "b": 411.4090881347656, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 55]}], "orig": "Figure 13: Table predictions example on colorful table.", "text": "Figure 13: Table predictions example on colorful table."}, {"self_ref": "#/texts/511", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 14, "bbox": {"l": 344.9849853515625, "t": 108.45364379882812, "r": 508.9893493652344, "b": 99.54707336425781, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 40]}], "orig": "Figure 14: Example with multi-line text.", "text": "Figure 14: Example with multi-line text."}, {"self_ref": "#/texts/512", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 15, "bbox": {"l": 84.23300170898438, "t": 147.64862060546875, "r": 252.24224853515625, "b": 138.7420654296875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 41]}], "orig": "Figure 15: Example with triangular table.", "text": "Figure 15: Example with triangular table."}, {"self_ref": "#/texts/513", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 15, "bbox": {"l": 292.6309814453125, "t": 57.86665725708008, "r": 302.59356689453125, "b": 48.9600944519043, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "15", "text": "15"}, {"self_ref": "#/texts/514", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 15, "bbox": {"l": 308.8619689941406, "t": 139.0646514892578, "r": 545.1151123046875, "b": 118.20308685302734, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 106]}], "orig": "Figure 16: Example of how post-processing helps to restore mis-aligned bounding boxes prediction artifact.", "text": "Figure 16: Example of how post-processing helps to restore mis-aligned bounding boxes prediction artifact."}, {"self_ref": "#/texts/515", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 16, "bbox": {"l": 50.11199951171875, "t": 283.6626281738281, "r": 545.1138305664062, "b": 262.80108642578125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 153]}], "orig": "Figure 17: Example of long table. End-to-end example from initial PDF cells to prediction of bounding boxes, post processing and prediction of structure.", "text": "Figure 17: Example of long table. End-to-end example from initial PDF cells to prediction of bounding boxes, post processing and prediction of structure."}, {"self_ref": "#/texts/516", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 16, "bbox": {"l": 292.6309814453125, "t": 57.866641998291016, "r": 302.59356689453125, "b": 48.960079193115234, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "16", "text": "16"}], "pictures": [{"self_ref": "#/pictures/0", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/8"}, {"cref": "#/texts/9"}, {"cref": "#/texts/10"}], "label": "picture", "prov": [{"page_no": 1, "bbox": {"l": 315.65362548828125, "t": 563.276611328125, "r": 537.1475219726562, "b": 489.1985778808594, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/1", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/13"}, {"cref": "#/texts/14"}, {"cref": "#/texts/15"}, {"cref": "#/texts/16"}, {"cref": "#/texts/17"}, {"cref": "#/texts/18"}, {"cref": "#/texts/19"}, {"cref": "#/texts/20"}, {"cref": "#/texts/21"}, {"cref": "#/texts/22"}, {"cref": "#/texts/23"}, {"cref": "#/texts/24"}, {"cref": "#/texts/25"}, {"cref": "#/texts/26"}, {"cref": "#/texts/27"}, {"cref": "#/texts/28"}, {"cref": "#/texts/29"}, {"cref": "#/texts/30"}, {"cref": "#/texts/31"}, {"cref": "#/texts/32"}, {"cref": "#/texts/33"}, {"cref": "#/texts/34"}, {"cref": "#/texts/35"}, {"cref": "#/texts/36"}, {"cref": "#/texts/37"}], "label": "picture", "prov": [{"page_no": 1, "bbox": {"l": 314.78173828125, "t": 453.9347229003906, "r": 539.1802978515625, "b": 381.9505615234375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/2", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/39"}, {"cref": "#/texts/40"}, {"cref": "#/texts/41"}, {"cref": "#/texts/42"}, {"cref": "#/texts/43"}, {"cref": "#/texts/44"}, {"cref": "#/texts/45"}, {"cref": "#/texts/46"}, {"cref": "#/texts/47"}, {"cref": "#/texts/48"}, {"cref": "#/texts/49"}, {"cref": "#/texts/50"}, {"cref": "#/texts/51"}, {"cref": "#/texts/52"}, {"cref": "#/texts/53"}, {"cref": "#/texts/54"}, {"cref": "#/texts/55"}, {"cref": "#/texts/56"}, {"cref": "#/texts/57"}, {"cref": "#/texts/58"}, {"cref": "#/texts/59"}, {"cref": "#/texts/60"}, {"cref": "#/texts/61"}, {"cref": "#/texts/62"}], "label": "picture", "prov": [{"page_no": 1, "bbox": {"l": 315.7172546386719, "t": 358.176513671875, "r": 536.835693359375, "b": 295.9709777832031, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/3", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/91"}, {"cref": "#/texts/92"}, {"cref": "#/texts/93"}, {"cref": "#/texts/94"}, {"cref": "#/texts/95"}, {"cref": "#/texts/96"}, {"cref": "#/texts/97"}, {"cref": "#/texts/98"}, {"cref": "#/texts/99"}, {"cref": "#/texts/100"}, {"cref": "#/texts/101"}, {"cref": "#/texts/102"}, {"cref": "#/texts/103"}, {"cref": "#/texts/104"}, {"cref": "#/texts/105"}, {"cref": "#/texts/106"}, {"cref": "#/texts/107"}, {"cref": "#/texts/108"}, {"cref": "#/texts/109"}, {"cref": "#/texts/110"}, {"cref": "#/texts/111"}, {"cref": "#/texts/112"}, {"cref": "#/texts/113"}, {"cref": "#/texts/114"}, {"cref": "#/texts/115"}, {"cref": "#/texts/116"}, {"cref": "#/texts/117"}, {"cref": "#/texts/118"}, {"cref": "#/texts/119"}, {"cref": "#/texts/120"}, {"cref": "#/texts/121"}, {"cref": "#/texts/122"}, {"cref": "#/texts/123"}], "label": "picture", "prov": [{"page_no": 3, "bbox": {"l": 312.10369873046875, "t": 713.5591430664062, "r": 550.38916015625, "b": 541.39013671875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 104]}], "captions": [{"cref": "#/texts/90"}], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/4", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/142"}, {"cref": "#/texts/143"}, {"cref": "#/texts/144"}, {"cref": "#/texts/145"}, {"cref": "#/texts/146"}, {"cref": "#/texts/147"}, {"cref": "#/texts/148"}, {"cref": "#/texts/149"}, {"cref": "#/texts/150"}, {"cref": "#/texts/151"}, {"cref": "#/texts/152"}, {"cref": "#/texts/153"}, {"cref": "#/texts/154"}, {"cref": "#/texts/155"}, {"cref": "#/texts/156"}, {"cref": "#/texts/157"}, {"cref": "#/texts/158"}, {"cref": "#/texts/159"}, {"cref": "#/texts/160"}, {"cref": "#/texts/161"}, {"cref": "#/texts/162"}, {"cref": "#/texts/163"}, {"cref": "#/texts/164"}, {"cref": "#/texts/165"}, {"cref": "#/texts/166"}, {"cref": "#/texts/167"}, {"cref": "#/texts/168"}, {"cref": "#/texts/169"}, {"cref": "#/texts/170"}, {"cref": "#/texts/171"}, {"cref": "#/texts/172"}, {"cref": "#/texts/173"}, {"cref": "#/texts/174"}, {"cref": "#/texts/175"}, {"cref": "#/texts/176"}, {"cref": "#/texts/177"}, {"cref": "#/texts/178"}, {"cref": "#/texts/179"}, {"cref": "#/texts/180"}, {"cref": "#/texts/181"}, {"cref": "#/texts/182"}, {"cref": "#/texts/183"}, {"cref": "#/texts/184"}, {"cref": "#/texts/185"}, {"cref": "#/texts/186"}, {"cref": "#/texts/187"}, {"cref": "#/texts/188"}, {"cref": "#/texts/189"}, {"cref": "#/texts/190"}, {"cref": "#/texts/191"}, {"cref": "#/texts/192"}, {"cref": "#/texts/193"}, {"cref": "#/texts/194"}, {"cref": "#/texts/195"}, {"cref": "#/texts/196"}, {"cref": "#/texts/197"}, {"cref": "#/texts/198"}, {"cref": "#/texts/199"}, {"cref": "#/texts/200"}], "label": "picture", "prov": [{"page_no": 5, "bbox": {"l": 74.30525970458984, "t": 714.0888061523438, "r": 519.9801025390625, "b": 608.2984619140625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 212]}], "captions": [{"cref": "#/texts/141"}], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/5", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/202"}, {"cref": "#/texts/203"}, {"cref": "#/texts/204"}, {"cref": "#/texts/205"}, {"cref": "#/texts/206"}, {"cref": "#/texts/207"}, {"cref": "#/texts/208"}, {"cref": "#/texts/209"}, {"cref": "#/texts/210"}, {"cref": "#/texts/211"}, {"cref": "#/texts/212"}, {"cref": "#/texts/213"}, {"cref": "#/texts/214"}, {"cref": "#/texts/215"}, {"cref": "#/texts/216"}, {"cref": "#/texts/217"}, {"cref": "#/texts/218"}, {"cref": "#/texts/219"}, {"cref": "#/texts/220"}, {"cref": "#/texts/221"}, {"cref": "#/texts/222"}, {"cref": "#/texts/223"}, {"cref": "#/texts/224"}, {"cref": "#/texts/225"}, {"cref": "#/texts/226"}, {"cref": "#/texts/227"}, {"cref": "#/texts/228"}, {"cref": "#/texts/229"}, {"cref": "#/texts/230"}, {"cref": "#/texts/231"}, {"cref": "#/texts/232"}, {"cref": "#/texts/233"}, {"cref": "#/texts/234"}, {"cref": "#/texts/235"}, {"cref": "#/texts/236"}, {"cref": "#/texts/237"}, {"cref": "#/texts/238"}, {"cref": "#/texts/239"}, {"cref": "#/texts/240"}, {"cref": "#/texts/241"}, {"cref": "#/texts/242"}, {"cref": "#/texts/243"}, {"cref": "#/texts/244"}, {"cref": "#/texts/245"}], "label": "picture", "prov": [{"page_no": 5, "bbox": {"l": 53.03328323364258, "t": 534.3346557617188, "r": 285.3731689453125, "b": 284.3311462402344, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 745]}], "captions": [{"cref": "#/texts/201"}], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/6", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 8, "bbox": {"l": 49.97503662109375, "t": 688.287353515625, "r": 301.6335754394531, "b": 604.4210815429688, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/7", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 8, "bbox": {"l": 305.5836486816406, "t": 693.3458251953125, "r": 554.8258666992188, "b": 611.3732299804688, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 79]}], "captions": [{"cref": "#/texts/289"}], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/8", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/292"}], "label": "picture", "prov": [{"page_no": 8, "bbox": {"l": 51.736167907714844, "t": 411.51934814453125, "r": 211.83778381347656, "b": 348.3419189453125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 397]}], "captions": [{"cref": "#/texts/291"}], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/9", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/293"}, {"cref": "#/texts/294"}, {"cref": "#/texts/295"}, {"cref": "#/texts/296"}, {"cref": "#/texts/297"}, {"cref": "#/texts/298"}, {"cref": "#/texts/299"}, {"cref": "#/texts/300"}, {"cref": "#/texts/301"}, {"cref": "#/texts/302"}, {"cref": "#/texts/303"}, {"cref": "#/texts/304"}, {"cref": "#/texts/305"}, {"cref": "#/texts/306"}, {"cref": "#/texts/307"}, {"cref": "#/texts/308"}, {"cref": "#/texts/309"}, {"cref": "#/texts/310"}, {"cref": "#/texts/311"}, {"cref": "#/texts/312"}, {"cref": "#/texts/313"}, {"cref": "#/texts/314"}, {"cref": "#/texts/315"}, {"cref": "#/texts/316"}, {"cref": "#/texts/317"}, {"cref": "#/texts/318"}, {"cref": "#/texts/319"}, {"cref": "#/texts/320"}, {"cref": "#/texts/321"}, {"cref": "#/texts/322"}, {"cref": "#/texts/323"}, {"cref": "#/texts/324"}, {"cref": "#/texts/325"}, {"cref": "#/texts/326"}, {"cref": "#/texts/327"}, {"cref": "#/texts/328"}, {"cref": "#/texts/329"}, {"cref": "#/texts/330"}, {"cref": "#/texts/331"}, {"cref": "#/texts/332"}, {"cref": "#/texts/333"}, {"cref": "#/texts/334"}, {"cref": "#/texts/335"}, {"cref": "#/texts/336"}, {"cref": "#/texts/337"}, {"cref": "#/texts/338"}, {"cref": "#/texts/339"}, {"cref": "#/texts/340"}, {"cref": "#/texts/341"}, {"cref": "#/texts/342"}, {"cref": "#/texts/343"}, {"cref": "#/texts/344"}, {"cref": "#/texts/345"}, {"cref": "#/texts/346"}, {"cref": "#/texts/347"}], "label": "picture", "prov": [{"page_no": 8, "bbox": {"l": 383.1364440917969, "t": 410.7686767578125, "r": 542.1132202148438, "b": 349.2250671386719, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/10", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/349"}], "label": "picture", "prov": [{"page_no": 8, "bbox": {"l": 216.76925659179688, "t": 411.5093688964844, "r": 375.7829284667969, "b": 348.65301513671875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 112]}], "captions": [{"cref": "#/texts/348"}], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/11", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/418"}, {"cref": "#/texts/419"}, {"cref": "#/texts/420"}, {"cref": "#/texts/421"}, {"cref": "#/texts/422"}, {"cref": "#/texts/423"}, {"cref": "#/texts/424"}, {"cref": "#/texts/425"}, {"cref": "#/texts/426"}, {"cref": "#/texts/427"}, {"cref": "#/texts/428"}, {"cref": "#/texts/429"}, {"cref": "#/texts/430"}, {"cref": "#/texts/431"}, {"cref": "#/texts/432"}, {"cref": "#/texts/433"}, {"cref": "#/texts/434"}, {"cref": "#/texts/435"}, {"cref": "#/texts/436"}, {"cref": "#/texts/437"}, {"cref": "#/texts/438"}, {"cref": "#/texts/439"}, {"cref": "#/texts/440"}, {"cref": "#/texts/441"}, {"cref": "#/texts/442"}, {"cref": "#/texts/443"}, {"cref": "#/texts/444"}, {"cref": "#/texts/445"}, {"cref": "#/texts/446"}, {"cref": "#/texts/447"}, {"cref": "#/texts/448"}, {"cref": "#/texts/449"}, {"cref": "#/texts/450"}, {"cref": "#/texts/451"}, {"cref": "#/texts/452"}, {"cref": "#/texts/453"}, {"cref": "#/texts/454"}, {"cref": "#/texts/455"}, {"cref": "#/texts/456"}, {"cref": "#/texts/457"}, {"cref": "#/texts/458"}, {"cref": "#/texts/459"}, {"cref": "#/texts/460"}, {"cref": "#/texts/461"}, {"cref": "#/texts/462"}, {"cref": "#/texts/463"}, {"cref": "#/texts/464"}, {"cref": "#/texts/465"}, {"cref": "#/texts/466"}, {"cref": "#/texts/467"}, {"cref": "#/texts/468"}, {"cref": "#/texts/469"}, {"cref": "#/texts/470"}, {"cref": "#/texts/471"}, {"cref": "#/texts/472"}, {"cref": "#/texts/473"}, {"cref": "#/texts/474"}, {"cref": "#/texts/475"}, {"cref": "#/texts/476"}], "label": "picture", "prov": [{"page_no": 12, "bbox": {"l": 53.54227066040039, "t": 717.25146484375, "r": 544.938232421875, "b": 644.4090576171875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 245]}], "captions": [{"cref": "#/texts/417"}], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/12", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 13, "bbox": {"l": 309.79150390625, "t": 538.0946044921875, "r": 425.9603271484375, "b": 499.60601806640625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/13", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 13, "bbox": {"l": 333.9573669433594, "t": 198.8865966796875, "r": 518.4768676757812, "b": 126.5096435546875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/14", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 14, "bbox": {"l": 51.15378952026367, "t": 687.6914672851562, "r": 282.8598937988281, "b": 447.09332275390625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 61]}], "captions": [{"cref": "#/texts/507"}], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/15", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 14, "bbox": {"l": 50.40477752685547, "t": 180.99615478515625, "r": 177.0564422607422, "b": 135.83905029296875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 56]}], "captions": [{"cref": "#/texts/508"}], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/16", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 14, "bbox": {"l": 318.6332092285156, "t": 701.1157836914062, "r": 534.73583984375, "b": 432.9424133300781, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 55]}], "captions": [{"cref": "#/texts/510"}], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/17", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 15, "bbox": {"l": 55.116363525390625, "t": 655.7449951171875, "r": 279.370849609375, "b": 542.6654663085938, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/18", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 15, "bbox": {"l": 54.28135299682617, "t": 531.7384033203125, "r": 279.2568359375, "b": 418.4729309082031, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/19", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 15, "bbox": {"l": 55.423954010009766, "t": 407.4449462890625, "r": 280.2310791015625, "b": 294.436279296875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/20", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 15, "bbox": {"l": 50.64818572998047, "t": 286.01953125, "r": 319.9103088378906, "b": 160.736328125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/21", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 15, "bbox": {"l": 323.46868896484375, "t": 429.5491638183594, "r": 525.9569091796875, "b": 327.739501953125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/22", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 15, "bbox": {"l": 353.6920471191406, "t": 304.594970703125, "r": 495.4288024902344, "b": 156.22674560546875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/23", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 16, "bbox": {"l": 66.79948425292969, "t": 538.3836669921875, "r": 528.5565795898438, "b": 293.8616027832031, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 153]}], "captions": [{"cref": "#/texts/515"}], "references": [], "footnotes": [], "image": null, "annotations": []}], "tables": [{"self_ref": "#/tables/0", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 1, "bbox": {"l": 315.65362548828125, "t": 563.276611328125, "r": 537.1475219726562, "b": 489.1985778808594, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/11"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [{"bbox": {"l": 384.03289794921875, "t": 539.321044921875, "r": 390.0376892089844, "b": 529.1906127929688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "3", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 451.9457092285156, "t": 556.6529541015625, "r": 457.95050048828125, "b": 546.5225219726562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "1", "column_header": true, "row_header": false, "row_section": false}], "num_rows": 1, "num_cols": 2, "grid": [[{"bbox": {"l": 384.03289794921875, "t": 539.321044921875, "r": 390.0376892089844, "b": 529.1906127929688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "3", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 451.9457092285156, "t": 556.6529541015625, "r": 457.95050048828125, "b": 546.5225219726562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "1", "column_header": true, "row_header": false, "row_section": false}]]}}, {"self_ref": "#/tables/1", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 1, "bbox": {"l": 315.7172546386719, "t": 358.176513671875, "r": 536.835693359375, "b": 295.9709777832031, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/63"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [{"bbox": {"l": 318.8807067871094, "t": 354.3141174316406, "r": 323.273193359375, "b": 345.5291748046875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "0", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24871826171875, "t": 354.3141174316406, "r": 351.6412048339844, "b": 345.5291748046875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 3, "text": "1", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 394.1042175292969, "t": 354.4064025878906, "r": 465.8810119628906, "b": 344.2760009765625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 5, "text": "2 1", "column_header": true, "row_header": false, "row_section": false}, {"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 318.7731628417969, "t": 342.4544982910156, "r": 323.1656494140625, "b": 333.6695556640625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24871826171875, "t": 342.4544982910156, "r": 351.6412048339844, "b": 333.6695556640625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 366.7010192871094, "t": 342.8791809082031, "r": 398.4967041015625, "b": 332.748779296875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "5 3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941162109375, "t": 342.4544982910156, "r": 445.3518981933594, "b": 333.6695556640625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.8149108886719, "t": 342.4544982910156, "r": 492.2073974609375, "b": 333.6695556640625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 318.7731628417969, "t": 318.2957458496094, "r": 323.1656494140625, "b": 309.51080322265625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24871826171875, "t": 330.1553955078125, "r": 351.6412048339844, "b": 321.3704528808594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.1042175292969, "t": 330.1553955078125, "r": 402.8883056640625, "b": 321.3704528808594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "10", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941162109375, "t": 330.1553955078125, "r": 449.4228515625, "b": 321.3704528808594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "11", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.8149108886719, "t": 330.1553955078125, "r": 496.5989990234375, "b": 321.3704528808594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "12", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 331.90423583984375, "t": 318.6770935058594, "r": 337.9090270996094, "b": 308.54669189453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 3, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 5, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24871826171875, "t": 318.2957458496094, "r": 356.0328063964844, "b": 309.51080322265625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "13", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.1042175292969, "t": 318.2957458496094, "r": 402.8883056640625, "b": 309.51080322265625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "14", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941162109375, "t": 318.2957458496094, "r": 449.7434997558594, "b": 309.51080322265625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "15", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.8149108886719, "t": 318.2957458496094, "r": 496.5989990234375, "b": 309.51080322265625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "16", "column_header": false, "row_header": false, "row_section": false}, {"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24871826171875, "t": 306.87530517578125, "r": 356.0328063964844, "b": 298.0903625488281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "17", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.1042175292969, "t": 306.87530517578125, "r": 402.8883056640625, "b": 298.0903625488281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "18", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941162109375, "t": 306.87530517578125, "r": 449.7434997558594, "b": 298.0903625488281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "19", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.8149108886719, "t": 306.87530517578125, "r": 496.5989990234375, "b": 298.0903625488281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "20", "column_header": false, "row_header": false, "row_section": false}], "num_rows": 5, "num_cols": 6, "grid": [[{"bbox": {"l": 318.8807067871094, "t": 354.3141174316406, "r": 323.273193359375, "b": 345.5291748046875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "0", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24871826171875, "t": 354.3141174316406, "r": 351.6412048339844, "b": 345.5291748046875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 3, "text": "1", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24871826171875, "t": 354.3141174316406, "r": 351.6412048339844, "b": 345.5291748046875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 3, "text": "1", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 394.1042175292969, "t": 354.4064025878906, "r": 465.8810119628906, "b": 344.2760009765625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 5, "text": "2 1", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 394.1042175292969, "t": 354.4064025878906, "r": 465.8810119628906, "b": 344.2760009765625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 5, "text": "2 1", "column_header": true, "row_header": false, "row_section": false}, {"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 318.7731628417969, "t": 342.4544982910156, "r": 323.1656494140625, "b": 333.6695556640625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24871826171875, "t": 342.4544982910156, "r": 351.6412048339844, "b": 333.6695556640625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 366.7010192871094, "t": 342.8791809082031, "r": 398.4967041015625, "b": 332.748779296875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "5 3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941162109375, "t": 342.4544982910156, "r": 445.3518981933594, "b": 333.6695556640625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.8149108886719, "t": 342.4544982910156, "r": 492.2073974609375, "b": 333.6695556640625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 318.7731628417969, "t": 318.2957458496094, "r": 323.1656494140625, "b": 309.51080322265625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24871826171875, "t": 330.1553955078125, "r": 351.6412048339844, "b": 321.3704528808594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.1042175292969, "t": 330.1553955078125, "r": 402.8883056640625, "b": 321.3704528808594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "10", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941162109375, "t": 330.1553955078125, "r": 449.4228515625, "b": 321.3704528808594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "11", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.8149108886719, "t": 330.1553955078125, "r": 496.5989990234375, "b": 321.3704528808594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "12", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 331.90423583984375, "t": 318.6770935058594, "r": 337.9090270996094, "b": 308.54669189453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 3, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 5, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "2", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24871826171875, "t": 318.2957458496094, "r": 356.0328063964844, "b": 309.51080322265625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "13", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.1042175292969, "t": 318.2957458496094, "r": 402.8883056640625, "b": 309.51080322265625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "14", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941162109375, "t": 318.2957458496094, "r": 449.7434997558594, "b": 309.51080322265625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "15", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.8149108886719, "t": 318.2957458496094, "r": 496.5989990234375, "b": 309.51080322265625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "16", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 331.90423583984375, "t": 318.6770935058594, "r": 337.9090270996094, "b": 308.54669189453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 3, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 5, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "2", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24871826171875, "t": 306.87530517578125, "r": 356.0328063964844, "b": 298.0903625488281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "17", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.1042175292969, "t": 306.87530517578125, "r": 402.8883056640625, "b": 298.0903625488281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "18", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941162109375, "t": 306.87530517578125, "r": 449.7434997558594, "b": 298.0903625488281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "19", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.8149108886719, "t": 306.87530517578125, "r": 496.5989990234375, "b": 298.0903625488281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "20", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 331.90423583984375, "t": 318.6770935058594, "r": 337.9090270996094, "b": 308.54669189453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 3, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 5, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "2", "column_header": false, "row_header": false, "row_section": false}]]}}, {"self_ref": "#/tables/2", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 4, "bbox": {"l": 310.67584228515625, "t": 718.8060913085938, "r": 542.9547119140625, "b": 636.7794799804688, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/133"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [{"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 412.3320007324219, "t": 718.3856201171875, "r": 430.9023132324219, "b": 709.4790649414062, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Tags", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 442.857421875, "t": 718.3856201171875, "r": 464.4463806152344, "b": 709.4790649414062, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "Bbox", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 477.78631591796875, "t": 718.3856201171875, "r": 494.9419250488281, "b": 709.4790649414062, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "Size", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 508.2818603515625, "t": 718.3856201171875, "r": 536.9143676757812, "b": 709.4790649414062, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "Format", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 317.05999755859375, "t": 706.0326538085938, "r": 361.64263916015625, "b": 697.1260986328125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "PubTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.8559875488281, "t": 706.33154296875, "r": 425.37774658203125, "b": 697.1161499023438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569091796875, "t": 706.33154296875, "r": 457.4174499511719, "b": 697.1161499023438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.4010009765625, "t": 706.0326538085938, "r": 496.3262023925781, "b": 697.1260986328125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "509k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.6349487304688, "t": 706.0326538085938, "r": 532.5601196289062, "b": 697.1260986328125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.05999755859375, "t": 694.07763671875, "r": 359.4309387207031, "b": 685.1710815429688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "FinTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.8559875488281, "t": 694.3765258789062, "r": 425.37774658203125, "b": 685.1611328125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569091796875, "t": 694.3765258789062, "r": 457.4174499511719, "b": 685.1611328125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.4010009765625, "t": 694.07763671875, "r": 496.3262023925781, "b": 685.1710815429688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "112k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 513.4618530273438, "t": 694.07763671875, "r": 531.7332763671875, "b": 685.1710815429688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PDF", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.05999755859375, "t": 682.1216430664062, "r": 359.9788818359375, "b": 673.215087890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableBank", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.8559875488281, "t": 682.4205322265625, "r": 425.37774658203125, "b": 673.2051391601562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 450.812255859375, "t": 682.4205322265625, "r": 456.50091552734375, "b": 673.2051391601562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.4010009765625, "t": 682.1216430664062, "r": 496.3262023925781, "b": 673.215087890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "145k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 511.25018310546875, "t": 682.1216430664062, "r": 533.9450073242188, "b": 673.215087890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "JPEG", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.05999755859375, "t": 670.1666259765625, "r": 400.3772277832031, "b": 661.2600708007812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Combined-Tabnet(*)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.8559875488281, "t": 670.4655151367188, "r": 425.37774658203125, "b": 661.2501220703125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569091796875, "t": 670.4655151367188, "r": 457.4174499511719, "b": 661.2501220703125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.4010009765625, "t": 670.1666259765625, "r": 496.3262023925781, "b": 661.2600708007812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "400k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.6349487304688, "t": 670.1666259765625, "r": 532.5601196289062, "b": 661.2600708007812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.05999755859375, "t": 658.2116088867188, "r": 375.1718444824219, "b": 649.3050537109375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Combined(**)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.8559875488281, "t": 658.510498046875, "r": 425.37774658203125, "b": 649.2951049804688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569091796875, "t": 658.510498046875, "r": 457.4174499511719, "b": 649.2951049804688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.4010009765625, "t": 658.2116088867188, "r": 496.3262023925781, "b": 649.3050537109375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "500k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.6349487304688, "t": 658.2116088867188, "r": 532.5601196289062, "b": 649.3050537109375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.05999755859375, "t": 646.256591796875, "r": 369.3935241699219, "b": 637.3500366210938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "SynthTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.8559875488281, "t": 646.5555419921875, "r": 425.37774658203125, "b": 637.3401489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569091796875, "t": 646.5555419921875, "r": 457.4174499511719, "b": 637.3401489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.4010009765625, "t": 646.2566528320312, "r": 496.3262023925781, "b": 637.35009765625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "600k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.6349487304688, "t": 646.2566528320312, "r": 532.5601196289062, "b": 637.35009765625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}], "num_rows": 7, "num_cols": 5, "grid": [[{"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 412.3320007324219, "t": 718.3856201171875, "r": 430.9023132324219, "b": 709.4790649414062, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Tags", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 442.857421875, "t": 718.3856201171875, "r": 464.4463806152344, "b": 709.4790649414062, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "Bbox", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 477.78631591796875, "t": 718.3856201171875, "r": 494.9419250488281, "b": 709.4790649414062, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "Size", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 508.2818603515625, "t": 718.3856201171875, "r": 536.9143676757812, "b": 709.4790649414062, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "Format", "column_header": true, "row_header": false, "row_section": false}], [{"bbox": {"l": 317.05999755859375, "t": 706.0326538085938, "r": 361.64263916015625, "b": 697.1260986328125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "PubTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.8559875488281, "t": 706.33154296875, "r": 425.37774658203125, "b": 697.1161499023438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569091796875, "t": 706.33154296875, "r": 457.4174499511719, "b": 697.1161499023438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.4010009765625, "t": 706.0326538085938, "r": 496.3262023925781, "b": 697.1260986328125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "509k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.6349487304688, "t": 706.0326538085938, "r": 532.5601196289062, "b": 697.1260986328125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 317.05999755859375, "t": 694.07763671875, "r": 359.4309387207031, "b": 685.1710815429688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "FinTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.8559875488281, "t": 694.3765258789062, "r": 425.37774658203125, "b": 685.1611328125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569091796875, "t": 694.3765258789062, "r": 457.4174499511719, "b": 685.1611328125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.4010009765625, "t": 694.07763671875, "r": 496.3262023925781, "b": 685.1710815429688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "112k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 513.4618530273438, "t": 694.07763671875, "r": 531.7332763671875, "b": 685.1710815429688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PDF", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 317.05999755859375, "t": 682.1216430664062, "r": 359.9788818359375, "b": 673.215087890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableBank", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.8559875488281, "t": 682.4205322265625, "r": 425.37774658203125, "b": 673.2051391601562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 450.812255859375, "t": 682.4205322265625, "r": 456.50091552734375, "b": 673.2051391601562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.4010009765625, "t": 682.1216430664062, "r": 496.3262023925781, "b": 673.215087890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "145k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 511.25018310546875, "t": 682.1216430664062, "r": 533.9450073242188, "b": 673.215087890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "JPEG", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 317.05999755859375, "t": 670.1666259765625, "r": 400.3772277832031, "b": 661.2600708007812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Combined-Tabnet(*)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.8559875488281, "t": 670.4655151367188, "r": 425.37774658203125, "b": 661.2501220703125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569091796875, "t": 670.4655151367188, "r": 457.4174499511719, "b": 661.2501220703125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.4010009765625, "t": 670.1666259765625, "r": 496.3262023925781, "b": 661.2600708007812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "400k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.6349487304688, "t": 670.1666259765625, "r": 532.5601196289062, "b": 661.2600708007812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 317.05999755859375, "t": 658.2116088867188, "r": 375.1718444824219, "b": 649.3050537109375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Combined(**)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.8559875488281, "t": 658.510498046875, "r": 425.37774658203125, "b": 649.2951049804688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569091796875, "t": 658.510498046875, "r": 457.4174499511719, "b": 649.2951049804688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.4010009765625, "t": 658.2116088867188, "r": 496.3262023925781, "b": 649.3050537109375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "500k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.6349487304688, "t": 658.2116088867188, "r": 532.5601196289062, "b": 649.3050537109375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 317.05999755859375, "t": 646.256591796875, "r": 369.3935241699219, "b": 637.3500366210938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "SynthTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.8559875488281, "t": 646.5555419921875, "r": 425.37774658203125, "b": 637.3401489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569091796875, "t": 646.5555419921875, "r": 457.4174499511719, "b": 637.3401489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.4010009765625, "t": 646.2566528320312, "r": 496.3262023925781, "b": 637.35009765625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "600k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.6349487304688, "t": 646.2566528320312, "r": 532.5601196289062, "b": 637.35009765625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}]]}}, {"self_ref": "#/tables/3", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 7, "bbox": {"l": 53.368526458740234, "t": 382.8642272949219, "r": 283.0443420410156, "b": 209.60223388671875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/277"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [{"bbox": {"l": 78.84300231933594, "t": 371.30963134765625, "r": 104.8553466796875, "b": 362.403076171875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Model", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 129.33799743652344, "t": 365.3326416015625, "r": 159.21583557128906, "b": 356.42608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Dataset", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 171.17095947265625, "t": 365.3326416015625, "r": 199.40496826171875, "b": 356.42608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "Simple", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 211.1999969482422, "t": 377.2876281738281, "r": 247.74349975585938, "b": 356.42608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "TEDS Complex", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 264.5404357910156, "t": 365.3326416015625, "r": 277.27264404296875, "b": 356.42608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "All", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 81.61199951171875, "t": 348.3756408691406, "r": 102.08513641357422, "b": 339.4690856933594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.87205505371094, "t": 348.3756408691406, "r": 153.69140625, "b": 339.4690856933594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.56553649902344, "t": 348.3756408691406, "r": 194.00009155273438, "b": 339.4690856933594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "91.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.82937622070312, "t": 348.3756408691406, "r": 238.26393127441406, "b": 339.4690856933594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "88.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.18414306640625, "t": 348.3756408691406, "r": 279.6186828613281, "b": 339.4690856933594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "89.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 82.16500091552734, "t": 336.4196472167969, "r": 101.53230285644531, "b": 327.5130920410156, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "GTE", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86715698242188, "t": 336.4196472167969, "r": 153.68650817871094, "b": 327.5130920410156, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.62411499023438, "t": 336.4196472167969, "r": 186.94166564941406, "b": 327.5130920410156, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.88795471191406, "t": 336.4196472167969, "r": 231.20550537109375, "b": 327.5130920410156, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.69854736328125, "t": 336.4196472167969, "r": 282.1144104003906, "b": 327.5130920410156, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "93.01", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 66.31500244140625, "t": 323.86663818359375, "r": 117.38329315185547, "b": 314.9600830078125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86766052246094, "t": 323.86663818359375, "r": 153.68701171875, "b": 314.9600830078125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57110595703125, "t": 323.86663818359375, "r": 194.0056610107422, "b": 314.9600830078125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "98.5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.83494567871094, "t": 323.86663818359375, "r": 238.26950073242188, "b": 314.9600830078125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "95.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.697998046875, "t": 323.9862060546875, "r": 282.1138610839844, "b": 315.0298156738281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "96.75", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 81.61199951171875, "t": 308.67364501953125, "r": 102.08513641357422, "b": 299.76708984375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.87205505371094, "t": 308.67364501953125, "r": 153.69140625, "b": 299.76708984375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.56553649902344, "t": 308.67364501953125, "r": 194.00009155273438, "b": 299.76708984375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "88.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 218.33871459960938, "t": 308.67364501953125, "r": 240.7545623779297, "b": 299.76708984375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "92.08", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.1841125488281, "t": 308.67364501953125, "r": 279.61865234375, "b": 299.76708984375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "90.6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 82.16500091552734, "t": 296.7186584472656, "r": 101.53230285644531, "b": 287.8121032714844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "GTE", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86715698242188, "t": 296.7186584472656, "r": 153.68650817871094, "b": 287.8121032714844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.62411499023438, "t": 296.7186584472656, "r": 186.94166564941406, "b": 287.8121032714844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.88795471191406, "t": 296.7186584472656, "r": 231.20550537109375, "b": 287.8121032714844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.69854736328125, "t": 296.7186584472656, "r": 282.1144104003906, "b": 287.8121032714844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "87.14", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 71.78900146484375, "t": 284.763671875, "r": 111.90838623046875, "b": 275.85711669921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "GTE (FT)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86221313476562, "t": 284.763671875, "r": 153.6815643310547, "b": 275.85711669921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.62913513183594, "t": 284.763671875, "r": 186.94668579101562, "b": 275.85711669921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.89297485351562, "t": 284.763671875, "r": 231.2105255126953, "b": 275.85711669921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.693603515625, "t": 284.763671875, "r": 282.1094665527344, "b": 275.85711669921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "91.02", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 66.31500244140625, "t": 272.8086853027344, "r": 117.38329315185547, "b": 263.9021301269531, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86766052246094, "t": 272.8086853027344, "r": 153.68701171875, "b": 263.9021301269531, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57110595703125, "t": 272.8086853027344, "r": 194.0056610107422, "b": 263.9021301269531, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "97.5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.83494567871094, "t": 272.8086853027344, "r": 238.26950073242188, "b": 263.9021301269531, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "96.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.1889953613281, "t": 272.9282531738281, "r": 279.62353515625, "b": 263.97186279296875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "96.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 81.61199951171875, "t": 255.5016326904297, "r": 102.08513641357422, "b": 246.59507751464844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 137.91064453125, "t": 255.5016326904297, "r": 150.64285278320312, "b": 246.59507751464844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "TB", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.56553649902344, "t": 255.5016326904297, "r": 194.00009155273438, "b": 246.59507751464844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "86.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.89285278320312, "t": 255.5016326904297, "r": 231.2104034423828, "b": 246.59507751464844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.1841125488281, "t": 255.5016326904297, "r": 279.61865234375, "b": 246.59507751464844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "86.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 66.31500244140625, "t": 243.54563903808594, "r": 117.38329315185547, "b": 234.6390838623047, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 137.90625, "t": 243.54563903808594, "r": 150.63845825195312, "b": 234.6390838623047, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "TB", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57110595703125, "t": 243.54563903808594, "r": 194.0056610107422, "b": 234.6390838623047, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "89.6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.88845825195312, "t": 243.54563903808594, "r": 231.2060089111328, "b": 234.6390838623047, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.1889953613281, "t": 243.66519165039062, "r": 279.62353515625, "b": 234.7088165283203, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "89.6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 66.31500244140625, "t": 223.9976348876953, "r": 117.38329315185547, "b": 215.09107971191406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86766052246094, "t": 223.9976348876953, "r": 153.68701171875, "b": 215.09107971191406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "STN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57110595703125, "t": 223.9976348876953, "r": 194.0056610107422, "b": 215.09107971191406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "96.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.83494567871094, "t": 223.9976348876953, "r": 238.26950073242188, "b": 215.09107971191406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "95.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.189697265625, "t": 223.9976348876953, "r": 279.6242370605469, "b": 215.09107971191406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "96.7", "column_header": false, "row_header": false, "row_section": false}], "num_rows": 11, "num_cols": 5, "grid": [[{"bbox": {"l": 78.84300231933594, "t": 371.30963134765625, "r": 104.8553466796875, "b": 362.403076171875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Model", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 129.33799743652344, "t": 365.3326416015625, "r": 159.21583557128906, "b": 356.42608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Dataset", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 171.17095947265625, "t": 365.3326416015625, "r": 199.40496826171875, "b": 356.42608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "Simple", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 211.1999969482422, "t": 377.2876281738281, "r": 247.74349975585938, "b": 356.42608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "TEDS Complex", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 264.5404357910156, "t": 365.3326416015625, "r": 277.27264404296875, "b": 356.42608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "All", "column_header": true, "row_header": false, "row_section": false}], [{"bbox": {"l": 81.61199951171875, "t": 348.3756408691406, "r": 102.08513641357422, "b": 339.4690856933594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.87205505371094, "t": 348.3756408691406, "r": 153.69140625, "b": 339.4690856933594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.56553649902344, "t": 348.3756408691406, "r": 194.00009155273438, "b": 339.4690856933594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "91.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.82937622070312, "t": 348.3756408691406, "r": 238.26393127441406, "b": 339.4690856933594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "88.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.18414306640625, "t": 348.3756408691406, "r": 279.6186828613281, "b": 339.4690856933594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "89.9", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 82.16500091552734, "t": 336.4196472167969, "r": 101.53230285644531, "b": 327.5130920410156, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "GTE", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86715698242188, "t": 336.4196472167969, "r": 153.68650817871094, "b": 327.5130920410156, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.62411499023438, "t": 336.4196472167969, "r": 186.94166564941406, "b": 327.5130920410156, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.88795471191406, "t": 336.4196472167969, "r": 231.20550537109375, "b": 327.5130920410156, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.69854736328125, "t": 336.4196472167969, "r": 282.1144104003906, "b": 327.5130920410156, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "93.01", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 66.31500244140625, "t": 323.86663818359375, "r": 117.38329315185547, "b": 314.9600830078125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86766052246094, "t": 323.86663818359375, "r": 153.68701171875, "b": 314.9600830078125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57110595703125, "t": 323.86663818359375, "r": 194.0056610107422, "b": 314.9600830078125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "98.5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.83494567871094, "t": 323.86663818359375, "r": 238.26950073242188, "b": 314.9600830078125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "95.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.697998046875, "t": 323.9862060546875, "r": 282.1138610839844, "b": 315.0298156738281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "96.75", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 81.61199951171875, "t": 308.67364501953125, "r": 102.08513641357422, "b": 299.76708984375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.87205505371094, "t": 308.67364501953125, "r": 153.69140625, "b": 299.76708984375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.56553649902344, "t": 308.67364501953125, "r": 194.00009155273438, "b": 299.76708984375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "88.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 218.33871459960938, "t": 308.67364501953125, "r": 240.7545623779297, "b": 299.76708984375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "92.08", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.1841125488281, "t": 308.67364501953125, "r": 279.61865234375, "b": 299.76708984375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "90.6", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 82.16500091552734, "t": 296.7186584472656, "r": 101.53230285644531, "b": 287.8121032714844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "GTE", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86715698242188, "t": 296.7186584472656, "r": 153.68650817871094, "b": 287.8121032714844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.62411499023438, "t": 296.7186584472656, "r": 186.94166564941406, "b": 287.8121032714844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.88795471191406, "t": 296.7186584472656, "r": 231.20550537109375, "b": 287.8121032714844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.69854736328125, "t": 296.7186584472656, "r": 282.1144104003906, "b": 287.8121032714844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "87.14", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 71.78900146484375, "t": 284.763671875, "r": 111.90838623046875, "b": 275.85711669921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "GTE (FT)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86221313476562, "t": 284.763671875, "r": 153.6815643310547, "b": 275.85711669921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.62913513183594, "t": 284.763671875, "r": 186.94668579101562, "b": 275.85711669921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.89297485351562, "t": 284.763671875, "r": 231.2105255126953, "b": 275.85711669921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.693603515625, "t": 284.763671875, "r": 282.1094665527344, "b": 275.85711669921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "91.02", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 66.31500244140625, "t": 272.8086853027344, "r": 117.38329315185547, "b": 263.9021301269531, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86766052246094, "t": 272.8086853027344, "r": 153.68701171875, "b": 263.9021301269531, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57110595703125, "t": 272.8086853027344, "r": 194.0056610107422, "b": 263.9021301269531, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "97.5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.83494567871094, "t": 272.8086853027344, "r": 238.26950073242188, "b": 263.9021301269531, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "96.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.1889953613281, "t": 272.9282531738281, "r": 279.62353515625, "b": 263.97186279296875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "96.8", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 81.61199951171875, "t": 255.5016326904297, "r": 102.08513641357422, "b": 246.59507751464844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 137.91064453125, "t": 255.5016326904297, "r": 150.64285278320312, "b": 246.59507751464844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "TB", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.56553649902344, "t": 255.5016326904297, "r": 194.00009155273438, "b": 246.59507751464844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "86.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.89285278320312, "t": 255.5016326904297, "r": 231.2104034423828, "b": 246.59507751464844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.1841125488281, "t": 255.5016326904297, "r": 279.61865234375, "b": 246.59507751464844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "86.0", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 66.31500244140625, "t": 243.54563903808594, "r": 117.38329315185547, "b": 234.6390838623047, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 137.90625, "t": 243.54563903808594, "r": 150.63845825195312, "b": 234.6390838623047, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "TB", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57110595703125, "t": 243.54563903808594, "r": 194.0056610107422, "b": 234.6390838623047, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "89.6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.88845825195312, "t": 243.54563903808594, "r": 231.2060089111328, "b": 234.6390838623047, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.1889953613281, "t": 243.66519165039062, "r": 279.62353515625, "b": 234.7088165283203, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "89.6", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 66.31500244140625, "t": 223.9976348876953, "r": 117.38329315185547, "b": 215.09107971191406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86766052246094, "t": 223.9976348876953, "r": 153.68701171875, "b": 215.09107971191406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "STN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57110595703125, "t": 223.9976348876953, "r": 194.0056610107422, "b": 215.09107971191406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "96.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.83494567871094, "t": 223.9976348876953, "r": 238.26950073242188, "b": 215.09107971191406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "95.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.189697265625, "t": 223.9976348876953, "r": 279.6242370605469, "b": 215.09107971191406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "96.7", "column_header": false, "row_header": false, "row_section": false}]]}}, {"self_ref": "#/tables/4", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 7, "bbox": {"l": 308.4068603515625, "t": 544.1236572265625, "r": 533.6419677734375, "b": 488.1943359375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/282"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [{"bbox": {"l": 339.322998046875, "t": 538.3356323242188, "r": 365.3353576660156, "b": 529.4290771484375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Model", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 401.04132080078125, "t": 538.3356323242188, "r": 430.9191589355469, "b": 529.4290771484375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Dataset", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 454.1021423339844, "t": 538.3356323242188, "r": 474.5852355957031, "b": 529.4290771484375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "mAP", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 486.54034423828125, "t": 538.3356323242188, "r": 527.2276000976562, "b": 529.4290771484375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "mAP (PP)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 327.656005859375, "t": 521.378662109375, "r": 377.0007629394531, "b": 512.4721069335938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD+BBox", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 393.6980895996094, "t": 521.378662109375, "r": 438.2807312011719, "b": 512.4721069335938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PubTabNet", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 455.6355895996094, "t": 521.378662109375, "r": 473.07012939453125, "b": 512.4721069335938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "79.2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 498.1659240722656, "t": 521.378662109375, "r": 515.6004638671875, "b": 512.4721069335938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "82.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 326.7950134277344, "t": 509.4236755371094, "r": 377.8633117675781, "b": 500.5171203613281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 393.6938781738281, "t": 509.4236755371094, "r": 438.2765197753906, "b": 500.5171203613281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PubTabNet", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 455.6310119628906, "t": 509.5432434082031, "r": 473.0655517578125, "b": 500.58685302734375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "82.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 498.1712951660156, "t": 509.5432434082031, "r": 515.6058349609375, "b": 500.58685302734375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "86.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 326.7950134277344, "t": 497.46868896484375, "r": 377.8633117675781, "b": 488.5621337890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 389.81842041015625, "t": 497.46868896484375, "r": 442.1519470214844, "b": 488.5621337890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "SynthTabNet", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 455.63134765625, "t": 497.46868896484375, "r": 473.0658874511719, "b": 488.5621337890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "87.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 505.22515869140625, "t": 497.46868896484375, "r": 508.5426940917969, "b": 488.5621337890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}], "num_rows": 4, "num_cols": 4, "grid": [[{"bbox": {"l": 339.322998046875, "t": 538.3356323242188, "r": 365.3353576660156, "b": 529.4290771484375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Model", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 401.04132080078125, "t": 538.3356323242188, "r": 430.9191589355469, "b": 529.4290771484375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Dataset", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 454.1021423339844, "t": 538.3356323242188, "r": 474.5852355957031, "b": 529.4290771484375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "mAP", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 486.54034423828125, "t": 538.3356323242188, "r": 527.2276000976562, "b": 529.4290771484375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "mAP (PP)", "column_header": true, "row_header": false, "row_section": false}], [{"bbox": {"l": 327.656005859375, "t": 521.378662109375, "r": 377.0007629394531, "b": 512.4721069335938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD+BBox", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 393.6980895996094, "t": 521.378662109375, "r": 438.2807312011719, "b": 512.4721069335938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PubTabNet", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 455.6355895996094, "t": 521.378662109375, "r": 473.07012939453125, "b": 512.4721069335938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "79.2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 498.1659240722656, "t": 521.378662109375, "r": 515.6004638671875, "b": 512.4721069335938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "82.7", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 326.7950134277344, "t": 509.4236755371094, "r": 377.8633117675781, "b": 500.5171203613281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 393.6938781738281, "t": 509.4236755371094, "r": 438.2765197753906, "b": 500.5171203613281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PubTabNet", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 455.6310119628906, "t": 509.5432434082031, "r": 473.0655517578125, "b": 500.58685302734375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "82.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 498.1712951660156, "t": 509.5432434082031, "r": 515.6058349609375, "b": 500.58685302734375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "86.8", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 326.7950134277344, "t": 497.46868896484375, "r": 377.8633117675781, "b": 488.5621337890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 389.81842041015625, "t": 497.46868896484375, "r": 442.1519470214844, "b": 488.5621337890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "SynthTabNet", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 455.63134765625, "t": 497.46868896484375, "r": 473.0658874511719, "b": 488.5621337890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "87.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 505.22515869140625, "t": 497.46868896484375, "r": 508.5426940917969, "b": 488.5621337890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}]]}}, {"self_ref": "#/tables/5", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 7, "bbox": {"l": 332.9688720703125, "t": 251.7164306640625, "r": 520.942138671875, "b": 148.73028564453125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/284"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [{"bbox": {"l": 358.010986328125, "t": 239.76663208007812, "r": 384.0233459472656, "b": 230.86007690429688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Model", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 408.5059814453125, "t": 233.7896270751953, "r": 436.739990234375, "b": 224.88307189941406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Simple", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 448.6950988769531, "t": 245.74462890625, "r": 485.0784912109375, "b": 224.88307189941406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "TEDS Complex", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 499.3847961425781, "t": 233.7896270751953, "r": 512.1170043945312, "b": 224.88307189941406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "All", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 357.6820068359375, "t": 216.8326416015625, "r": 384.3518981933594, "b": 207.92608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Tabula", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.9009704589844, "t": 216.8326416015625, "r": 431.33551025390625, "b": 207.92608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "78.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.164794921875, "t": 216.8326416015625, "r": 475.5993347167969, "b": 207.92608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "57.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.0289001464844, "t": 216.8326416015625, "r": 514.4634399414062, "b": 207.92608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "67.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 350.7229919433594, "t": 204.8776397705078, "r": 391.3106384277344, "b": 195.97108459472656, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Traprange", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.90582275390625, "t": 204.8776397705078, "r": 431.3403625488281, "b": 195.97108459472656, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "60.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.1696472167969, "t": 204.8776397705078, "r": 475.60418701171875, "b": 195.97108459472656, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "49.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.03375244140625, "t": 204.8776397705078, "r": 514.4683227539062, "b": 195.97108459472656, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "55.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 354.135986328125, "t": 192.92164611816406, "r": 387.89923095703125, "b": 184.0150909423828, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Camelot", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.901611328125, "t": 192.92164611816406, "r": 431.3361511230469, "b": 184.0150909423828, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "80.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.1654357910156, "t": 192.92164611816406, "r": 475.5999755859375, "b": 184.0150909423828, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "66.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.029541015625, "t": 192.92164611816406, "r": 514.464111328125, "b": 184.0150909423828, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "73.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 346.5589904785156, "t": 180.96664428710938, "r": 395.475341796875, "b": 172.06008911132812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Acrobat Pro", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.9061584472656, "t": 180.96664428710938, "r": 431.3406982421875, "b": 172.06008911132812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "68.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16998291015625, "t": 180.96664428710938, "r": 475.6045227050781, "b": 172.06008911132812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "61.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.0340881347656, "t": 180.96664428710938, "r": 514.4686279296875, "b": 172.06008911132812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "65.3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 360.781005859375, "t": 169.0116424560547, "r": 381.254150390625, "b": 160.10508728027344, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.9015808105469, "t": 169.0116424560547, "r": 431.33612060546875, "b": 160.10508728027344, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "91.2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.1654052734375, "t": 169.0116424560547, "r": 475.5999450683594, "b": 160.10508728027344, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "85.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.0295104980469, "t": 169.0116424560547, "r": 514.4640502929688, "b": 160.10508728027344, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "88.3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 345.4830017089844, "t": 157.056640625, "r": 396.5513000488281, "b": 148.15008544921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.9061584472656, "t": 157.056640625, "r": 431.3406982421875, "b": 148.15008544921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "95.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16998291015625, "t": 157.056640625, "r": 475.6045227050781, "b": 148.15008544921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "90.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.03399658203125, "t": 157.1761932373047, "r": 514.4685668945312, "b": 148.21981811523438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "93.6", "column_header": false, "row_header": false, "row_section": false}], "num_rows": 7, "num_cols": 4, "grid": [[{"bbox": {"l": 358.010986328125, "t": 239.76663208007812, "r": 384.0233459472656, "b": 230.86007690429688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Model", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 408.5059814453125, "t": 233.7896270751953, "r": 436.739990234375, "b": 224.88307189941406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Simple", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 448.6950988769531, "t": 245.74462890625, "r": 485.0784912109375, "b": 224.88307189941406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "TEDS Complex", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 499.3847961425781, "t": 233.7896270751953, "r": 512.1170043945312, "b": 224.88307189941406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "All", "column_header": true, "row_header": false, "row_section": false}], [{"bbox": {"l": 357.6820068359375, "t": 216.8326416015625, "r": 384.3518981933594, "b": 207.92608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Tabula", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.9009704589844, "t": 216.8326416015625, "r": 431.33551025390625, "b": 207.92608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "78.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.164794921875, "t": 216.8326416015625, "r": 475.5993347167969, "b": 207.92608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "57.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.0289001464844, "t": 216.8326416015625, "r": 514.4634399414062, "b": 207.92608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "67.9", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 350.7229919433594, "t": 204.8776397705078, "r": 391.3106384277344, "b": 195.97108459472656, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Traprange", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.90582275390625, "t": 204.8776397705078, "r": 431.3403625488281, "b": 195.97108459472656, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "60.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.1696472167969, "t": 204.8776397705078, "r": 475.60418701171875, "b": 195.97108459472656, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "49.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.03375244140625, "t": 204.8776397705078, "r": 514.4683227539062, "b": 195.97108459472656, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "55.4", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 354.135986328125, "t": 192.92164611816406, "r": 387.89923095703125, "b": 184.0150909423828, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Camelot", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.901611328125, "t": 192.92164611816406, "r": 431.3361511230469, "b": 184.0150909423828, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "80.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.1654357910156, "t": 192.92164611816406, "r": 475.5999755859375, "b": 184.0150909423828, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "66.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.029541015625, "t": 192.92164611816406, "r": 514.464111328125, "b": 184.0150909423828, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "73.0", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 346.5589904785156, "t": 180.96664428710938, "r": 395.475341796875, "b": 172.06008911132812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Acrobat Pro", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.9061584472656, "t": 180.96664428710938, "r": 431.3406982421875, "b": 172.06008911132812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "68.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16998291015625, "t": 180.96664428710938, "r": 475.6045227050781, "b": 172.06008911132812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "61.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.0340881347656, "t": 180.96664428710938, "r": 514.4686279296875, "b": 172.06008911132812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "65.3", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 360.781005859375, "t": 169.0116424560547, "r": 381.254150390625, "b": 160.10508728027344, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.9015808105469, "t": 169.0116424560547, "r": 431.33612060546875, "b": 160.10508728027344, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "91.2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.1654052734375, "t": 169.0116424560547, "r": 475.5999450683594, "b": 160.10508728027344, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "85.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.0295104980469, "t": 169.0116424560547, "r": 514.4640502929688, "b": 160.10508728027344, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "88.3", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 345.4830017089844, "t": 157.056640625, "r": 396.5513000488281, "b": 148.15008544921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.9061584472656, "t": 157.056640625, "r": 431.3406982421875, "b": 148.15008544921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "95.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16998291015625, "t": 157.056640625, "r": 475.6045227050781, "b": 148.15008544921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "90.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.03399658203125, "t": 157.1761932373047, "r": 514.4685668945312, "b": 148.21981811523438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "93.6", "column_header": false, "row_header": false, "row_section": false}]]}}, {"self_ref": "#/tables/6", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 8, "bbox": {"l": 53.62853240966797, "t": 573.0513916015625, "r": 298.5574951171875, "b": 499.60003662109375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [{"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 209.93284606933594, "t": 569.8192749023438, "r": 241.04458618164062, "b": 565.6378784179688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 4, "text": "\u8ad6\u6587\u30d5\u30a1\u30a4\u30eb", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 263.764892578125, "t": 569.8192749023438, "r": 284.5058898925781, "b": 565.6378784179688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 4, "end_col_offset_idx": 6, "text": "\u53c2\u8003\u6587\u732e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 110.24990844726562, "t": 562.3340454101562, "r": 120.62017822265625, "b": 558.1526489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u51fa\u5178", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 175.3660888671875, "t": 562.3340454101562, "r": 201.29246520996094, "b": 558.1526489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "\u30d5\u30a1\u30a4\u30eb \u6570", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 209.62408447265625, "t": 562.3340454101562, "r": 219.99435424804688, "b": 558.1526489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "\u82f1\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 229.19813537597656, "t": 562.3340454101562, "r": 244.75376892089844, "b": 558.1526489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "\u65e5\u672c\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 256.11419677734375, "t": 562.3340454101562, "r": 266.4844665527344, "b": 558.1526489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "\u82f1\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 278.38433837890625, "t": 562.3340454101562, "r": 293.9399719238281, "b": 558.1526489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "\u65e5\u672c\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 55.530521392822266, "t": 555.5741577148438, "r": 162.71310424804688, "b": 551.2162475585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Association for Computational Linguistics(ACL2003)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 184.39730834960938, "t": 555.5741577148438, "r": 189.56455993652344, "b": 551.2162475585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "65", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 208.99026489257812, "t": 555.5741577148438, "r": 214.1575164794922, "b": 551.2162475585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "65", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 234.8751678466797, "t": 555.5741577148438, "r": 237.4583282470703, "b": 551.2162475585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 555.5741577148438, "r": 264.63580322265625, "b": 551.2162475585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 284.06134033203125, "t": 555.5741577148438, "r": 286.6445007324219, "b": 551.2162475585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.530521392822266, "t": 549.3795166015625, "r": 139.7225341796875, "b": 545.0216064453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Computational Linguistics(COLING2002)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536193847656, "t": 549.3795166015625, "r": 190.85670471191406, "b": 545.0216064453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "140", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 207.6983184814453, "t": 549.3795166015625, "r": 215.4496612548828, "b": 545.0216064453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "140", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 234.8751678466797, "t": 549.3795166015625, "r": 237.4583282470703, "b": 545.0216064453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 549.3795166015625, "r": 264.63580322265625, "b": 545.0216064453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 284.06134033203125, "t": 549.3795166015625, "r": 286.6445007324219, "b": 545.0216064453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.530521392822266, "t": 542.4105834960938, "r": 128.96026611328125, "b": 538.0201416015625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u96fb\u6c17\u60c5\u5831\u901a\u4fe1\u5b66\u4f1a 2003 \u5e74\u7dcf\u5408\u5927\u4f1a", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536193847656, "t": 543.1849365234375, "r": 190.85670471191406, "b": 538.8270263671875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.2822265625, "t": 543.1849365234375, "r": 212.86538696289062, "b": 538.8270263671875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153442382812, "t": 543.1849365234375, "r": 240.04287719726562, "b": 538.8270263671875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "142", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 543.1849365234375, "r": 264.63580322265625, "b": 538.8270263671875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "223", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.4774169921875, "t": 543.1849365234375, "r": 289.228759765625, "b": 538.8270263671875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "147", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.530521392822266, "t": 534.9253540039062, "r": 129.88177490234375, "b": 530.534912109375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u60c5\u5831\u51e6\u7406\u5b66\u4f1a\u7b2c 65 \u56de\u5168\u56fd\u5927\u4f1a (2003)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536193847656, "t": 535.69970703125, "r": 190.85670471191406, "b": 531.341796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "177", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.2822265625, "t": 535.69970703125, "r": 212.86538696289062, "b": 531.341796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153442382812, "t": 535.69970703125, "r": 240.04287719726562, "b": 531.341796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "176", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 535.69970703125, "r": 264.63580322265625, "b": 531.341796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.4774169921875, "t": 535.69970703125, "r": 289.228759765625, "b": 531.341796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "236", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.530521392822266, "t": 527.6982421875, "r": 129.88177490234375, "b": 523.3078002929688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u7b2c 17 \u56de\u4eba\u5de5\u77e5\u80fd\u5b66\u4f1a\u5168\u56fd\u5927\u4f1a (2003)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536193847656, "t": 528.4725952148438, "r": 190.85670471191406, "b": 524.1146850585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "208", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.2822265625, "t": 528.4725952148438, "r": 212.86538696289062, "b": 524.1146850585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153442382812, "t": 528.4725952148438, "r": 240.04287719726562, "b": 524.1146850585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "203", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 528.4725952148438, "r": 264.63580322265625, "b": 524.1146850585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "152", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.4774169921875, "t": 528.4725952148438, "r": 289.228759765625, "b": 524.1146850585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "244", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.530521392822266, "t": 520.47119140625, "r": 127.32453918457031, "b": 516.0807495117188, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\u7814\u7a76\u4f1a\u7b2c 146 \u301c 155 \u56de", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 184.39730834960938, "t": 521.2455444335938, "r": 189.56455993652344, "b": 516.8876342773438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "98", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.2822265625, "t": 521.2455444335938, "r": 212.86538696289062, "b": 516.8876342773438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 233.58348083496094, "t": 521.2455444335938, "r": 238.750732421875, "b": 516.8876342773438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "96", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 521.2455444335938, "r": 264.63580322265625, "b": 516.8876342773438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.4774169921875, "t": 521.2455444335938, "r": 289.228759765625, "b": 516.8876342773438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "232", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.530521392822266, "t": 512.986083984375, "r": 110.16829681396484, "b": 508.59564208984375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "WWW \u304b\u3089\u53ce\u96c6\u3057\u305f\u8ad6\u6587", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536193847656, "t": 514.0184326171875, "r": 190.85670471191406, "b": 509.6605224609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "107", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 208.99026489257812, "t": 514.0184326171875, "r": 214.1575164794922, "b": 509.6605224609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "73", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 233.58348083496094, "t": 514.0184326171875, "r": 238.750732421875, "b": 509.6605224609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "34", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 514.0184326171875, "r": 264.63580322265625, "b": 509.6605224609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "147", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 282.7693786621094, "t": 514.0184326171875, "r": 287.9366149902344, "b": 509.6605224609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "96", "column_header": false, "row_header": false, "row_section": false}, {"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.10536193847656, "t": 506.5333251953125, "r": 190.85670471191406, "b": 502.1754150390625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "945", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 207.6983184814453, "t": 506.5333251953125, "r": 215.4496612548828, "b": 502.1754150390625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "294", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153442382812, "t": 506.5333251953125, "r": 240.04287719726562, "b": 502.1754150390625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "651", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 255.7650604248047, "t": 506.5333251953125, "r": 265.7520446777344, "b": 502.1754150390625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "1122", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.4774169921875, "t": 506.5333251953125, "r": 289.228759765625, "b": 502.1754150390625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "955", "column_header": false, "row_header": false, "row_section": false}], "num_rows": 10, "num_cols": 6, "grid": [[{"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 209.93284606933594, "t": 569.8192749023438, "r": 241.04458618164062, "b": 565.6378784179688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 4, "text": "\u8ad6\u6587\u30d5\u30a1\u30a4\u30eb", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 209.93284606933594, "t": 569.8192749023438, "r": 241.04458618164062, "b": 565.6378784179688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 4, "text": "\u8ad6\u6587\u30d5\u30a1\u30a4\u30eb", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 263.764892578125, "t": 569.8192749023438, "r": 284.5058898925781, "b": 565.6378784179688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 4, "end_col_offset_idx": 6, "text": "\u53c2\u8003\u6587\u732e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 263.764892578125, "t": 569.8192749023438, "r": 284.5058898925781, "b": 565.6378784179688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 4, "end_col_offset_idx": 6, "text": "\u53c2\u8003\u6587\u732e", "column_header": true, "row_header": false, "row_section": false}], [{"bbox": {"l": 110.24990844726562, "t": 562.3340454101562, "r": 120.62017822265625, "b": 558.1526489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u51fa\u5178", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 175.3660888671875, "t": 562.3340454101562, "r": 201.29246520996094, "b": 558.1526489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "\u30d5\u30a1\u30a4\u30eb \u6570", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 209.62408447265625, "t": 562.3340454101562, "r": 219.99435424804688, "b": 558.1526489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "\u82f1\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 229.19813537597656, "t": 562.3340454101562, "r": 244.75376892089844, "b": 558.1526489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "\u65e5\u672c\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 256.11419677734375, "t": 562.3340454101562, "r": 266.4844665527344, "b": 558.1526489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "\u82f1\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 278.38433837890625, "t": 562.3340454101562, "r": 293.9399719238281, "b": 558.1526489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "\u65e5\u672c\u8a9e", "column_header": true, "row_header": false, "row_section": false}], [{"bbox": {"l": 55.530521392822266, "t": 555.5741577148438, "r": 162.71310424804688, "b": 551.2162475585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Association for Computational Linguistics(ACL2003)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 184.39730834960938, "t": 555.5741577148438, "r": 189.56455993652344, "b": 551.2162475585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "65", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 208.99026489257812, "t": 555.5741577148438, "r": 214.1575164794922, "b": 551.2162475585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "65", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 234.8751678466797, "t": 555.5741577148438, "r": 237.4583282470703, "b": 551.2162475585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 555.5741577148438, "r": 264.63580322265625, "b": 551.2162475585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 284.06134033203125, "t": 555.5741577148438, "r": 286.6445007324219, "b": 551.2162475585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 55.530521392822266, "t": 549.3795166015625, "r": 139.7225341796875, "b": 545.0216064453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Computational Linguistics(COLING2002)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536193847656, "t": 549.3795166015625, "r": 190.85670471191406, "b": 545.0216064453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "140", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 207.6983184814453, "t": 549.3795166015625, "r": 215.4496612548828, "b": 545.0216064453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "140", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 234.8751678466797, "t": 549.3795166015625, "r": 237.4583282470703, "b": 545.0216064453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 549.3795166015625, "r": 264.63580322265625, "b": 545.0216064453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 284.06134033203125, "t": 549.3795166015625, "r": 286.6445007324219, "b": 545.0216064453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 55.530521392822266, "t": 542.4105834960938, "r": 128.96026611328125, "b": 538.0201416015625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u96fb\u6c17\u60c5\u5831\u901a\u4fe1\u5b66\u4f1a 2003 \u5e74\u7dcf\u5408\u5927\u4f1a", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536193847656, "t": 543.1849365234375, "r": 190.85670471191406, "b": 538.8270263671875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.2822265625, "t": 543.1849365234375, "r": 212.86538696289062, "b": 538.8270263671875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153442382812, "t": 543.1849365234375, "r": 240.04287719726562, "b": 538.8270263671875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "142", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 543.1849365234375, "r": 264.63580322265625, "b": 538.8270263671875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "223", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.4774169921875, "t": 543.1849365234375, "r": 289.228759765625, "b": 538.8270263671875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "147", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 55.530521392822266, "t": 534.9253540039062, "r": 129.88177490234375, "b": 530.534912109375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u60c5\u5831\u51e6\u7406\u5b66\u4f1a\u7b2c 65 \u56de\u5168\u56fd\u5927\u4f1a (2003)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536193847656, "t": 535.69970703125, "r": 190.85670471191406, "b": 531.341796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "177", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.2822265625, "t": 535.69970703125, "r": 212.86538696289062, "b": 531.341796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153442382812, "t": 535.69970703125, "r": 240.04287719726562, "b": 531.341796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "176", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 535.69970703125, "r": 264.63580322265625, "b": 531.341796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.4774169921875, "t": 535.69970703125, "r": 289.228759765625, "b": 531.341796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "236", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 55.530521392822266, "t": 527.6982421875, "r": 129.88177490234375, "b": 523.3078002929688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u7b2c 17 \u56de\u4eba\u5de5\u77e5\u80fd\u5b66\u4f1a\u5168\u56fd\u5927\u4f1a (2003)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536193847656, "t": 528.4725952148438, "r": 190.85670471191406, "b": 524.1146850585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "208", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.2822265625, "t": 528.4725952148438, "r": 212.86538696289062, "b": 524.1146850585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153442382812, "t": 528.4725952148438, "r": 240.04287719726562, "b": 524.1146850585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "203", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 528.4725952148438, "r": 264.63580322265625, "b": 524.1146850585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "152", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.4774169921875, "t": 528.4725952148438, "r": 289.228759765625, "b": 524.1146850585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "244", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 55.530521392822266, "t": 520.47119140625, "r": 127.32453918457031, "b": 516.0807495117188, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\u7814\u7a76\u4f1a\u7b2c 146 \u301c 155 \u56de", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 184.39730834960938, "t": 521.2455444335938, "r": 189.56455993652344, "b": 516.8876342773438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "98", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.2822265625, "t": 521.2455444335938, "r": 212.86538696289062, "b": 516.8876342773438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 233.58348083496094, "t": 521.2455444335938, "r": 238.750732421875, "b": 516.8876342773438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "96", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 521.2455444335938, "r": 264.63580322265625, "b": 516.8876342773438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.4774169921875, "t": 521.2455444335938, "r": 289.228759765625, "b": 516.8876342773438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "232", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 55.530521392822266, "t": 512.986083984375, "r": 110.16829681396484, "b": 508.59564208984375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "WWW \u304b\u3089\u53ce\u96c6\u3057\u305f\u8ad6\u6587", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536193847656, "t": 514.0184326171875, "r": 190.85670471191406, "b": 509.6605224609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "107", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 208.99026489257812, "t": 514.0184326171875, "r": 214.1575164794922, "b": 509.6605224609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "73", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 233.58348083496094, "t": 514.0184326171875, "r": 238.750732421875, "b": 509.6605224609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "34", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 514.0184326171875, "r": 264.63580322265625, "b": 509.6605224609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "147", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 282.7693786621094, "t": 514.0184326171875, "r": 287.9366149902344, "b": 509.6605224609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "96", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.10536193847656, "t": 506.5333251953125, "r": 190.85670471191406, "b": 502.1754150390625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "945", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 207.6983184814453, "t": 506.5333251953125, "r": 215.4496612548828, "b": 502.1754150390625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "294", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153442382812, "t": 506.5333251953125, "r": 240.04287719726562, "b": 502.1754150390625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "651", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 255.7650604248047, "t": 506.5333251953125, "r": 265.7520446777344, "b": 502.1754150390625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "1122", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.4774169921875, "t": 506.5333251953125, "r": 289.228759765625, "b": 502.1754150390625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "955", "column_header": false, "row_header": false, "row_section": false}]]}}, {"self_ref": "#/tables/7", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 8, "bbox": {"l": 304.9219970703125, "t": 573.485107421875, "r": 550.2321166992188, "b": 504.09930419921875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/290"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [{"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 392.0967102050781, "t": 570.425537109375, "r": 438.0144958496094, "b": 565.3603515625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 3, "text": "Shares (in millions)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 459.0486145019531, "t": 570.3758544921875, "r": 542.0001831054688, "b": 559.1006469726562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 5, "text": "Weighted Average Grant Date Fair Value", "column_header": true, "row_header": false, "row_section": false}, {"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 393.24420166015625, "t": 555.2528686523438, "r": 407.3463134765625, "b": 550.1876831054688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "RS U s", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 427.1832275390625, "t": 555.2528686523438, "r": 440.98779296875, "b": 550.1876831054688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "PSUs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 468.3825378417969, "t": 555.2528686523438, "r": 482.4846496582031, "b": 550.1876831054688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "RSUs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 516.92578125, "t": 555.2528686523438, "r": 530.7303466796875, "b": 550.1876831054688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PSUs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11492919921875, "t": 547.38916015625, "r": 364.65606689453125, "b": 542.323974609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Nonvested on Janua ry 1", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 396.2466125488281, "t": 547.0867309570312, "r": 403.75531005859375, "b": 542.0215454101562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "1. 1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 429.8183898925781, "t": 547.0867309570312, "r": 437.32708740234375, "b": 542.0215454101562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 465.5285949707031, "t": 547.0867309570312, "r": 483.5500183105469, "b": 542.0215454101562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "90.10 $", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 513.4482421875, "t": 547.0867309570312, "r": 531.4696655273438, "b": 542.0215454101562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "$ 91.19", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11492919921875, "t": 538.3154907226562, "r": 325.6267395019531, "b": 533.2503051757812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Granted", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 396.2466125488281, "t": 538.3154907226562, "r": 403.75531005859375, "b": 533.2503051757812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "0. 5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 429.8183898925781, "t": 538.3154907226562, "r": 437.32708740234375, "b": 533.2503051757812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 466.435791015625, "t": 538.3154907226562, "r": 482.5483093261719, "b": 533.2503051757812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "117.44", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 514.2906494140625, "t": 538.3154907226562, "r": 530.809814453125, "b": 533.2503051757812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "122.41", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11492919921875, "t": 530.4517822265625, "r": 322.628662109375, "b": 525.3865966796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Vested", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 394.4322204589844, "t": 530.4517822265625, "r": 405.5362548828125, "b": 525.3865966796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "(0. 5 )", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 427.70159912109375, "t": 530.4517822265625, "r": 438.8056335449219, "b": 525.3865966796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "(0.1)", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 468.5553283691406, "t": 530.4517822265625, "r": 482.0704345703125, "b": 525.3865966796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "87.08", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 516.0186157226562, "t": 530.4517822265625, "r": 529.5337524414062, "b": 525.3865966796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "81.14", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11492919921875, "t": 522.3585205078125, "r": 356.2477111816406, "b": 517.2933349609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Canceled or forfeited", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 394.4322204589844, "t": 521.6805419921875, "r": 405.5362548828125, "b": 516.6153564453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "(0. 1 )", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 431.02801513671875, "t": 521.6805419921875, "r": 436.4280090332031, "b": 516.6153564453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 465.83099365234375, "t": 521.6805419921875, "r": 482.3501281738281, "b": 516.6153564453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "102.01", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 516.0186157226562, "t": 521.6805419921875, "r": 529.5337524414062, "b": 516.6153564453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "92.18", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11492919921875, "t": 513.5142822265625, "r": 373.3576354980469, "b": 508.4490661621094, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Nonvested on December 31", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 396.2466125488281, "t": 513.5142822265625, "r": 403.75531005859375, "b": 508.4490661621094, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "1.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 429.5159912109375, "t": 513.5142822265625, "r": 437.0246887207031, "b": 508.4490661621094, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 463.7142028808594, "t": 513.5142822265625, "r": 484.7396545410156, "b": 508.4490661621094, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "104.85 $", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.99462890625, "t": 513.5142822265625, "r": 534.0200805664062, "b": 508.4490661621094, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "$ 104.51", "column_header": false, "row_header": false, "row_section": false}], "num_rows": 7, "num_cols": 5, "grid": [[{"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 392.0967102050781, "t": 570.425537109375, "r": 438.0144958496094, "b": 565.3603515625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 3, "text": "Shares (in millions)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 392.0967102050781, "t": 570.425537109375, "r": 438.0144958496094, "b": 565.3603515625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 3, "text": "Shares (in millions)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 459.0486145019531, "t": 570.3758544921875, "r": 542.0001831054688, "b": 559.1006469726562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 5, "text": "Weighted Average Grant Date Fair Value", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 459.0486145019531, "t": 570.3758544921875, "r": 542.0001831054688, "b": 559.1006469726562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 5, "text": "Weighted Average Grant Date Fair Value", "column_header": true, "row_header": false, "row_section": false}], [{"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 393.24420166015625, "t": 555.2528686523438, "r": 407.3463134765625, "b": 550.1876831054688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "RS U s", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 427.1832275390625, "t": 555.2528686523438, "r": 440.98779296875, "b": 550.1876831054688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "PSUs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 468.3825378417969, "t": 555.2528686523438, "r": 482.4846496582031, "b": 550.1876831054688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "RSUs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 516.92578125, "t": 555.2528686523438, "r": 530.7303466796875, "b": 550.1876831054688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PSUs", "column_header": true, "row_header": false, "row_section": false}], [{"bbox": {"l": 306.11492919921875, "t": 547.38916015625, "r": 364.65606689453125, "b": 542.323974609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Nonvested on Janua ry 1", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 396.2466125488281, "t": 547.0867309570312, "r": 403.75531005859375, "b": 542.0215454101562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "1. 1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 429.8183898925781, "t": 547.0867309570312, "r": 437.32708740234375, "b": 542.0215454101562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 465.5285949707031, "t": 547.0867309570312, "r": 483.5500183105469, "b": 542.0215454101562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "90.10 $", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 513.4482421875, "t": 547.0867309570312, "r": 531.4696655273438, "b": 542.0215454101562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "$ 91.19", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 306.11492919921875, "t": 538.3154907226562, "r": 325.6267395019531, "b": 533.2503051757812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Granted", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 396.2466125488281, "t": 538.3154907226562, "r": 403.75531005859375, "b": 533.2503051757812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "0. 5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 429.8183898925781, "t": 538.3154907226562, "r": 437.32708740234375, "b": 533.2503051757812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 466.435791015625, "t": 538.3154907226562, "r": 482.5483093261719, "b": 533.2503051757812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "117.44", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 514.2906494140625, "t": 538.3154907226562, "r": 530.809814453125, "b": 533.2503051757812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "122.41", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 306.11492919921875, "t": 530.4517822265625, "r": 322.628662109375, "b": 525.3865966796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Vested", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 394.4322204589844, "t": 530.4517822265625, "r": 405.5362548828125, "b": 525.3865966796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "(0. 5 )", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 427.70159912109375, "t": 530.4517822265625, "r": 438.8056335449219, "b": 525.3865966796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "(0.1)", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 468.5553283691406, "t": 530.4517822265625, "r": 482.0704345703125, "b": 525.3865966796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "87.08", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 516.0186157226562, "t": 530.4517822265625, "r": 529.5337524414062, "b": 525.3865966796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "81.14", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 306.11492919921875, "t": 522.3585205078125, "r": 356.2477111816406, "b": 517.2933349609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Canceled or forfeited", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 394.4322204589844, "t": 521.6805419921875, "r": 405.5362548828125, "b": 516.6153564453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "(0. 1 )", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 431.02801513671875, "t": 521.6805419921875, "r": 436.4280090332031, "b": 516.6153564453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 465.83099365234375, "t": 521.6805419921875, "r": 482.3501281738281, "b": 516.6153564453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "102.01", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 516.0186157226562, "t": 521.6805419921875, "r": 529.5337524414062, "b": 516.6153564453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "92.18", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 306.11492919921875, "t": 513.5142822265625, "r": 373.3576354980469, "b": 508.4490661621094, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Nonvested on December 31", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 396.2466125488281, "t": 513.5142822265625, "r": 403.75531005859375, "b": 508.4490661621094, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "1.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 429.5159912109375, "t": 513.5142822265625, "r": 437.0246887207031, "b": 508.4490661621094, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 463.7142028808594, "t": 513.5142822265625, "r": 484.7396545410156, "b": 508.4490661621094, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "104.85 $", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.99462890625, "t": 513.5142822265625, "r": 534.0200805664062, "b": 508.4490661621094, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "$ 104.51", "column_header": false, "row_header": false, "row_section": false}]]}}, {"self_ref": "#/tables/8", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 13, "bbox": {"l": 84.0283203125, "t": 635.6664428710938, "r": 239.1690673828125, "b": 577.606689453125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/9", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 13, "bbox": {"l": 82.92001342773438, "t": 558.2236938476562, "r": 239.1903533935547, "b": 500.716064453125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/10", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 13, "bbox": {"l": 83.94786071777344, "t": 482.9522705078125, "r": 239.17135620117188, "b": 424.0904235839844, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/11", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 13, "bbox": {"l": 83.31756591796875, "t": 395.9864501953125, "r": 248.873046875, "b": 304.7430114746094, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/503"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/12", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 13, "bbox": {"l": 310.3294372558594, "t": 690.8223266601562, "r": 555.8338623046875, "b": 655.8524780273438, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/13", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 13, "bbox": {"l": 309.9566345214844, "t": 637.385498046875, "r": 555.7466430664062, "b": 607.2774658203125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/14", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 13, "bbox": {"l": 309.9635314941406, "t": 596.2945556640625, "r": 555.7054443359375, "b": 558.4485473632812, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/15", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 13, "bbox": {"l": 309.79150390625, "t": 538.0946044921875, "r": 425.9603271484375, "b": 499.60601806640625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/505"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/16", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 13, "bbox": {"l": 335.2694091796875, "t": 403.53253173828125, "r": 490.081787109375, "b": 354.97760009765625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/17", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 13, "bbox": {"l": 334.9334716796875, "t": 338.0523681640625, "r": 490.0914306640625, "b": 289.2789001464844, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/18", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 13, "bbox": {"l": 335.2545471191406, "t": 272.92431640625, "r": 490.22369384765625, "b": 224.31207275390625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/19", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 13, "bbox": {"l": 333.9573669433594, "t": 198.8865966796875, "r": 518.4768676757812, "b": 126.5096435546875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/506"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/20", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 14, "bbox": {"l": 51.72642135620117, "t": 518.3907470703125, "r": 283.114013671875, "b": 447.7554931640625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/21", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 14, "bbox": {"l": 51.434879302978516, "t": 338.51251220703125, "r": 310.7267150878906, "b": 300.17974853515625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/22", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 14, "bbox": {"l": 50.86823654174805, "t": 287.90374755859375, "r": 310.6080017089844, "b": 249.55401611328125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/23", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 14, "bbox": {"l": 51.27280807495117, "t": 238.271484375, "r": 311.0897216796875, "b": 200.086669921875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/24", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 14, "bbox": {"l": 318.9809265136719, "t": 630.765380859375, "r": 534.6229248046875, "b": 577.3739624023438, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/25", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 14, "bbox": {"l": 319.0057678222656, "t": 565.8936767578125, "r": 534.408935546875, "b": 512.142333984375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/26", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 14, "bbox": {"l": 328.1381530761719, "t": 503.3182067871094, "r": 523.8916015625, "b": 433.7275695800781, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/27", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 14, "bbox": {"l": 319.4707946777344, "t": 361.09698486328125, "r": 518.5693359375, "b": 314.05645751953125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/28", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 14, "bbox": {"l": 319.982666015625, "t": 302.7562561035156, "r": 519.0963745117188, "b": 256.30419921875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/29", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 14, "bbox": {"l": 319.8287658691406, "t": 245.5906982421875, "r": 519.6065673828125, "b": 198.8935546875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/30", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 14, "bbox": {"l": 319.06494140625, "t": 182.1591796875, "r": 533.77392578125, "b": 122.80792236328125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/511"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/31", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 15, "bbox": {"l": 55.116363525390625, "t": 655.7449951171875, "r": 279.370849609375, "b": 542.6654663085938, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/32", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 15, "bbox": {"l": 54.28135299682617, "t": 531.7384033203125, "r": 279.2568359375, "b": 418.4729309082031, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/33", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 15, "bbox": {"l": 50.64818572998047, "t": 286.01953125, "r": 319.9103088378906, "b": 160.736328125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/512"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/34", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 15, "bbox": {"l": 323.0059509277344, "t": 670.452880859375, "r": 525.95166015625, "b": 569.088623046875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/35", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 15, "bbox": {"l": 323.384765625, "t": 550.0270385742188, "r": 526.1268920898438, "b": 447.90789794921875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/36", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 15, "bbox": {"l": 323.46868896484375, "t": 429.5491638183594, "r": 525.9569091796875, "b": 327.739501953125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/37", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 15, "bbox": {"l": 353.6920471191406, "t": 304.594970703125, "r": 495.4288024902344, "b": 156.22674560546875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/514"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}], "key_value_items": [], "pages": {"1": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 1}, "2": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 2}, "3": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 3}, "4": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 4}, "5": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 5}, "6": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 6}, "7": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 7}, "8": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 8}, "9": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 9}, "10": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 10}, "11": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 11}, "12": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 12}, "13": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 13}, "14": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 14}, "15": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 15}, "16": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 16}}} \ No newline at end of file +{"schema_name": "DoclingDocument", "version": "1.0.0", "name": "2203.01017v2", "origin": {"mimetype": "application/pdf", "binary_hash": 10763566541725197878, "filename": "2203.01017v2.pdf", "uri": null}, "furniture": {"self_ref": "#/furniture", "parent": null, "children": [], "name": "_root_", "label": "unspecified"}, "body": {"self_ref": "#/body", "parent": null, "children": [{"cref": "#/texts/0"}, {"cref": "#/texts/1"}, {"cref": "#/texts/2"}, {"cref": "#/groups/0"}, {"cref": "#/texts/4"}, {"cref": "#/texts/5"}, {"cref": "#/texts/6"}, {"cref": "#/texts/7"}, {"cref": "#/pictures/0"}, {"cref": "#/texts/11"}, {"cref": "#/tables/0"}, {"cref": "#/groups/1"}, {"cref": "#/pictures/1"}, {"cref": "#/groups/2"}, {"cref": "#/pictures/2"}, {"cref": "#/texts/63"}, {"cref": "#/tables/1"}, {"cref": "#/texts/64"}, {"cref": "#/texts/65"}, {"cref": "#/texts/66"}, {"cref": "#/texts/67"}, {"cref": "#/texts/68"}, {"cref": "#/texts/69"}, {"cref": "#/texts/70"}, {"cref": "#/groups/3"}, {"cref": "#/texts/75"}, {"cref": "#/texts/76"}, {"cref": "#/texts/77"}, {"cref": "#/texts/78"}, {"cref": "#/texts/79"}, {"cref": "#/texts/80"}, {"cref": "#/texts/81"}, {"cref": "#/texts/82"}, {"cref": "#/texts/83"}, {"cref": "#/texts/84"}, {"cref": "#/texts/85"}, {"cref": "#/texts/86"}, {"cref": "#/texts/87"}, {"cref": "#/texts/88"}, {"cref": "#/texts/89"}, {"cref": "#/texts/90"}, {"cref": "#/pictures/3"}, {"cref": "#/texts/124"}, {"cref": "#/texts/125"}, {"cref": "#/texts/126"}, {"cref": "#/texts/127"}, {"cref": "#/texts/128"}, {"cref": "#/texts/129"}, {"cref": "#/texts/130"}, {"cref": "#/texts/131"}, {"cref": "#/texts/132"}, {"cref": "#/texts/133"}, {"cref": "#/tables/2"}, {"cref": "#/texts/134"}, {"cref": "#/texts/135"}, {"cref": "#/texts/136"}, {"cref": "#/texts/137"}, {"cref": "#/texts/138"}, {"cref": "#/texts/139"}, {"cref": "#/texts/140"}, {"cref": "#/texts/141"}, {"cref": "#/pictures/4"}, {"cref": "#/texts/201"}, {"cref": "#/pictures/5"}, {"cref": "#/texts/246"}, {"cref": "#/texts/247"}, {"cref": "#/texts/248"}, {"cref": "#/texts/249"}, {"cref": "#/texts/250"}, {"cref": "#/texts/251"}, {"cref": "#/texts/252"}, {"cref": "#/texts/253"}, {"cref": "#/texts/254"}, {"cref": "#/texts/255"}, {"cref": "#/texts/256"}, {"cref": "#/texts/257"}, {"cref": "#/texts/258"}, {"cref": "#/texts/259"}, {"cref": "#/texts/260"}, {"cref": "#/texts/261"}, {"cref": "#/texts/262"}, {"cref": "#/texts/263"}, {"cref": "#/texts/264"}, {"cref": "#/texts/265"}, {"cref": "#/texts/266"}, {"cref": "#/texts/267"}, {"cref": "#/texts/268"}, {"cref": "#/texts/269"}, {"cref": "#/texts/270"}, {"cref": "#/texts/271"}, {"cref": "#/texts/272"}, {"cref": "#/texts/273"}, {"cref": "#/texts/274"}, {"cref": "#/texts/275"}, {"cref": "#/texts/276"}, {"cref": "#/texts/277"}, {"cref": "#/tables/3"}, {"cref": "#/texts/278"}, {"cref": "#/texts/279"}, {"cref": "#/texts/280"}, {"cref": "#/texts/281"}, {"cref": "#/texts/282"}, {"cref": "#/tables/4"}, {"cref": "#/texts/283"}, {"cref": "#/texts/284"}, {"cref": "#/tables/5"}, {"cref": "#/groups/4"}, {"cref": "#/texts/287"}, {"cref": "#/texts/288"}, {"cref": "#/pictures/6"}, {"cref": "#/texts/289"}, {"cref": "#/pictures/7"}, {"cref": "#/tables/6"}, {"cref": "#/texts/290"}, {"cref": "#/tables/7"}, {"cref": "#/texts/291"}, {"cref": "#/pictures/8"}, {"cref": "#/pictures/9"}, {"cref": "#/texts/348"}, {"cref": "#/pictures/10"}, {"cref": "#/texts/350"}, {"cref": "#/texts/351"}, {"cref": "#/texts/352"}, {"cref": "#/texts/353"}, {"cref": "#/texts/354"}, {"cref": "#/groups/5"}, {"cref": "#/texts/356"}, {"cref": "#/groups/6"}, {"cref": "#/texts/372"}, {"cref": "#/groups/7"}, {"cref": "#/texts/383"}, {"cref": "#/groups/8"}, {"cref": "#/texts/396"}, {"cref": "#/groups/9"}, {"cref": "#/texts/399"}, {"cref": "#/texts/400"}, {"cref": "#/texts/401"}, {"cref": "#/texts/402"}, {"cref": "#/texts/403"}, {"cref": "#/texts/404"}, {"cref": "#/texts/405"}, {"cref": "#/texts/406"}, {"cref": "#/texts/407"}, {"cref": "#/texts/408"}, {"cref": "#/groups/10"}, {"cref": "#/texts/414"}, {"cref": "#/texts/415"}, {"cref": "#/texts/416"}, {"cref": "#/texts/417"}, {"cref": "#/pictures/11"}, {"cref": "#/groups/11"}, {"cref": "#/texts/479"}, {"cref": "#/texts/480"}, {"cref": "#/groups/12"}, {"cref": "#/texts/486"}, {"cref": "#/texts/487"}, {"cref": "#/groups/13"}, {"cref": "#/texts/489"}, {"cref": "#/groups/14"}, {"cref": "#/texts/494"}, {"cref": "#/groups/15"}, {"cref": "#/texts/499"}, {"cref": "#/texts/500"}, {"cref": "#/texts/501"}, {"cref": "#/texts/502"}, {"cref": "#/tables/8"}, {"cref": "#/tables/9"}, {"cref": "#/tables/10"}, {"cref": "#/texts/503"}, {"cref": "#/tables/11"}, {"cref": "#/texts/504"}, {"cref": "#/tables/12"}, {"cref": "#/tables/13"}, {"cref": "#/tables/14"}, {"cref": "#/pictures/12"}, {"cref": "#/texts/505"}, {"cref": "#/tables/15"}, {"cref": "#/tables/16"}, {"cref": "#/tables/17"}, {"cref": "#/tables/18"}, {"cref": "#/pictures/13"}, {"cref": "#/texts/506"}, {"cref": "#/tables/19"}, {"cref": "#/tables/20"}, {"cref": "#/texts/507"}, {"cref": "#/pictures/14"}, {"cref": "#/tables/21"}, {"cref": "#/tables/22"}, {"cref": "#/tables/23"}, {"cref": "#/texts/508"}, {"cref": "#/pictures/15"}, {"cref": "#/texts/509"}, {"cref": "#/tables/24"}, {"cref": "#/tables/25"}, {"cref": "#/tables/26"}, {"cref": "#/texts/510"}, {"cref": "#/pictures/16"}, {"cref": "#/tables/27"}, {"cref": "#/tables/28"}, {"cref": "#/tables/29"}, {"cref": "#/texts/511"}, {"cref": "#/tables/30"}, {"cref": "#/pictures/17"}, {"cref": "#/tables/31"}, {"cref": "#/pictures/18"}, {"cref": "#/tables/32"}, {"cref": "#/pictures/19"}, {"cref": "#/pictures/20"}, {"cref": "#/texts/512"}, {"cref": "#/tables/33"}, {"cref": "#/texts/513"}, {"cref": "#/tables/34"}, {"cref": "#/tables/35"}, {"cref": "#/pictures/21"}, {"cref": "#/tables/36"}, {"cref": "#/pictures/22"}, {"cref": "#/texts/514"}, {"cref": "#/tables/37"}, {"cref": "#/texts/515"}, {"cref": "#/pictures/23"}, {"cref": "#/texts/516"}], "name": "_root_", "label": "unspecified"}, "groups": [{"self_ref": "#/groups/0", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/3"}], "name": "group", "label": "key_value_area"}, {"self_ref": "#/groups/1", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/12"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/2", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/38"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/3", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/71"}, {"cref": "#/texts/72"}, {"cref": "#/texts/73"}, {"cref": "#/texts/74"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/4", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/285"}, {"cref": "#/texts/286"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/5", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/355"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/6", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/357"}, {"cref": "#/texts/358"}, {"cref": "#/texts/359"}, {"cref": "#/texts/360"}, {"cref": "#/texts/361"}, {"cref": "#/texts/362"}, {"cref": "#/texts/363"}, {"cref": "#/texts/364"}, {"cref": "#/texts/365"}, {"cref": "#/texts/366"}, {"cref": "#/texts/367"}, {"cref": "#/texts/368"}, {"cref": "#/texts/369"}, {"cref": "#/texts/370"}, {"cref": "#/texts/371"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/7", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/373"}, {"cref": "#/texts/374"}, {"cref": "#/texts/375"}, {"cref": "#/texts/376"}, {"cref": "#/texts/377"}, {"cref": "#/texts/378"}, {"cref": "#/texts/379"}, {"cref": "#/texts/380"}, {"cref": "#/texts/381"}, {"cref": "#/texts/382"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/8", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/384"}, {"cref": "#/texts/385"}, {"cref": "#/texts/386"}, {"cref": "#/texts/387"}, {"cref": "#/texts/388"}, {"cref": "#/texts/389"}, {"cref": "#/texts/390"}, {"cref": "#/texts/391"}, {"cref": "#/texts/392"}, {"cref": "#/texts/393"}, {"cref": "#/texts/394"}, {"cref": "#/texts/395"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/9", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/397"}, {"cref": "#/texts/398"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/10", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/409"}, {"cref": "#/texts/410"}, {"cref": "#/texts/411"}, {"cref": "#/texts/412"}, {"cref": "#/texts/413"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/11", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/477"}, {"cref": "#/texts/478"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/12", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/481"}, {"cref": "#/texts/482"}, {"cref": "#/texts/483"}, {"cref": "#/texts/484"}, {"cref": "#/texts/485"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/13", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/488"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/14", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/490"}, {"cref": "#/texts/491"}, {"cref": "#/texts/492"}, {"cref": "#/texts/493"}], "name": "list", "label": "list"}, {"self_ref": "#/groups/15", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/495"}, {"cref": "#/texts/496"}, {"cref": "#/texts/497"}, {"cref": "#/texts/498"}], "name": "list", "label": "list"}], "texts": [{"self_ref": "#/texts/0", "parent": {"cref": "#/body"}, "children": [], "label": "page_header", "prov": [{"page_no": 1, "bbox": {"l": 18.340221405029297, "t": 584.1799926757812, "r": 36.339778900146484, "b": 231.99996948242188, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 38]}], "orig": "arXiv:2203.01017v2 [cs.CV] 11 Mar 2022", "text": "arXiv:2203.01017v2 [cs.CV] 11 Mar 2022"}, {"self_ref": "#/texts/1", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 1, "bbox": {"l": 96.3010025024414, "t": 684.9658813476562, "r": 498.9270935058594, "b": 672.0686645507812, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 61]}], "orig": "TableFormer: Table Structure Understanding with Transformers.", "text": "TableFormer: Table Structure Understanding with Transformers.", "level": 1}, {"self_ref": "#/texts/2", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 1, "bbox": {"l": 142.4770050048828, "t": 645.3146362304688, "r": 452.7502746582031, "b": 620.6796264648438, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 73]}], "orig": "Ahmed Nassar, Nikolaos Livathinos, Maksym Lysak, Peter Staar IBM Research", "text": "Ahmed Nassar, Nikolaos Livathinos, Maksym Lysak, Peter Staar IBM Research", "level": 1}, {"self_ref": "#/texts/3", "parent": {"cref": "#/groups/0"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 208.123, "t": 616.03876, "r": 378.73257, "b": 607.57446, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 35]}], "orig": "{ ahn,nli,mly,taa } @zurich.ibm.com", "text": "{ ahn,nli,mly,taa } @zurich.ibm.com"}, {"self_ref": "#/texts/4", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 1, "bbox": {"l": 145.99497985839844, "t": 576.5170288085938, "r": 190.48028564453125, "b": 565.769287109375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 8]}], "orig": "Abstract", "text": "Abstract", "level": 1}, {"self_ref": "#/texts/5", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 1, "bbox": {"l": 315.5670166015625, "t": 573.9931640625, "r": 408.4407043457031, "b": 565.2451782226562, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 22]}], "orig": "a. Picture of a table:", "text": "a. Picture of a table:", "level": 1}, {"self_ref": "#/texts/6", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 1, "bbox": {"l": 50.111976623535156, "t": 252.05723571777344, "r": 126.94803619384766, "b": 241.30950927734375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 15]}], "orig": "1. Introduction", "text": "1. Introduction", "level": 1}, {"self_ref": "#/texts/7", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 50.111976623535156, "t": 231.216796875, "r": 286.3650817871094, "b": 78.84822082519531, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 712]}], "orig": "The occurrence of tables in documents is ubiquitous. They often summarise quantitative or factual data, which is cumbersome to describe in verbose text but nevertheless extremely valuable. Unfortunately, this compact representation is often not easy to parse by machines. There are many implicit conventions used to obtain a compact table representation. For example, tables often have complex columnand row-headers in order to reduce duplicated cell content. Lines of different shapes and sizes are leveraged to separate content or indicate a tree structure. Additionally, tables can also have empty/missing table-entries or multi-row textual table-entries. Fig. 1 shows a table which presents all these issues.", "text": "The occurrence of tables in documents is ubiquitous. They often summarise quantitative or factual data, which is cumbersome to describe in verbose text but nevertheless extremely valuable. Unfortunately, this compact representation is often not easy to parse by machines. There are many implicit conventions used to obtain a compact table representation. For example, tables often have complex columnand row-headers in order to reduce duplicated cell content. Lines of different shapes and sizes are leveraged to separate content or indicate a tree structure. Additionally, tables can also have empty/missing table-entries or multi-row textual table-entries. Fig. 1 shows a table which presents all these issues."}, {"self_ref": "#/texts/8", "parent": {"cref": "#/pictures/0"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 451.9457100000001, "t": 556.65295, "r": 457.95050000000003, "b": 546.52252, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/9", "parent": {"cref": "#/pictures/0"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 331.19681, "t": 522.64734, "r": 337.2016, "b": 512.51691, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "2", "text": "2"}, {"self_ref": "#/texts/10", "parent": {"cref": "#/pictures/0"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 384.0329, "t": 539.32104, "r": 390.03769, "b": 529.19061, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "3", "text": "3"}, {"self_ref": "#/texts/11", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 1, "bbox": {"l": 50.111976623535156, "t": 550.6049194335938, "r": 286.3651123046875, "b": 279.00335693359375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1320]}], "orig": "Tables organize valuable content in a concise and compact representation. This content is extremely valuable for systems such as search engines, Knowledge Graph's, etc, since they enhance their predictive capabilities. Unfortunately, tables come in a large variety of shapes and sizes. Furthermore, they can have complex column/row-header configurations, multiline rows, different variety of separation lines, missing entries, etc. As such, the correct identification of the table-structure from an image is a nontrivial task. In this paper, we present a new table-structure identification model. The latter improves the latest end-toend deep learning model (i.e. encoder-dual-decoder from PubTabNet) in two significant ways. First, we introduce a new object detection decoder for table-cells. In this way, we can obtain the content of the table-cells from programmatic PDF's directly from the PDF source and avoid the training of the custom OCR decoders. This architectural change leads to more accurate table-content extraction and allows us to tackle non-english tables. Second, we replace the LSTM decoders with transformer based decoders. This upgrade improves significantly the previous state-of-the-art tree-editing-distance-score (TEDS) from 91% to 98.5% on simple tables and from 88.7% to 95% on complex tables.", "text": "Tables organize valuable content in a concise and compact representation. This content is extremely valuable for systems such as search engines, Knowledge Graph's, etc, since they enhance their predictive capabilities. Unfortunately, tables come in a large variety of shapes and sizes. Furthermore, they can have complex column/row-header configurations, multiline rows, different variety of separation lines, missing entries, etc. As such, the correct identification of the table-structure from an image is a nontrivial task. In this paper, we present a new table-structure identification model. The latter improves the latest end-toend deep learning model (i.e. encoder-dual-decoder from PubTabNet) in two significant ways. First, we introduce a new object detection decoder for table-cells. In this way, we can obtain the content of the table-cells from programmatic PDF's directly from the PDF source and avoid the training of the custom OCR decoders. This architectural change leads to more accurate table-content extraction and allows us to tackle non-english tables. Second, we replace the LSTM decoders with transformer based decoders. This upgrade improves significantly the previous state-of-the-art tree-editing-distance-score (TEDS) from 91% to 98.5% on simple tables and from 88.7% to 95% on complex tables."}, {"self_ref": "#/texts/12", "parent": {"cref": "#/groups/1"}, "children": [], "label": "list_item", "prov": [{"page_no": 1, "bbox": {"l": 315.5670166015625, "t": 478.3052062988281, "r": 486.4019470214844, "b": 458.7572021484375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 68]}], "orig": "b. Red-annotation of bounding boxes, Blue-predictions by TableFormer", "text": "b. Red-annotation of bounding boxes, Blue-predictions by TableFormer", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/13", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 408.14752, "t": 449.17172, "r": 412.54001, "b": 440.38678, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/14", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 356.11011, "t": 450.42783, "r": 360.50259, "b": 441.64288, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/15", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 500.6777, "t": 451.06232, "r": 505.0701900000001, "b": 442.2773700000001, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "2", "text": "2"}, {"self_ref": "#/texts/16", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 356.13382, "t": 440.25211, "r": 360.52631, "b": 431.46716, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "3", "text": "3"}, {"self_ref": "#/texts/17", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 402.53992, "t": 436.1235, "r": 406.9324, "b": 427.33856, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "4", "text": "4"}, {"self_ref": "#/texts/18", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 448.58178999999996, "t": 439.15982, "r": 452.97427, "b": 430.37488, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "5", "text": "5"}, {"self_ref": "#/texts/19", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 491.65161000000006, "t": 438.29343, "r": 496.0441, "b": 429.50848, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "6", "text": "6"}, {"self_ref": "#/texts/20", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 535.13843, "t": 438.66031, "r": 539.53088, "b": 429.87537, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "7", "text": "7"}, {"self_ref": "#/texts/21", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 348.82822, "t": 404.90219, "r": 353.2207, "b": 396.11725, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "8", "text": "8"}, {"self_ref": "#/texts/22", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 389.27151, "t": 416.62772, "r": 393.664, "b": 407.84277, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "9", "text": "9"}, {"self_ref": "#/texts/23", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 442.67479999999995, "t": 416.35379, "r": 451.45889000000005, "b": 407.56885, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "10", "text": "10"}, {"self_ref": "#/texts/24", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 477.4382299999999, "t": 416.466, "r": 485.90167, "b": 407.68105999999995, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "11", "text": "11"}, {"self_ref": "#/texts/25", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 522.57263, "t": 416.35379, "r": 531.35669, "b": 407.56885, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "12", "text": "12"}, {"self_ref": "#/texts/26", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 400.22992, "t": 404.88571, "r": 409.01401, "b": 396.10077, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "13", "text": "13"}, {"self_ref": "#/texts/27", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 442.30792, "t": 405.01018999999997, "r": 451.0920100000001, "b": 396.22524999999996, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "14", "text": "14"}, {"self_ref": "#/texts/28", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 478.21941999999996, "t": 404.62531, "r": 487.00351000000006, "b": 395.84036, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "15", "text": "15"}, {"self_ref": "#/texts/29", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 523.2287, "t": 405.01018999999997, "r": 532.01276, "b": 396.22524999999996, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "16", "text": "16"}, {"self_ref": "#/texts/30", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 411.57233, "t": 392.57523, "r": 415.96481, "b": 383.79028, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/31", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 415.96393, "t": 392.57523, "r": 420.35641, "b": 383.79028, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "7", "text": "7"}, {"self_ref": "#/texts/32", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 442.30521, "t": 392.9628000000001, "r": 451.08929, "b": 384.17786000000007, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "18", "text": "18"}, {"self_ref": "#/texts/33", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 478.77893, "t": 393.00360000000006, "r": 487.56302, "b": 384.21866000000006, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "19", "text": "19"}, {"self_ref": "#/texts/34", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 523.97241, "t": 393.3885200000001, "r": 532.75647, "b": 384.60358, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "20", "text": "20"}, {"self_ref": "#/texts/35", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 385.09399, "t": 434.23969000000005, "r": 391.09879, "b": 424.10928, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "3", "text": "3"}, {"self_ref": "#/texts/36", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 333.43451, "t": 411.2735, "r": 339.4393, "b": 401.14310000000006, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "2", "text": "2"}, {"self_ref": "#/texts/37", "parent": {"cref": "#/pictures/1"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 478.07210999999995, "t": 450.9631999999999, "r": 484.0769, "b": 440.83279000000005, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/38", "parent": {"cref": "#/groups/2"}, "children": [], "label": "list_item", "prov": [{"page_no": 1, "bbox": {"l": 315.5670166015625, "t": 371.81719970703125, "r": 491.1912536621094, "b": 363.0691833496094, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 38]}], "orig": "c. Structure predicted by TableFormer:", "text": "c. Structure predicted by TableFormer:", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/39", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 347.24872, "t": 354.31412, "r": 351.6412, "b": 345.52917, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/40", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 318.88071, "t": 354.31412, "r": 323.27319, "b": 345.52917, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/41", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 394.10422, "t": 354.31412, "r": 398.4967, "b": 345.52917, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "2", "text": "2"}, {"self_ref": "#/texts/42", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 318.77316, "t": 342.4545, "r": 323.16565, "b": 333.66956, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "3", "text": "3"}, {"self_ref": "#/texts/43", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 347.24872, "t": 342.4545, "r": 351.6412, "b": 333.66956, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "4", "text": "4"}, {"self_ref": "#/texts/44", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 394.10422, "t": 342.4545, "r": 398.4967, "b": 333.66956, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "5", "text": "5"}, {"self_ref": "#/texts/45", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 440.95941000000005, "t": 342.4545, "r": 445.3519, "b": 333.66956, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "6", "text": "6"}, {"self_ref": "#/texts/46", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 487.81491, "t": 342.4545, "r": 492.2074, "b": 333.66956, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "7", "text": "7"}, {"self_ref": "#/texts/47", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 318.77316, "t": 318.29575, "r": 323.16565, "b": 309.5108, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "8", "text": "8"}, {"self_ref": "#/texts/48", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 347.24872, "t": 330.1554, "r": 351.6412, "b": 321.37045, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "9", "text": "9"}, {"self_ref": "#/texts/49", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 394.10422, "t": 330.1554, "r": 402.88831, "b": 321.37045, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "10", "text": "10"}, {"self_ref": "#/texts/50", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 440.95941000000005, "t": 330.1554, "r": 449.42285, "b": 321.37045, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "11", "text": "11"}, {"self_ref": "#/texts/51", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 487.81491, "t": 330.1554, "r": 496.599, "b": 321.37045, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "12", "text": "12"}, {"self_ref": "#/texts/52", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 347.24872, "t": 318.29575, "r": 356.03281, "b": 309.5108, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "13", "text": "13"}, {"self_ref": "#/texts/53", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 394.10422, "t": 318.29575, "r": 402.88831, "b": 309.5108, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "14", "text": "14"}, {"self_ref": "#/texts/54", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 440.95941000000005, "t": 318.29575, "r": 449.7435, "b": 309.5108, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "15", "text": "15"}, {"self_ref": "#/texts/55", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 487.81491, "t": 318.29575, "r": 496.599, "b": 309.5108, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "16", "text": "16"}, {"self_ref": "#/texts/56", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 347.24872, "t": 306.87531, "r": 356.03281, "b": 298.09036, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "17", "text": "17"}, {"self_ref": "#/texts/57", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 394.10422, "t": 306.87531, "r": 402.88831, "b": 298.09036, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "18", "text": "18"}, {"self_ref": "#/texts/58", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 440.95941000000005, "t": 306.87531, "r": 449.7435, "b": 298.09036, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "19", "text": "19"}, {"self_ref": "#/texts/59", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 487.81491, "t": 306.87531, "r": 496.599, "b": 298.09036, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "20", "text": "20"}, {"self_ref": "#/texts/60", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 366.70102, "t": 342.87918, "r": 372.70581, "b": 332.74878, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "3", "text": "3"}, {"self_ref": "#/texts/61", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 331.90424, "t": 318.67709, "r": 337.90903, "b": 308.54669, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "2", "text": "2"}, {"self_ref": "#/texts/62", "parent": {"cref": "#/pictures/2"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 459.87621999999993, "t": 354.4064, "r": 465.88101, "b": 344.276, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/63", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 1, "bbox": {"l": 308.86199951171875, "t": 277.4996337890625, "r": 545.1151733398438, "b": 232.7270965576172, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 220]}], "orig": "Figure 1: Picture of a table with subtle, complex features such as (1) multi-column headers, (2) cell with multi-row text and (3) cells with no content. Image from PubTabNet evaluation set, filename: 'PMC2944238 004 02'.", "text": "Figure 1: Picture of a table with subtle, complex features such as (1) multi-column headers, (2) cell with multi-row text and (3) cells with no content. Image from PubTabNet evaluation set, filename: 'PMC2944238 004 02'."}, {"self_ref": "#/texts/64", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 308.86199951171875, "t": 207.59063720703125, "r": 545.1151733398438, "b": 126.95307159423828, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 363]}], "orig": "Recently, significant progress has been made with vision based approaches to extract tables in documents. For the sake of completeness, the issue of table extraction from documents is typically decomposed into two separate challenges, i.e. (1) finding the location of the table(s) on a document-page and (2) finding the structure of a given table in the document.", "text": "Recently, significant progress has been made with vision based approaches to extract tables in documents. For the sake of completeness, the issue of table extraction from documents is typically decomposed into two separate challenges, i.e. (1) finding the location of the table(s) on a document-page and (2) finding the structure of a given table in the document."}, {"self_ref": "#/texts/65", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 308.86199951171875, "t": 123.61963653564453, "r": 545.1151123046875, "b": 78.84806823730469, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 229]}], "orig": "The first problem is called table-location and has been previously addressed [30, 38, 19, 21, 23, 26, 8] with stateof-the-art object-detection networks (e.g. YOLO and later on Mask-RCNN [9]). For all practical purposes, it can be", "text": "The first problem is called table-location and has been previously addressed [30, 38, 19, 21, 23, 26, 8] with stateof-the-art object-detection networks (e.g. YOLO and later on Mask-RCNN [9]). For all practical purposes, it can be"}, {"self_ref": "#/texts/66", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 1, "bbox": {"l": 295.1210021972656, "t": 57.866634368896484, "r": 300.102294921875, "b": 48.9600715637207, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/67", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 50.11199951171875, "t": 716.7916259765625, "r": 286.36505126953125, "b": 695.9300537109375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 75]}], "orig": "considered as a solved problem, given enough ground-truth data to train on.", "text": "considered as a solved problem, given enough ground-truth data to train on."}, {"self_ref": "#/texts/68", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 50.11199951171875, "t": 692.4285888671875, "r": 286.3651428222656, "b": 563.9699096679688, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 626]}], "orig": "The second problem is called table-structure decomposition. The latter is a long standing problem in the community of document understanding [6, 4, 14]. Contrary to the table-location problem, there are no commonly used approaches that can easily be re-purposed to solve this problem. Lately, a set of new model-architectures has been proposed by the community to address table-structure decomposition [37, 36, 18, 20]. All these models have some weaknesses (see Sec. 2). The common denominator here is the reliance on textual features and/or the inability to provide the bounding box of each table-cell in the original image.", "text": "The second problem is called table-structure decomposition. The latter is a long standing problem in the community of document understanding [6, 4, 14]. Contrary to the table-location problem, there are no commonly used approaches that can easily be re-purposed to solve this problem. Lately, a set of new model-architectures has been proposed by the community to address table-structure decomposition [37, 36, 18, 20]. All these models have some weaknesses (see Sec. 2). The common denominator here is the reliance on textual features and/or the inability to provide the bounding box of each table-cell in the original image."}, {"self_ref": "#/texts/69", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 50.11199951171875, "t": 560.4684448242188, "r": 286.3651123046875, "b": 420.054931640625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 643]}], "orig": "In this paper, we want to address these weaknesses and present a robust table-structure decomposition algorithm. The design criteria for our model are the following. First, we want our algorithm to be language agnostic. In this way, we can obtain the structure of any table, irregardless of the language. Second, we want our algorithm to leverage as much data as possible from the original PDF document. For programmatic PDF documents, the text-cells can often be extracted much faster and with higher accuracy compared to OCR methods. Last but not least, we want to have a direct link between the table-cell and its bounding box in the image.", "text": "In this paper, we want to address these weaknesses and present a robust table-structure decomposition algorithm. The design criteria for our model are the following. First, we want our algorithm to be language agnostic. In this way, we can obtain the structure of any table, irregardless of the language. Second, we want our algorithm to leverage as much data as possible from the original PDF document. For programmatic PDF documents, the text-cells can often be extracted much faster and with higher accuracy compared to OCR methods. Last but not least, we want to have a direct link between the table-cell and its bounding box in the image."}, {"self_ref": "#/texts/70", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 50.11199951171875, "t": 416.5534973144531, "r": 286.3665771484375, "b": 359.8269958496094, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 242]}], "orig": "To meet the design criteria listed above, we developed a new model called TableFormer and a synthetically generated table structure dataset called SynthTabNet $^{1}$. In particular, our contributions in this work can be summarised as follows:", "text": "To meet the design criteria listed above, we developed a new model called TableFormer and a synthetically generated table structure dataset called SynthTabNet $^{1}$. In particular, our contributions in this work can be summarised as follows:"}, {"self_ref": "#/texts/71", "parent": {"cref": "#/groups/3"}, "children": [], "label": "list_item", "prov": [{"page_no": 2, "bbox": {"l": 61.56901550292969, "t": 347.568115234375, "r": 286.3648986816406, "b": 302.6770324707031, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 166]}], "orig": "\u00b7 We propose TableFormer , a transformer based model that predicts tables structure and bounding boxes for the table content simultaneously in an end-to-end approach.", "text": "\u00b7 We propose TableFormer , a transformer based model that predicts tables structure and bounding boxes for the table content simultaneously in an end-to-end approach.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/72", "parent": {"cref": "#/groups/3"}, "children": [], "label": "list_item", "prov": [{"page_no": 2, "bbox": {"l": 61.56901550292969, "t": 289.9661560058594, "r": 286.3648986816406, "b": 245.0740509033203, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 181]}], "orig": "\u00b7 Across all benchmark datasets TableFormer significantly outperforms existing state-of-the-art metrics, while being much more efficient in training and inference to existing works.", "text": "\u00b7 Across all benchmark datasets TableFormer significantly outperforms existing state-of-the-art metrics, while being much more efficient in training and inference to existing works.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/73", "parent": {"cref": "#/groups/3"}, "children": [], "label": "list_item", "prov": [{"page_no": 2, "bbox": {"l": 61.569000244140625, "t": 232.3631591796875, "r": 286.36492919921875, "b": 199.4270477294922, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 106]}], "orig": "\u00b7 We present SynthTabNet a synthetically generated dataset, with various appearance styles and complexity.", "text": "\u00b7 We present SynthTabNet a synthetically generated dataset, with various appearance styles and complexity.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/74", "parent": {"cref": "#/groups/3"}, "children": [], "label": "list_item", "prov": [{"page_no": 2, "bbox": {"l": 61.569007873535156, "t": 186.5966033935547, "r": 286.3650817871094, "b": 153.779052734375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 131]}], "orig": "\u00b7 An augmented dataset based on PubTabNet [37], FinTabNet [36], and TableBank [17] with generated ground-truth for reproducibility.", "text": "\u00b7 An augmented dataset based on PubTabNet [37], FinTabNet [36], and TableBank [17] with generated ground-truth for reproducibility.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/75", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 50.11200714111328, "t": 141.401611328125, "r": 286.3651123046875, "b": 96.63004302978516, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 231]}], "orig": "The paper is structured as follows. In Sec. 2, we give a brief overview of the current state-of-the-art. In Sec. 3, we describe the datasets on which we train. In Sec. 4, we introduce the TableFormer model-architecture and describe", "text": "The paper is structured as follows. In Sec. 2, we give a brief overview of the current state-of-the-art. In Sec. 3, we describe the datasets on which we train. In Sec. 4, we introduce the TableFormer model-architecture and describe"}, {"self_ref": "#/texts/76", "parent": {"cref": "#/body"}, "children": [], "label": "footnote", "prov": [{"page_no": 2, "bbox": {"l": 60.97100067138672, "t": 86.40372467041016, "r": 183.7305450439453, "b": 79.27845764160156, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 40]}], "orig": "$^{1}$https://github.com/IBM/SynthTabNet", "text": "$^{1}$https://github.com/IBM/SynthTabNet"}, {"self_ref": "#/texts/77", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 2, "bbox": {"l": 295.1210021972656, "t": 57.86671829223633, "r": 300.102294921875, "b": 48.96015548706055, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "2", "text": "2"}, {"self_ref": "#/texts/78", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 308.86199951171875, "t": 716.7916259765625, "r": 545.1151123046875, "b": 683.9750366210938, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 166]}], "orig": "its results & performance in Sec. 5. As a conclusion, we describe how this new model-architecture can be re-purposed for other tasks in the computer-vision community.", "text": "its results & performance in Sec. 5. As a conclusion, we describe how this new model-architecture can be re-purposed for other tasks in the computer-vision community."}, {"self_ref": "#/texts/79", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 2, "bbox": {"l": 308.86199951171875, "t": 670.26806640625, "r": 498.28021240234375, "b": 659.5203247070312, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 37]}], "orig": "2. Previous work and State of the Art", "text": "2. Previous work and State of the Art", "level": 1}, {"self_ref": "#/texts/80", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 308.86199951171875, "t": 649.7786254882812, "r": 545.1151733398438, "b": 461.54498291015625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 901]}], "orig": "Identifying the structure of a table has been an outstanding problem in the document-parsing community, that motivates many organised public challenges [6, 4, 14]. The difficulty of the problem can be attributed to a number of factors. First, there is a large variety in the shapes and sizes of tables. Such large variety requires a flexible method. This is especially true for complex column- and row headers, which can be extremely intricate and demanding. A second factor of complexity is the lack of data with regard to table-structure. Until the publication of PubTabNet [37], there were no large datasets (i.e. > 100 K tables) that provided structure information. This happens primarily due to the fact that tables are notoriously time-consuming to annotate by hand. However, this has definitely changed in recent years with the deliverance of PubTabNet [37], FinTabNet [36], TableBank [17] etc.", "text": "Identifying the structure of a table has been an outstanding problem in the document-parsing community, that motivates many organised public challenges [6, 4, 14]. The difficulty of the problem can be attributed to a number of factors. First, there is a large variety in the shapes and sizes of tables. Such large variety requires a flexible method. This is especially true for complex column- and row headers, which can be extremely intricate and demanding. A second factor of complexity is the lack of data with regard to table-structure. Until the publication of PubTabNet [37], there were no large datasets (i.e. > 100 K tables) that provided structure information. This happens primarily due to the fact that tables are notoriously time-consuming to annotate by hand. However, this has definitely changed in recent years with the deliverance of PubTabNet [37], FinTabNet [36], TableBank [17] etc."}, {"self_ref": "#/texts/81", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 308.86199951171875, "t": 458.4305419921875, "r": 545.115234375, "b": 341.9270935058594, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 552]}], "orig": "Before the rising popularity of deep neural networks, the community relied heavily on heuristic and/or statistical methods to do table structure identification [3, 7, 11, 5, 13, 28]. Although such methods work well on constrained tables [12], a more data-driven approach can be applied due to the advent of convolutional neural networks (CNNs) and the availability of large datasets. To the best-of-our knowledge, there are currently two different types of network architecture that are being pursued for state-of-the-art tablestructure identification.", "text": "Before the rising popularity of deep neural networks, the community relied heavily on heuristic and/or statistical methods to do table structure identification [3, 7, 11, 5, 13, 28]. Although such methods work well on constrained tables [12], a more data-driven approach can be applied due to the advent of convolutional neural networks (CNNs) and the availability of large datasets. To the best-of-our knowledge, there are currently two different types of network architecture that are being pursued for state-of-the-art tablestructure identification."}, {"self_ref": "#/texts/82", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 308.8619689941406, "t": 338.9322204589844, "r": 545.1168823242188, "b": 78.84815216064453, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1262]}], "orig": "Image-to-Text networks : In this type of network, one predicts a sequence of tokens starting from an encoded image. Such sequences of tokens can be HTML table tags [37, 17] or LaTeX symbols[10]. The choice of symbols is ultimately not very important, since one can be transformed into the other. There are however subtle variations in the Image-to-Text networks. The easiest network architectures are \"image-encoder \u2192 text-decoder\" (IETD), similar to network architectures that try to provide captions to images [32]. In these IETD networks, one expects as output the LaTeX/HTML string of the entire table, i.e. the symbols necessary for creating the table with the content of the table. Another approach is the \"image-encoder \u2192 dual decoder\" (IEDD) networks. In these type of networks, one has two consecutive decoders with different purposes. The first decoder is the tag-decoder , i.e. it only produces the HTML/LaTeX tags which construct an empty table. The second content-decoder uses the encoding of the image in combination with the output encoding of each cell-tag (from the tag-decoder ) to generate the textual content of each table cell. The network architecture of IEDD is certainly more elaborate, but it has the advantage that one can pre-train the", "text": "Image-to-Text networks : In this type of network, one predicts a sequence of tokens starting from an encoded image. Such sequences of tokens can be HTML table tags [37, 17] or LaTeX symbols[10]. The choice of symbols is ultimately not very important, since one can be transformed into the other. There are however subtle variations in the Image-to-Text networks. The easiest network architectures are \"image-encoder \u2192 text-decoder\" (IETD), similar to network architectures that try to provide captions to images [32]. In these IETD networks, one expects as output the LaTeX/HTML string of the entire table, i.e. the symbols necessary for creating the table with the content of the table. Another approach is the \"image-encoder \u2192 dual decoder\" (IEDD) networks. In these type of networks, one has two consecutive decoders with different purposes. The first decoder is the tag-decoder , i.e. it only produces the HTML/LaTeX tags which construct an empty table. The second content-decoder uses the encoding of the image in combination with the output encoding of each cell-tag (from the tag-decoder ) to generate the textual content of each table cell. The network architecture of IEDD is certainly more elaborate, but it has the advantage that one can pre-train the"}, {"self_ref": "#/texts/83", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 50.11199951171875, "t": 716.7916259765625, "r": 250.15101623535156, "b": 707.8850708007812, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 51]}], "orig": "tag-decoder which is constrained to the table-tags.", "text": "tag-decoder which is constrained to the table-tags."}, {"self_ref": "#/texts/84", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 50.11199951171875, "t": 704.7806396484375, "r": 286.3651428222656, "b": 516.5458984375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 864]}], "orig": "In practice, both network architectures (IETD and IEDD) require an implicit, custom trained object-characterrecognition (OCR) to obtain the content of the table-cells. In the case of IETD, this OCR engine is implicit in the decoder similar to [24]. For the IEDD, the OCR is solely embedded in the content-decoder. This reliance on a custom, implicit OCR decoder is of course problematic. OCR is a well known and extremely tough problem, that often needs custom training for each individual language. However, the limited availability for non-english content in the current datasets, makes it impractical to apply the IETD and IEDD methods on tables with other languages. Additionally, OCR can be completely omitted if the tables originate from programmatic PDF documents with known positions of each cell. The latter was the inspiration for the work of this paper.", "text": "In practice, both network architectures (IETD and IEDD) require an implicit, custom trained object-characterrecognition (OCR) to obtain the content of the table-cells. In the case of IETD, this OCR engine is implicit in the decoder similar to [24]. For the IEDD, the OCR is solely embedded in the content-decoder. This reliance on a custom, implicit OCR decoder is of course problematic. OCR is a well known and extremely tough problem, that often needs custom training for each individual language. However, the limited availability for non-english content in the current datasets, makes it impractical to apply the IETD and IEDD methods on tables with other languages. Additionally, OCR can be completely omitted if the tables originate from programmatic PDF documents with known positions of each cell. The latter was the inspiration for the work of this paper."}, {"self_ref": "#/texts/85", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 50.11199188232422, "t": 513.56103515625, "r": 286.3651123046875, "b": 301.297119140625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1007]}], "orig": "Graph Neural networks : Graph Neural networks (GNN's) take a radically different approach to tablestructure extraction. Note that one table cell can constitute out of multiple text-cells. To obtain the table-structure, one creates an initial graph, where each of the text-cells becomes a node in the graph similar to [33, 34, 2]. Each node is then associated with en embedding vector coming from the encoded image, its coordinates and the encoded text. Furthermore, nodes that represent adjacent text-cells are linked. Graph Convolutional Networks (GCN's) based methods take the image as an input, but also the position of the text-cells and their content [18]. The purpose of a GCN is to transform the input graph into a new graph, which replaces the old links with new ones. The new links then represent the table-structure. With this approach, one can avoid the need to build custom OCR decoders. However, the quality of the reconstructed structure is not comparable to the current state-of-the-art [18].", "text": "Graph Neural networks : Graph Neural networks (GNN's) take a radically different approach to tablestructure extraction. Note that one table cell can constitute out of multiple text-cells. To obtain the table-structure, one creates an initial graph, where each of the text-cells becomes a node in the graph similar to [33, 34, 2]. Each node is then associated with en embedding vector coming from the encoded image, its coordinates and the encoded text. Furthermore, nodes that represent adjacent text-cells are linked. Graph Convolutional Networks (GCN's) based methods take the image as an input, but also the position of the text-cells and their content [18]. The purpose of a GCN is to transform the input graph into a new graph, which replaces the old links with new ones. The new links then represent the table-structure. With this approach, one can avoid the need to build custom OCR decoders. However, the quality of the reconstructed structure is not comparable to the current state-of-the-art [18]."}, {"self_ref": "#/texts/86", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 50.11198425292969, "t": 298.3112487792969, "r": 286.36627197265625, "b": 169.733154296875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 619]}], "orig": "Hybrid Deep Learning-Rule-Based approach : A popular current model for table-structure identification is the use of a hybrid Deep Learning-Rule-Based approach similar to [27, 29]. In this approach, one first detects the position of the table-cells with object detection (e.g. YoloVx or MaskRCNN), then classifies the table into different types (from its images) and finally uses different rule-sets to obtain its table-structure. Currently, this approach achieves stateof-the-art results, but is not an end-to-end deep-learning method. As such, new rules need to be written if different types of tables are encountered.", "text": "Hybrid Deep Learning-Rule-Based approach : A popular current model for table-structure identification is the use of a hybrid Deep Learning-Rule-Based approach similar to [27, 29]. In this approach, one first detects the position of the table-cells with object detection (e.g. YoloVx or MaskRCNN), then classifies the table into different types (from its images) and finally uses different rule-sets to obtain its table-structure. Currently, this approach achieves stateof-the-art results, but is not an end-to-end deep-learning method. As such, new rules need to be written if different types of tables are encountered."}, {"self_ref": "#/texts/87", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 3, "bbox": {"l": 50.11198425292969, "t": 156.05516052246094, "r": 105.22545623779297, "b": 145.30743408203125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 11]}], "orig": "3. Datasets", "text": "3. Datasets", "level": 1}, {"self_ref": "#/texts/88", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 50.11198425292969, "t": 135.57470703125, "r": 286.3650817871094, "b": 78.84813690185547, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 281]}], "orig": "We rely on large-scale datasets such as PubTabNet [37], FinTabNet [36], and TableBank [17] datasets to train and evaluate our models. These datasets span over various appearance styles and content. We also introduce our own synthetically generated SynthTabNet dataset to fix an im-", "text": "We rely on large-scale datasets such as PubTabNet [37], FinTabNet [36], and TableBank [17] datasets to train and evaluate our models. These datasets span over various appearance styles and content. We also introduce our own synthetically generated SynthTabNet dataset to fix an im-"}, {"self_ref": "#/texts/89", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 3, "bbox": {"l": 295.1210021972656, "t": 57.86680221557617, "r": 300.102294921875, "b": 48.96023941040039, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "3", "text": "3"}, {"self_ref": "#/texts/90", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 3, "bbox": {"l": 308.86199951171875, "t": 524.1636352539062, "r": 545.1151123046875, "b": 503.3020935058594, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 104]}], "orig": "Figure 2: Distribution of the tables across different table dimensions in PubTabNet + FinTabNet datasets", "text": "Figure 2: Distribution of the tables across different table dimensions in PubTabNet + FinTabNet datasets"}, {"self_ref": "#/texts/91", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "section_header", "prov": [{"page_no": 3, "bbox": {"l": 380.79849, "t": 712.1882300000001, "r": 486.84909, "b": 703.44025, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 21]}], "orig": "PubTabNet + FinTabNet", "text": "PubTabNet + FinTabNet", "level": 1}, {"self_ref": "#/texts/92", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 396.76776, "t": 549.97302, "r": 469.78748, "b": 541.22504, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 14]}], "orig": "Rows / Columns", "text": "Rows / Columns"}, {"self_ref": "#/texts/93", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 320.97653, "t": 558.57703, "r": 324.79254, "b": 552.745, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/94", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 410.483, "t": 558.57703, "r": 418.11319, "b": 552.745, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "20", "text": "20"}, {"self_ref": "#/texts/95", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 500.84949, "t": 558.57703, "r": 508.47968000000003, "b": 552.745, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "40", "text": "40"}, {"self_ref": "#/texts/96", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 365.29999, "t": 558.57703, "r": 372.93018, "b": 552.745, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "10", "text": "10"}, {"self_ref": "#/texts/97", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 455.66626, "t": 558.57703, "r": 463.29645, "b": 552.745, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "30", "text": "30"}, {"self_ref": "#/texts/98", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 542.03528, "t": 558.57703, "r": 549.66547, "b": 552.745, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "50", "text": "50"}, {"self_ref": "#/texts/99", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 316.04474, "t": 561.55383, "r": 319.86075, "b": 555.7218, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/100", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 312.62521, "t": 593.30927, "r": 316.44122, "b": 587.47723, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "2", "text": "2"}, {"self_ref": "#/texts/101", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 316.43942, "t": 593.30927, "r": 320.2554, "b": 587.47723, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/102", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 313.14951, "t": 623.90204, "r": 316.96552, "b": 618.07001, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "4", "text": "4"}, {"self_ref": "#/texts/103", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 316.96371, "t": 623.90204, "r": 320.77969, "b": 618.07001, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/104", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 312.92972, "t": 655.41229, "r": 316.74573, "b": 649.58026, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "6", "text": "6"}, {"self_ref": "#/texts/105", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 316.74393, "t": 655.41229, "r": 320.55991, "b": 649.58026, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/106", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 312.48227, "t": 686.39825, "r": 316.29828, "b": 680.56622, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "8", "text": "8"}, {"self_ref": "#/texts/107", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 316.29648, "t": 686.39825, "r": 320.11246, "b": 680.56622, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/108", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 312.48227, "t": 579.74078, "r": 316.29828, "b": 573.90875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/109", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 316.29648, "t": 579.74078, "r": 320.11246, "b": 573.90875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/110", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 313.07639, "t": 608.27802, "r": 316.8924, "b": 602.44598, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "3", "text": "3"}, {"self_ref": "#/texts/111", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 316.89059, "t": 608.27802, "r": 320.70657, "b": 602.44598, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/112", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 312.76321, "t": 639.526, "r": 316.57922, "b": 633.69397, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "5", "text": "5"}, {"self_ref": "#/texts/113", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 316.57742, "t": 639.526, "r": 320.3934, "b": 633.69397, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/114", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 312.19775, "t": 671.4295, "r": 316.01376, "b": 665.59747, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "7", "text": "7"}, {"self_ref": "#/texts/115", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 316.01196, "t": 671.4295, "r": 319.82794, "b": 665.59747, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/116", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 312.8165, "t": 701.8913, "r": 316.63251, "b": 696.05927, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "9", "text": "9"}, {"self_ref": "#/texts/117", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 316.63071, "t": 701.8913, "r": 320.44669, "b": 696.05927, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/118", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 532.17426, "t": 569.27271, "r": 536.94427, "b": 561.98273, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/119", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 532.87952, "t": 683.7329700000001, "r": 547.61249, "b": 676.44299, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "10K", "text": "10K"}, {"self_ref": "#/texts/120", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 532.7735, "t": 661.21899, "r": 542.73877, "b": 653.92902, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "8K", "text": "8K"}, {"self_ref": "#/texts/121", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 532.79901, "t": 638.07648, "r": 542.76428, "b": 630.7865, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "6K", "text": "6K"}, {"self_ref": "#/texts/122", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 532.5705, "t": 615.242, "r": 542.53577, "b": 607.95203, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "4K", "text": "4K"}, {"self_ref": "#/texts/123", "parent": {"cref": "#/pictures/3"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 532.14551, "t": 592.3537, "r": 542.11078, "b": 585.06372, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "2K", "text": "2K"}, {"self_ref": "#/texts/124", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 308.86199951171875, "t": 474.5266418457031, "r": 437.27001953125, "b": 465.6200866699219, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 33]}], "orig": "balance in the previous datasets.", "text": "balance in the previous datasets."}, {"self_ref": "#/texts/125", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 308.86199951171875, "t": 460.4686279296875, "r": 545.1151733398438, "b": 164.6382598876953, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1400]}], "orig": "The PubTabNet dataset contains 509k tables delivered as annotated PNG images. The annotations consist of the table structure represented in HTML format, the tokenized text and its bounding boxes per table cell. Fig. 1 shows the appearance style of PubTabNet. Depending on its complexity, a table is characterized as \"simple\" when it does not contain row spans or column spans, otherwise it is \"complex\". The dataset is divided into Train and Val splits (roughly 98% and 2%). The Train split consists of 54% simple and 46% complex tables and the Val split of 51% and 49% respectively. The FinTabNet dataset contains 112k tables delivered as single-page PDF documents with mixed table structures and text content. Similarly to the PubTabNet, the annotations of FinTabNet include the table structure in HTML, the tokenized text and the bounding boxes on a table cell basis. The dataset is divided into Train, Test and Val splits (81%, 9.5%, 9.5%), and each one is almost equally divided into simple and complex tables (Train: 48% simple, 52% complex, Test: 48% simple, 52% complex, Test: 53% simple, 47% complex). Finally the TableBank dataset consists of 145k tables provided as JPEG images. The latter has annotations for the table structure, but only few with bounding boxes of the table cells. The entire dataset consists of simple tables and it is divided into 90% Train, 3% Test and 7% Val splits.", "text": "The PubTabNet dataset contains 509k tables delivered as annotated PNG images. The annotations consist of the table structure represented in HTML format, the tokenized text and its bounding boxes per table cell. Fig. 1 shows the appearance style of PubTabNet. Depending on its complexity, a table is characterized as \"simple\" when it does not contain row spans or column spans, otherwise it is \"complex\". The dataset is divided into Train and Val splits (roughly 98% and 2%). The Train split consists of 54% simple and 46% complex tables and the Val split of 51% and 49% respectively. The FinTabNet dataset contains 112k tables delivered as single-page PDF documents with mixed table structures and text content. Similarly to the PubTabNet, the annotations of FinTabNet include the table structure in HTML, the tokenized text and the bounding boxes on a table cell basis. The dataset is divided into Train, Test and Val splits (81%, 9.5%, 9.5%), and each one is almost equally divided into simple and complex tables (Train: 48% simple, 52% complex, Test: 48% simple, 52% complex, Test: 53% simple, 47% complex). Finally the TableBank dataset consists of 145k tables provided as JPEG images. The latter has annotations for the table structure, but only few with bounding boxes of the table cells. The entire dataset consists of simple tables and it is divided into 90% Train, 3% Test and 7% Val splits."}, {"self_ref": "#/texts/126", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 3, "bbox": {"l": 308.86199951171875, "t": 159.48580932617188, "r": 545.1151123046875, "b": 78.84823608398438, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 406]}], "orig": "Due to the heterogeneity across the dataset formats, it was necessary to combine all available data into one homogenized dataset before we could train our models for practical purposes. Given the size of PubTabNet, we adopted its annotation format and we extracted and converted all tables as PNG images with a resolution of 72 dpi. Additionally, we have filtered out tables with extreme sizes due to small", "text": "Due to the heterogeneity across the dataset formats, it was necessary to combine all available data into one homogenized dataset before we could train our models for practical purposes. Given the size of PubTabNet, we adopted its annotation format and we extracted and converted all tables as PNG images with a resolution of 72 dpi. Additionally, we have filtered out tables with extreme sizes due to small"}, {"self_ref": "#/texts/127", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 4, "bbox": {"l": 50.11199951171875, "t": 716.7916259765625, "r": 286.3651123046875, "b": 695.9300537109375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 93]}], "orig": "amount of such tables, and kept only those ones ranging between 1*1 and 20*10 (rows/columns).", "text": "amount of such tables, and kept only those ones ranging between 1*1 and 20*10 (rows/columns)."}, {"self_ref": "#/texts/128", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 4, "bbox": {"l": 50.11199951171875, "t": 691.0396118164062, "r": 286.3651428222656, "b": 478.8949279785156, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 983]}], "orig": "The availability of the bounding boxes for all table cells is essential to train our models. In order to distinguish between empty and non-empty bounding boxes, we have introduced a binary class in the annotation. Unfortunately, the original datasets either omit the bounding boxes for whole tables (e.g. TableBank) or they narrow their scope only to non-empty cells. Therefore, it was imperative to introduce a data pre-processing procedure that generates the missing bounding boxes out of the annotation information. This procedure first parses the provided table structure and calculates the dimensions of the most fine-grained grid that covers the table structure. Notice that each table cell may occupy multiple grid squares due to row or column spans. In case of PubTabNet we had to compute missing bounding boxes for 48% of the simple and 69% of the complex tables. Regarding FinTabNet, 68% of the simple and 98% of the complex tables require the generation of bounding boxes.", "text": "The availability of the bounding boxes for all table cells is essential to train our models. In order to distinguish between empty and non-empty bounding boxes, we have introduced a binary class in the annotation. Unfortunately, the original datasets either omit the bounding boxes for whole tables (e.g. TableBank) or they narrow their scope only to non-empty cells. Therefore, it was imperative to introduce a data pre-processing procedure that generates the missing bounding boxes out of the annotation information. This procedure first parses the provided table structure and calculates the dimensions of the most fine-grained grid that covers the table structure. Notice that each table cell may occupy multiple grid squares due to row or column spans. In case of PubTabNet we had to compute missing bounding boxes for 48% of the simple and 69% of the complex tables. Regarding FinTabNet, 68% of the simple and 98% of the complex tables require the generation of bounding boxes."}, {"self_ref": "#/texts/129", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 4, "bbox": {"l": 50.11199951171875, "t": 474.0044860839844, "r": 286.3651123046875, "b": 357.50103759765625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 571]}], "orig": "As it is illustrated in Fig. 2, the table distributions from all datasets are skewed towards simpler structures with fewer number of rows/columns. Additionally, there is very limited variance in the table styles, which in case of PubTabNet and FinTabNet means one styling format for the majority of the tables. Similar limitations appear also in the type of table content, which in some cases (e.g. FinTabNet) is restricted to a certain domain. Ultimately, the lack of diversity in the training dataset damages the ability of the models to generalize well on unseen data.", "text": "As it is illustrated in Fig. 2, the table distributions from all datasets are skewed towards simpler structures with fewer number of rows/columns. Additionally, there is very limited variance in the table styles, which in case of PubTabNet and FinTabNet means one styling format for the majority of the tables. Similar limitations appear also in the type of table content, which in some cases (e.g. FinTabNet) is restricted to a certain domain. Ultimately, the lack of diversity in the training dataset damages the ability of the models to generalize well on unseen data."}, {"self_ref": "#/texts/130", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 4, "bbox": {"l": 50.11199951171875, "t": 352.610595703125, "r": 286.3665466308594, "b": 164.37611389160156, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 941]}], "orig": "Motivated by those observations we aimed at generating a synthetic table dataset named SynthTabNet . This approach offers control over: 1) the size of the dataset, 2) the table structure, 3) the table style and 4) the type of content. The complexity of the table structure is described by the size of the table header and the table body, as well as the percentage of the table cells covered by row spans and column spans. A set of carefully designed styling templates provides the basis to build a wide range of table appearances. Lastly, the table content is generated out of a curated collection of text corpora. By controlling the size and scope of the synthetic datasets we are able to train and evaluate our models in a variety of different conditions. For example, we can first generate a highly diverse dataset to train our models and then evaluate their performance on other synthetic datasets which are focused on a specific domain.", "text": "Motivated by those observations we aimed at generating a synthetic table dataset named SynthTabNet . This approach offers control over: 1) the size of the dataset, 2) the table structure, 3) the table style and 4) the type of content. The complexity of the table structure is described by the size of the table header and the table body, as well as the percentage of the table cells covered by row spans and column spans. A set of carefully designed styling templates provides the basis to build a wide range of table appearances. Lastly, the table content is generated out of a curated collection of text corpora. By controlling the size and scope of the synthetic datasets we are able to train and evaluate our models in a variety of different conditions. For example, we can first generate a highly diverse dataset to train our models and then evaluate their performance on other synthetic datasets which are focused on a specific domain."}, {"self_ref": "#/texts/131", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 4, "bbox": {"l": 50.11201477050781, "t": 159.4856719970703, "r": 286.3651123046875, "b": 78.84810638427734, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 405]}], "orig": "In this regard, we have prepared four synthetic datasets, each one containing 150k examples. The corpora to generate the table text consists of the most frequent terms appearing in PubTabNet and FinTabNet together with randomly generated text. The first two synthetic datasets have been fine-tuned to mimic the appearance of the original datasets but encompass more complicated table structures. The third", "text": "In this regard, we have prepared four synthetic datasets, each one containing 150k examples. The corpora to generate the table text consists of the most frequent terms appearing in PubTabNet and FinTabNet together with randomly generated text. The first two synthetic datasets have been fine-tuned to mimic the appearance of the original datasets but encompass more complicated table structures. The third"}, {"self_ref": "#/texts/132", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 4, "bbox": {"l": 295.1209716796875, "t": 57.86674880981445, "r": 300.1022644042969, "b": 48.96018600463867, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "4", "text": "4"}, {"self_ref": "#/texts/133", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 4, "bbox": {"l": 308.86199951171875, "t": 624.338623046875, "r": 545.1150512695312, "b": 567.6110229492188, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 267]}], "orig": "Table 1: Both \"Combined-Tabnet\" and \"CombinedTabnet\" are variations of the following: (*) The CombinedTabnet dataset is the processed combination of PubTabNet and Fintabnet. (**) The combined dataset is the processed combination of PubTabNet, Fintabnet and TableBank.", "text": "Table 1: Both \"Combined-Tabnet\" and \"CombinedTabnet\" are variations of the following: (*) The CombinedTabnet dataset is the processed combination of PubTabNet and Fintabnet. (**) The combined dataset is the processed combination of PubTabNet, Fintabnet and TableBank."}, {"self_ref": "#/texts/134", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 4, "bbox": {"l": 308.86199951171875, "t": 542.3795776367188, "r": 545.1151733398438, "b": 497.6080322265625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 210]}], "orig": "one adopts a colorful appearance with high contrast and the last one contains tables with sparse content. Lastly, we have combined all synthetic datasets into one big unified synthetic dataset of 600k examples.", "text": "one adopts a colorful appearance with high contrast and the last one contains tables with sparse content. Lastly, we have combined all synthetic datasets into one big unified synthetic dataset of 600k examples."}, {"self_ref": "#/texts/135", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 4, "bbox": {"l": 320.8169860839844, "t": 494.22760009765625, "r": 542.7439575195312, "b": 485.321044921875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 57]}], "orig": "Tab. 1 summarizes the various attributes of the datasets.", "text": "Tab. 1 summarizes the various attributes of the datasets."}, {"self_ref": "#/texts/136", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 4, "bbox": {"l": 308.86199951171875, "t": 470.8160400390625, "r": 444.9360656738281, "b": 460.0683288574219, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 24]}], "orig": "4. The TableFormer model", "text": "4. The TableFormer model", "level": 1}, {"self_ref": "#/texts/137", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 4, "bbox": {"l": 308.86199951171875, "t": 450.06060791015625, "r": 545.115234375, "b": 345.5131530761719, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 504]}], "orig": "Given the image of a table, TableFormer is able to predict: 1) a sequence of tokens that represent the structure of a table, and 2) a bounding box coupled to a subset of those tokens. The conversion of an image into a sequence of tokens is a well-known task [35, 16]. While attention is often used as an implicit method to associate each token of the sequence with a position in the original image, an explicit association between the individual table-cells and the image bounding boxes is also required.", "text": "Given the image of a table, TableFormer is able to predict: 1) a sequence of tokens that represent the structure of a table, and 2) a bounding box coupled to a subset of those tokens. The conversion of an image into a sequence of tokens is a well-known task [35, 16]. While attention is often used as an implicit method to associate each token of the sequence with a position in the original image, an explicit association between the individual table-cells and the image bounding boxes is also required."}, {"self_ref": "#/texts/138", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 4, "bbox": {"l": 308.86199951171875, "t": 334.30572509765625, "r": 420.16058349609375, "b": 324.45367431640625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 24]}], "orig": "4.1. Model architecture.", "text": "4.1. Model architecture.", "level": 1}, {"self_ref": "#/texts/139", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 4, "bbox": {"l": 308.8619689941406, "t": 315.2347106933594, "r": 545.11572265625, "b": 127.00019073486328, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 907]}], "orig": "We now describe in detail the proposed method, which is composed of three main components, see Fig. 4. Our CNN Backbone Network encodes the input as a feature vector of predefined length. The input feature vector of the encoded image is passed to the Structure Decoder to produce a sequence of HTML tags that represent the structure of the table. With each prediction of an HTML standard data cell (' < td > ') the hidden state of that cell is passed to the Cell BBox Decoder. As for spanning cells, such as row or column span, the tag is broken down to ' < ', 'rowspan=' or 'colspan=', with the number of spanning cells (attribute), and ' > '. The hidden state attached to ' < ' is passed to the Cell BBox Decoder. A shared feed forward network (FFN) receives the hidden states from the Structure Decoder, to provide the final detection predictions of the bounding box coordinates and their classification.", "text": "We now describe in detail the proposed method, which is composed of three main components, see Fig. 4. Our CNN Backbone Network encodes the input as a feature vector of predefined length. The input feature vector of the encoded image is passed to the Structure Decoder to produce a sequence of HTML tags that represent the structure of the table. With each prediction of an HTML standard data cell (' < td > ') the hidden state of that cell is passed to the Cell BBox Decoder. As for spanning cells, such as row or column span, the tag is broken down to ' < ', 'rowspan=' or 'colspan=', with the number of spanning cells (attribute), and ' > '. The hidden state attached to ' < ' is passed to the Cell BBox Decoder. A shared feed forward network (FFN) receives the hidden states from the Structure Decoder, to provide the final detection predictions of the bounding box coordinates and their classification."}, {"self_ref": "#/texts/140", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 4, "bbox": {"l": 308.8619689941406, "t": 123.73930358886719, "r": 545.1151123046875, "b": 78.84818267822266, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 223]}], "orig": "CNN Backbone Network. A ResNet-18 CNN is the backbone that receives the table image and encodes it as a vector of predefined length. The network has been modified by removing the linear and pooling layer, as we are not per-", "text": "CNN Backbone Network. A ResNet-18 CNN is the backbone that receives the table image and encodes it as a vector of predefined length. The network has been modified by removing the linear and pooling layer, as we are not per-"}, {"self_ref": "#/texts/141", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 5, "bbox": {"l": 50.11199188232422, "t": 588.0142211914062, "r": 545.1084594726562, "b": 567.0330810546875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 212]}], "orig": "Figure 3: TableFormer takes in an image of the PDF and creates bounding box and HTML structure predictions that are synchronized. The bounding boxes grabs the content from the PDF and inserts it in the structure.", "text": "Figure 3: TableFormer takes in an image of the PDF and creates bounding box and HTML structure predictions that are synchronized. The bounding boxes grabs the content from the PDF and inserts it in the structure."}, {"self_ref": "#/texts/142", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 81.688072, "t": 669.5603, "r": 84.927567, "b": 666.37109, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "1.", "text": "1."}, {"self_ref": "#/texts/143", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 86.54731, "t": 669.5603, "r": 93.026291, "b": 666.37109, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "Item", "text": "Item"}, {"self_ref": "#/texts/144", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 102.50498, "t": 676.74786, "r": 115.3461, "b": 673.55865, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Amount", "text": "Amount"}, {"self_ref": "#/texts/145", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 82.140205, "t": 676.7851, "r": 93.291527, "b": 673.59589, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 5]}], "orig": "Names", "text": "Names"}, {"self_ref": "#/texts/146", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 96.748268, "t": 669.5603, "r": 104.3119, "b": 666.37109, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "1000", "text": "1000"}, {"self_ref": "#/texts/147", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 96.748268, "t": 664.2562900000001, "r": 102.42083, "b": 661.06708, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "500", "text": "500"}, {"self_ref": "#/texts/148", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 96.748268, "t": 658.54431, "r": 104.3119, "b": 655.3551, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "3500", "text": "3500"}, {"self_ref": "#/texts/149", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 96.748268, "t": 652.83228, "r": 102.42083, "b": 649.64307, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "150", "text": "150"}, {"self_ref": "#/texts/150", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 110.66107, "t": 669.5603, "r": 116.14391, "b": 666.37109, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "unit", "text": "unit"}, {"self_ref": "#/texts/151", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 110.66107, "t": 664.2562900000001, "r": 116.14391, "b": 661.06708, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "unit", "text": "unit"}, {"self_ref": "#/texts/152", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 110.66107, "t": 658.54431, "r": 116.14391, "b": 655.3551, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "unit", "text": "unit"}, {"self_ref": "#/texts/153", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 110.66107, "t": 652.83228, "r": 116.14391, "b": 649.64307, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "unit", "text": "unit"}, {"self_ref": "#/texts/154", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 81.688072, "t": 664.2562900000001, "r": 84.927567, "b": 661.06708, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "2.", "text": "2."}, {"self_ref": "#/texts/155", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 86.54731, "t": 664.2562900000001, "r": 93.026291, "b": 661.06708, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "Item", "text": "Item"}, {"self_ref": "#/texts/156", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 81.688072, "t": 658.54431, "r": 84.927567, "b": 655.3551, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "3.", "text": "3."}, {"self_ref": "#/texts/157", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 86.54731, "t": 658.54431, "r": 93.026291, "b": 655.3551, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "Item", "text": "Item"}, {"self_ref": "#/texts/158", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 81.688072, "t": 652.83228, "r": 84.927567, "b": 649.64307, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "4.", "text": "4."}, {"self_ref": "#/texts/159", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 86.54731, "t": 652.83228, "r": 93.026291, "b": 649.64307, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "Item", "text": "Item"}, {"self_ref": "#/texts/160", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 88.084389, "t": 701.50262, "r": 113.93649, "b": 695.76202, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 9]}], "orig": "Extracted", "text": "Extracted"}, {"self_ref": "#/texts/161", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 82.81002, "t": 694.36261, "r": 119.21240000000002, "b": 688.62201, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 12]}], "orig": "Table Images", "text": "Table Images"}, {"self_ref": "#/texts/162", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 143.94247, "t": 691.39764, "r": 180.01131, "b": 685.65704, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 12]}], "orig": "Standardized", "text": "Standardized"}, {"self_ref": "#/texts/163", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 151.94064, "t": 684.25763, "r": 172.0118, "b": 678.5170299999999, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Images", "text": "Images"}, {"self_ref": "#/texts/164", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 251.76939000000002, "t": 711.0690300000001, "r": 266.39557, "b": 705.32843, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "BBox", "text": "BBox"}, {"self_ref": "#/texts/165", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 247.51601, "t": 705.96899, "r": 270.65021, "b": 700.22839, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Decoder", "text": "Decoder"}, {"self_ref": "#/texts/166", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 331.03699, "t": 713.44019, "r": 352.12589, "b": 707.69958, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "BBoxes", "text": "BBoxes"}, {"self_ref": "#/texts/167", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 390.56421, "t": 695.96777, "r": 431.7261, "b": 690.2271700000001, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 13]}], "orig": "BBoxes can be", "text": "BBoxes can be"}, {"self_ref": "#/texts/168", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 386.82422, "t": 689.8477199999999, "r": 435.46966999999995, "b": 684.10712, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 18]}], "orig": "traced back to the", "text": "traced back to the"}, {"self_ref": "#/texts/169", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 388.69589, "t": 683.72772, "r": 433.6032400000001, "b": 677.9871199999999, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 17]}], "orig": "original image to", "text": "original image to"}, {"self_ref": "#/texts/170", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 391.07761, "t": 677.60773, "r": 431.22542999999996, "b": 671.8671300000001, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 15]}], "orig": "extract content", "text": "extract content"}, {"self_ref": "#/texts/171", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 431.22650000000004, "t": 640.31488, "r": 498.82068, "b": 634.57428, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 23]}], "orig": "Structure Tags sequence", "text": "Structure Tags sequence"}, {"self_ref": "#/texts/172", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 431.1738, "t": 634.19482, "r": 498.87753000000004, "b": 628.45422, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 27]}], "orig": "provide full description of", "text": "provide full description of"}, {"self_ref": "#/texts/173", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 440.5289, "t": 628.07483, "r": 489.51827999999995, "b": 622.33423, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 19]}], "orig": "the table structure", "text": "the table structure"}, {"self_ref": "#/texts/174", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 328.37479, "t": 613.74615, "r": 367.72333, "b": 608.00555, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 14]}], "orig": "Structure Tags", "text": "Structure Tags"}, {"self_ref": "#/texts/175", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 331.84451, "t": 668.09113, "r": 373.67963, "b": 662.3505199999998, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 14]}], "orig": "BBoxes in sync", "text": "BBoxes in sync"}, {"self_ref": "#/texts/176", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 331.84451, "t": 662.9911499999998, "r": 381.17786, "b": 657.25055, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 17]}], "orig": "with tag sequence", "text": "with tag sequence"}, {"self_ref": "#/texts/177", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 196.62633, "t": 703.88379, "r": 219.42332, "b": 698.14319, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Encoder", "text": "Encoder"}, {"self_ref": "#/texts/178", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 246.66771, "t": 662.5053099999999, "r": 271.49899, "b": 656.76471, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 9]}], "orig": "Structure", "text": "Structure"}, {"self_ref": "#/texts/179", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 247.51601, "t": 657.40527, "r": 270.65021, "b": 651.66467, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Decoder", "text": "Decoder"}, {"self_ref": "#/texts/180", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 330.63071, "t": 702.98077, "r": 365.55347, "b": 697.24017, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 16]}], "orig": "[x1, y2, x2, y2]", "text": "[x1, y2, x2, y2]"}, {"self_ref": "#/texts/181", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 330.63071, "t": 694.82074, "r": 370.22717, "b": 689.08014, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 20]}], "orig": "[x1', y2', x2', y2']", "text": "[x1', y2', x2', y2']"}, {"self_ref": "#/texts/182", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 330.63071, "t": 686.6607700000001, "r": 374.51157, "b": 680.92017, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 24]}], "orig": "[x1'', y2'', x2'', y2'']", "text": "[x1'', y2'', x2'', y2'']"}, {"self_ref": "#/texts/183", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 330.63071, "t": 678.5007300000001, "r": 335.73233, "b": 672.76013, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "...", "text": "..."}, {"self_ref": "#/texts/184", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 322.30579, "t": 650.20764, "r": 335.05988, "b": 645.42383, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "", "text": ""}, {"self_ref": "#/texts/185", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 322.30579, "t": 643.06769, "r": 335.05988, "b": 638.28387, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "", "text": ""}, {"self_ref": "#/texts/186", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 337.54971, "t": 643.44421, "r": 340.95242, "b": 637.70361, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/187", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 343.56262, "t": 643.06769, "r": 398.91446, "b": 638.28387, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 21]}], "orig": "", "text": ""}, {"self_ref": "#/texts/188", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 407.41718, "t": 643.06769, "r": 421.58801, "b": 638.28387, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 5]}], "orig": "", "text": ""}, {"self_ref": "#/texts/189", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 322.30579, "t": 635.92767, "r": 349.23022, "b": 631.14386, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 9]}], "orig": "", "text": ""}, {"self_ref": "#/texts/190", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 322.30579, "t": 628.78766, "r": 335.05988, "b": 624.00385, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "", "text": ""}, {"self_ref": "#/texts/191", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 343.56155, "t": 628.78766, "r": 374.73685, "b": 624.00385, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 12]}], "orig": "...", "text": "..."}, {"self_ref": "#/texts/192", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 322.30579, "t": 621.64764, "r": 326.55716, "b": 616.86383, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "...", "text": "..."}, {"self_ref": "#/texts/193", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 323.51111, "t": 702.33032, "r": 326.91382, "b": 696.58972, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/194", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 323.71509, "t": 694.21112, "r": 327.1178, "b": 688.47052, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "2", "text": "2"}, {"self_ref": "#/texts/195", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 323.71509, "t": 686.01031, "r": 327.1178, "b": 680.2697099999999, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "3", "text": "3"}, {"self_ref": "#/texts/196", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 401.4816, "t": 643.45374, "r": 404.88431, "b": 637.71313, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "2", "text": "2"}, {"self_ref": "#/texts/197", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 337.6976, "t": 629.31549, "r": 341.10031, "b": 623.57489, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "3", "text": "3"}, {"self_ref": "#/texts/198", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 454.46378, "t": 687.45416, "r": 457.86648999999994, "b": 681.7135599999999, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "3", "text": "3"}, {"self_ref": "#/texts/199", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 493.32580999999993, "t": 700.90454, "r": 496.72852, "b": 695.16394, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "2", "text": "2"}, {"self_ref": "#/texts/200", "parent": {"cref": "#/pictures/4"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 454.08298, "t": 701.4312099999999, "r": 457.48569000000003, "b": 695.69061, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/201", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 5, "bbox": {"l": 50.11199951171875, "t": 264.2171936035156, "r": 286.365966796875, "b": 111.72905731201172, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 745]}], "orig": "Figure 4: Given an input image of a table, the Encoder produces fixed-length features that represent the input image. The features are then passed to both the Structure Decoder and Cell BBox Decoder . During training, the Structure Decoder receives 'tokenized tags' of the HTML code that represent the table structure. Afterwards, a transformer encoder and decoder architecture is employed to produce features that are received by a linear layer, and the Cell BBox Decoder. The linear layer is applied to the features to predict the tags. Simultaneously, the Cell BBox Decoder selects features referring to the data cells (' < td > ', ' < ') and passes them through an attention network, an MLP, and a linear layer to predict the bounding boxes.", "text": "Figure 4: Given an input image of a table, the Encoder produces fixed-length features that represent the input image. The features are then passed to both the Structure Decoder and Cell BBox Decoder . During training, the Structure Decoder receives 'tokenized tags' of the HTML code that represent the table structure. Afterwards, a transformer encoder and decoder architecture is employed to produce features that are received by a linear layer, and the Cell BBox Decoder. The linear layer is applied to the features to predict the tags. Simultaneously, the Cell BBox Decoder selects features referring to the data cells (' < td > ', ' < ') and passes them through an attention network, an MLP, and a linear layer to predict the bounding boxes."}, {"self_ref": "#/texts/202", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 74.253464, "t": 533.78528, "r": 101.75846, "b": 527.82526, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 11]}], "orig": "Input Image", "text": "Input Image"}, {"self_ref": "#/texts/203", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 122.29972, "t": 533.65479, "r": 157.83972, "b": 527.69476, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 14]}], "orig": "Tokenised Tags", "text": "Tokenised Tags"}, {"self_ref": "#/texts/204", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 78.549347, "t": 420.61420000000004, "r": 125.68359000000001, "b": 414.95218, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 20]}], "orig": "Multi-Head Attention", "text": "Multi-Head Attention"}, {"self_ref": "#/texts/205", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 78.513298, "t": 400.68143, "r": 84.644547, "b": 395.01941, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "Add", "text": "Add"}, {"self_ref": "#/texts/206", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 116.52705, "t": 400.68143, "r": 125.11079999999998, "b": 395.01941, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 15]}], "orig": "& Normalisation", "text": "& Normalisation"}, {"self_ref": "#/texts/207", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 76.024773, "t": 367.54691, "r": 127.92327000000002, "b": 361.88489, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 20]}], "orig": "Feed Forward Network", "text": "Feed Forward Network"}, {"self_ref": "#/texts/208", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 78.382828, "t": 347.11044, "r": 84.514076, "b": 341.44843, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "Add", "text": "Add"}, {"self_ref": "#/texts/209", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 116.39658, "t": 347.11044, "r": 124.98033, "b": 341.44843, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 15]}], "orig": "& Normalisation", "text": "& Normalisation"}, {"self_ref": "#/texts/210", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 167.46945, "t": 329.55676, "r": 181.6292, "b": 323.89474, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Linear", "text": "Linear"}, {"self_ref": "#/texts/211", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 165.61292, "t": 313.52893, "r": 184.43242, "b": 307.86691, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Softmax", "text": "Softmax"}, {"self_ref": "#/texts/212", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 65.319511, "t": 467.73764000000006, "r": 132.9245, "b": 461.77764999999994, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 20]}], "orig": "CNN BACKBONE ENCODER", "text": "CNN BACKBONE ENCODER"}, {"self_ref": "#/texts/213", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 119.51457, "t": 522.33606, "r": 162.98782, "b": 517.27008, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 21]}], "orig": "[30, 1, 2, 3, 4, \u2026 3,", "text": "[30, 1, 2, 3, 4, \u2026 3,"}, {"self_ref": "#/texts/214", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 128.72858, "t": 517.08606, "r": 151.41083, "b": 512.02008, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 12]}], "orig": "4, 5, 8, 31]", "text": "4, 5, 8, 31]"}, {"self_ref": "#/texts/215", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 60.434211999999995, "t": 453.04007, "r": 80.27021, "b": 447.73007, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 10]}], "orig": "Positional", "text": "Positional"}, {"self_ref": "#/texts/216", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 60.598457, "t": 448.61395, "r": 78.854958, "b": 443.30396, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 8]}], "orig": "Encoding", "text": "Encoding"}, {"self_ref": "#/texts/217", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 134.82877, "t": 498.62238, "r": 154.66476, "b": 493.31238, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 10]}], "orig": "Positional", "text": "Positional"}, {"self_ref": "#/texts/218", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 134.99303, "t": 494.19629000000003, "r": 153.24953, "b": 488.88629, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 8]}], "orig": "Encoding", "text": "Encoding"}, {"self_ref": "#/texts/219", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 150.55193, "t": 446.64139, "r": 197.14943, "b": 440.97937, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 19]}], "orig": "Add & Normalisation", "text": "Add & Normalisation"}, {"self_ref": "#/texts/220", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 150.55193, "t": 397.5766, "r": 156.68318, "b": 391.91458, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "Add", "text": "Add"}, {"self_ref": "#/texts/221", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 188.56567, "t": 397.5766, "r": 197.14943, "b": 391.91458, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 15]}], "orig": "& Normalisation", "text": "& Normalisation"}, {"self_ref": "#/texts/222", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 150.18539, "t": 416.33157, "r": 197.31964, "b": 410.66956, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 20]}], "orig": "Multi-Head Attention", "text": "Multi-Head Attention"}, {"self_ref": "#/texts/223", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 150.55193, "t": 351.75152999999995, "r": 156.68318, "b": 346.08951, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "Add", "text": "Add"}, {"self_ref": "#/texts/224", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 188.56567, "t": 351.75152999999995, "r": 197.14943, "b": 346.08951, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 15]}], "orig": "& Normalisation", "text": "& Normalisation"}, {"self_ref": "#/texts/225", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 147.86377, "t": 369.90665, "r": 199.76227, "b": 364.24463, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 20]}], "orig": "Feed Forward Network", "text": "Feed Forward Network"}, {"self_ref": "#/texts/226", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 241.56567000000004, "t": 477.73714999999993, "r": 255.72542, "b": 472.07513, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Linear", "text": "Linear"}, {"self_ref": "#/texts/227", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 241.91730000000004, "t": 430.63507, "r": 256.07706, "b": 424.97305, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Linear", "text": "Linear"}, {"self_ref": "#/texts/228", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 228.054, "t": 455.38070999999997, "r": 248.72363000000004, "b": 449.71869, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 9]}], "orig": "Attention", "text": "Attention"}, {"self_ref": "#/texts/229", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 246.2919, "t": 455.38070999999997, "r": 269.39325, "b": 449.71869, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Network", "text": "Network"}, {"self_ref": "#/texts/230", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 228.44568000000004, "t": 386.85318, "r": 238.73892, "b": 381.19116, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "MLP", "text": "MLP"}, {"self_ref": "#/texts/231", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 256.29767, "t": 386.7967499999999, "r": 271.77792, "b": 381.13474, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Linear", "text": "Linear"}, {"self_ref": "#/texts/232", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 239.54543, "t": 409.78656, "r": 258.08942, "b": 404.12454, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Sigmoid", "text": "Sigmoid"}, {"self_ref": "#/texts/233", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 54.14704100000001, "t": 407.12817, "r": 59.51152, "b": 342.21674, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 27]}], "orig": "Transformer Encoder Network", "text": "Transformer Encoder Network"}, {"self_ref": "#/texts/234", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 54.235424, "t": 418.18768, "r": 59.30449699999999, "b": 413.54578000000004, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "x2", "text": "x2"}, {"self_ref": "#/texts/235", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 85.295891, "t": 307.46811, "r": 122.16431, "b": 301.63312, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 14]}], "orig": "Encoded Output", "text": "Encoded Output"}, {"self_ref": "#/texts/236", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 229.66599, "t": 512.45392, "r": 265.3194, "b": 506.54427999999996, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 14]}], "orig": "Encoded Output", "text": "Encoded Output"}, {"self_ref": "#/texts/237", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 157.17369, "t": 291.6969, "r": 190.41711, "b": 285.87057, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 14]}], "orig": "Predicted Tags", "text": "Predicted Tags"}, {"self_ref": "#/texts/238", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 227.81598999999997, "t": 353.94458, "r": 270.78442, "b": 348.10794, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 16]}], "orig": "Bounding Boxes &", "text": "Bounding Boxes &"}, {"self_ref": "#/texts/239", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 233.70262, "t": 347.93817, "r": 263.51105, "b": 342.1095000000001, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 14]}], "orig": "Classification", "text": "Classification"}, {"self_ref": "#/texts/240", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 184.74655, "t": 498.60498, "r": 212.16055, "b": 493.24097, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 11]}], "orig": "Transformer", "text": "Transformer"}, {"self_ref": "#/texts/241", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 178.91229, "t": 492.85498, "r": 216.74378999999996, "b": 487.49097, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 15]}], "orig": "Decoder Network", "text": "Decoder Network"}, {"self_ref": "#/texts/242", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 194.24574, "t": 509.2178, "r": 198.89099, "b": 504.15182000000004, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "x4", "text": "x4"}, {"self_ref": "#/texts/243", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 221.45587, "t": 520.13086, "r": 276.47089, "b": 514.17084, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 17]}], "orig": "CELL BBOX DECODER", "text": "CELL BBOX DECODER"}, {"self_ref": "#/texts/244", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 151.65219, "t": 468.55759, "r": 197.29019, "b": 462.89557, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 17]}], "orig": "Masked Multi-Head", "text": "Masked Multi-Head"}, {"self_ref": "#/texts/245", "parent": {"cref": "#/pictures/5"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 163.43277, "t": 462.55759, "r": 184.19028, "b": 456.89557, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 9]}], "orig": "Attention", "text": "Attention"}, {"self_ref": "#/texts/246", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 308.86199951171875, "t": 542.465576171875, "r": 545.1150512695312, "b": 497.69305419921875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 227]}], "orig": "forming classification, and adding an adaptive pooling layer of size 28*28. ResNet by default downsamples the image resolution by 32 and then the encoded image is provided to both the Structure Decoder , and Cell BBox Decoder .", "text": "forming classification, and adding an adaptive pooling layer of size 28*28. ResNet by default downsamples the image resolution by 32 and then the encoded image is provided to both the Structure Decoder , and Cell BBox Decoder ."}, {"self_ref": "#/texts/247", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 308.8619384765625, "t": 494.6601867675781, "r": 545.1151123046875, "b": 378.0381774902344, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 563]}], "orig": "Structure Decoder. The transformer architecture of this component is based on the work proposed in [31]. After extensive experimentation, the Structure Decoder is modeled as a transformer encoder with two encoder layers and a transformer decoder made from a stack of 4 decoder layers that comprise mainly of multi-head attention and feed forward layers. This configuration uses fewer layers and heads in comparison to networks applied to other problems (e.g. \"Scene Understanding\", \"Image Captioning\"), something which we relate to the simplicity of table images.", "text": "Structure Decoder. The transformer architecture of this component is based on the work proposed in [31]. After extensive experimentation, the Structure Decoder is modeled as a transformer encoder with two encoder layers and a transformer decoder made from a stack of 4 decoder layers that comprise mainly of multi-head attention and feed forward layers. This configuration uses fewer layers and heads in comparison to networks applied to other problems (e.g. \"Scene Understanding\", \"Image Captioning\"), something which we relate to the simplicity of table images."}, {"self_ref": "#/texts/248", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 308.8619689941406, "t": 374.8857421875, "r": 545.1151123046875, "b": 246.4272918701172, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 592]}], "orig": "The transformer encoder receives an encoded image from the CNN Backbone Network and refines it through a multi-head dot-product attention layer, followed by a Feed Forward Network. During training, the transformer decoder receives as input the output feature produced by the transformer encoder, and the tokenized input of the HTML ground-truth tags. Using a stack of multi-head attention layers, different aspects of the tag sequence could be inferred. This is achieved by each attention head on a layer operating in a different subspace, and then combining altogether their attention score.", "text": "The transformer encoder receives an encoded image from the CNN Backbone Network and refines it through a multi-head dot-product attention layer, followed by a Feed Forward Network. During training, the transformer decoder receives as input the output feature produced by the transformer encoder, and the tokenized input of the HTML ground-truth tags. Using a stack of multi-head attention layers, different aspects of the tag sequence could be inferred. This is achieved by each attention head on a layer operating in a different subspace, and then combining altogether their attention score."}, {"self_ref": "#/texts/249", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 308.8619384765625, "t": 243.39540100097656, "r": 545.1151123046875, "b": 138.727294921875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 483]}], "orig": "Cell BBox Decoder. Our architecture allows to simultaneously predict HTML tags and bounding boxes for each table cell without the need of a separate object detector end to end. This approach is inspired by DETR [1] which employs a Transformer Encoder, and Decoder that looks for a specific number of object queries (potential object detections). As our model utilizes a transformer architecture, the hidden state of the < td > ' and ' < ' HTML structure tags become the object query.", "text": "Cell BBox Decoder. Our architecture allows to simultaneously predict HTML tags and bounding boxes for each table cell without the need of a separate object detector end to end. This approach is inspired by DETR [1] which employs a Transformer Encoder, and Decoder that looks for a specific number of object queries (potential object detections). As our model utilizes a transformer architecture, the hidden state of the < td > ' and ' < ' HTML structure tags become the object query."}, {"self_ref": "#/texts/250", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 5, "bbox": {"l": 308.8619384765625, "t": 135.57484436035156, "r": 545.1150512695312, "b": 78.84827423095703, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 286]}], "orig": "The encoding generated by the CNN Backbone Network along with the features acquired for every data cell from the Transformer Decoder are then passed to the attention network. The attention network takes both inputs and learns to provide an attention weighted encoding. This weighted at-", "text": "The encoding generated by the CNN Backbone Network along with the features acquired for every data cell from the Transformer Decoder are then passed to the attention network. The attention network takes both inputs and learns to provide an attention weighted encoding. This weighted at-"}, {"self_ref": "#/texts/251", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 5, "bbox": {"l": 295.1209411621094, "t": 57.86684036254883, "r": 300.10223388671875, "b": 48.96027755737305, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "5", "text": "5"}, {"self_ref": "#/texts/252", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 50.11199951171875, "t": 716.7916259765625, "r": 286.3651428222656, "b": 636.1539916992188, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 380]}], "orig": "tention encoding is then multiplied to the encoded image to produce a feature for each table cell. Notice that this is different than the typical object detection problem where imbalances between the number of detections and the amount of objects may exist. In our case, we know up front that the produced detections always match with the table cells in number and correspondence.", "text": "tention encoding is then multiplied to the encoded image to produce a feature for each table cell. Notice that this is different than the typical object detection problem where imbalances between the number of detections and the amount of objects may exist. In our case, we know up front that the produced detections always match with the table cells in number and correspondence."}, {"self_ref": "#/texts/253", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 50.11199951171875, "t": 632.3755493164062, "r": 286.3651123046875, "b": 551.7369384765625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 371]}], "orig": "The output features for each table cell are then fed into the feed-forward network (FFN). The FFN consists of a Multi-Layer Perceptron (3 layers with ReLU activation function) that predicts the normalized coordinates for the bounding box of each table cell. Finally, the predicted bounding boxes are classified based on whether they are empty or not using a linear layer.", "text": "The output features for each table cell are then fed into the feed-forward network (FFN). The FFN consists of a Multi-Layer Perceptron (3 layers with ReLU activation function) that predicts the normalized coordinates for the bounding box of each table cell. Finally, the predicted bounding boxes are classified based on whether they are empty or not using a linear layer."}, {"self_ref": "#/texts/254", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 50.11199951171875, "t": 548.0780639648438, "r": 286.36572265625, "b": 347.76910400390625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 985]}], "orig": "Loss Functions. We formulate a multi-task loss Eq. 2 to train our network. The Cross-Entropy loss (denoted as l$_{s}$ ) is used to train the Structure Decoder which predicts the structure tokens. As for the Cell BBox Decoder it is trained with a combination of losses denoted as l$_{box}$ . l$_{box}$ consists of the generally used l$_{1}$ loss for object detection and the IoU loss ( l$_{iou}$ ) to be scale invariant as explained in [25]. In comparison to DETR, we do not use the Hungarian algorithm [15] to match the predicted bounding boxes with the ground-truth boxes, as we have already achieved a one-toone match through two steps: 1) Our token input sequence is naturally ordered, therefore the hidden states of the table data cells are also in order when they are provided as input to the Cell BBox Decoder , and 2) Our bounding boxes generation mechanism (see Sec. 3) ensures a one-to-one mapping between the cell content and its bounding box for all post-processed datasets.", "text": "Loss Functions. We formulate a multi-task loss Eq. 2 to train our network. The Cross-Entropy loss (denoted as l$_{s}$ ) is used to train the Structure Decoder which predicts the structure tokens. As for the Cell BBox Decoder it is trained with a combination of losses denoted as l$_{box}$ . l$_{box}$ consists of the generally used l$_{1}$ loss for object detection and the IoU loss ( l$_{iou}$ ) to be scale invariant as explained in [25]. In comparison to DETR, we do not use the Hungarian algorithm [15] to match the predicted bounding boxes with the ground-truth boxes, as we have already achieved a one-toone match through two steps: 1) Our token input sequence is naturally ordered, therefore the hidden states of the table data cells are also in order when they are provided as input to the Cell BBox Decoder , and 2) Our bounding boxes generation mechanism (see Sec. 3) ensures a one-to-one mapping between the cell content and its bounding box for all post-processed datasets."}, {"self_ref": "#/texts/255", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 50.112022399902344, "t": 343.9896545410156, "r": 286.364990234375, "b": 323.12811279296875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 67]}], "orig": "The loss used to train the TableFormer can be defined as following:", "text": "The loss used to train the TableFormer can be defined as following:"}, {"self_ref": "#/texts/256", "parent": {"cref": "#/body"}, "children": [], "label": "formula", "prov": [{"page_no": 6, "bbox": {"l": 124.33001708984375, "t": 298.71905517578125, "r": 286.3624267578125, "b": 274.92828369140625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 84]}], "orig": "l$_{box}$ = \u03bb$_{iou}$l$_{iou}$ + \u03bb$_{l}$$_{1}$ l = \u03bbl$_{s}$ + (1 - \u03bb ) l$_{box}$ (1)", "text": ""}, {"self_ref": "#/texts/257", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 50.112030029296875, "t": 261.4079895019531, "r": 281.596923828125, "b": 251.78411865234375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 76]}], "orig": "where \u03bb \u2208 [0, 1], and \u03bb$_{iou}$, \u03bb$_{l}$$_{1}$ \u2208$_{R}$ are hyper-parameters.", "text": "where \u03bb \u2208 [0, 1], and \u03bb$_{iou}$, \u03bb$_{l}$$_{1}$ \u2208$_{R}$ are hyper-parameters."}, {"self_ref": "#/texts/258", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 6, "bbox": {"l": 50.11204528808594, "t": 236.08311462402344, "r": 171.9833526611328, "b": 225.33538818359375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 23]}], "orig": "5. Experimental Results", "text": "5. Experimental Results", "level": 1}, {"self_ref": "#/texts/259", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 6, "bbox": {"l": 50.11204528808594, "t": 215.7356719970703, "r": 179.17501831054688, "b": 205.8836212158203, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 27]}], "orig": "5.1. Implementation Details", "text": "5.1. Implementation Details", "level": 1}, {"self_ref": "#/texts/260", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 50.11204528808594, "t": 196.2656707763672, "r": 286.36517333984375, "b": 151.4931182861328, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 207]}], "orig": "TableFormer uses ResNet-18 as the CNN Backbone Network . The input images are resized to 448*448 pixels and the feature map has a dimension of 28*28. Additionally, we enforce the following input constraints:", "text": "TableFormer uses ResNet-18 as the CNN Backbone Network . The input images are resized to 448*448 pixels and the feature map has a dimension of 28*28. Additionally, we enforce the following input constraints:"}, {"self_ref": "#/texts/261", "parent": {"cref": "#/body"}, "children": [], "label": "formula", "prov": [{"page_no": 6, "bbox": {"l": 91.66104888916016, "t": 138.1719970703125, "r": 286.3624572753906, "b": 113.60411834716797, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 77]}], "orig": "Image width and height \u2264 1024 pixels Structural tags length \u2264 512 tokens. (2)", "text": ""}, {"self_ref": "#/texts/262", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 50.112060546875, "t": 99.70968627929688, "r": 286.3651428222656, "b": 78.8481216430664, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 117]}], "orig": "Although input constraints are used also by other methods, such as EDD, ours are less restrictive due to the improved", "text": "Although input constraints are used also by other methods, such as EDD, ours are less restrictive due to the improved"}, {"self_ref": "#/texts/263", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 6, "bbox": {"l": 295.12103271484375, "t": 57.86667251586914, "r": 300.1023254394531, "b": 48.96010971069336, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "6", "text": "6"}, {"self_ref": "#/texts/264", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 308.862060546875, "t": 716.7916870117188, "r": 545.115234375, "b": 683.97509765625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 156]}], "orig": "runtime performance and lower memory footprint of TableFormer. This allows to utilize input samples with longer sequences and images with larger dimensions.", "text": "runtime performance and lower memory footprint of TableFormer. This allows to utilize input samples with longer sequences and images with larger dimensions."}, {"self_ref": "#/texts/265", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 308.862060546875, "t": 675.7706298828125, "r": 545.1152954101562, "b": 463.6259460449219, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1024]}], "orig": "The Transformer Encoder consists of two \"Transformer Encoder Layers\", with an input feature size of 512, feed forward network of 1024, and 4 attention heads. As for the Transformer Decoder it is composed of four \"Transformer Decoder Layers\" with similar input and output dimensions as the \"Transformer Encoder Layers\". Even though our model uses fewer layers and heads than the default implementation parameters, our extensive experimentation has proved this setup to be more suitable for table images. We attribute this finding to the inherent design of table images, which contain mostly lines and text, unlike the more elaborate content present in other scopes (e.g. the COCO dataset). Moreover, we have added ResNet blocks to the inputs of the Structure Decoder and Cell BBox Decoder. This prevents a decoder having a stronger influence over the learned weights which would damage the other prediction task (structure vs bounding boxes), but learn task specific weights instead. Lastly our dropout layers are set to 0.5.", "text": "The Transformer Encoder consists of two \"Transformer Encoder Layers\", with an input feature size of 512, feed forward network of 1024, and 4 attention heads. As for the Transformer Decoder it is composed of four \"Transformer Decoder Layers\" with similar input and output dimensions as the \"Transformer Encoder Layers\". Even though our model uses fewer layers and heads than the default implementation parameters, our extensive experimentation has proved this setup to be more suitable for table images. We attribute this finding to the inherent design of table images, which contain mostly lines and text, unlike the more elaborate content present in other scopes (e.g. the COCO dataset). Moreover, we have added ResNet blocks to the inputs of the Structure Decoder and Cell BBox Decoder. This prevents a decoder having a stronger influence over the learned weights which would damage the other prediction task (structure vs bounding boxes), but learn task specific weights instead. Lastly our dropout layers are set to 0.5."}, {"self_ref": "#/texts/266", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 308.8620300292969, "t": 455.4224853515625, "r": 545.1151733398438, "b": 362.83001708984375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 419]}], "orig": "For training, TableFormer is trained with 3 Adam optimizers, each one for the CNN Backbone Network , Structure Decoder , and Cell BBox Decoder . Taking the PubTabNet as an example for our parameter set up, the initializing learning rate is 0.001 for 12 epochs with a batch size of 24, and \u03bb set to 0.5. Afterwards, we reduce the learning rate to 0.0001, the batch size to 18 and train for 12 more epochs or convergence.", "text": "For training, TableFormer is trained with 3 Adam optimizers, each one for the CNN Backbone Network , Structure Decoder , and Cell BBox Decoder . Taking the PubTabNet as an example for our parameter set up, the initializing learning rate is 0.001 for 12 epochs with a batch size of 24, and \u03bb set to 0.5. Afterwards, we reduce the learning rate to 0.0001, the batch size to 18 and train for 12 more epochs or convergence."}, {"self_ref": "#/texts/267", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 308.8620300292969, "t": 354.6255798339844, "r": 545.115234375, "b": 238.12310791015625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 528]}], "orig": "TableFormer is implemented with PyTorch and Torchvision libraries [22]. To speed up the inference, the image undergoes a single forward pass through the CNN Backbone Network and transformer encoder. This eliminates the overhead of generating the same features for each decoding step. Similarly, we employ a 'caching' technique to preform faster autoregressive decoding. This is achieved by storing the features of decoded tokens so we can reuse them for each time step. Therefore, we only compute the attention for each new tag.", "text": "TableFormer is implemented with PyTorch and Torchvision libraries [22]. To speed up the inference, the image undergoes a single forward pass through the CNN Backbone Network and transformer encoder. This eliminates the overhead of generating the same features for each decoding step. Similarly, we employ a 'caching' technique to preform faster autoregressive decoding. This is achieved by storing the features of decoded tokens so we can reuse them for each time step. Therefore, we only compute the attention for each new tag."}, {"self_ref": "#/texts/268", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 6, "bbox": {"l": 308.8620300292969, "t": 212.4456787109375, "r": 397.44281005859375, "b": 202.5936279296875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 19]}], "orig": "5.2. Generalization", "text": "5.2. Generalization", "level": 1}, {"self_ref": "#/texts/269", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 308.8620300292969, "t": 188.55067443847656, "r": 545.1151733398438, "b": 119.86811065673828, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 299]}], "orig": "TableFormer is evaluated on three major publicly available datasets of different nature to prove the generalization and effectiveness of our model. The datasets used for evaluation are the PubTabNet, FinTabNet and TableBank which stem from the scientific, financial and general domains respectively.", "text": "TableFormer is evaluated on three major publicly available datasets of different nature to prove the generalization and effectiveness of our model. The datasets used for evaluation are the PubTabNet, FinTabNet and TableBank which stem from the scientific, financial and general domains respectively."}, {"self_ref": "#/texts/270", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 6, "bbox": {"l": 308.8620300292969, "t": 111.6646728515625, "r": 545.115234375, "b": 78.84710693359375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 155]}], "orig": "We also share our baseline results on the challenging SynthTabNet dataset. Throughout our experiments, the same parameters stated in Sec. 5.1 are utilized.", "text": "We also share our baseline results on the challenging SynthTabNet dataset. Throughout our experiments, the same parameters stated in Sec. 5.1 are utilized."}, {"self_ref": "#/texts/271", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 7, "bbox": {"l": 50.11199951171875, "t": 717.5986328125, "r": 167.89825439453125, "b": 707.74658203125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 25]}], "orig": "5.3. Datasets and Metrics", "text": "5.3. Datasets and Metrics", "level": 1}, {"self_ref": "#/texts/272", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 7, "bbox": {"l": 50.11199951171875, "t": 698.6495971679688, "r": 286.3651123046875, "b": 653.8770141601562, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 192]}], "orig": "The Tree-Edit-Distance-Based Similarity (TEDS) metric was introduced in [37]. It represents the prediction, and ground-truth as a tree structure of HTML tags. This similarity is calculated as:", "text": "The Tree-Edit-Distance-Based Similarity (TEDS) metric was introduced in [37]. It represents the prediction, and ground-truth as a tree structure of HTML tags. This similarity is calculated as:"}, {"self_ref": "#/texts/273", "parent": {"cref": "#/body"}, "children": [], "label": "formula", "prov": [{"page_no": 7, "bbox": {"l": 86.218994140625, "t": 641.6820068359375, "r": 286.3623962402344, "b": 619.26123046875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 99]}], "orig": "TEDS ( T$_{a}$, T$_{b}$ ) = 1 - EditDist ( T$_{a}$, T$_{b}$ ) max ( | T$_{a}$ | , | T$_{b}$ | ) (3)", "text": ""}, {"self_ref": "#/texts/274", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 7, "bbox": {"l": 50.11198425292969, "t": 610.9970092773438, "r": 286.36285400390625, "b": 578.02099609375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 162]}], "orig": "where T$_{a}$ and T$_{b}$ represent tables in tree structure HTML format. EditDist denotes the tree-edit distance, and | T | represents the number of nodes in T .", "text": "where T$_{a}$ and T$_{b}$ represent tables in tree structure HTML format. EditDist denotes the tree-edit distance, and | T | represents the number of nodes in T ."}, {"self_ref": "#/texts/275", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 7, "bbox": {"l": 50.11199951171875, "t": 567.1805419921875, "r": 170.45169067382812, "b": 557.3284912109375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 26]}], "orig": "5.4. Quantitative Analysis", "text": "5.4. Quantitative Analysis", "level": 1}, {"self_ref": "#/texts/276", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 7, "bbox": {"l": 50.11199951171875, "t": 548.35009765625, "r": 286.3651428222656, "b": 395.862060546875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 723]}], "orig": "Structure. As shown in Tab. 2, TableFormer outperforms all SOTA methods across different datasets by a large margin for predicting the table structure from an image. All the more, our model outperforms pre-trained methods. During the evaluation we do not apply any table filtering. We also provide our baseline results on the SynthTabNet dataset. It has been observed that large tables (e.g. tables that occupy half of the page or more) yield poor predictions. We attribute this issue to the image resizing during the preprocessing step, that produces downsampled images with indistinguishable features. This problem can be addressed by treating such big tables with a separate model which accepts a large input image size.", "text": "Structure. As shown in Tab. 2, TableFormer outperforms all SOTA methods across different datasets by a large margin for predicting the table structure from an image. All the more, our model outperforms pre-trained methods. During the evaluation we do not apply any table filtering. We also provide our baseline results on the SynthTabNet dataset. It has been observed that large tables (e.g. tables that occupy half of the page or more) yield poor predictions. We attribute this issue to the image resizing during the preprocessing step, that produces downsampled images with indistinguishable features. This problem can be addressed by treating such big tables with a separate model which accepts a large input image size."}, {"self_ref": "#/texts/277", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 7, "bbox": {"l": 50.11199951171875, "t": 199.56663513183594, "r": 286.3651123046875, "b": 178.705078125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 101]}], "orig": "Table 2: Structure results on PubTabNet (PTN), FinTabNet (FTN), TableBank (TB) and SynthTabNet (STN).", "text": "Table 2: Structure results on PubTabNet (PTN), FinTabNet (FTN), TableBank (TB) and SynthTabNet (STN)."}, {"self_ref": "#/texts/278", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 7, "bbox": {"l": 50.11199951171875, "t": 175.65663146972656, "r": 261.7873229980469, "b": 166.7500762939453, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 50]}], "orig": "FT: Model was trained on PubTabNet then finetuned.", "text": "FT: Model was trained on PubTabNet then finetuned."}, {"self_ref": "#/texts/279", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 7, "bbox": {"l": 50.11201477050781, "t": 147.6501922607422, "r": 286.3659973144531, "b": 78.84806823730469, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 346]}], "orig": "Cell Detection. Like any object detector, our Cell BBox Detector provides bounding boxes that can be improved with post-processing during inference. We make use of the grid-like structure of tables to refine the predictions. A detailed explanation on the post-processing is available in the supplementary material. As shown in Tab. 3, we evaluate", "text": "Cell Detection. Like any object detector, our Cell BBox Detector provides bounding boxes that can be improved with post-processing during inference. We make use of the grid-like structure of tables to refine the predictions. A detailed explanation on the post-processing is available in the supplementary material. As shown in Tab. 3, we evaluate"}, {"self_ref": "#/texts/280", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 7, "bbox": {"l": 295.1210021972656, "t": 57.866641998291016, "r": 300.102294921875, "b": 48.960079193115234, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "7", "text": "7"}, {"self_ref": "#/texts/281", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 7, "bbox": {"l": 308.86199951171875, "t": 716.7916259765625, "r": 545.1151733398438, "b": 564.4229125976562, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 737]}], "orig": "our Cell BBox Decoder accuracy for cells with a class label of 'content' only using the PASCAL VOC mAP metric for pre-processing and post-processing. Note that we do not have post-processing results for SynthTabNet as images are only provided. To compare the performance of our proposed approach, we've integrated TableFormer's Cell BBox Decoder into EDD architecture. As mentioned previously, the Structure Decoder provides the Cell BBox Decoder with the features needed to predict the bounding box predictions. Therefore, the accuracy of the Structure Decoder directly influences the accuracy of the Cell BBox Decoder . If the Structure Decoder predicts an extra column, this will result in an extra column of predicted bounding boxes.", "text": "our Cell BBox Decoder accuracy for cells with a class label of 'content' only using the PASCAL VOC mAP metric for pre-processing and post-processing. Note that we do not have post-processing results for SynthTabNet as images are only provided. To compare the performance of our proposed approach, we've integrated TableFormer's Cell BBox Decoder into EDD architecture. As mentioned previously, the Structure Decoder provides the Cell BBox Decoder with the features needed to predict the bounding box predictions. Therefore, the accuracy of the Structure Decoder directly influences the accuracy of the Cell BBox Decoder . If the Structure Decoder predicts an extra column, this will result in an extra column of predicted bounding boxes."}, {"self_ref": "#/texts/282", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 7, "bbox": {"l": 308.86199951171875, "t": 475.5506896972656, "r": 545.1151733398438, "b": 454.68914794921875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 94]}], "orig": "Table 3: Cell Bounding Box detection results on PubTabNet, and FinTabNet. PP: Post-processing.", "text": "Table 3: Cell Bounding Box detection results on PubTabNet, and FinTabNet. PP: Post-processing."}, {"self_ref": "#/texts/283", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 7, "bbox": {"l": 308.8619689941406, "t": 424.3202819824219, "r": 545.1156616210938, "b": 271.8323059082031, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 715]}], "orig": "Cell Content. In this section, we evaluate the entire pipeline of recovering a table with content. Here we put our approach to test by capitalizing on extracting content from the PDF cells rather than decoding from images. Tab. 4 shows the TEDs score of HTML code representing the structure of the table along with the content inserted in the data cell and compared with the ground-truth. Our method achieved a 5.3% increase over the state-of-the-art, and commercial solutions. We believe our scores would be higher if the HTML ground-truth matched the extracted PDF cell content. Unfortunately, there are small discrepancies such as spacings around words or special characters with various unicode representations.", "text": "Cell Content. In this section, we evaluate the entire pipeline of recovering a table with content. Here we put our approach to test by capitalizing on extracting content from the PDF cells rather than decoding from images. Tab. 4 shows the TEDs score of HTML code representing the structure of the table along with the content inserted in the data cell and compared with the ground-truth. Our method achieved a 5.3% increase over the state-of-the-art, and commercial solutions. We believe our scores would be higher if the HTML ground-truth matched the extracted PDF cell content. Unfortunately, there are small discrepancies such as spacings around words or special characters with various unicode representations."}, {"self_ref": "#/texts/284", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 7, "bbox": {"l": 308.86199951171875, "t": 135.13864135742188, "r": 545.1151733398438, "b": 102.32206726074219, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 148]}], "orig": "Table 4: Results of structure with content retrieved using cell detection on PubTabNet. In all cases the input is PDF documents with cropped tables.", "text": "Table 4: Results of structure with content retrieved using cell detection on PubTabNet. In all cases the input is PDF documents with cropped tables."}, {"self_ref": "#/texts/285", "parent": {"cref": "#/groups/4"}, "children": [], "label": "list_item", "prov": [{"page_no": 8, "bbox": {"l": 53.28603744506836, "t": 713.3124389648438, "r": 61.550289154052734, "b": 705.4392700195312, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "a.", "text": "a.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/286", "parent": {"cref": "#/groups/4"}, "children": [], "label": "list_item", "prov": [{"page_no": 8, "bbox": {"l": 65.68241882324219, "t": 713.3124389648438, "r": 499.5556335449219, "b": 705.4392700195312, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 105]}], "orig": "Red - PDF cells, Green - predicted bounding boxes, Blue - post-processed predictions matched to PDF cells", "text": "Red - PDF cells, Green - predicted bounding boxes, Blue - post-processed predictions matched to PDF cells", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/287", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 8, "bbox": {"l": 53.81178283691406, "t": 697.7188720703125, "r": 284.3459167480469, "b": 689.845703125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 53]}], "orig": "Japanese language (previously unseen by TableFormer):", "text": "Japanese language (previously unseen by TableFormer):", "level": 1}, {"self_ref": "#/texts/288", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 8, "bbox": {"l": 304.830810546875, "t": 697.7188720703125, "r": 431.0911865234375, "b": 689.845703125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 29]}], "orig": "Example table from FinTabNet:", "text": "Example table from FinTabNet:", "level": 1}, {"self_ref": "#/texts/289", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 8, "bbox": {"l": 53.81178283691406, "t": 583.7667236328125, "r": 385.93450927734375, "b": 575.8935546875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 79]}], "orig": "b. Structure predicted by TableFormer, with superimposed matched PDF cell text:", "text": "b. Structure predicted by TableFormer, with superimposed matched PDF cell text:"}, {"self_ref": "#/texts/290", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 8, "bbox": {"l": 380.42730712890625, "t": 499.69573974609375, "r": 549.4217529296875, "b": 493.39715576171875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 53]}], "orig": "Text is aligned to match original for ease of viewing", "text": "Text is aligned to match original for ease of viewing"}, {"self_ref": "#/texts/291", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 8, "bbox": {"l": 50.11199951171875, "t": 471.1226501464844, "r": 545.11376953125, "b": 426.3501281738281, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 397]}], "orig": "Figure 5: One of the benefits of TableFormer is that it is language agnostic, as an example, the left part of the illustration demonstrates TableFormer predictions on previously unseen language (Japanese). Additionally, we see that TableFormer is robust to variability in style and content, right side of the illustration shows the example of the TableFormer prediction from the FinTabNet dataset.", "text": "Figure 5: One of the benefits of TableFormer is that it is language agnostic, as an example, the left part of the illustration demonstrates TableFormer predictions on previously unseen language (Japanese). Additionally, we see that TableFormer is robust to variability in style and content, right side of the illustration shows the example of the TableFormer prediction from the FinTabNet dataset."}, {"self_ref": "#/texts/292", "parent": {"cref": "#/pictures/8"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 53.715248, "t": 410.22278, "r": 85.657333, "b": 405.55719, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 12]}], "orig": "Ground Truth", "text": "Ground Truth"}, {"self_ref": "#/texts/293", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 437.37939, "t": 391.44705, "r": 443.69870000000003, "b": 385.12842, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "16", "text": "16"}, {"self_ref": "#/texts/294", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 450.33203, "t": 391.44705, "r": 456.6513100000001, "b": 385.12842, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "17", "text": "17"}, {"self_ref": "#/texts/295", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 463.28464, "t": 391.44705, "r": 469.60394, "b": 385.12842, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "18", "text": "18"}, {"self_ref": "#/texts/296", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 476.23724000000004, "t": 391.44705, "r": 482.5565500000001, "b": 385.12842, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "19", "text": "19"}, {"self_ref": "#/texts/297", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 489.18988, "t": 391.44705, "r": 495.50916, "b": 385.12842, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "20", "text": "20"}, {"self_ref": "#/texts/298", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 502.14251999999993, "t": 391.44705, "r": 508.46178999999995, "b": 385.12842, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "21", "text": "21"}, {"self_ref": "#/texts/299", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 515.09509, "t": 391.44705, "r": 521.41443, "b": 385.12842, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "22", "text": "22"}, {"self_ref": "#/texts/300", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 385.2814, "t": 380.96163999999993, "r": 391.60071, "b": 374.64301, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "23", "text": "23"}, {"self_ref": "#/texts/301", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 398.52341, "t": 380.96163999999993, "r": 404.84271, "b": 374.64301, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "24", "text": "24"}, {"self_ref": "#/texts/302", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 411.47604, "t": 380.96163999999993, "r": 417.79535, "b": 374.64301, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "25", "text": "25"}, {"self_ref": "#/texts/303", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 437.37939, "t": 380.96163999999993, "r": 443.69870000000003, "b": 374.64301, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "26", "text": "26"}, {"self_ref": "#/texts/304", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 450.33203, "t": 380.96163999999993, "r": 456.6513100000001, "b": 374.64301, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "27", "text": "27"}, {"self_ref": "#/texts/305", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 463.28464, "t": 380.96163999999993, "r": 469.60394, "b": 374.64301, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "28", "text": "28"}, {"self_ref": "#/texts/306", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 385.2814, "t": 370.9303, "r": 391.60071, "b": 364.61166, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "30", "text": "30"}, {"self_ref": "#/texts/307", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 398.52341, "t": 370.9303, "r": 404.84271, "b": 364.61166, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "31", "text": "31"}, {"self_ref": "#/texts/308", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 411.47604, "t": 370.9303, "r": 417.79532, "b": 364.61166, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "32", "text": "32"}, {"self_ref": "#/texts/309", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 424.42865, "t": 370.9303, "r": 430.74796, "b": 364.61166, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "33", "text": "33"}, {"self_ref": "#/texts/310", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 437.38129, "t": 370.9303, "r": 443.70056, "b": 364.61166, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "34", "text": "34"}, {"self_ref": "#/texts/311", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 450.33389000000005, "t": 370.9303, "r": 456.65319999999997, "b": 364.61166, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "35", "text": "35"}, {"self_ref": "#/texts/312", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 463.2865, "t": 370.9303, "r": 469.6058, "b": 364.61166, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "36", "text": "36"}, {"self_ref": "#/texts/313", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 476.23914, "t": 370.9303, "r": 482.55841, "b": 364.61166, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "37", "text": "37"}, {"self_ref": "#/texts/314", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 489.1917700000001, "t": 370.9303, "r": 495.51105, "b": 364.61166, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "38", "text": "38"}, {"self_ref": "#/texts/315", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 502.14438, "t": 370.9303, "r": 508.46368, "b": 364.61166, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "39", "text": "39"}, {"self_ref": "#/texts/316", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 515.09705, "t": 370.9303, "r": 521.41632, "b": 364.61166, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "40", "text": "40"}, {"self_ref": "#/texts/317", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 528.04962, "t": 370.9303, "r": 534.3689, "b": 364.61166, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "41", "text": "41"}, {"self_ref": "#/texts/318", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 385.2814, "t": 359.95569, "r": 391.60071, "b": 353.63705, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "42", "text": "42"}, {"self_ref": "#/texts/319", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 398.52341, "t": 359.95569, "r": 404.84271, "b": 353.63705, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "43", "text": "43"}, {"self_ref": "#/texts/320", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 411.47604, "t": 359.95569, "r": 417.79532, "b": 353.63705, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "44", "text": "44"}, {"self_ref": "#/texts/321", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 424.42865, "t": 359.95569, "r": 430.74796, "b": 353.63705, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "45", "text": "45"}, {"self_ref": "#/texts/322", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 437.38129, "t": 359.95569, "r": 443.70056, "b": 353.63705, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "46", "text": "46"}, {"self_ref": "#/texts/323", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 450.33389000000005, "t": 359.95569, "r": 456.65319999999997, "b": 353.63705, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "47", "text": "47"}, {"self_ref": "#/texts/324", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 463.2865, "t": 359.95569, "r": 469.6058, "b": 353.63705, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "48", "text": "48"}, {"self_ref": "#/texts/325", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 476.23914, "t": 359.95569, "r": 482.55841, "b": 353.63705, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "49", "text": "49"}, {"self_ref": "#/texts/326", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 489.1917700000001, "t": 359.95569, "r": 495.51105, "b": 353.63705, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "50", "text": "50"}, {"self_ref": "#/texts/327", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 502.14438, "t": 359.95569, "r": 508.46368, "b": 353.63705, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "51", "text": "51"}, {"self_ref": "#/texts/328", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 515.09705, "t": 359.95569, "r": 521.41632, "b": 353.63705, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "52", "text": "52"}, {"self_ref": "#/texts/329", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 528.04962, "t": 359.95569, "r": 534.3689, "b": 353.63705, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "53", "text": "53"}, {"self_ref": "#/texts/330", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 385.2814, "t": 402.79996, "r": 388.44073, "b": 396.48132, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "0", "text": "0"}, {"self_ref": "#/texts/331", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 398.52341, "t": 402.79996, "r": 401.68274, "b": 396.48132, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/332", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 411.4754, "t": 402.79996, "r": 414.63474, "b": 396.48132, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "2", "text": "2"}, {"self_ref": "#/texts/333", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 424.4274, "t": 402.79996, "r": 427.58673, "b": 396.48132, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "3", "text": "3"}, {"self_ref": "#/texts/334", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 437.37939, "t": 402.79996, "r": 440.53870000000006, "b": 396.48132, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "4", "text": "4"}, {"self_ref": "#/texts/335", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 450.33136, "t": 402.79996, "r": 453.49069000000003, "b": 396.48132, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "5", "text": "5"}, {"self_ref": "#/texts/336", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 463.28336, "t": 402.79996, "r": 466.44269, "b": 396.48132, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "6", "text": "6"}, {"self_ref": "#/texts/337", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 476.23535, "t": 402.79996, "r": 479.39468, "b": 396.48132, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "7", "text": "7"}, {"self_ref": "#/texts/338", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 489.18735, "t": 402.79996, "r": 492.34668, "b": 396.48132, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "8", "text": "8"}, {"self_ref": "#/texts/339", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 502.13933999999995, "t": 402.79996, "r": 505.29868000000005, "b": 396.48132, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "9", "text": "9"}, {"self_ref": "#/texts/340", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 515.09131, "t": 402.79996, "r": 521.41064, "b": 396.48132, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "10", "text": "10"}, {"self_ref": "#/texts/341", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 528.04364, "t": 402.79996, "r": 534.13104, "b": 396.48132, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "11", "text": "11"}, {"self_ref": "#/texts/342", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 385.2814, "t": 393.02536, "r": 391.60071, "b": 386.70673, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "12", "text": "12"}, {"self_ref": "#/texts/343", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 398.52341, "t": 393.02536, "r": 404.84271, "b": 386.70673, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "13", "text": "13"}, {"self_ref": "#/texts/344", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 411.47604, "t": 393.02536, "r": 417.79535, "b": 386.70673, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "14", "text": "14"}, {"self_ref": "#/texts/345", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 424.42719, "t": 385.22536999999994, "r": 430.74648999999994, "b": 378.90674, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "15", "text": "15"}, {"self_ref": "#/texts/346", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 502.86941999999993, "t": 381.00562, "r": 509.18871999999993, "b": 374.68698, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "29", "text": "29"}, {"self_ref": "#/texts/347", "parent": {"cref": "#/pictures/9"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 384.35437, "t": 410.22278, "r": 430.99261, "b": 405.55719, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 19]}], "orig": "Predicted Structure", "text": "Predicted Structure"}, {"self_ref": "#/texts/348", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 8, "bbox": {"l": 62.595001220703125, "t": 333.2716369628906, "r": 532.6304931640625, "b": 324.3650817871094, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 112]}], "orig": "Figure 6: An example of TableFormer predictions (bounding boxes and structure) from generated SynthTabNet table.", "text": "Figure 6: An example of TableFormer predictions (bounding boxes and structure) from generated SynthTabNet table."}, {"self_ref": "#/texts/349", "parent": {"cref": "#/pictures/10"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 220.26282, "t": 410.22278, "r": 342.07819, "b": 405.55719, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 49]}], "orig": "Red - PDF cells, Green - predicted bounding boxes", "text": "Red - PDF cells, Green - predicted bounding boxes"}, {"self_ref": "#/texts/350", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 8, "bbox": {"l": 50.11199951171875, "t": 300.6046447753906, "r": 163.75579833984375, "b": 290.7525939941406, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 25]}], "orig": "5.5. Qualitative Analysis", "text": "5.5. Qualitative Analysis", "level": 1}, {"self_ref": "#/texts/351", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 50.11199951171875, "t": 255.1266326904297, "r": 286.3651123046875, "b": 78.84805297851562, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 866]}], "orig": "We showcase several visualizations for the different components of our network on various \"complex\" tables within datasets presented in this work in Fig. 5 and Fig. 6 As it is shown, our model is able to predict bounding boxes for all table cells, even for the empty ones. Additionally, our post-processing techniques can extract the cell content by matching the predicted bounding boxes to the PDF cells based on their overlap and spatial proximity. The left part of Fig. 5 demonstrates also the adaptability of our method to any language, as it can successfully extract Japanese text, although the training set contains only English content. We provide more visualizations including the intermediate steps in the supplementary material. Overall these illustrations justify the versatility of our method across a diverse range of table appearances and content type.", "text": "We showcase several visualizations for the different components of our network on various \"complex\" tables within datasets presented in this work in Fig. 5 and Fig. 6 As it is shown, our model is able to predict bounding boxes for all table cells, even for the empty ones. Additionally, our post-processing techniques can extract the cell content by matching the predicted bounding boxes to the PDF cells based on their overlap and spatial proximity. The left part of Fig. 5 demonstrates also the adaptability of our method to any language, as it can successfully extract Japanese text, although the training set contains only English content. We provide more visualizations including the intermediate steps in the supplementary material. Overall these illustrations justify the versatility of our method across a diverse range of table appearances and content type."}, {"self_ref": "#/texts/352", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 8, "bbox": {"l": 308.86199951171875, "t": 301.29107666015625, "r": 460.8484802246094, "b": 290.5433654785156, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 27]}], "orig": "6. Future Work & Conclusion", "text": "6. Future Work & Conclusion", "level": 1}, {"self_ref": "#/texts/353", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 8, "bbox": {"l": 308.86199951171875, "t": 279.10662841796875, "r": 545.1151733398438, "b": 138.69407653808594, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 640]}], "orig": "In this paper, we presented TableFormer an end-to-end transformer based approach to predict table structures and bounding boxes of cells from an image. This approach enables us to recreate the table structure, and extract the cell content from PDF or OCR by using bounding boxes. Additionally, it provides the versatility required in real-world scenarios when dealing with various types of PDF documents, and languages. Furthermore, our method outperforms all state-of-the-arts with a wide margin. Finally, we introduce \"SynthTabNet\" a challenging synthetically generated dataset that reinforces missing characteristics from other datasets.", "text": "In this paper, we presented TableFormer an end-to-end transformer based approach to predict table structures and bounding boxes of cells from an image. This approach enables us to recreate the table structure, and extract the cell content from PDF or OCR by using bounding boxes. Additionally, it provides the versatility required in real-world scenarios when dealing with various types of PDF documents, and languages. Furthermore, our method outperforms all state-of-the-arts with a wide margin. Finally, we introduce \"SynthTabNet\" a challenging synthetically generated dataset that reinforces missing characteristics from other datasets."}, {"self_ref": "#/texts/354", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 8, "bbox": {"l": 308.86199951171875, "t": 119.90107727050781, "r": 364.4058532714844, "b": 109.15335845947266, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 10]}], "orig": "References", "text": "References", "level": 1}, {"self_ref": "#/texts/355", "parent": {"cref": "#/groups/5"}, "children": [], "label": "list_item", "prov": [{"page_no": 8, "bbox": {"l": 313.3450012207031, "t": 98.0382080078125, "r": 545.1134033203125, "b": 79.06324768066406, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 121]}], "orig": "[1] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-", "text": "[1] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/356", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 8, "bbox": {"l": 295.1210021972656, "t": 57.866634368896484, "r": 300.102294921875, "b": 48.9600715637207, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "8", "text": "8"}, {"self_ref": "#/texts/357", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 70.03099822998047, "t": 716.1162109375, "r": 286.36334228515625, "b": 675.2242431640625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 212]}], "orig": "end object detection with transformers. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision - ECCV 2020 , pages 213-229, Cham, 2020. Springer International Publishing. 5", "text": "end object detection with transformers. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision - ECCV 2020 , pages 213-229, Cham, 2020. Springer International Publishing. 5", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/358", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 54.59500503540039, "t": 671.96826171875, "r": 286.36334228515625, "b": 642.0343017578125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 165]}], "orig": "[2] Zewen Chi, Heyan Huang, Heng-Da Xu, Houjin Yu, Wanxuan Yin, and Xian-Ling Mao. Complicated table structure recognition. arXiv preprint arXiv:1908.04729 , 2019. 3", "text": "[2] Zewen Chi, Heyan Huang, Heng-Da Xu, Houjin Yu, Wanxuan Yin, and Xian-Ling Mao. Complicated table structure recognition. arXiv preprint arXiv:1908.04729 , 2019. 3", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/359", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 54.595001220703125, "t": 638.7783203125, "r": 286.3630065917969, "b": 608.8453369140625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 125]}], "orig": "[3] Bertrand Couasnon and Aurelie Lemaitre. Recognition of Tables and Forms , pages 647-677. Springer London, London, 2014. 2", "text": "[3] Bertrand Couasnon and Aurelie Lemaitre. Recognition of Tables and Forms , pages 647-677. Springer London, London, 2014. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/360", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 54.59498977661133, "t": 605.58935546875, "r": 286.364013671875, "b": 564.6964111328125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 216]}], "orig": "[4] Herv'e D'ejean, Jean-Luc Meunier, Liangcai Gao, Yilun Huang, Yu Fang, Florian Kleber, and Eva-Maria Lang. ICDAR 2019 Competition on Table Detection and Recognition (cTDaR), Apr. 2019. http://sac.founderit.com/. 2", "text": "[4] Herv'e D'ejean, Jean-Luc Meunier, Liangcai Gao, Yilun Huang, Yu Fang, Florian Kleber, and Eva-Maria Lang. ICDAR 2019 Competition on Table Detection and Recognition (cTDaR), Apr. 2019. http://sac.founderit.com/. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/361", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 54.5949821472168, "t": 561.4404296875, "r": 286.36334228515625, "b": 520.5484619140625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 236]}], "orig": "[5] Basilios Gatos, Dimitrios Danatsas, Ioannis Pratikakis, and Stavros J Perantonis. Automatic table detection in document images. In International Conference on Pattern Recognition and Image Analysis , pages 609-618. Springer, 2005. 2", "text": "[5] Basilios Gatos, Dimitrios Danatsas, Ioannis Pratikakis, and Stavros J Perantonis. Automatic table detection in document images. In International Conference on Pattern Recognition and Image Analysis , pages 609-618. Springer, 2005. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/362", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 54.594970703125, "t": 517.2924194335938, "r": 286.36676025390625, "b": 476.3995056152344, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 193]}], "orig": "[6] Max Gobel, Tamir Hassan, Ermelinda Oro, and Giorgio Orsi. Icdar 2013 table competition. In 2013 12th International Conference on Document Analysis and Recognition , pages 1449-1453, 2013. 2", "text": "[6] Max Gobel, Tamir Hassan, Ermelinda Oro, and Giorgio Orsi. Icdar 2013 table competition. In 2013 12th International Conference on Document Analysis and Recognition , pages 1449-1453, 2013. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/363", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 54.59498977661133, "t": 473.1434631347656, "r": 286.3631896972656, "b": 443.2104797363281, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 165]}], "orig": "[7] EA Green and M Krishnamoorthy. Recognition of tables using table grammars. procs. In Symposium on Document Analysis and Recognition (SDAIR'95) , pages 261-277. 2", "text": "[7] EA Green and M Krishnamoorthy. Recognition of tables using table grammars. procs. In Symposium on Document Analysis and Recognition (SDAIR'95) , pages 261-277. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/364", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 54.59498596191406, "t": 439.9544372558594, "r": 286.3633117675781, "b": 388.1025085449219, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 273]}], "orig": "[8] Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Didier Stricker, and Muhammad Zeshan Afzal. Castabdetectors: Cascade network for table detection in document images with recursive feature pyramid and switchable atrous convolution. Journal of Imaging , 7(10), 2021. 1", "text": "[8] Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Didier Stricker, and Muhammad Zeshan Afzal. Castabdetectors: Cascade network for table detection in document images with recursive feature pyramid and switchable atrous convolution. Journal of Imaging , 7(10), 2021. 1", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/365", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 54.595001220703125, "t": 384.84747314453125, "r": 286.3598937988281, "b": 354.9135437011719, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 170]}], "orig": "[9] Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) , Oct 2017. 1", "text": "[9] Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) , Oct 2017. 1", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/366", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 50.11199951171875, "t": 351.6575012207031, "r": 286.36334228515625, "b": 310.7645568847656, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 226]}], "orig": "[10] Yelin He, X. Qi, Jiaquan Ye, Peng Gao, Yihao Chen, Bingcong Li, Xin Tang, and Rong Xiao. Pingan-vcgroup's solution for icdar 2021 competition on scientific table image recognition to latex. ArXiv , abs/2105.01846, 2021. 2", "text": "[10] Yelin He, X. Qi, Jiaquan Ye, Peng Gao, Yihao Chen, Bingcong Li, Xin Tang, and Rong Xiao. Pingan-vcgroup's solution for icdar 2021 competition on scientific table image recognition to latex. ArXiv , abs/2105.01846, 2021. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/367", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 50.11199951171875, "t": 307.509521484375, "r": 286.3633117675781, "b": 255.65762329101562, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 239]}], "orig": "[11] Jianying Hu, Ramanujan S Kashi, Daniel P Lopresti, and Gordon Wilfong. Medium-independent table detection. In Document Recognition and Retrieval VII , volume 3967, pages 291-302. International Society for Optics and Photonics, 1999. 2", "text": "[11] Jianying Hu, Ramanujan S Kashi, Daniel P Lopresti, and Gordon Wilfong. Medium-independent table detection. In Document Recognition and Retrieval VII , volume 3967, pages 291-302. International Society for Optics and Photonics, 1999. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/368", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 50.11200714111328, "t": 252.40158081054688, "r": 286.36334228515625, "b": 200.55062866210938, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 240]}], "orig": "[12] Matthew Hurst. A constraint-based approach to table structure derivation. In Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2 , ICDAR '03, page 911, USA, 2003. IEEE Computer Society. 2", "text": "[12] Matthew Hurst. A constraint-based approach to table structure derivation. In Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2 , ICDAR '03, page 911, USA, 2003. IEEE Computer Society. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/369", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 50.11200714111328, "t": 197.29458618164062, "r": 286.3633117675781, "b": 145.442626953125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 283]}], "orig": "[13] Thotreingam Kasar, Philippine Barlas, Sebastien Adam, Cl'ement Chatelain, and Thierry Paquet. Learning to detect tables in scanned document images using line information. In 2013 12th International Conference on Document Analysis and Recognition , pages 1185-1189. IEEE, 2013. 2", "text": "[13] Thotreingam Kasar, Philippine Barlas, Sebastien Adam, Cl'ement Chatelain, and Thierry Paquet. Learning to detect tables in scanned document images using line information. In 2013 12th International Conference on Document Analysis and Recognition , pages 1185-1189. IEEE, 2013. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/370", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 50.11199188232422, "t": 142.18658447265625, "r": 286.36334228515625, "b": 112.25361633300781, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 142]}], "orig": "[14] Pratik Kayal, Mrinal Anand, Harsh Desai, and Mayank Singh. Icdar 2021 competition on scientific table image recognition to latex, 2021. 2", "text": "[14] Pratik Kayal, Mrinal Anand, Harsh Desai, and Mayank Singh. Icdar 2021 competition on scientific table image recognition to latex, 2021. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/371", "parent": {"cref": "#/groups/6"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 50.11199188232422, "t": 108.99756622314453, "r": 286.35931396484375, "b": 79.06361389160156, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 127]}], "orig": "[15] Harold W Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly , 2(1-2):83-97, 1955. 6", "text": "[15] Harold W Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly , 2(1-2):83-97, 1955. 6", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/372", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 9, "bbox": {"l": 295.12103271484375, "t": 57.86741256713867, "r": 300.1023254394531, "b": 48.96084976196289, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "9", "text": "9"}, {"self_ref": "#/texts/373", "parent": {"cref": "#/groups/7"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 308.8619689941406, "t": 716.1165771484375, "r": 545.11474609375, "b": 653.306640625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 287]}], "orig": "[16] Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C. Berg, and Tamara L. Berg. Babytalk: Understanding and generating simple image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence , 35(12):2891-2903, 2013. 4", "text": "[16] Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C. Berg, and Tamara L. Berg. Babytalk: Understanding and generating simple image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence , 35(12):2891-2903, 2013. 4", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/374", "parent": {"cref": "#/groups/7"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 308.86199951171875, "t": 649.8766479492188, "r": 545.1134033203125, "b": 619.9436645507812, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 156]}], "orig": "[17] Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, and Zhoujun Li. Tablebank: A benchmark dataset for table detection and recognition, 2019. 2, 3", "text": "[17] Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, and Zhoujun Li. Tablebank: A benchmark dataset for table detection and recognition, 2019. 2, 3", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/375", "parent": {"cref": "#/groups/7"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 308.86199951171875, "t": 616.513671875, "r": 545.113525390625, "b": 531.7857666015625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 407]}], "orig": "[18] Yiren Li, Zheng Huang, Junchi Yan, Yi Zhou, Fan Ye, and Xianhui Liu. Gfte: Graph-based financial table extraction. In Alberto Del Bimbo, Rita Cucchiara, Stan Sclaroff, Giovanni Maria Farinella, Tao Mei, Marco Bertini, Hugo Jair Escalante, and Roberto Vezzani, editors, Pattern Recognition. ICPR International Workshops and Challenges , pages 644-658, Cham, 2021. Springer International Publishing. 2, 3", "text": "[18] Yiren Li, Zheng Huang, Junchi Yan, Yi Zhou, Fan Ye, and Xianhui Liu. Gfte: Graph-based financial table extraction. In Alberto Del Bimbo, Rita Cucchiara, Stan Sclaroff, Giovanni Maria Farinella, Tao Mei, Marco Bertini, Hugo Jair Escalante, and Roberto Vezzani, editors, Pattern Recognition. ICPR International Workshops and Challenges , pages 644-658, Cham, 2021. Springer International Publishing. 2, 3", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/376", "parent": {"cref": "#/groups/7"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 308.86199951171875, "t": 528.3557739257812, "r": 545.1141967773438, "b": 465.5458679199219, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 328]}], "orig": "[19] Nikolaos Livathinos, Cesar Berrospi, Maksym Lysak, Viktor Kuropiatnyk, Ahmed Nassar, Andre Carvalho, Michele Dolfi, Christoph Auer, Kasper Dinkla, and Peter Staar. Robust pdf document conversion using recurrent neural networks. Proceedings of the AAAI Conference on Artificial Intelligence , 35(17):15137-15145, May 2021. 1", "text": "[19] Nikolaos Livathinos, Cesar Berrospi, Maksym Lysak, Viktor Kuropiatnyk, Ahmed Nassar, Andre Carvalho, Michele Dolfi, Christoph Auer, Kasper Dinkla, and Peter Staar. Robust pdf document conversion using recurrent neural networks. Proceedings of the AAAI Conference on Artificial Intelligence , 35(17):15137-15145, May 2021. 1", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/377", "parent": {"cref": "#/groups/7"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 308.86199951171875, "t": 462.1158142089844, "r": 545.1160888671875, "b": 421.2228698730469, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 229]}], "orig": "[20] Rujiao Long, Wen Wang, Nan Xue, Feiyu Gao, Zhibo Yang, Yongpan Wang, and Gui-Song Xia. Parsing table structures in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 944-952, 2021. 2", "text": "[20] Rujiao Long, Wen Wang, Nan Xue, Feiyu Gao, Zhibo Yang, Yongpan Wang, and Gui-Song Xia. Parsing table structures in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 944-952, 2021. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/378", "parent": {"cref": "#/groups/7"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 308.86199951171875, "t": 417.7938232421875, "r": 545.1134643554688, "b": 354.9829406738281, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 315]}], "orig": "[21] Shubham Singh Paliwal, D Vishwanath, Rohit Rahul, Monika Sharma, and Lovekesh Vig. Tablenet: Deep learning model for end-to-end table detection and tabular data extraction from scanned document images. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 128-133. IEEE, 2019. 1", "text": "[21] Shubham Singh Paliwal, D Vishwanath, Rohit Rahul, Monika Sharma, and Lovekesh Vig. Tablenet: Deep learning model for end-to-end table detection and tabular data extraction from scanned document images. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 128-133. IEEE, 2019. 1", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/379", "parent": {"cref": "#/groups/7"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 308.86199951171875, "t": 351.55389404296875, "r": 545.11474609375, "b": 233.94903564453125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 592]}], "orig": "[22] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch'e-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32 , pages 8024-8035. Curran Associates, Inc., 2019. 6", "text": "[22] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch'e-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32 , pages 8024-8035. Curran Associates, Inc., 2019. 6", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/380", "parent": {"cref": "#/groups/7"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 308.86199951171875, "t": 230.5189971923828, "r": 545.1134033203125, "b": 167.7090301513672, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 322]}], "orig": "[23] Devashish Prasad, Ayan Gadpal, Kshitij Kapadni, Manish Visave, and Kavita Sultanpure. Cascadetabnet: An approach for end to end table detection and structure recognition from image-based documents. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops , pages 572-573, 2020. 1", "text": "[23] Devashish Prasad, Ayan Gadpal, Kshitij Kapadni, Manish Visave, and Kavita Sultanpure. Cascadetabnet: An approach for end to end table detection and structure recognition from image-based documents. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops , pages 572-573, 2020. 1", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/381", "parent": {"cref": "#/groups/7"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 308.86199951171875, "t": 164.27899169921875, "r": 545.1162109375, "b": 123.38601684570312, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 224]}], "orig": "[24] Shah Rukh Qasim, Hassan Mahmood, and Faisal Shafait. Rethinking table recognition using graph neural networks. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 142-147. IEEE, 2019. 3", "text": "[24] Shah Rukh Qasim, Hassan Mahmood, and Faisal Shafait. Rethinking table recognition using graph neural networks. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 142-147. IEEE, 2019. 3", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/382", "parent": {"cref": "#/groups/7"}, "children": [], "label": "list_item", "prov": [{"page_no": 9, "bbox": {"l": 308.8620300292969, "t": 119.95699310302734, "r": 545.1134033203125, "b": 79.06402587890625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 229]}], "orig": "[25] Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on", "text": "[25] Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/383", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 10, "bbox": {"l": 70.03099822998047, "t": 716.1162109375, "r": 286.36175537109375, "b": 697.1412353515625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 64]}], "orig": "Computer Vision and Pattern Recognition , pages 658-666, 2019. 6", "text": "Computer Vision and Pattern Recognition , pages 658-666, 2019. 6"}, {"self_ref": "#/texts/384", "parent": {"cref": "#/groups/8"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 50.11200714111328, "t": 693.834228515625, "r": 286.36578369140625, "b": 631.0233154296875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 302]}], "orig": "[26] Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Dengel, and Sheraz Ahmed. Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) , volume 01, pages 11621167, 2017. 1", "text": "[26] Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Dengel, and Sheraz Ahmed. Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) , volume 01, pages 11621167, 2017. 1", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/385", "parent": {"cref": "#/groups/8"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 50.11200714111328, "t": 627.71533203125, "r": 286.3633728027344, "b": 564.9053955078125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 308]}], "orig": "[27] Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Dengel, and Sheraz Ahmed. Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In 2017 14th IAPR international conference on document analysis and recognition (ICDAR) , volume 1, pages 1162-1167. IEEE, 2017. 3", "text": "[27] Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Dengel, and Sheraz Ahmed. Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In 2017 14th IAPR international conference on document analysis and recognition (ICDAR) , volume 1, pages 1162-1167. IEEE, 2017. 3", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/386", "parent": {"cref": "#/groups/8"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 50.11200714111328, "t": 561.597412109375, "r": 286.36578369140625, "b": 520.7044677734375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 183]}], "orig": "[28] Faisal Shafait and Ray Smith. Table detection in heterogeneous documents. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems , pages 6572, 2010. 2", "text": "[28] Faisal Shafait and Ray Smith. Table detection in heterogeneous documents. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems , pages 6572, 2010. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/387", "parent": {"cref": "#/groups/8"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 50.11200714111328, "t": 517.3964233398438, "r": 286.36627197265625, "b": 465.5455017089844, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 275]}], "orig": "[29] Shoaib Ahmed Siddiqui, Imran Ali Fateh, Syed Tahseen Raza Rizvi, Andreas Dengel, and Sheraz Ahmed. Deeptabstr: Deep learning based table structure recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 1403-1409. IEEE, 2019. 3", "text": "[29] Shoaib Ahmed Siddiqui, Imran Ali Fateh, Syed Tahseen Raza Rizvi, Andreas Dengel, and Sheraz Ahmed. Deeptabstr: Deep learning based table structure recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 1403-1409. IEEE, 2019. 3", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/388", "parent": {"cref": "#/groups/8"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 50.11200714111328, "t": 462.2374572753906, "r": 286.36334228515625, "b": 410.3855285644531, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 251]}], "orig": "[30] Peter W J Staar, Michele Dolfi, Christoph Auer, and Costas Bekas. Corpus conversion service: A machine learning platform to ingest documents at scale. In Proceedings of the 24th ACM SIGKDD , KDD '18, pages 774-782, New York, NY, USA, 2018. ACM. 1", "text": "[30] Peter W J Staar, Michele Dolfi, Christoph Auer, and Costas Bekas. Corpus conversion service: A machine learning platform to ingest documents at scale. In Proceedings of the 24th ACM SIGKDD , KDD '18, pages 774-782, New York, NY, USA, 2018. ACM. 1", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/389", "parent": {"cref": "#/groups/8"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 50.11200714111328, "t": 407.0774841308594, "r": 286.3638916015625, "b": 333.3085632324219, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 366]}], "orig": "[31] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141 ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30 , pages 5998-6008. Curran Associates, Inc., 2017. 5", "text": "[31] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141 ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30 , pages 5998-6008. Curran Associates, Inc., 2017. 5", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/390", "parent": {"cref": "#/groups/8"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 50.11200714111328, "t": 330.0005187988281, "r": 286.36334228515625, "b": 289.1075744628906, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 221]}], "orig": "[32] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2015. 2", "text": "[32] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2015. 2", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/391", "parent": {"cref": "#/groups/8"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 50.11201477050781, "t": 285.7995300292969, "r": 286.3633728027344, "b": 244.90756225585938, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 217]}], "orig": "[33] Wenyuan Xue, Qingyong Li, and Dacheng Tao. Res2tim: reconstruct syntactic structures from table images. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 749-755. IEEE, 2019. 3", "text": "[33] Wenyuan Xue, Qingyong Li, and Dacheng Tao. Res2tim: reconstruct syntactic structures from table images. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 749-755. IEEE, 2019. 3", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/392", "parent": {"cref": "#/groups/8"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 50.112022399902344, "t": 241.59951782226562, "r": 286.3633728027344, "b": 200.70655822753906, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 190]}], "orig": "[34] Wenyuan Xue, Baosheng Yu, Wen Wang, Dacheng Tao, and Qingyong Li. Tgrnet: A table graph reconstruction network for table structure recognition. arXiv preprint arXiv:2106.10598 , 2021. 3", "text": "[34] Wenyuan Xue, Baosheng Yu, Wen Wang, Dacheng Tao, and Qingyong Li. Tgrnet: A table graph reconstruction network for table structure recognition. arXiv preprint arXiv:2106.10598 , 2021. 3", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/393", "parent": {"cref": "#/groups/8"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 50.112030029296875, "t": 197.3985137939453, "r": 286.3634033203125, "b": 156.50555419921875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 220]}], "orig": "[35] Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. Image captioning with semantic attention. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 4651-4659, 2016. 4", "text": "[35] Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. Image captioning with semantic attention. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 4651-4659, 2016. 4", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/394", "parent": {"cref": "#/groups/8"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 50.112022399902344, "t": 153.197509765625, "r": 286.3633728027344, "b": 101.34652709960938, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 280]}], "orig": "[36] Xinyi Zheng, Doug Burdick, Lucian Popa, Peter Zhong, and Nancy Xin Ru Wang. Global table extractor (gte): A framework for joint table identification and cell structure recognition using visual context. Winter Conference for Applications in Computer Vision (WACV) , 2021. 2, 3", "text": "[36] Xinyi Zheng, Doug Burdick, Lucian Popa, Peter Zhong, and Nancy Xin Ru Wang. Global table extractor (gte): A framework for joint table identification and cell structure recognition using visual context. Winter Conference for Applications in Computer Vision (WACV) , 2021. 2, 3", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/395", "parent": {"cref": "#/groups/8"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 50.11201477050781, "t": 98.03849792480469, "r": 286.36334228515625, "b": 79.06353759765625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 106]}], "orig": "[37] Xu Zhong, Elaheh ShafieiBavani, and Antonio Jimeno Yepes. Image-based table recognition: Data, model,", "text": "[37] Xu Zhong, Elaheh ShafieiBavani, and Antonio Jimeno Yepes. Image-based table recognition: Data, model,", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/396", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 10, "bbox": {"l": 292.6300048828125, "t": 57.867008209228516, "r": 302.59259033203125, "b": 48.960445404052734, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "10", "text": "10"}, {"self_ref": "#/texts/397", "parent": {"cref": "#/groups/9"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 328.781005859375, "t": 716.1165161132812, "r": 545.1145629882812, "b": 675.2245483398438, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 192]}], "orig": "and evaluation. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision ECCV 2020 , pages 564-580, Cham, 2020. Springer International Publishing. 2, 3, 7", "text": "and evaluation. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision ECCV 2020 , pages 564-580, Cham, 2020. Springer International Publishing. 2, 3, 7", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/398", "parent": {"cref": "#/groups/9"}, "children": [], "label": "list_item", "prov": [{"page_no": 10, "bbox": {"l": 308.86199951171875, "t": 671.2855224609375, "r": 545.1133422851562, "b": 630.392578125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 221]}], "orig": "[38] Xu Zhong, Jianbin Tang, and Antonio Jimeno Yepes. Publaynet: Largest dataset ever for document layout analysis. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 1015-1022, 2019. 1", "text": "[38] Xu Zhong, Jianbin Tang, and Antonio Jimeno Yepes. Publaynet: Largest dataset ever for document layout analysis. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 1015-1022, 2019. 1", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/399", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 11, "bbox": {"l": 132.8419952392578, "t": 681.4251098632812, "r": 465.37591552734375, "b": 656.4699096679688, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 83]}], "orig": "TableFormer: Table Structure Understanding with Transformers Supplementary Material", "text": "TableFormer: Table Structure Understanding with Transformers Supplementary Material", "level": 1}, {"self_ref": "#/texts/400", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 11, "bbox": {"l": 50.11198425292969, "t": 630.839111328125, "r": 175.96437072753906, "b": 620.0913696289062, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 26]}], "orig": "1. Details on the datasets", "text": "1. Details on the datasets", "level": 1}, {"self_ref": "#/texts/401", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 11, "bbox": {"l": 50.11198425292969, "t": 611.0206909179688, "r": 150.364013671875, "b": 601.1686401367188, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 21]}], "orig": "1.1. Data preparation", "text": "1.1. Data preparation", "level": 1}, {"self_ref": "#/texts/402", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 11, "bbox": {"l": 50.11198425292969, "t": 592.0797119140625, "r": 286.3651428222656, "b": 403.8451843261719, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 931]}], "orig": "As a first step of our data preparation process, we have calculated statistics over the datasets across the following dimensions: (1) table size measured in the number of rows and columns, (2) complexity of the table, (3) strictness of the provided HTML structure and (4) completeness (i.e. no omitted bounding boxes). A table is considered to be simple if it does not contain row spans or column spans. Additionally, a table has a strict HTML structure if every row has the same number of columns after taking into account any row or column spans. Therefore a strict HTML structure looks always rectangular. However, HTML is a lenient encoding format, i.e. tables with rows of different sizes might still be regarded as correct due to implicit display rules. These implicit rules leave room for ambiguity, which we want to avoid. As such, we prefer to have \"strict\" tables, i.e. tables where every row has exactly the same length.", "text": "As a first step of our data preparation process, we have calculated statistics over the datasets across the following dimensions: (1) table size measured in the number of rows and columns, (2) complexity of the table, (3) strictness of the provided HTML structure and (4) completeness (i.e. no omitted bounding boxes). A table is considered to be simple if it does not contain row spans or column spans. Additionally, a table has a strict HTML structure if every row has the same number of columns after taking into account any row or column spans. Therefore a strict HTML structure looks always rectangular. However, HTML is a lenient encoding format, i.e. tables with rows of different sizes might still be regarded as correct due to implicit display rules. These implicit rules leave room for ambiguity, which we want to avoid. As such, we prefer to have \"strict\" tables, i.e. tables where every row has exactly the same length."}, {"self_ref": "#/texts/403", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 11, "bbox": {"l": 50.11198425292969, "t": 400.5947265625, "r": 286.3651123046875, "b": 164.54029846191406, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1149]}], "orig": "We have developed a technique that tries to derive a missing bounding box out of its neighbors. As a first step, we use the annotation data to generate the most fine-grained grid that covers the table structure. In case of strict HTML tables, all grid squares are associated with some table cell and in the presence of table spans a cell extends across multiple grid squares. When enough bounding boxes are known for a rectangular table, it is possible to compute the geometrical border lines between the grid rows and columns. Eventually this information is used to generate the missing bounding boxes. Additionally, the existence of unused grid squares indicates that the table rows have unequal number of columns and the overall structure is non-strict. The generation of missing bounding boxes for non-strict HTML tables is ambiguous and therefore quite challenging. Thus, we have decided to simply discard those tables. In case of PubTabNet we have computed missing bounding boxes for 48% of the simple and 69% of the complex tables. Regarding FinTabNet, 68% of the simple and 98% of the complex tables require the generation of bounding boxes.", "text": "We have developed a technique that tries to derive a missing bounding box out of its neighbors. As a first step, we use the annotation data to generate the most fine-grained grid that covers the table structure. In case of strict HTML tables, all grid squares are associated with some table cell and in the presence of table spans a cell extends across multiple grid squares. When enough bounding boxes are known for a rectangular table, it is possible to compute the geometrical border lines between the grid rows and columns. Eventually this information is used to generate the missing bounding boxes. Additionally, the existence of unused grid squares indicates that the table rows have unequal number of columns and the overall structure is non-strict. The generation of missing bounding boxes for non-strict HTML tables is ambiguous and therefore quite challenging. Thus, we have decided to simply discard those tables. In case of PubTabNet we have computed missing bounding boxes for 48% of the simple and 69% of the complex tables. Regarding FinTabNet, 68% of the simple and 98% of the complex tables require the generation of bounding boxes."}, {"self_ref": "#/texts/404", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 11, "bbox": {"l": 50.11198425292969, "t": 161.28985595703125, "r": 286.3649597167969, "b": 140.42730712890625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 92]}], "orig": "Figure 7 illustrates the distribution of the tables across different dimensions per dataset.", "text": "Figure 7 illustrates the distribution of the tables across different dimensions per dataset."}, {"self_ref": "#/texts/405", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 11, "bbox": {"l": 50.11198425292969, "t": 129.60986328125, "r": 153.60784912109375, "b": 119.7578125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 23]}], "orig": "1.2. Synthetic datasets", "text": "1.2. Synthetic datasets", "level": 1}, {"self_ref": "#/texts/406", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 11, "bbox": {"l": 50.11198425292969, "t": 110.66886901855469, "r": 286.36505126953125, "b": 77.852294921875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 167]}], "orig": "Aiming to train and evaluate our models in a broader spectrum of table data we have synthesized four types of datasets. Each one contains tables with different appear-", "text": "Aiming to train and evaluate our models in a broader spectrum of table data we have synthesized four types of datasets. Each one contains tables with different appear-"}, {"self_ref": "#/texts/407", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 11, "bbox": {"l": 308.86199951171875, "t": 629.3448486328125, "r": 545.1151123046875, "b": 584.572265625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 221]}], "orig": "ances in regard to their size, structure, style and content. Every synthetic dataset contains 150k examples, summing up to 600k synthetic examples. All datasets are divided into Train, Test and Val splits (80%, 10%, 10%).", "text": "ances in regard to their size, structure, style and content. Every synthetic dataset contains 150k examples, summing up to 600k synthetic examples. All datasets are divided into Train, Test and Val splits (80%, 10%, 10%)."}, {"self_ref": "#/texts/408", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 11, "bbox": {"l": 308.86199951171875, "t": 580.7648315429688, "r": 545.1150512695312, "b": 559.9032592773438, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 89]}], "orig": "The process of generating a synthetic dataset can be decomposed into the following steps:", "text": "The process of generating a synthetic dataset can be decomposed into the following steps:"}, {"self_ref": "#/texts/409", "parent": {"cref": "#/groups/10"}, "children": [], "label": "list_item", "prov": [{"page_no": 11, "bbox": {"l": 308.86199951171875, "t": 556.0947875976562, "r": 545.1151123046875, "b": 475.45721435546875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 373]}], "orig": "1. Prepare styling and content templates: The styling templates have been manually designed and organized into groups of scope specific appearances (e.g. financial data, marketing data, etc.) Additionally, we have prepared curated collections of content templates by extracting the most frequently used terms out of non-synthetic datasets (e.g. PubTabNet, FinTabNet, etc.).", "text": "1. Prepare styling and content templates: The styling templates have been manually designed and organized into groups of scope specific appearances (e.g. financial data, marketing data, etc.) Additionally, we have prepared curated collections of content templates by extracting the most frequently used terms out of non-synthetic datasets (e.g. PubTabNet, FinTabNet, etc.).", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/410", "parent": {"cref": "#/groups/10"}, "children": [], "label": "list_item", "prov": [{"page_no": 11, "bbox": {"l": 308.86199951171875, "t": 471.6497802734375, "r": 545.1151733398438, "b": 343.19134521484375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 573]}], "orig": "2. Generate table structures: The structure of each synthetic dataset assumes a horizontal table header which potentially spans over multiple rows and a table body that may contain a combination of row spans and column spans. However, spans are not allowed to cross the header - body boundary. The table structure is described by the parameters: Total number of table rows and columns, number of header rows, type of spans (header only spans, row only spans, column only spans, both row and column spans), maximum span size and the ratio of the table area covered by spans.", "text": "2. Generate table structures: The structure of each synthetic dataset assumes a horizontal table header which potentially spans over multiple rows and a table body that may contain a combination of row spans and column spans. However, spans are not allowed to cross the header - body boundary. The table structure is described by the parameters: Total number of table rows and columns, number of header rows, type of spans (header only spans, row only spans, column only spans, both row and column spans), maximum span size and the ratio of the table area covered by spans.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/411", "parent": {"cref": "#/groups/10"}, "children": [], "label": "list_item", "prov": [{"page_no": 11, "bbox": {"l": 308.86199951171875, "t": 339.3839111328125, "r": 545.1151733398438, "b": 294.61138916015625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 195]}], "orig": "3. Generate content: Based on the dataset theme , a set of suitable content templates is chosen first. Then, this content can be combined with purely random text to produce the synthetic content.", "text": "3. Generate content: Based on the dataset theme , a set of suitable content templates is chosen first. Then, this content can be combined with purely random text to produce the synthetic content.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/412", "parent": {"cref": "#/groups/10"}, "children": [], "label": "list_item", "prov": [{"page_no": 11, "bbox": {"l": 308.86199951171875, "t": 290.803955078125, "r": 545.1152954101562, "b": 246.0314178466797, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 218]}], "orig": "4. Apply styling templates: Depending on the domain of the synthetic dataset, a set of styling templates is first manually selected. Then, a style is randomly selected to format the appearance of the synthesized table.", "text": "4. Apply styling templates: Depending on the domain of the synthetic dataset, a set of styling templates is first manually selected. Then, a style is randomly selected to format the appearance of the synthesized table.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/413", "parent": {"cref": "#/groups/10"}, "children": [], "label": "list_item", "prov": [{"page_no": 11, "bbox": {"l": 308.86199951171875, "t": 242.22396850585938, "r": 545.1151733398438, "b": 185.4964141845703, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 238]}], "orig": "5. Render the complete tables: The synthetic table is finally rendered by a web browser engine to generate the bounding boxes for each table cell. A batching technique is utilized to optimize the runtime overhead of the rendering process.", "text": "5. Render the complete tables: The synthetic table is finally rendered by a web browser engine to generate the bounding boxes for each table cell. A batching technique is utilized to optimize the runtime overhead of the rendering process.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/414", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 11, "bbox": {"l": 308.86199951171875, "t": 169.70941162109375, "r": 545.1087646484375, "b": 145.01368713378906, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 47]}], "orig": "2. Prediction post-processing for PDF documents", "text": "2. Prediction post-processing for PDF documents", "level": 1}, {"self_ref": "#/texts/415", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 11, "bbox": {"l": 308.8620300292969, "t": 134.57896423339844, "r": 545.1151733398438, "b": 77.85139465332031, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 247]}], "orig": "Although TableFormer can predict the table structure and the bounding boxes for tables recognized inside PDF documents, this is not enough when a full reconstruction of the original table is required. This happens mainly due the following reasons:", "text": "Although TableFormer can predict the table structure and the bounding boxes for tables recognized inside PDF documents, this is not enough when a full reconstruction of the original table is required. This happens mainly due the following reasons:"}, {"self_ref": "#/texts/416", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 11, "bbox": {"l": 292.63104248046875, "t": 57.86696243286133, "r": 302.5936279296875, "b": 48.96039962768555, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "11", "text": "11"}, {"self_ref": "#/texts/417", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 12, "bbox": {"l": 50.11199951171875, "t": 626.4976196289062, "r": 545.1137084960938, "b": 605.6360473632812, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 245]}], "orig": "Figure 7: Distribution of the tables across different dimensions per dataset. Simple vs complex tables per dataset and split, strict vs non strict html structures per dataset and table complexity, missing bboxes per dataset and table complexity.", "text": "Figure 7: Distribution of the tables across different dimensions per dataset. Simple vs complex tables per dataset and split, strict vs non strict html structures per dataset and table complexity, missing bboxes per dataset and table complexity."}, {"self_ref": "#/texts/418", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 119.39108, "t": 714.68945, "r": 151.94641, "b": 708.74078, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 9]}], "orig": "PubTabNet", "text": "PubTabNet"}, {"self_ref": "#/texts/419", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 53.345978, "t": 716.80847, "r": 59.327053, "b": 710.8598, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "b.", "text": "b."}, {"self_ref": "#/texts/420", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 289.5791, "t": 714.54169, "r": 319.8266, "b": 708.5930199999999, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 9]}], "orig": "FinTabNet", "text": "FinTabNet"}, {"self_ref": "#/texts/421", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 448.37271, "t": 714.7460300000001, "r": 481.75916, "b": 708.79736, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 10]}], "orig": "Table Bank", "text": "Table Bank"}, {"self_ref": "#/texts/422", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 82.553436, "t": 650.72382, "r": 94.976013, "b": 645.7666, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 5]}], "orig": "Train", "text": "Train"}, {"self_ref": "#/texts/423", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 63.03878399999999, "t": 690.89587, "r": 85.290085, "b": 685.9386600000001, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Complex", "text": "Complex"}, {"self_ref": "#/texts/424", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 67.76786, "t": 667.60468, "r": 85.231277, "b": 662.64746, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Simple", "text": "Simple"}, {"self_ref": "#/texts/425", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 227.55121, "t": 689.46008, "r": 249.80251, "b": 684.50287, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Complex", "text": "Complex"}, {"self_ref": "#/texts/426", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 232.19898999999998, "t": 665.0142200000001, "r": 249.66241, "b": 660.05701, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Simple", "text": "Simple"}, {"self_ref": "#/texts/427", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 396.2337, "t": 677.95477, "r": 413.69711, "b": 672.99756, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Simple", "text": "Simple"}, {"self_ref": "#/texts/428", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 97.382202, "t": 650.72382, "r": 105.08014, "b": 645.7666, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "Val", "text": "Val"}, {"self_ref": "#/texts/429", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 60.93763400000001, "t": 706.26678, "r": 76.151443, "b": 701.30957, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "100%", "text": "100%"}, {"self_ref": "#/texts/430", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 82.304901, "t": 705.77649, "r": 106.99162, "b": 700.8192699999998, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 8]}], "orig": "500K 10K", "text": "500K 10K"}, {"self_ref": "#/texts/431", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 246.20530999999997, "t": 650.39392, "r": 281.88013, "b": 645.43671, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 14]}], "orig": "Train Test Val", "text": "Train Test Val"}, {"self_ref": "#/texts/432", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 226.69780000000003, "t": 706.26678, "r": 241.91161, "b": 701.30957, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "100%", "text": "100%"}, {"self_ref": "#/texts/433", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 249.93848999999997, "t": 705.91199, "r": 282.49384, "b": 700.95477, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 11]}], "orig": "91K 10K 10K", "text": "91K 10K 10K"}, {"self_ref": "#/texts/434", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 410.19409, "t": 650.72382, "r": 444.68915, "b": 645.7666, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 14]}], "orig": "Train Test Val", "text": "Train Test Val"}, {"self_ref": "#/texts/435", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 391.37341, "t": 706.26678, "r": 432.6716599999999, "b": 701.30957, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 12]}], "orig": "100% 130K 5K", "text": "100% 130K 5K"}, {"self_ref": "#/texts/436", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 435.60571000000004, "t": 705.73859, "r": 445.62414999999993, "b": 700.78137, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "10K", "text": "10K"}, {"self_ref": "#/texts/437", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 113.94921, "t": 650.71155, "r": 136.20052, "b": 645.75433, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Complex", "text": "Complex"}, {"self_ref": "#/texts/438", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 116.91554000000001, "t": 697.18146, "r": 127.05433999999998, "b": 692.22424, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "Non", "text": "Non"}, {"self_ref": "#/texts/439", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 113.3146, "t": 691.06146, "r": 127.05298, "b": 686.10425, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Strict", "text": "Strict"}, {"self_ref": "#/texts/440", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 112.94112, "t": 684.9414699999999, "r": 127.05537, "b": 679.98425, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "HTML", "text": "HTML"}, {"self_ref": "#/texts/441", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 113.22738999999999, "t": 669.38477, "r": 126.96577, "b": 664.42755, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Strict", "text": "Strict"}, {"self_ref": "#/texts/442", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 112.85390000000001, "t": 663.26477, "r": 126.96814999999998, "b": 658.30756, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "HTML", "text": "HTML"}, {"self_ref": "#/texts/443", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 138.57864, "t": 650.5636, "r": 156.04207, "b": 645.60638, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Simple", "text": "Simple"}, {"self_ref": "#/texts/444", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 122.03101, "t": 705.7287, "r": 151.04185, "b": 700.77148, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 9]}], "orig": "230K 280K", "text": "230K 280K"}, {"self_ref": "#/texts/445", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 311.65359, "t": 705.44501, "r": 321.67203, "b": 700.4877899999999, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "65K", "text": "65K"}, {"self_ref": "#/texts/446", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 287.89441, "t": 650.28937, "r": 310.14572, "b": 645.33215, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Complex", "text": "Complex"}, {"self_ref": "#/texts/447", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 289.23572, "t": 698.92023, "r": 299.37451, "b": 693.96301, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "Non", "text": "Non"}, {"self_ref": "#/texts/448", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 285.63513, "t": 692.80023, "r": 299.3735, "b": 687.8430199999999, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Strict", "text": "Strict"}, {"self_ref": "#/texts/449", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 285.26111, "t": 686.68024, "r": 299.37537, "b": 681.72302, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "HTML", "text": "HTML"}, {"self_ref": "#/texts/450", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 285.43109, "t": 671.61005, "r": 299.16946, "b": 666.65283, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Strict", "text": "Strict"}, {"self_ref": "#/texts/451", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 285.05713, "t": 665.49005, "r": 299.17139, "b": 660.53284, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "HTML", "text": "HTML"}, {"self_ref": "#/texts/452", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 311.34592, "t": 650.28937, "r": 328.80933, "b": 645.33215, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Simple", "text": "Simple"}, {"self_ref": "#/texts/453", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 299.58362, "t": 705.30646, "r": 309.60205, "b": 700.34924, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "47K", "text": "47K"}, {"self_ref": "#/texts/454", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 466.04077000000007, "t": 650.32831, "r": 483.50418, "b": 645.37109, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Simple", "text": "Simple"}, {"self_ref": "#/texts/455", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 459.02151, "t": 698.23883, "r": 469.16031000000004, "b": 693.28162, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "Non", "text": "Non"}, {"self_ref": "#/texts/456", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 455.4209, "t": 692.11884, "r": 469.15927000000005, "b": 687.16162, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Strict", "text": "Strict"}, {"self_ref": "#/texts/457", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 455.04691, "t": 685.9988399999999, "r": 469.16115999999994, "b": 681.04163, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "HTML", "text": "HTML"}, {"self_ref": "#/texts/458", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 467.39401, "t": 706.42761, "r": 480.6545100000001, "b": 701.4704, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "145K", "text": "145K"}, {"self_ref": "#/texts/459", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 160.37672, "t": 650.41614, "r": 182.62802, "b": 645.45892, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Complex", "text": "Complex"}, {"self_ref": "#/texts/460", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 153.74265, "t": 697.13519, "r": 173.32664, "b": 692.17798, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Contain", "text": "Contain"}, {"self_ref": "#/texts/461", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 154.50967, "t": 691.0152, "r": 173.3246, "b": 686.0579799999999, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Missing", "text": "Missing"}, {"self_ref": "#/texts/462", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 155.27162, "t": 684.8952, "r": 173.32664, "b": 679.9379900000001, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "bboxes", "text": "bboxes"}, {"self_ref": "#/texts/463", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 326.41302, "t": 684.76752, "r": 345.99701, "b": 679.8103, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Contain", "text": "Contain"}, {"self_ref": "#/texts/464", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 327.17972, "t": 678.64752, "r": 345.99463, "b": 673.69031, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Missing", "text": "Missing"}, {"self_ref": "#/texts/465", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 327.94131, "t": 672.52753, "r": 345.99634, "b": 667.57031, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "bboxes", "text": "bboxes"}, {"self_ref": "#/texts/466", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 488.9942, "t": 687.8462500000002, "r": 508.76384999999993, "b": 682.88904, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Dataset", "text": "Dataset"}, {"self_ref": "#/texts/467", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 490.1893, "t": 681.72626, "r": 508.76349000000005, "b": 676.7690399999999, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "doesn't", "text": "doesn't"}, {"self_ref": "#/texts/468", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 489.72009, "t": 675.60626, "r": 508.76758, "b": 670.6490499999999, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "provide", "text": "provide"}, {"self_ref": "#/texts/469", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 490.71121, "t": 669.48627, "r": 508.76624, "b": 664.52905, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "bboxes", "text": "bboxes"}, {"self_ref": "#/texts/470", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 185.37759, "t": 650.28882, "r": 202.84102, "b": 645.3316, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Simple", "text": "Simple"}, {"self_ref": "#/texts/471", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 168.50357, "t": 705.86389, "r": 197.52699, "b": 700.90668, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 9]}], "orig": "230K 280K", "text": "230K 280K"}, {"self_ref": "#/texts/472", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 357.3768, "t": 706.00293, "r": 367.39523, "b": 701.04572, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "65K", "text": "65K"}, {"self_ref": "#/texts/473", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 333.73151, "t": 650.37677, "r": 374.92862, "b": 645.41956, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 14]}], "orig": "Complex Simple", "text": "Complex Simple"}, {"self_ref": "#/texts/474", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 345.69101, "t": 705.94409, "r": 355.70944, "b": 700.9868799999999, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 3]}], "orig": "47K", "text": "47K"}, {"self_ref": "#/texts/475", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 508.54248, "t": 650.62317, "r": 526.00592, "b": 645.66595, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 6]}], "orig": "Simple", "text": "Simple"}, {"self_ref": "#/texts/476", "parent": {"cref": "#/pictures/11"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 510.44653000000005, "t": 705.9074100000001, "r": 523.70703, "b": 700.9502, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 4]}], "orig": "145K", "text": "145K"}, {"self_ref": "#/texts/477", "parent": {"cref": "#/groups/11"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 61.569000244140625, "t": 581.068603515625, "r": 286.3651123046875, "b": 560.20703125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 61]}], "orig": "\u00b7 TableFormer output does not include the table cell content.", "text": "\u00b7 TableFormer output does not include the table cell content.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/478", "parent": {"cref": "#/groups/11"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 61.569000244140625, "t": 547.9285888671875, "r": 286.3651428222656, "b": 527.0670166015625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 77]}], "orig": "\u00b7 There are occasional inaccuracies in the predictions of the bounding boxes.", "text": "\u00b7 There are occasional inaccuracies in the predictions of the bounding boxes.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/479", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 50.11199951171875, "t": 512.7965698242188, "r": 286.3651123046875, "b": 396.2931213378906, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 545]}], "orig": "However, it is possible to mitigate those limitations by combining the TableFormer predictions with the information already present inside a programmatic PDF document. More specifically, PDF documents can be seen as a sequence of PDF cells where each cell is described by its content and bounding box. If we are able to associate the PDF cells with the predicted table cells, we can directly link the PDF cell content to the table cell structure and use the PDF bounding boxes to correct misalignments in the predicted table cell bounding boxes.", "text": "However, it is possible to mitigate those limitations by combining the TableFormer predictions with the information already present inside a programmatic PDF document. More specifically, PDF documents can be seen as a sequence of PDF cells where each cell is described by its content and bounding box. If we are able to associate the PDF cells with the predicted table cells, we can directly link the PDF cell content to the table cell structure and use the PDF bounding boxes to correct misalignments in the predicted table cell bounding boxes."}, {"self_ref": "#/texts/480", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 50.11199951171875, "t": 392.9306640625, "r": 286.3649597167969, "b": 372.068115234375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 68]}], "orig": "Here is a step-by-step description of the prediction postprocessing:", "text": "Here is a step-by-step description of the prediction postprocessing:"}, {"self_ref": "#/texts/481", "parent": {"cref": "#/groups/12"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 50.11199951171875, "t": 368.7046813964844, "r": 286.3650817871094, "b": 335.8881530761719, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 173]}], "orig": "1. Get the minimal grid dimensions - number of rows and columns for the predicted table structure. This represents the most granular grid for the underlying table structure.", "text": "1. Get the minimal grid dimensions - number of rows and columns for the predicted table structure. This represents the most granular grid for the underlying table structure.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/482", "parent": {"cref": "#/groups/12"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 50.11199951171875, "t": 332.52471923828125, "r": 286.36505126953125, "b": 287.7532043457031, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 187]}], "orig": "2. Generate pair-wise matches between the bounding boxes of the PDF cells and the predicted cells. The Intersection Over Union (IOU) metric is used to evaluate the quality of the matches.", "text": "2. Generate pair-wise matches between the bounding boxes of the PDF cells and the predicted cells. The Intersection Over Union (IOU) metric is used to evaluate the quality of the matches.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/483", "parent": {"cref": "#/groups/12"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 50.11199951171875, "t": 284.3897705078125, "r": 286.36492919921875, "b": 263.5272216796875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 97]}], "orig": "3. Use a carefully selected IOU threshold to designate the matches as \"good\" ones and \"bad\" ones.", "text": "3. Use a carefully selected IOU threshold to designate the matches as \"good\" ones and \"bad\" ones.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/484", "parent": {"cref": "#/groups/12"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 50.11199951171875, "t": 260.164794921875, "r": 286.3651123046875, "b": 227.34722900390625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 131]}], "orig": "3.a. If all IOU scores in a column are below the threshold, discard all predictions (structure and bounding boxes) for that column.", "text": "3.a. If all IOU scores in a column are below the threshold, discard all predictions (structure and bounding boxes) for that column.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/485", "parent": {"cref": "#/groups/12"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 50.11199951171875, "t": 223.98377990722656, "r": 286.3650817871094, "b": 191.16722106933594, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 169]}], "orig": "4. Find the best-fitting content alignment for the predicted cells with good IOU per each column. The alignment of the column can be identified by the following formula:", "text": "4. Find the best-fitting content alignment for the predicted cells with good IOU per each column. The alignment of the column can be identified by the following formula:", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/486", "parent": {"cref": "#/body"}, "children": [], "label": "formula", "prov": [{"page_no": 12, "bbox": {"l": 110.70498657226562, "t": 168.5640869140625, "r": 286.3623962402344, "b": 137.89439392089844, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 81]}], "orig": "alignment = arg min c { D$_{c}$ } D$_{c}$ = max { x$_{c}$ } - min { x$_{c}$ } (4)", "text": ""}, {"self_ref": "#/texts/487", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 50.11199951171875, "t": 124.6520767211914, "r": 286.36199951171875, "b": 103.07321166992188, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 103]}], "orig": "where c is one of { left, centroid, right } and x$_{c}$ is the xcoordinate for the corresponding point.", "text": "where c is one of { left, centroid, right } and x$_{c}$ is the xcoordinate for the corresponding point."}, {"self_ref": "#/texts/488", "parent": {"cref": "#/groups/13"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 50.11199951171875, "t": 99.70977783203125, "r": 286.3649597167969, "b": 78.84821319580078, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 110]}], "orig": "5. Use the alignment computed in step 4, to compute the median x -coordinate for all table columns and the me-", "text": "5. Use the alignment computed in step 4, to compute the median x -coordinate for all table columns and the me-", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/489", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 308.86199951171875, "t": 581.0687866210938, "r": 545.1151733398438, "b": 536.2962036132812, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 183]}], "orig": "dian cell size for all table cells. The usage of median during the computations, helps to eliminate outliers caused by occasional column spans which are usually wider than the normal.", "text": "dian cell size for all table cells. The usage of median during the computations, helps to eliminate outliers caused by occasional column spans which are usually wider than the normal."}, {"self_ref": "#/texts/490", "parent": {"cref": "#/groups/14"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 308.86199951171875, "t": 532.8977661132812, "r": 545.114990234375, "b": 512.0361938476562, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 91]}], "orig": "6. Snap all cells with bad IOU to their corresponding median x -coordinates and cell sizes.", "text": "6. Snap all cells with bad IOU to their corresponding median x -coordinates and cell sizes.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/491", "parent": {"cref": "#/groups/14"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 308.8620300292969, "t": 508.6367492675781, "r": 545.1151123046875, "b": 404.08929443359375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 471]}], "orig": "7. Generate a new set of pair-wise matches between the corrected bounding boxes and PDF cells. This time use a modified version of the IOU metric, where the area of the intersection between the predicted and PDF cells is divided by the PDF cell area. In case there are multiple matches for the same PDF cell, the prediction with the higher score is preferred. This covers the cases where the PDF cells are smaller than the area of predicted or corrected prediction cells.", "text": "7. Generate a new set of pair-wise matches between the corrected bounding boxes and PDF cells. This time use a modified version of the IOU metric, where the area of the intersection between the predicted and PDF cells is divided by the PDF cell area. In case there are multiple matches for the same PDF cell, the prediction with the higher score is preferred. This covers the cases where the PDF cells are smaller than the area of predicted or corrected prediction cells.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/492", "parent": {"cref": "#/groups/14"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 308.8620300292969, "t": 400.6898498535156, "r": 545.1151733398438, "b": 332.00836181640625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 311]}], "orig": "8. In some rare occasions, we have noticed that TableFormer can confuse a single column as two. When the postprocessing steps are applied, this results with two predicted columns pointing to the same PDF column. In such case we must de-duplicate the columns according to highest total column intersection score.", "text": "8. In some rare occasions, we have noticed that TableFormer can confuse a single column as two. When the postprocessing steps are applied, this results with two predicted columns pointing to the same PDF column. In such case we must de-duplicate the columns according to highest total column intersection score.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/493", "parent": {"cref": "#/groups/14"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 308.8620300292969, "t": 328.6089172363281, "r": 545.1151733398438, "b": 224.06141662597656, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 503]}], "orig": "9. Pick up the remaining orphan cells. There could be cases, when after applying all the previous post-processing steps, some PDF cells could still remain without any match to predicted cells. However, it is still possible to deduce the correct matching for an orphan PDF cell by mapping its bounding box on the geometry of the grid. This mapping decides if the content of the orphan cell will be appended to an already matched table cell, or a new table cell should be created to match with the orphan.", "text": "9. Pick up the remaining orphan cells. There could be cases, when after applying all the previous post-processing steps, some PDF cells could still remain without any match to predicted cells. However, it is still possible to deduce the correct matching for an orphan PDF cell by mapping its bounding box on the geometry of the grid. This mapping decides if the content of the orphan cell will be appended to an already matched table cell, or a new table cell should be created to match with the orphan.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/494", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 12, "bbox": {"l": 308.8620300292969, "t": 220.66197204589844, "r": 545.1168823242188, "b": 187.8454132080078, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 113]}], "orig": "9a. Compute the top and bottom boundary of the horizontal band for each grid row (min/max y coordinates per row).", "text": "9a. Compute the top and bottom boundary of the horizontal band for each grid row (min/max y coordinates per row)."}, {"self_ref": "#/texts/495", "parent": {"cref": "#/groups/15"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 308.862060546875, "t": 184.44696044921875, "r": 545.1150512695312, "b": 163.58441162109375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 101]}], "orig": "9b. Intersect the orphan's bounding box with the row bands, and map the cell to the closest grid row.", "text": "9b. Intersect the orphan's bounding box with the row bands, and map the cell to the closest grid row.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/496", "parent": {"cref": "#/groups/15"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 308.862060546875, "t": 160.18597412109375, "r": 545.1150512695312, "b": 127.3694076538086, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 117]}], "orig": "9c. Compute the left and right boundary of the vertical band for each grid column (min/max x coordinates per column).", "text": "9c. Compute the left and right boundary of the vertical band for each grid column (min/max x coordinates per column).", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/497", "parent": {"cref": "#/groups/15"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 308.862060546875, "t": 123.969970703125, "r": 545.114990234375, "b": 103.10841369628906, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 107]}], "orig": "9d. Intersect the orphan's bounding box with the column bands, and map the cell to the closest grid column.", "text": "9d. Intersect the orphan's bounding box with the column bands, and map the cell to the closest grid column.", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/498", "parent": {"cref": "#/groups/15"}, "children": [], "label": "list_item", "prov": [{"page_no": 12, "bbox": {"l": 308.862060546875, "t": 99.70997619628906, "r": 545.1151733398438, "b": 78.84840393066406, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 118]}], "orig": "9e. If the table cell under the identified row and column is not empty, extend its content with the content of the or-", "text": "9e. If the table cell under the identified row and column is not empty, extend its content with the content of the or-", "enumerated": false, "marker": "-"}, {"self_ref": "#/texts/499", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 12, "bbox": {"l": 292.6310729980469, "t": 57.86697006225586, "r": 302.5936584472656, "b": 48.96040725708008, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "12", "text": "12"}, {"self_ref": "#/texts/500", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 13, "bbox": {"l": 50.11199951171875, "t": 716.7916259765625, "r": 88.84658813476562, "b": 707.8850708007812, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 10]}], "orig": "phan cell.", "text": "phan cell."}, {"self_ref": "#/texts/501", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 13, "bbox": {"l": 50.11199951171875, "t": 704.8366088867188, "r": 286.3649597167969, "b": 683.9750366210938, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 76]}], "orig": "9f. Otherwise create a new structural cell and match it wit the orphan cell.", "text": "9f. Otherwise create a new structural cell and match it wit the orphan cell."}, {"self_ref": "#/texts/502", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 13, "bbox": {"l": 50.11199951171875, "t": 680.8369140625, "r": 286.364990234375, "b": 660.2941284179688, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 97]}], "orig": "Aditional images with examples of TableFormer predictions and post-processing can be found below.", "text": "Aditional images with examples of TableFormer predictions and post-processing can be found below."}, {"self_ref": "#/texts/503", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 13, "bbox": {"l": 63.340999603271484, "t": 289.9436340332031, "r": 273.1334228515625, "b": 281.0370788574219, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 52]}], "orig": "Figure 8: Example of a table with multi-line header.", "text": "Figure 8: Example of a table with multi-line header."}, {"self_ref": "#/texts/504", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 13, "bbox": {"l": 292.6309814453125, "t": 57.866641998291016, "r": 302.59356689453125, "b": 48.960079193115234, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "13", "text": "13"}, {"self_ref": "#/texts/505", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 13, "bbox": {"l": 308.86199951171875, "t": 485.4016418457031, "r": 545.1151123046875, "b": 464.54010009765625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 67]}], "orig": "Figure 9: Example of a table with big empty distance between cells.", "text": "Figure 9: Example of a table with big empty distance between cells."}, {"self_ref": "#/texts/506", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 13, "bbox": {"l": 312.3429870605469, "t": 111.50663757324219, "r": 541.63232421875, "b": 102.60006713867188, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 55]}], "orig": "Figure 10: Example of a complex table with empty cells.", "text": "Figure 10: Example of a complex table with empty cells."}, {"self_ref": "#/texts/507", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 14, "bbox": {"l": 50.11199951171875, "t": 435.2296447753906, "r": 286.3650817871094, "b": 414.36810302734375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 61]}], "orig": "Figure 11: Simple table with different style and empty cells.", "text": "Figure 11: Simple table with different style and empty cells."}, {"self_ref": "#/texts/508", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 14, "bbox": {"l": 54.61899948120117, "t": 120.181640625, "r": 281.85589599609375, "b": 111.27507781982422, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 56]}], "orig": "Figure 12: Simple table predictions and post processing.", "text": "Figure 12: Simple table predictions and post processing."}, {"self_ref": "#/texts/509", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 14, "bbox": {"l": 292.6309814453125, "t": 57.86663818359375, "r": 302.59356689453125, "b": 48.96007537841797, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "14", "text": "14"}, {"self_ref": "#/texts/510", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 14, "bbox": {"l": 315.7900085449219, "t": 420.3156433105469, "r": 538.1852416992188, "b": 411.4090881347656, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 55]}], "orig": "Figure 13: Table predictions example on colorful table.", "text": "Figure 13: Table predictions example on colorful table."}, {"self_ref": "#/texts/511", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 14, "bbox": {"l": 344.9849853515625, "t": 108.45364379882812, "r": 508.9893493652344, "b": 99.54707336425781, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 40]}], "orig": "Figure 14: Example with multi-line text.", "text": "Figure 14: Example with multi-line text."}, {"self_ref": "#/texts/512", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 15, "bbox": {"l": 84.23300170898438, "t": 147.64862060546875, "r": 252.24224853515625, "b": 138.7420654296875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 41]}], "orig": "Figure 15: Example with triangular table.", "text": "Figure 15: Example with triangular table."}, {"self_ref": "#/texts/513", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 15, "bbox": {"l": 292.6309814453125, "t": 57.86665725708008, "r": 302.59356689453125, "b": 48.9600944519043, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "15", "text": "15"}, {"self_ref": "#/texts/514", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 15, "bbox": {"l": 308.8619689941406, "t": 139.0646514892578, "r": 545.1151123046875, "b": 118.20308685302734, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 106]}], "orig": "Figure 16: Example of how post-processing helps to restore mis-aligned bounding boxes prediction artifact.", "text": "Figure 16: Example of how post-processing helps to restore mis-aligned bounding boxes prediction artifact."}, {"self_ref": "#/texts/515", "parent": {"cref": "#/body"}, "children": [], "label": "caption", "prov": [{"page_no": 16, "bbox": {"l": 50.11199951171875, "t": 283.6626281738281, "r": 545.1138305664062, "b": 262.80108642578125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 153]}], "orig": "Figure 17: Example of long table. End-to-end example from initial PDF cells to prediction of bounding boxes, post processing and prediction of structure.", "text": "Figure 17: Example of long table. End-to-end example from initial PDF cells to prediction of bounding boxes, post processing and prediction of structure."}, {"self_ref": "#/texts/516", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 16, "bbox": {"l": 292.6309814453125, "t": 57.866641998291016, "r": 302.59356689453125, "b": 48.960079193115234, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 2]}], "orig": "16", "text": "16"}], "pictures": [{"self_ref": "#/pictures/0", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/8"}, {"cref": "#/texts/9"}, {"cref": "#/texts/10"}], "label": "picture", "prov": [{"page_no": 1, "bbox": {"l": 315.65362548828125, "t": 563.276611328125, "r": 537.1475219726562, "b": 489.1985778808594, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/1", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/13"}, {"cref": "#/texts/14"}, {"cref": "#/texts/15"}, {"cref": "#/texts/16"}, {"cref": "#/texts/17"}, {"cref": "#/texts/18"}, {"cref": "#/texts/19"}, {"cref": "#/texts/20"}, {"cref": "#/texts/21"}, {"cref": "#/texts/22"}, {"cref": "#/texts/23"}, {"cref": "#/texts/24"}, {"cref": "#/texts/25"}, {"cref": "#/texts/26"}, {"cref": "#/texts/27"}, {"cref": "#/texts/28"}, {"cref": "#/texts/29"}, {"cref": "#/texts/30"}, {"cref": "#/texts/31"}, {"cref": "#/texts/32"}, {"cref": "#/texts/33"}, {"cref": "#/texts/34"}, {"cref": "#/texts/35"}, {"cref": "#/texts/36"}, {"cref": "#/texts/37"}], "label": "picture", "prov": [{"page_no": 1, "bbox": {"l": 314.78173828125, "t": 453.9347229003906, "r": 539.1802978515625, "b": 381.9505615234375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/2", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/39"}, {"cref": "#/texts/40"}, {"cref": "#/texts/41"}, {"cref": "#/texts/42"}, {"cref": "#/texts/43"}, {"cref": "#/texts/44"}, {"cref": "#/texts/45"}, {"cref": "#/texts/46"}, {"cref": "#/texts/47"}, {"cref": "#/texts/48"}, {"cref": "#/texts/49"}, {"cref": "#/texts/50"}, {"cref": "#/texts/51"}, {"cref": "#/texts/52"}, {"cref": "#/texts/53"}, {"cref": "#/texts/54"}, {"cref": "#/texts/55"}, {"cref": "#/texts/56"}, {"cref": "#/texts/57"}, {"cref": "#/texts/58"}, {"cref": "#/texts/59"}, {"cref": "#/texts/60"}, {"cref": "#/texts/61"}, {"cref": "#/texts/62"}], "label": "picture", "prov": [{"page_no": 1, "bbox": {"l": 315.7172546386719, "t": 358.176513671875, "r": 536.835693359375, "b": 295.9709777832031, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/3", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/91"}, {"cref": "#/texts/92"}, {"cref": "#/texts/93"}, {"cref": "#/texts/94"}, {"cref": "#/texts/95"}, {"cref": "#/texts/96"}, {"cref": "#/texts/97"}, {"cref": "#/texts/98"}, {"cref": "#/texts/99"}, {"cref": "#/texts/100"}, {"cref": "#/texts/101"}, {"cref": "#/texts/102"}, {"cref": "#/texts/103"}, {"cref": "#/texts/104"}, {"cref": "#/texts/105"}, {"cref": "#/texts/106"}, {"cref": "#/texts/107"}, {"cref": "#/texts/108"}, {"cref": "#/texts/109"}, {"cref": "#/texts/110"}, {"cref": "#/texts/111"}, {"cref": "#/texts/112"}, {"cref": "#/texts/113"}, {"cref": "#/texts/114"}, {"cref": "#/texts/115"}, {"cref": "#/texts/116"}, {"cref": "#/texts/117"}, {"cref": "#/texts/118"}, {"cref": "#/texts/119"}, {"cref": "#/texts/120"}, {"cref": "#/texts/121"}, {"cref": "#/texts/122"}, {"cref": "#/texts/123"}], "label": "picture", "prov": [{"page_no": 3, "bbox": {"l": 312.10369873046875, "t": 713.5591430664062, "r": 550.38916015625, "b": 541.39013671875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 104]}], "captions": [{"cref": "#/texts/90"}], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/4", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/142"}, {"cref": "#/texts/143"}, {"cref": "#/texts/144"}, {"cref": "#/texts/145"}, {"cref": "#/texts/146"}, {"cref": "#/texts/147"}, {"cref": "#/texts/148"}, {"cref": "#/texts/149"}, {"cref": "#/texts/150"}, {"cref": "#/texts/151"}, {"cref": "#/texts/152"}, {"cref": "#/texts/153"}, {"cref": "#/texts/154"}, {"cref": "#/texts/155"}, {"cref": "#/texts/156"}, {"cref": "#/texts/157"}, {"cref": "#/texts/158"}, {"cref": "#/texts/159"}, {"cref": "#/texts/160"}, {"cref": "#/texts/161"}, {"cref": "#/texts/162"}, {"cref": "#/texts/163"}, {"cref": "#/texts/164"}, {"cref": "#/texts/165"}, {"cref": "#/texts/166"}, {"cref": "#/texts/167"}, {"cref": "#/texts/168"}, {"cref": "#/texts/169"}, {"cref": "#/texts/170"}, {"cref": "#/texts/171"}, {"cref": "#/texts/172"}, {"cref": "#/texts/173"}, {"cref": "#/texts/174"}, {"cref": "#/texts/175"}, {"cref": "#/texts/176"}, {"cref": "#/texts/177"}, {"cref": "#/texts/178"}, {"cref": "#/texts/179"}, {"cref": "#/texts/180"}, {"cref": "#/texts/181"}, {"cref": "#/texts/182"}, {"cref": "#/texts/183"}, {"cref": "#/texts/184"}, {"cref": "#/texts/185"}, {"cref": "#/texts/186"}, {"cref": "#/texts/187"}, {"cref": "#/texts/188"}, {"cref": "#/texts/189"}, {"cref": "#/texts/190"}, {"cref": "#/texts/191"}, {"cref": "#/texts/192"}, {"cref": "#/texts/193"}, {"cref": "#/texts/194"}, {"cref": "#/texts/195"}, {"cref": "#/texts/196"}, {"cref": "#/texts/197"}, {"cref": "#/texts/198"}, {"cref": "#/texts/199"}, {"cref": "#/texts/200"}], "label": "picture", "prov": [{"page_no": 5, "bbox": {"l": 74.30525970458984, "t": 714.0888061523438, "r": 519.9801025390625, "b": 608.2984619140625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 212]}], "captions": [{"cref": "#/texts/141"}], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/5", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/202"}, {"cref": "#/texts/203"}, {"cref": "#/texts/204"}, {"cref": "#/texts/205"}, {"cref": "#/texts/206"}, {"cref": "#/texts/207"}, {"cref": "#/texts/208"}, {"cref": "#/texts/209"}, {"cref": "#/texts/210"}, {"cref": "#/texts/211"}, {"cref": "#/texts/212"}, {"cref": "#/texts/213"}, {"cref": "#/texts/214"}, {"cref": "#/texts/215"}, {"cref": "#/texts/216"}, {"cref": "#/texts/217"}, {"cref": "#/texts/218"}, {"cref": "#/texts/219"}, {"cref": "#/texts/220"}, {"cref": "#/texts/221"}, {"cref": "#/texts/222"}, {"cref": "#/texts/223"}, {"cref": "#/texts/224"}, {"cref": "#/texts/225"}, {"cref": "#/texts/226"}, {"cref": "#/texts/227"}, {"cref": "#/texts/228"}, {"cref": "#/texts/229"}, {"cref": "#/texts/230"}, {"cref": "#/texts/231"}, {"cref": "#/texts/232"}, {"cref": "#/texts/233"}, {"cref": "#/texts/234"}, {"cref": "#/texts/235"}, {"cref": "#/texts/236"}, {"cref": "#/texts/237"}, {"cref": "#/texts/238"}, {"cref": "#/texts/239"}, {"cref": "#/texts/240"}, {"cref": "#/texts/241"}, {"cref": "#/texts/242"}, {"cref": "#/texts/243"}, {"cref": "#/texts/244"}, {"cref": "#/texts/245"}], "label": "picture", "prov": [{"page_no": 5, "bbox": {"l": 53.03328323364258, "t": 534.3346557617188, "r": 285.3731689453125, "b": 284.3311462402344, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 745]}], "captions": [{"cref": "#/texts/201"}], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/6", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 8, "bbox": {"l": 49.97503662109375, "t": 688.287353515625, "r": 301.6335754394531, "b": 604.4210815429688, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/7", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 8, "bbox": {"l": 305.5836486816406, "t": 693.3458251953125, "r": 554.8258666992188, "b": 611.3732299804688, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 79]}], "captions": [{"cref": "#/texts/289"}], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/8", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/292"}], "label": "picture", "prov": [{"page_no": 8, "bbox": {"l": 51.736167907714844, "t": 411.51934814453125, "r": 211.83778381347656, "b": 348.3419189453125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 397]}], "captions": [{"cref": "#/texts/291"}], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/9", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/293"}, {"cref": "#/texts/294"}, {"cref": "#/texts/295"}, {"cref": "#/texts/296"}, {"cref": "#/texts/297"}, {"cref": "#/texts/298"}, {"cref": "#/texts/299"}, {"cref": "#/texts/300"}, {"cref": "#/texts/301"}, {"cref": "#/texts/302"}, {"cref": "#/texts/303"}, {"cref": "#/texts/304"}, {"cref": "#/texts/305"}, {"cref": "#/texts/306"}, {"cref": "#/texts/307"}, {"cref": "#/texts/308"}, {"cref": "#/texts/309"}, {"cref": "#/texts/310"}, {"cref": "#/texts/311"}, {"cref": "#/texts/312"}, {"cref": "#/texts/313"}, {"cref": "#/texts/314"}, {"cref": "#/texts/315"}, {"cref": "#/texts/316"}, {"cref": "#/texts/317"}, {"cref": "#/texts/318"}, {"cref": "#/texts/319"}, {"cref": "#/texts/320"}, {"cref": "#/texts/321"}, {"cref": "#/texts/322"}, {"cref": "#/texts/323"}, {"cref": "#/texts/324"}, {"cref": "#/texts/325"}, {"cref": "#/texts/326"}, {"cref": "#/texts/327"}, {"cref": "#/texts/328"}, {"cref": "#/texts/329"}, {"cref": "#/texts/330"}, {"cref": "#/texts/331"}, {"cref": "#/texts/332"}, {"cref": "#/texts/333"}, {"cref": "#/texts/334"}, {"cref": "#/texts/335"}, {"cref": "#/texts/336"}, {"cref": "#/texts/337"}, {"cref": "#/texts/338"}, {"cref": "#/texts/339"}, {"cref": "#/texts/340"}, {"cref": "#/texts/341"}, {"cref": "#/texts/342"}, {"cref": "#/texts/343"}, {"cref": "#/texts/344"}, {"cref": "#/texts/345"}, {"cref": "#/texts/346"}, {"cref": "#/texts/347"}], "label": "picture", "prov": [{"page_no": 8, "bbox": {"l": 383.1364440917969, "t": 410.7686767578125, "r": 542.1132202148438, "b": 349.2250671386719, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/10", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/349"}], "label": "picture", "prov": [{"page_no": 8, "bbox": {"l": 216.76925659179688, "t": 411.5093688964844, "r": 375.7829284667969, "b": 348.65301513671875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 112]}], "captions": [{"cref": "#/texts/348"}], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/11", "parent": {"cref": "#/body"}, "children": [{"cref": "#/texts/418"}, {"cref": "#/texts/419"}, {"cref": "#/texts/420"}, {"cref": "#/texts/421"}, {"cref": "#/texts/422"}, {"cref": "#/texts/423"}, {"cref": "#/texts/424"}, {"cref": "#/texts/425"}, {"cref": "#/texts/426"}, {"cref": "#/texts/427"}, {"cref": "#/texts/428"}, {"cref": "#/texts/429"}, {"cref": "#/texts/430"}, {"cref": "#/texts/431"}, {"cref": "#/texts/432"}, {"cref": "#/texts/433"}, {"cref": "#/texts/434"}, {"cref": "#/texts/435"}, {"cref": "#/texts/436"}, {"cref": "#/texts/437"}, {"cref": "#/texts/438"}, {"cref": "#/texts/439"}, {"cref": "#/texts/440"}, {"cref": "#/texts/441"}, {"cref": "#/texts/442"}, {"cref": "#/texts/443"}, {"cref": "#/texts/444"}, {"cref": "#/texts/445"}, {"cref": "#/texts/446"}, {"cref": "#/texts/447"}, {"cref": "#/texts/448"}, {"cref": "#/texts/449"}, {"cref": "#/texts/450"}, {"cref": "#/texts/451"}, {"cref": "#/texts/452"}, {"cref": "#/texts/453"}, {"cref": "#/texts/454"}, {"cref": "#/texts/455"}, {"cref": "#/texts/456"}, {"cref": "#/texts/457"}, {"cref": "#/texts/458"}, {"cref": "#/texts/459"}, {"cref": "#/texts/460"}, {"cref": "#/texts/461"}, {"cref": "#/texts/462"}, {"cref": "#/texts/463"}, {"cref": "#/texts/464"}, {"cref": "#/texts/465"}, {"cref": "#/texts/466"}, {"cref": "#/texts/467"}, {"cref": "#/texts/468"}, {"cref": "#/texts/469"}, {"cref": "#/texts/470"}, {"cref": "#/texts/471"}, {"cref": "#/texts/472"}, {"cref": "#/texts/473"}, {"cref": "#/texts/474"}, {"cref": "#/texts/475"}, {"cref": "#/texts/476"}], "label": "picture", "prov": [{"page_no": 12, "bbox": {"l": 53.54227066040039, "t": 717.25146484375, "r": 544.938232421875, "b": 644.4090576171875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 245]}], "captions": [{"cref": "#/texts/417"}], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/12", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 13, "bbox": {"l": 309.79150390625, "t": 538.0946044921875, "r": 425.9603271484375, "b": 499.60601806640625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/13", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 13, "bbox": {"l": 333.9573669433594, "t": 198.8865966796875, "r": 518.4768676757812, "b": 126.5096435546875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/14", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 14, "bbox": {"l": 51.15378952026367, "t": 687.6914672851562, "r": 282.8598937988281, "b": 447.09332275390625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 61]}], "captions": [{"cref": "#/texts/507"}], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/15", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 14, "bbox": {"l": 50.40477752685547, "t": 180.99615478515625, "r": 177.0564422607422, "b": 135.83905029296875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 56]}], "captions": [{"cref": "#/texts/508"}], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/16", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 14, "bbox": {"l": 318.6332092285156, "t": 701.1157836914062, "r": 534.73583984375, "b": 432.9424133300781, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 55]}], "captions": [{"cref": "#/texts/510"}], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/17", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 15, "bbox": {"l": 55.116363525390625, "t": 655.7449951171875, "r": 279.370849609375, "b": 542.6654663085938, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/18", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 15, "bbox": {"l": 54.28135299682617, "t": 531.7384033203125, "r": 279.2568359375, "b": 418.4729309082031, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/19", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 15, "bbox": {"l": 55.423954010009766, "t": 407.4449462890625, "r": 280.2310791015625, "b": 294.436279296875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/20", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 15, "bbox": {"l": 50.64818572998047, "t": 286.01953125, "r": 319.9103088378906, "b": 160.736328125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/21", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 15, "bbox": {"l": 323.46868896484375, "t": 429.5491638183594, "r": 525.9569091796875, "b": 327.739501953125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/22", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 15, "bbox": {"l": 353.6920471191406, "t": 304.594970703125, "r": 495.4288024902344, "b": 156.22674560546875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/23", "parent": {"cref": "#/body"}, "children": [], "label": "picture", "prov": [{"page_no": 16, "bbox": {"l": 66.79948425292969, "t": 538.3836669921875, "r": 528.5565795898438, "b": 293.8616027832031, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 153]}], "captions": [{"cref": "#/texts/515"}], "references": [], "footnotes": [], "image": null, "annotations": []}], "tables": [{"self_ref": "#/tables/0", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 1, "bbox": {"l": 315.65362548828125, "t": 563.276611328125, "r": 537.1475219726562, "b": 489.1985778808594, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/11"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [{"bbox": {"l": 384.03289794921875, "t": 539.321044921875, "r": 390.0376892089844, "b": 529.1906127929688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "3", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 451.9457092285156, "t": 556.6529541015625, "r": 457.95050048828125, "b": 546.5225219726562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "1", "column_header": true, "row_header": false, "row_section": false}], "num_rows": 1, "num_cols": 2, "grid": [[{"bbox": {"l": 384.03289794921875, "t": 539.321044921875, "r": 390.0376892089844, "b": 529.1906127929688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "3", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 451.9457092285156, "t": 556.6529541015625, "r": 457.95050048828125, "b": 546.5225219726562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "1", "column_header": true, "row_header": false, "row_section": false}]]}}, {"self_ref": "#/tables/1", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 1, "bbox": {"l": 315.7172546386719, "t": 358.176513671875, "r": 536.835693359375, "b": 295.9709777832031, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/63"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [{"bbox": {"l": 318.8807067871094, "t": 354.3141174316406, "r": 323.273193359375, "b": 345.5291748046875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "0", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24871826171875, "t": 354.3141174316406, "r": 351.6412048339844, "b": 345.5291748046875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 3, "text": "1", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 394.1042175292969, "t": 354.4064025878906, "r": 465.8810119628906, "b": 344.2760009765625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 5, "text": "2 1", "column_header": true, "row_header": false, "row_section": false}, {"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 318.7731628417969, "t": 342.4544982910156, "r": 323.1656494140625, "b": 333.6695556640625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24871826171875, "t": 342.4544982910156, "r": 351.6412048339844, "b": 333.6695556640625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 366.7010192871094, "t": 342.8791809082031, "r": 398.4967041015625, "b": 332.748779296875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "5 3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941162109375, "t": 342.4544982910156, "r": 445.3518981933594, "b": 333.6695556640625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.8149108886719, "t": 342.4544982910156, "r": 492.2073974609375, "b": 333.6695556640625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 318.7731628417969, "t": 318.2957458496094, "r": 323.1656494140625, "b": 309.51080322265625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24871826171875, "t": 330.1553955078125, "r": 351.6412048339844, "b": 321.3704528808594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.1042175292969, "t": 330.1553955078125, "r": 402.8883056640625, "b": 321.3704528808594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "10", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941162109375, "t": 330.1553955078125, "r": 449.4228515625, "b": 321.3704528808594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "11", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.8149108886719, "t": 330.1553955078125, "r": 496.5989990234375, "b": 321.3704528808594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "12", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 331.90423583984375, "t": 318.6770935058594, "r": 337.9090270996094, "b": 308.54669189453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 3, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 5, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24871826171875, "t": 318.2957458496094, "r": 356.0328063964844, "b": 309.51080322265625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "13", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.1042175292969, "t": 318.2957458496094, "r": 402.8883056640625, "b": 309.51080322265625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "14", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941162109375, "t": 318.2957458496094, "r": 449.7434997558594, "b": 309.51080322265625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "15", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.8149108886719, "t": 318.2957458496094, "r": 496.5989990234375, "b": 309.51080322265625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "16", "column_header": false, "row_header": false, "row_section": false}, {"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24871826171875, "t": 306.87530517578125, "r": 356.0328063964844, "b": 298.0903625488281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "17", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.1042175292969, "t": 306.87530517578125, "r": 402.8883056640625, "b": 298.0903625488281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "18", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941162109375, "t": 306.87530517578125, "r": 449.7434997558594, "b": 298.0903625488281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "19", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.8149108886719, "t": 306.87530517578125, "r": 496.5989990234375, "b": 298.0903625488281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "20", "column_header": false, "row_header": false, "row_section": false}], "num_rows": 5, "num_cols": 6, "grid": [[{"bbox": {"l": 318.8807067871094, "t": 354.3141174316406, "r": 323.273193359375, "b": 345.5291748046875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "0", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24871826171875, "t": 354.3141174316406, "r": 351.6412048339844, "b": 345.5291748046875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 3, "text": "1", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24871826171875, "t": 354.3141174316406, "r": 351.6412048339844, "b": 345.5291748046875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 3, "text": "1", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 394.1042175292969, "t": 354.4064025878906, "r": 465.8810119628906, "b": 344.2760009765625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 5, "text": "2 1", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 394.1042175292969, "t": 354.4064025878906, "r": 465.8810119628906, "b": 344.2760009765625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 5, "text": "2 1", "column_header": true, "row_header": false, "row_section": false}, {"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 318.7731628417969, "t": 342.4544982910156, "r": 323.1656494140625, "b": 333.6695556640625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24871826171875, "t": 342.4544982910156, "r": 351.6412048339844, "b": 333.6695556640625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 366.7010192871094, "t": 342.8791809082031, "r": 398.4967041015625, "b": 332.748779296875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "5 3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941162109375, "t": 342.4544982910156, "r": 445.3518981933594, "b": 333.6695556640625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.8149108886719, "t": 342.4544982910156, "r": 492.2073974609375, "b": 333.6695556640625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 318.7731628417969, "t": 318.2957458496094, "r": 323.1656494140625, "b": 309.51080322265625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24871826171875, "t": 330.1553955078125, "r": 351.6412048339844, "b": 321.3704528808594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.1042175292969, "t": 330.1553955078125, "r": 402.8883056640625, "b": 321.3704528808594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "10", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941162109375, "t": 330.1553955078125, "r": 449.4228515625, "b": 321.3704528808594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "11", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.8149108886719, "t": 330.1553955078125, "r": 496.5989990234375, "b": 321.3704528808594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "12", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 331.90423583984375, "t": 318.6770935058594, "r": 337.9090270996094, "b": 308.54669189453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 3, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 5, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "2", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24871826171875, "t": 318.2957458496094, "r": 356.0328063964844, "b": 309.51080322265625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "13", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.1042175292969, "t": 318.2957458496094, "r": 402.8883056640625, "b": 309.51080322265625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "14", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941162109375, "t": 318.2957458496094, "r": 449.7434997558594, "b": 309.51080322265625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "15", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.8149108886719, "t": 318.2957458496094, "r": 496.5989990234375, "b": 309.51080322265625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "16", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 331.90423583984375, "t": 318.6770935058594, "r": 337.9090270996094, "b": 308.54669189453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 3, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 5, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "2", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24871826171875, "t": 306.87530517578125, "r": 356.0328063964844, "b": 298.0903625488281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "17", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.1042175292969, "t": 306.87530517578125, "r": 402.8883056640625, "b": 298.0903625488281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "18", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941162109375, "t": 306.87530517578125, "r": 449.7434997558594, "b": 298.0903625488281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "19", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.8149108886719, "t": 306.87530517578125, "r": 496.5989990234375, "b": 298.0903625488281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "20", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 331.90423583984375, "t": 318.6770935058594, "r": 337.9090270996094, "b": 308.54669189453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 3, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 5, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "2", "column_header": false, "row_header": false, "row_section": false}]]}}, {"self_ref": "#/tables/2", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 4, "bbox": {"l": 310.67584228515625, "t": 718.8060913085938, "r": 542.9547119140625, "b": 636.7794799804688, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/133"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [{"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 412.3320007324219, "t": 718.3856201171875, "r": 430.9023132324219, "b": 709.4790649414062, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Tags", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 442.857421875, "t": 718.3856201171875, "r": 464.4463806152344, "b": 709.4790649414062, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "Bbox", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 477.78631591796875, "t": 718.3856201171875, "r": 494.9419250488281, "b": 709.4790649414062, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "Size", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 508.2818603515625, "t": 718.3856201171875, "r": 536.9143676757812, "b": 709.4790649414062, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "Format", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 317.05999755859375, "t": 706.0326538085938, "r": 361.64263916015625, "b": 697.1260986328125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "PubTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.8559875488281, "t": 706.33154296875, "r": 425.37774658203125, "b": 697.1161499023438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569091796875, "t": 706.33154296875, "r": 457.4174499511719, "b": 697.1161499023438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.4010009765625, "t": 706.0326538085938, "r": 496.3262023925781, "b": 697.1260986328125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "509k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.6349487304688, "t": 706.0326538085938, "r": 532.5601196289062, "b": 697.1260986328125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.05999755859375, "t": 694.07763671875, "r": 359.4309387207031, "b": 685.1710815429688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "FinTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.8559875488281, "t": 694.3765258789062, "r": 425.37774658203125, "b": 685.1611328125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569091796875, "t": 694.3765258789062, "r": 457.4174499511719, "b": 685.1611328125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.4010009765625, "t": 694.07763671875, "r": 496.3262023925781, "b": 685.1710815429688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "112k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 513.4618530273438, "t": 694.07763671875, "r": 531.7332763671875, "b": 685.1710815429688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PDF", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.05999755859375, "t": 682.1216430664062, "r": 359.9788818359375, "b": 673.215087890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableBank", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.8559875488281, "t": 682.4205322265625, "r": 425.37774658203125, "b": 673.2051391601562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 450.812255859375, "t": 682.4205322265625, "r": 456.50091552734375, "b": 673.2051391601562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.4010009765625, "t": 682.1216430664062, "r": 496.3262023925781, "b": 673.215087890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "145k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 511.25018310546875, "t": 682.1216430664062, "r": 533.9450073242188, "b": 673.215087890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "JPEG", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.05999755859375, "t": 670.1666259765625, "r": 400.3772277832031, "b": 661.2600708007812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Combined-Tabnet(*)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.8559875488281, "t": 670.4655151367188, "r": 425.37774658203125, "b": 661.2501220703125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569091796875, "t": 670.4655151367188, "r": 457.4174499511719, "b": 661.2501220703125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.4010009765625, "t": 670.1666259765625, "r": 496.3262023925781, "b": 661.2600708007812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "400k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.6349487304688, "t": 670.1666259765625, "r": 532.5601196289062, "b": 661.2600708007812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.05999755859375, "t": 658.2116088867188, "r": 375.1718444824219, "b": 649.3050537109375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Combined(**)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.8559875488281, "t": 658.510498046875, "r": 425.37774658203125, "b": 649.2951049804688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569091796875, "t": 658.510498046875, "r": 457.4174499511719, "b": 649.2951049804688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.4010009765625, "t": 658.2116088867188, "r": 496.3262023925781, "b": 649.3050537109375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "500k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.6349487304688, "t": 658.2116088867188, "r": 532.5601196289062, "b": 649.3050537109375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.05999755859375, "t": 646.256591796875, "r": 369.3935241699219, "b": 637.3500366210938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "SynthTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.8559875488281, "t": 646.5555419921875, "r": 425.37774658203125, "b": 637.3401489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569091796875, "t": 646.5555419921875, "r": 457.4174499511719, "b": 637.3401489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.4010009765625, "t": 646.2566528320312, "r": 496.3262023925781, "b": 637.35009765625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "600k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.6349487304688, "t": 646.2566528320312, "r": 532.5601196289062, "b": 637.35009765625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}], "num_rows": 7, "num_cols": 5, "grid": [[{"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 412.3320007324219, "t": 718.3856201171875, "r": 430.9023132324219, "b": 709.4790649414062, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Tags", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 442.857421875, "t": 718.3856201171875, "r": 464.4463806152344, "b": 709.4790649414062, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "Bbox", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 477.78631591796875, "t": 718.3856201171875, "r": 494.9419250488281, "b": 709.4790649414062, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "Size", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 508.2818603515625, "t": 718.3856201171875, "r": 536.9143676757812, "b": 709.4790649414062, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "Format", "column_header": true, "row_header": false, "row_section": false}], [{"bbox": {"l": 317.05999755859375, "t": 706.0326538085938, "r": 361.64263916015625, "b": 697.1260986328125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "PubTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.8559875488281, "t": 706.33154296875, "r": 425.37774658203125, "b": 697.1161499023438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569091796875, "t": 706.33154296875, "r": 457.4174499511719, "b": 697.1161499023438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.4010009765625, "t": 706.0326538085938, "r": 496.3262023925781, "b": 697.1260986328125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "509k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.6349487304688, "t": 706.0326538085938, "r": 532.5601196289062, "b": 697.1260986328125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 317.05999755859375, "t": 694.07763671875, "r": 359.4309387207031, "b": 685.1710815429688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "FinTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.8559875488281, "t": 694.3765258789062, "r": 425.37774658203125, "b": 685.1611328125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569091796875, "t": 694.3765258789062, "r": 457.4174499511719, "b": 685.1611328125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.4010009765625, "t": 694.07763671875, "r": 496.3262023925781, "b": 685.1710815429688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "112k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 513.4618530273438, "t": 694.07763671875, "r": 531.7332763671875, "b": 685.1710815429688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PDF", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 317.05999755859375, "t": 682.1216430664062, "r": 359.9788818359375, "b": 673.215087890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableBank", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.8559875488281, "t": 682.4205322265625, "r": 425.37774658203125, "b": 673.2051391601562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 450.812255859375, "t": 682.4205322265625, "r": 456.50091552734375, "b": 673.2051391601562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.4010009765625, "t": 682.1216430664062, "r": 496.3262023925781, "b": 673.215087890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "145k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 511.25018310546875, "t": 682.1216430664062, "r": 533.9450073242188, "b": 673.215087890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "JPEG", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 317.05999755859375, "t": 670.1666259765625, "r": 400.3772277832031, "b": 661.2600708007812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Combined-Tabnet(*)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.8559875488281, "t": 670.4655151367188, "r": 425.37774658203125, "b": 661.2501220703125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569091796875, "t": 670.4655151367188, "r": 457.4174499511719, "b": 661.2501220703125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.4010009765625, "t": 670.1666259765625, "r": 496.3262023925781, "b": 661.2600708007812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "400k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.6349487304688, "t": 670.1666259765625, "r": 532.5601196289062, "b": 661.2600708007812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 317.05999755859375, "t": 658.2116088867188, "r": 375.1718444824219, "b": 649.3050537109375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Combined(**)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.8559875488281, "t": 658.510498046875, "r": 425.37774658203125, "b": 649.2951049804688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569091796875, "t": 658.510498046875, "r": 457.4174499511719, "b": 649.2951049804688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.4010009765625, "t": 658.2116088867188, "r": 496.3262023925781, "b": 649.3050537109375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "500k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.6349487304688, "t": 658.2116088867188, "r": 532.5601196289062, "b": 649.3050537109375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 317.05999755859375, "t": 646.256591796875, "r": 369.3935241699219, "b": 637.3500366210938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "SynthTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.8559875488281, "t": 646.5555419921875, "r": 425.37774658203125, "b": 637.3401489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569091796875, "t": 646.5555419921875, "r": 457.4174499511719, "b": 637.3401489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.4010009765625, "t": 646.2566528320312, "r": 496.3262023925781, "b": 637.35009765625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "600k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.6349487304688, "t": 646.2566528320312, "r": 532.5601196289062, "b": 637.35009765625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}]]}}, {"self_ref": "#/tables/3", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 7, "bbox": {"l": 53.368526458740234, "t": 382.8642272949219, "r": 283.0443420410156, "b": 209.60223388671875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/277"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [{"bbox": {"l": 78.84300231933594, "t": 371.30963134765625, "r": 104.8553466796875, "b": 362.403076171875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Model", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 129.33799743652344, "t": 365.3326416015625, "r": 159.21583557128906, "b": 356.42608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Dataset", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 171.17095947265625, "t": 365.3326416015625, "r": 199.40496826171875, "b": 356.42608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "Simple", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 211.1999969482422, "t": 377.2876281738281, "r": 247.74349975585938, "b": 356.42608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "TEDS Complex", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 264.5404357910156, "t": 365.3326416015625, "r": 277.27264404296875, "b": 356.42608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "All", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 81.61199951171875, "t": 348.3756408691406, "r": 102.08513641357422, "b": 339.4690856933594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.87205505371094, "t": 348.3756408691406, "r": 153.69140625, "b": 339.4690856933594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.56553649902344, "t": 348.3756408691406, "r": 194.00009155273438, "b": 339.4690856933594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "91.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.82937622070312, "t": 348.3756408691406, "r": 238.26393127441406, "b": 339.4690856933594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "88.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.18414306640625, "t": 348.3756408691406, "r": 279.6186828613281, "b": 339.4690856933594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "89.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 82.16500091552734, "t": 336.4196472167969, "r": 101.53230285644531, "b": 327.5130920410156, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "GTE", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86715698242188, "t": 336.4196472167969, "r": 153.68650817871094, "b": 327.5130920410156, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.62411499023438, "t": 336.4196472167969, "r": 186.94166564941406, "b": 327.5130920410156, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.88795471191406, "t": 336.4196472167969, "r": 231.20550537109375, "b": 327.5130920410156, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.69854736328125, "t": 336.4196472167969, "r": 282.1144104003906, "b": 327.5130920410156, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "93.01", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 66.31500244140625, "t": 323.86663818359375, "r": 117.38329315185547, "b": 314.9600830078125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86766052246094, "t": 323.86663818359375, "r": 153.68701171875, "b": 314.9600830078125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57110595703125, "t": 323.86663818359375, "r": 194.0056610107422, "b": 314.9600830078125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "98.5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.83494567871094, "t": 323.86663818359375, "r": 238.26950073242188, "b": 314.9600830078125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "95.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.697998046875, "t": 323.9862060546875, "r": 282.1138610839844, "b": 315.0298156738281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "96.75", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 81.61199951171875, "t": 308.67364501953125, "r": 102.08513641357422, "b": 299.76708984375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.87205505371094, "t": 308.67364501953125, "r": 153.69140625, "b": 299.76708984375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.56553649902344, "t": 308.67364501953125, "r": 194.00009155273438, "b": 299.76708984375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "88.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 218.33871459960938, "t": 308.67364501953125, "r": 240.7545623779297, "b": 299.76708984375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "92.08", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.1841125488281, "t": 308.67364501953125, "r": 279.61865234375, "b": 299.76708984375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "90.6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 82.16500091552734, "t": 296.7186584472656, "r": 101.53230285644531, "b": 287.8121032714844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "GTE", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86715698242188, "t": 296.7186584472656, "r": 153.68650817871094, "b": 287.8121032714844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.62411499023438, "t": 296.7186584472656, "r": 186.94166564941406, "b": 287.8121032714844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.88795471191406, "t": 296.7186584472656, "r": 231.20550537109375, "b": 287.8121032714844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.69854736328125, "t": 296.7186584472656, "r": 282.1144104003906, "b": 287.8121032714844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "87.14", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 71.78900146484375, "t": 284.763671875, "r": 111.90838623046875, "b": 275.85711669921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "GTE (FT)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86221313476562, "t": 284.763671875, "r": 153.6815643310547, "b": 275.85711669921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.62913513183594, "t": 284.763671875, "r": 186.94668579101562, "b": 275.85711669921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.89297485351562, "t": 284.763671875, "r": 231.2105255126953, "b": 275.85711669921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.693603515625, "t": 284.763671875, "r": 282.1094665527344, "b": 275.85711669921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "91.02", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 66.31500244140625, "t": 272.8086853027344, "r": 117.38329315185547, "b": 263.9021301269531, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86766052246094, "t": 272.8086853027344, "r": 153.68701171875, "b": 263.9021301269531, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57110595703125, "t": 272.8086853027344, "r": 194.0056610107422, "b": 263.9021301269531, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "97.5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.83494567871094, "t": 272.8086853027344, "r": 238.26950073242188, "b": 263.9021301269531, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "96.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.1889953613281, "t": 272.9282531738281, "r": 279.62353515625, "b": 263.97186279296875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "96.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 81.61199951171875, "t": 255.5016326904297, "r": 102.08513641357422, "b": 246.59507751464844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 137.91064453125, "t": 255.5016326904297, "r": 150.64285278320312, "b": 246.59507751464844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "TB", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.56553649902344, "t": 255.5016326904297, "r": 194.00009155273438, "b": 246.59507751464844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "86.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.89285278320312, "t": 255.5016326904297, "r": 231.2104034423828, "b": 246.59507751464844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.1841125488281, "t": 255.5016326904297, "r": 279.61865234375, "b": 246.59507751464844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "86.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 66.31500244140625, "t": 243.54563903808594, "r": 117.38329315185547, "b": 234.6390838623047, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 137.90625, "t": 243.54563903808594, "r": 150.63845825195312, "b": 234.6390838623047, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "TB", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57110595703125, "t": 243.54563903808594, "r": 194.0056610107422, "b": 234.6390838623047, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "89.6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.88845825195312, "t": 243.54563903808594, "r": 231.2060089111328, "b": 234.6390838623047, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.1889953613281, "t": 243.66519165039062, "r": 279.62353515625, "b": 234.7088165283203, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "89.6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 66.31500244140625, "t": 223.9976348876953, "r": 117.38329315185547, "b": 215.09107971191406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86766052246094, "t": 223.9976348876953, "r": 153.68701171875, "b": 215.09107971191406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "STN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57110595703125, "t": 223.9976348876953, "r": 194.0056610107422, "b": 215.09107971191406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "96.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.83494567871094, "t": 223.9976348876953, "r": 238.26950073242188, "b": 215.09107971191406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "95.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.189697265625, "t": 223.9976348876953, "r": 279.6242370605469, "b": 215.09107971191406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "96.7", "column_header": false, "row_header": false, "row_section": false}], "num_rows": 11, "num_cols": 5, "grid": [[{"bbox": {"l": 78.84300231933594, "t": 371.30963134765625, "r": 104.8553466796875, "b": 362.403076171875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Model", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 129.33799743652344, "t": 365.3326416015625, "r": 159.21583557128906, "b": 356.42608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Dataset", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 171.17095947265625, "t": 365.3326416015625, "r": 199.40496826171875, "b": 356.42608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "Simple", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 211.1999969482422, "t": 377.2876281738281, "r": 247.74349975585938, "b": 356.42608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "TEDS Complex", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 264.5404357910156, "t": 365.3326416015625, "r": 277.27264404296875, "b": 356.42608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "All", "column_header": true, "row_header": false, "row_section": false}], [{"bbox": {"l": 81.61199951171875, "t": 348.3756408691406, "r": 102.08513641357422, "b": 339.4690856933594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.87205505371094, "t": 348.3756408691406, "r": 153.69140625, "b": 339.4690856933594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.56553649902344, "t": 348.3756408691406, "r": 194.00009155273438, "b": 339.4690856933594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "91.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.82937622070312, "t": 348.3756408691406, "r": 238.26393127441406, "b": 339.4690856933594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "88.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.18414306640625, "t": 348.3756408691406, "r": 279.6186828613281, "b": 339.4690856933594, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "89.9", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 82.16500091552734, "t": 336.4196472167969, "r": 101.53230285644531, "b": 327.5130920410156, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "GTE", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86715698242188, "t": 336.4196472167969, "r": 153.68650817871094, "b": 327.5130920410156, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.62411499023438, "t": 336.4196472167969, "r": 186.94166564941406, "b": 327.5130920410156, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.88795471191406, "t": 336.4196472167969, "r": 231.20550537109375, "b": 327.5130920410156, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.69854736328125, "t": 336.4196472167969, "r": 282.1144104003906, "b": 327.5130920410156, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "93.01", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 66.31500244140625, "t": 323.86663818359375, "r": 117.38329315185547, "b": 314.9600830078125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86766052246094, "t": 323.86663818359375, "r": 153.68701171875, "b": 314.9600830078125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57110595703125, "t": 323.86663818359375, "r": 194.0056610107422, "b": 314.9600830078125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "98.5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.83494567871094, "t": 323.86663818359375, "r": 238.26950073242188, "b": 314.9600830078125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "95.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.697998046875, "t": 323.9862060546875, "r": 282.1138610839844, "b": 315.0298156738281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "96.75", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 81.61199951171875, "t": 308.67364501953125, "r": 102.08513641357422, "b": 299.76708984375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.87205505371094, "t": 308.67364501953125, "r": 153.69140625, "b": 299.76708984375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.56553649902344, "t": 308.67364501953125, "r": 194.00009155273438, "b": 299.76708984375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "88.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 218.33871459960938, "t": 308.67364501953125, "r": 240.7545623779297, "b": 299.76708984375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "92.08", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.1841125488281, "t": 308.67364501953125, "r": 279.61865234375, "b": 299.76708984375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "90.6", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 82.16500091552734, "t": 296.7186584472656, "r": 101.53230285644531, "b": 287.8121032714844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "GTE", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86715698242188, "t": 296.7186584472656, "r": 153.68650817871094, "b": 287.8121032714844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.62411499023438, "t": 296.7186584472656, "r": 186.94166564941406, "b": 287.8121032714844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.88795471191406, "t": 296.7186584472656, "r": 231.20550537109375, "b": 287.8121032714844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.69854736328125, "t": 296.7186584472656, "r": 282.1144104003906, "b": 287.8121032714844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "87.14", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 71.78900146484375, "t": 284.763671875, "r": 111.90838623046875, "b": 275.85711669921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "GTE (FT)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86221313476562, "t": 284.763671875, "r": 153.6815643310547, "b": 275.85711669921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.62913513183594, "t": 284.763671875, "r": 186.94668579101562, "b": 275.85711669921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.89297485351562, "t": 284.763671875, "r": 231.2105255126953, "b": 275.85711669921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.693603515625, "t": 284.763671875, "r": 282.1094665527344, "b": 275.85711669921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "91.02", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 66.31500244140625, "t": 272.8086853027344, "r": 117.38329315185547, "b": 263.9021301269531, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86766052246094, "t": 272.8086853027344, "r": 153.68701171875, "b": 263.9021301269531, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57110595703125, "t": 272.8086853027344, "r": 194.0056610107422, "b": 263.9021301269531, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "97.5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.83494567871094, "t": 272.8086853027344, "r": 238.26950073242188, "b": 263.9021301269531, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "96.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.1889953613281, "t": 272.9282531738281, "r": 279.62353515625, "b": 263.97186279296875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "96.8", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 81.61199951171875, "t": 255.5016326904297, "r": 102.08513641357422, "b": 246.59507751464844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 137.91064453125, "t": 255.5016326904297, "r": 150.64285278320312, "b": 246.59507751464844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "TB", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.56553649902344, "t": 255.5016326904297, "r": 194.00009155273438, "b": 246.59507751464844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "86.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.89285278320312, "t": 255.5016326904297, "r": 231.2104034423828, "b": 246.59507751464844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.1841125488281, "t": 255.5016326904297, "r": 279.61865234375, "b": 246.59507751464844, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "86.0", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 66.31500244140625, "t": 243.54563903808594, "r": 117.38329315185547, "b": 234.6390838623047, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 137.90625, "t": 243.54563903808594, "r": 150.63845825195312, "b": 234.6390838623047, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "TB", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57110595703125, "t": 243.54563903808594, "r": 194.0056610107422, "b": 234.6390838623047, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "89.6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.88845825195312, "t": 243.54563903808594, "r": 231.2060089111328, "b": 234.6390838623047, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.1889953613281, "t": 243.66519165039062, "r": 279.62353515625, "b": 234.7088165283203, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "89.6", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 66.31500244140625, "t": 223.9976348876953, "r": 117.38329315185547, "b": 215.09107971191406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86766052246094, "t": 223.9976348876953, "r": 153.68701171875, "b": 215.09107971191406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "STN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57110595703125, "t": 223.9976348876953, "r": 194.0056610107422, "b": 215.09107971191406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "96.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.83494567871094, "t": 223.9976348876953, "r": 238.26950073242188, "b": 215.09107971191406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "95.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.189697265625, "t": 223.9976348876953, "r": 279.6242370605469, "b": 215.09107971191406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "96.7", "column_header": false, "row_header": false, "row_section": false}]]}}, {"self_ref": "#/tables/4", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 7, "bbox": {"l": 308.4068603515625, "t": 544.1236572265625, "r": 533.6419677734375, "b": 488.1943359375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/282"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [{"bbox": {"l": 339.322998046875, "t": 538.3356323242188, "r": 365.3353576660156, "b": 529.4290771484375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Model", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 401.04132080078125, "t": 538.3356323242188, "r": 430.9191589355469, "b": 529.4290771484375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Dataset", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 454.1021423339844, "t": 538.3356323242188, "r": 474.5852355957031, "b": 529.4290771484375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "mAP", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 486.54034423828125, "t": 538.3356323242188, "r": 527.2276000976562, "b": 529.4290771484375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "mAP (PP)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 327.656005859375, "t": 521.378662109375, "r": 377.0007629394531, "b": 512.4721069335938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD+BBox", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 393.6980895996094, "t": 521.378662109375, "r": 438.2807312011719, "b": 512.4721069335938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PubTabNet", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 455.6355895996094, "t": 521.378662109375, "r": 473.07012939453125, "b": 512.4721069335938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "79.2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 498.1659240722656, "t": 521.378662109375, "r": 515.6004638671875, "b": 512.4721069335938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "82.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 326.7950134277344, "t": 509.4236755371094, "r": 377.8633117675781, "b": 500.5171203613281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 393.6938781738281, "t": 509.4236755371094, "r": 438.2765197753906, "b": 500.5171203613281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PubTabNet", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 455.6310119628906, "t": 509.5432434082031, "r": 473.0655517578125, "b": 500.58685302734375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "82.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 498.1712951660156, "t": 509.5432434082031, "r": 515.6058349609375, "b": 500.58685302734375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "86.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 326.7950134277344, "t": 497.46868896484375, "r": 377.8633117675781, "b": 488.5621337890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 389.81842041015625, "t": 497.46868896484375, "r": 442.1519470214844, "b": 488.5621337890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "SynthTabNet", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 455.63134765625, "t": 497.46868896484375, "r": 473.0658874511719, "b": 488.5621337890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "87.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 505.22515869140625, "t": 497.46868896484375, "r": 508.5426940917969, "b": 488.5621337890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}], "num_rows": 4, "num_cols": 4, "grid": [[{"bbox": {"l": 339.322998046875, "t": 538.3356323242188, "r": 365.3353576660156, "b": 529.4290771484375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Model", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 401.04132080078125, "t": 538.3356323242188, "r": 430.9191589355469, "b": 529.4290771484375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Dataset", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 454.1021423339844, "t": 538.3356323242188, "r": 474.5852355957031, "b": 529.4290771484375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "mAP", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 486.54034423828125, "t": 538.3356323242188, "r": 527.2276000976562, "b": 529.4290771484375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "mAP (PP)", "column_header": true, "row_header": false, "row_section": false}], [{"bbox": {"l": 327.656005859375, "t": 521.378662109375, "r": 377.0007629394531, "b": 512.4721069335938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD+BBox", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 393.6980895996094, "t": 521.378662109375, "r": 438.2807312011719, "b": 512.4721069335938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PubTabNet", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 455.6355895996094, "t": 521.378662109375, "r": 473.07012939453125, "b": 512.4721069335938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "79.2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 498.1659240722656, "t": 521.378662109375, "r": 515.6004638671875, "b": 512.4721069335938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "82.7", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 326.7950134277344, "t": 509.4236755371094, "r": 377.8633117675781, "b": 500.5171203613281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 393.6938781738281, "t": 509.4236755371094, "r": 438.2765197753906, "b": 500.5171203613281, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PubTabNet", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 455.6310119628906, "t": 509.5432434082031, "r": 473.0655517578125, "b": 500.58685302734375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "82.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 498.1712951660156, "t": 509.5432434082031, "r": 515.6058349609375, "b": 500.58685302734375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "86.8", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 326.7950134277344, "t": 497.46868896484375, "r": 377.8633117675781, "b": 488.5621337890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 389.81842041015625, "t": 497.46868896484375, "r": 442.1519470214844, "b": 488.5621337890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "SynthTabNet", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 455.63134765625, "t": 497.46868896484375, "r": 473.0658874511719, "b": 488.5621337890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "87.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 505.22515869140625, "t": 497.46868896484375, "r": 508.5426940917969, "b": 488.5621337890625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}]]}}, {"self_ref": "#/tables/5", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 7, "bbox": {"l": 332.9688720703125, "t": 251.7164306640625, "r": 520.942138671875, "b": 148.73028564453125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/284"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [{"bbox": {"l": 358.010986328125, "t": 239.76663208007812, "r": 384.0233459472656, "b": 230.86007690429688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Model", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 408.5059814453125, "t": 233.7896270751953, "r": 436.739990234375, "b": 224.88307189941406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Simple", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 448.6950988769531, "t": 245.74462890625, "r": 485.0784912109375, "b": 224.88307189941406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "TEDS Complex", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 499.3847961425781, "t": 233.7896270751953, "r": 512.1170043945312, "b": 224.88307189941406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "All", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 357.6820068359375, "t": 216.8326416015625, "r": 384.3518981933594, "b": 207.92608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Tabula", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.9009704589844, "t": 216.8326416015625, "r": 431.33551025390625, "b": 207.92608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "78.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.164794921875, "t": 216.8326416015625, "r": 475.5993347167969, "b": 207.92608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "57.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.0289001464844, "t": 216.8326416015625, "r": 514.4634399414062, "b": 207.92608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "67.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 350.7229919433594, "t": 204.8776397705078, "r": 391.3106384277344, "b": 195.97108459472656, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Traprange", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.90582275390625, "t": 204.8776397705078, "r": 431.3403625488281, "b": 195.97108459472656, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "60.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.1696472167969, "t": 204.8776397705078, "r": 475.60418701171875, "b": 195.97108459472656, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "49.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.03375244140625, "t": 204.8776397705078, "r": 514.4683227539062, "b": 195.97108459472656, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "55.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 354.135986328125, "t": 192.92164611816406, "r": 387.89923095703125, "b": 184.0150909423828, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Camelot", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.901611328125, "t": 192.92164611816406, "r": 431.3361511230469, "b": 184.0150909423828, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "80.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.1654357910156, "t": 192.92164611816406, "r": 475.5999755859375, "b": 184.0150909423828, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "66.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.029541015625, "t": 192.92164611816406, "r": 514.464111328125, "b": 184.0150909423828, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "73.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 346.5589904785156, "t": 180.96664428710938, "r": 395.475341796875, "b": 172.06008911132812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Acrobat Pro", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.9061584472656, "t": 180.96664428710938, "r": 431.3406982421875, "b": 172.06008911132812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "68.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16998291015625, "t": 180.96664428710938, "r": 475.6045227050781, "b": 172.06008911132812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "61.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.0340881347656, "t": 180.96664428710938, "r": 514.4686279296875, "b": 172.06008911132812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "65.3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 360.781005859375, "t": 169.0116424560547, "r": 381.254150390625, "b": 160.10508728027344, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.9015808105469, "t": 169.0116424560547, "r": 431.33612060546875, "b": 160.10508728027344, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "91.2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.1654052734375, "t": 169.0116424560547, "r": 475.5999450683594, "b": 160.10508728027344, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "85.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.0295104980469, "t": 169.0116424560547, "r": 514.4640502929688, "b": 160.10508728027344, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "88.3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 345.4830017089844, "t": 157.056640625, "r": 396.5513000488281, "b": 148.15008544921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.9061584472656, "t": 157.056640625, "r": 431.3406982421875, "b": 148.15008544921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "95.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16998291015625, "t": 157.056640625, "r": 475.6045227050781, "b": 148.15008544921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "90.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.03399658203125, "t": 157.1761932373047, "r": 514.4685668945312, "b": 148.21981811523438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "93.6", "column_header": false, "row_header": false, "row_section": false}], "num_rows": 7, "num_cols": 4, "grid": [[{"bbox": {"l": 358.010986328125, "t": 239.76663208007812, "r": 384.0233459472656, "b": 230.86007690429688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Model", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 408.5059814453125, "t": 233.7896270751953, "r": 436.739990234375, "b": 224.88307189941406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Simple", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 448.6950988769531, "t": 245.74462890625, "r": 485.0784912109375, "b": 224.88307189941406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "TEDS Complex", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 499.3847961425781, "t": 233.7896270751953, "r": 512.1170043945312, "b": 224.88307189941406, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "All", "column_header": true, "row_header": false, "row_section": false}], [{"bbox": {"l": 357.6820068359375, "t": 216.8326416015625, "r": 384.3518981933594, "b": 207.92608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Tabula", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.9009704589844, "t": 216.8326416015625, "r": 431.33551025390625, "b": 207.92608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "78.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.164794921875, "t": 216.8326416015625, "r": 475.5993347167969, "b": 207.92608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "57.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.0289001464844, "t": 216.8326416015625, "r": 514.4634399414062, "b": 207.92608642578125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "67.9", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 350.7229919433594, "t": 204.8776397705078, "r": 391.3106384277344, "b": 195.97108459472656, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Traprange", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.90582275390625, "t": 204.8776397705078, "r": 431.3403625488281, "b": 195.97108459472656, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "60.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.1696472167969, "t": 204.8776397705078, "r": 475.60418701171875, "b": 195.97108459472656, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "49.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.03375244140625, "t": 204.8776397705078, "r": 514.4683227539062, "b": 195.97108459472656, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "55.4", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 354.135986328125, "t": 192.92164611816406, "r": 387.89923095703125, "b": 184.0150909423828, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Camelot", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.901611328125, "t": 192.92164611816406, "r": 431.3361511230469, "b": 184.0150909423828, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "80.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.1654357910156, "t": 192.92164611816406, "r": 475.5999755859375, "b": 184.0150909423828, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "66.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.029541015625, "t": 192.92164611816406, "r": 514.464111328125, "b": 184.0150909423828, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "73.0", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 346.5589904785156, "t": 180.96664428710938, "r": 395.475341796875, "b": 172.06008911132812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Acrobat Pro", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.9061584472656, "t": 180.96664428710938, "r": 431.3406982421875, "b": 172.06008911132812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "68.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16998291015625, "t": 180.96664428710938, "r": 475.6045227050781, "b": 172.06008911132812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "61.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.0340881347656, "t": 180.96664428710938, "r": 514.4686279296875, "b": 172.06008911132812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "65.3", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 360.781005859375, "t": 169.0116424560547, "r": 381.254150390625, "b": 160.10508728027344, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.9015808105469, "t": 169.0116424560547, "r": 431.33612060546875, "b": 160.10508728027344, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "91.2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.1654052734375, "t": 169.0116424560547, "r": 475.5999450683594, "b": 160.10508728027344, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "85.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.0295104980469, "t": 169.0116424560547, "r": 514.4640502929688, "b": 160.10508728027344, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "88.3", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 345.4830017089844, "t": 157.056640625, "r": 396.5513000488281, "b": 148.15008544921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.9061584472656, "t": 157.056640625, "r": 431.3406982421875, "b": 148.15008544921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "95.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16998291015625, "t": 157.056640625, "r": 475.6045227050781, "b": 148.15008544921875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "90.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.03399658203125, "t": 157.1761932373047, "r": 514.4685668945312, "b": 148.21981811523438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "93.6", "column_header": false, "row_header": false, "row_section": false}]]}}, {"self_ref": "#/tables/6", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 8, "bbox": {"l": 53.62853240966797, "t": 573.0513916015625, "r": 298.5574951171875, "b": 499.60003662109375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [{"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 209.93284606933594, "t": 569.8192749023438, "r": 241.04458618164062, "b": 565.6378784179688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 4, "text": "\u8ad6\u6587\u30d5\u30a1\u30a4\u30eb", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 263.764892578125, "t": 569.8192749023438, "r": 284.5058898925781, "b": 565.6378784179688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 4, "end_col_offset_idx": 6, "text": "\u53c2\u8003\u6587\u732e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 110.24990844726562, "t": 562.3340454101562, "r": 120.62017822265625, "b": 558.1526489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u51fa\u5178", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 175.3660888671875, "t": 562.3340454101562, "r": 201.29246520996094, "b": 558.1526489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "\u30d5\u30a1\u30a4\u30eb \u6570", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 209.62408447265625, "t": 562.3340454101562, "r": 219.99435424804688, "b": 558.1526489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "\u82f1\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 229.19813537597656, "t": 562.3340454101562, "r": 244.75376892089844, "b": 558.1526489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "\u65e5\u672c\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 256.11419677734375, "t": 562.3340454101562, "r": 266.4844665527344, "b": 558.1526489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "\u82f1\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 278.38433837890625, "t": 562.3340454101562, "r": 293.9399719238281, "b": 558.1526489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "\u65e5\u672c\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 55.530521392822266, "t": 555.5741577148438, "r": 162.71310424804688, "b": 551.2162475585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Association for Computational Linguistics(ACL2003)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 184.39730834960938, "t": 555.5741577148438, "r": 189.56455993652344, "b": 551.2162475585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "65", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 208.99026489257812, "t": 555.5741577148438, "r": 214.1575164794922, "b": 551.2162475585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "65", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 234.8751678466797, "t": 555.5741577148438, "r": 237.4583282470703, "b": 551.2162475585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 555.5741577148438, "r": 264.63580322265625, "b": 551.2162475585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 284.06134033203125, "t": 555.5741577148438, "r": 286.6445007324219, "b": 551.2162475585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.530521392822266, "t": 549.3795166015625, "r": 139.7225341796875, "b": 545.0216064453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Computational Linguistics(COLING2002)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536193847656, "t": 549.3795166015625, "r": 190.85670471191406, "b": 545.0216064453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "140", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 207.6983184814453, "t": 549.3795166015625, "r": 215.4496612548828, "b": 545.0216064453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "140", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 234.8751678466797, "t": 549.3795166015625, "r": 237.4583282470703, "b": 545.0216064453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 549.3795166015625, "r": 264.63580322265625, "b": 545.0216064453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 284.06134033203125, "t": 549.3795166015625, "r": 286.6445007324219, "b": 545.0216064453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.530521392822266, "t": 542.4105834960938, "r": 128.96026611328125, "b": 538.0201416015625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u96fb\u6c17\u60c5\u5831\u901a\u4fe1\u5b66\u4f1a 2003 \u5e74\u7dcf\u5408\u5927\u4f1a", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536193847656, "t": 543.1849365234375, "r": 190.85670471191406, "b": 538.8270263671875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.2822265625, "t": 543.1849365234375, "r": 212.86538696289062, "b": 538.8270263671875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153442382812, "t": 543.1849365234375, "r": 240.04287719726562, "b": 538.8270263671875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "142", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 543.1849365234375, "r": 264.63580322265625, "b": 538.8270263671875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "223", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.4774169921875, "t": 543.1849365234375, "r": 289.228759765625, "b": 538.8270263671875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "147", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.530521392822266, "t": 534.9253540039062, "r": 129.88177490234375, "b": 530.534912109375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u60c5\u5831\u51e6\u7406\u5b66\u4f1a\u7b2c 65 \u56de\u5168\u56fd\u5927\u4f1a (2003)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536193847656, "t": 535.69970703125, "r": 190.85670471191406, "b": 531.341796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "177", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.2822265625, "t": 535.69970703125, "r": 212.86538696289062, "b": 531.341796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153442382812, "t": 535.69970703125, "r": 240.04287719726562, "b": 531.341796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "176", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 535.69970703125, "r": 264.63580322265625, "b": 531.341796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.4774169921875, "t": 535.69970703125, "r": 289.228759765625, "b": 531.341796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "236", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.530521392822266, "t": 527.6982421875, "r": 129.88177490234375, "b": 523.3078002929688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u7b2c 17 \u56de\u4eba\u5de5\u77e5\u80fd\u5b66\u4f1a\u5168\u56fd\u5927\u4f1a (2003)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536193847656, "t": 528.4725952148438, "r": 190.85670471191406, "b": 524.1146850585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "208", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.2822265625, "t": 528.4725952148438, "r": 212.86538696289062, "b": 524.1146850585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153442382812, "t": 528.4725952148438, "r": 240.04287719726562, "b": 524.1146850585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "203", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 528.4725952148438, "r": 264.63580322265625, "b": 524.1146850585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "152", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.4774169921875, "t": 528.4725952148438, "r": 289.228759765625, "b": 524.1146850585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "244", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.530521392822266, "t": 520.47119140625, "r": 127.32453918457031, "b": 516.0807495117188, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\u7814\u7a76\u4f1a\u7b2c 146 \u301c 155 \u56de", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 184.39730834960938, "t": 521.2455444335938, "r": 189.56455993652344, "b": 516.8876342773438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "98", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.2822265625, "t": 521.2455444335938, "r": 212.86538696289062, "b": 516.8876342773438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 233.58348083496094, "t": 521.2455444335938, "r": 238.750732421875, "b": 516.8876342773438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "96", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 521.2455444335938, "r": 264.63580322265625, "b": 516.8876342773438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.4774169921875, "t": 521.2455444335938, "r": 289.228759765625, "b": 516.8876342773438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "232", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.530521392822266, "t": 512.986083984375, "r": 110.16829681396484, "b": 508.59564208984375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "WWW \u304b\u3089\u53ce\u96c6\u3057\u305f\u8ad6\u6587", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536193847656, "t": 514.0184326171875, "r": 190.85670471191406, "b": 509.6605224609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "107", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 208.99026489257812, "t": 514.0184326171875, "r": 214.1575164794922, "b": 509.6605224609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "73", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 233.58348083496094, "t": 514.0184326171875, "r": 238.750732421875, "b": 509.6605224609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "34", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 514.0184326171875, "r": 264.63580322265625, "b": 509.6605224609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "147", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 282.7693786621094, "t": 514.0184326171875, "r": 287.9366149902344, "b": 509.6605224609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "96", "column_header": false, "row_header": false, "row_section": false}, {"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.10536193847656, "t": 506.5333251953125, "r": 190.85670471191406, "b": 502.1754150390625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "945", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 207.6983184814453, "t": 506.5333251953125, "r": 215.4496612548828, "b": 502.1754150390625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "294", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153442382812, "t": 506.5333251953125, "r": 240.04287719726562, "b": 502.1754150390625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "651", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 255.7650604248047, "t": 506.5333251953125, "r": 265.7520446777344, "b": 502.1754150390625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "1122", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.4774169921875, "t": 506.5333251953125, "r": 289.228759765625, "b": 502.1754150390625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "955", "column_header": false, "row_header": false, "row_section": false}], "num_rows": 10, "num_cols": 6, "grid": [[{"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 209.93284606933594, "t": 569.8192749023438, "r": 241.04458618164062, "b": 565.6378784179688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 4, "text": "\u8ad6\u6587\u30d5\u30a1\u30a4\u30eb", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 209.93284606933594, "t": 569.8192749023438, "r": 241.04458618164062, "b": 565.6378784179688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 4, "text": "\u8ad6\u6587\u30d5\u30a1\u30a4\u30eb", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 263.764892578125, "t": 569.8192749023438, "r": 284.5058898925781, "b": 565.6378784179688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 4, "end_col_offset_idx": 6, "text": "\u53c2\u8003\u6587\u732e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 263.764892578125, "t": 569.8192749023438, "r": 284.5058898925781, "b": 565.6378784179688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 4, "end_col_offset_idx": 6, "text": "\u53c2\u8003\u6587\u732e", "column_header": true, "row_header": false, "row_section": false}], [{"bbox": {"l": 110.24990844726562, "t": 562.3340454101562, "r": 120.62017822265625, "b": 558.1526489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u51fa\u5178", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 175.3660888671875, "t": 562.3340454101562, "r": 201.29246520996094, "b": 558.1526489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "\u30d5\u30a1\u30a4\u30eb \u6570", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 209.62408447265625, "t": 562.3340454101562, "r": 219.99435424804688, "b": 558.1526489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "\u82f1\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 229.19813537597656, "t": 562.3340454101562, "r": 244.75376892089844, "b": 558.1526489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "\u65e5\u672c\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 256.11419677734375, "t": 562.3340454101562, "r": 266.4844665527344, "b": 558.1526489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "\u82f1\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 278.38433837890625, "t": 562.3340454101562, "r": 293.9399719238281, "b": 558.1526489257812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "\u65e5\u672c\u8a9e", "column_header": true, "row_header": false, "row_section": false}], [{"bbox": {"l": 55.530521392822266, "t": 555.5741577148438, "r": 162.71310424804688, "b": 551.2162475585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Association for Computational Linguistics(ACL2003)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 184.39730834960938, "t": 555.5741577148438, "r": 189.56455993652344, "b": 551.2162475585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "65", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 208.99026489257812, "t": 555.5741577148438, "r": 214.1575164794922, "b": 551.2162475585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "65", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 234.8751678466797, "t": 555.5741577148438, "r": 237.4583282470703, "b": 551.2162475585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 555.5741577148438, "r": 264.63580322265625, "b": 551.2162475585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 284.06134033203125, "t": 555.5741577148438, "r": 286.6445007324219, "b": 551.2162475585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 55.530521392822266, "t": 549.3795166015625, "r": 139.7225341796875, "b": 545.0216064453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Computational Linguistics(COLING2002)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536193847656, "t": 549.3795166015625, "r": 190.85670471191406, "b": 545.0216064453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "140", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 207.6983184814453, "t": 549.3795166015625, "r": 215.4496612548828, "b": 545.0216064453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "140", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 234.8751678466797, "t": 549.3795166015625, "r": 237.4583282470703, "b": 545.0216064453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 549.3795166015625, "r": 264.63580322265625, "b": 545.0216064453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 284.06134033203125, "t": 549.3795166015625, "r": 286.6445007324219, "b": 545.0216064453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 55.530521392822266, "t": 542.4105834960938, "r": 128.96026611328125, "b": 538.0201416015625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u96fb\u6c17\u60c5\u5831\u901a\u4fe1\u5b66\u4f1a 2003 \u5e74\u7dcf\u5408\u5927\u4f1a", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536193847656, "t": 543.1849365234375, "r": 190.85670471191406, "b": 538.8270263671875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.2822265625, "t": 543.1849365234375, "r": 212.86538696289062, "b": 538.8270263671875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153442382812, "t": 543.1849365234375, "r": 240.04287719726562, "b": 538.8270263671875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "142", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 543.1849365234375, "r": 264.63580322265625, "b": 538.8270263671875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "223", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.4774169921875, "t": 543.1849365234375, "r": 289.228759765625, "b": 538.8270263671875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "147", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 55.530521392822266, "t": 534.9253540039062, "r": 129.88177490234375, "b": 530.534912109375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u60c5\u5831\u51e6\u7406\u5b66\u4f1a\u7b2c 65 \u56de\u5168\u56fd\u5927\u4f1a (2003)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536193847656, "t": 535.69970703125, "r": 190.85670471191406, "b": 531.341796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "177", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.2822265625, "t": 535.69970703125, "r": 212.86538696289062, "b": 531.341796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153442382812, "t": 535.69970703125, "r": 240.04287719726562, "b": 531.341796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "176", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 535.69970703125, "r": 264.63580322265625, "b": 531.341796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.4774169921875, "t": 535.69970703125, "r": 289.228759765625, "b": 531.341796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "236", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 55.530521392822266, "t": 527.6982421875, "r": 129.88177490234375, "b": 523.3078002929688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u7b2c 17 \u56de\u4eba\u5de5\u77e5\u80fd\u5b66\u4f1a\u5168\u56fd\u5927\u4f1a (2003)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536193847656, "t": 528.4725952148438, "r": 190.85670471191406, "b": 524.1146850585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "208", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.2822265625, "t": 528.4725952148438, "r": 212.86538696289062, "b": 524.1146850585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153442382812, "t": 528.4725952148438, "r": 240.04287719726562, "b": 524.1146850585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "203", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 528.4725952148438, "r": 264.63580322265625, "b": 524.1146850585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "152", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.4774169921875, "t": 528.4725952148438, "r": 289.228759765625, "b": 524.1146850585938, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "244", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 55.530521392822266, "t": 520.47119140625, "r": 127.32453918457031, "b": 516.0807495117188, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\u7814\u7a76\u4f1a\u7b2c 146 \u301c 155 \u56de", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 184.39730834960938, "t": 521.2455444335938, "r": 189.56455993652344, "b": 516.8876342773438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "98", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.2822265625, "t": 521.2455444335938, "r": 212.86538696289062, "b": 516.8876342773438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 233.58348083496094, "t": 521.2455444335938, "r": 238.750732421875, "b": 516.8876342773438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "96", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 521.2455444335938, "r": 264.63580322265625, "b": 516.8876342773438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.4774169921875, "t": 521.2455444335938, "r": 289.228759765625, "b": 516.8876342773438, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "232", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 55.530521392822266, "t": 512.986083984375, "r": 110.16829681396484, "b": 508.59564208984375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "WWW \u304b\u3089\u53ce\u96c6\u3057\u305f\u8ad6\u6587", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536193847656, "t": 514.0184326171875, "r": 190.85670471191406, "b": 509.6605224609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "107", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 208.99026489257812, "t": 514.0184326171875, "r": 214.1575164794922, "b": 509.6605224609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "73", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 233.58348083496094, "t": 514.0184326171875, "r": 238.750732421875, "b": 509.6605224609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "34", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446044921875, "t": 514.0184326171875, "r": 264.63580322265625, "b": 509.6605224609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "147", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 282.7693786621094, "t": 514.0184326171875, "r": 287.9366149902344, "b": 509.6605224609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "96", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.10536193847656, "t": 506.5333251953125, "r": 190.85670471191406, "b": 502.1754150390625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "945", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 207.6983184814453, "t": 506.5333251953125, "r": 215.4496612548828, "b": 502.1754150390625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "294", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153442382812, "t": 506.5333251953125, "r": 240.04287719726562, "b": 502.1754150390625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "651", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 255.7650604248047, "t": 506.5333251953125, "r": 265.7520446777344, "b": 502.1754150390625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "1122", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.4774169921875, "t": 506.5333251953125, "r": 289.228759765625, "b": 502.1754150390625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "955", "column_header": false, "row_header": false, "row_section": false}]]}}, {"self_ref": "#/tables/7", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 8, "bbox": {"l": 304.9219970703125, "t": 573.485107421875, "r": 550.2321166992188, "b": 504.09930419921875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/290"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [{"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 392.0967102050781, "t": 570.425537109375, "r": 438.0144958496094, "b": 565.3603515625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 3, "text": "Shares (in millions)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 459.0486145019531, "t": 570.3758544921875, "r": 542.0001831054688, "b": 559.1006469726562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 5, "text": "Weighted Average Grant Date Fair Value", "column_header": true, "row_header": false, "row_section": false}, {"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 393.24420166015625, "t": 555.2528686523438, "r": 407.3463134765625, "b": 550.1876831054688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "RS U s", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 427.1832275390625, "t": 555.2528686523438, "r": 440.98779296875, "b": 550.1876831054688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "PSUs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 468.3825378417969, "t": 555.2528686523438, "r": 482.4846496582031, "b": 550.1876831054688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "RSUs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 516.92578125, "t": 555.2528686523438, "r": 530.7303466796875, "b": 550.1876831054688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PSUs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11492919921875, "t": 547.38916015625, "r": 364.65606689453125, "b": 542.323974609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Nonvested on Janua ry 1", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 396.2466125488281, "t": 547.0867309570312, "r": 403.75531005859375, "b": 542.0215454101562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "1. 1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 429.8183898925781, "t": 547.0867309570312, "r": 437.32708740234375, "b": 542.0215454101562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 465.5285949707031, "t": 547.0867309570312, "r": 483.5500183105469, "b": 542.0215454101562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "90.10 $", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 513.4482421875, "t": 547.0867309570312, "r": 531.4696655273438, "b": 542.0215454101562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "$ 91.19", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11492919921875, "t": 538.3154907226562, "r": 325.6267395019531, "b": 533.2503051757812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Granted", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 396.2466125488281, "t": 538.3154907226562, "r": 403.75531005859375, "b": 533.2503051757812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "0. 5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 429.8183898925781, "t": 538.3154907226562, "r": 437.32708740234375, "b": 533.2503051757812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 466.435791015625, "t": 538.3154907226562, "r": 482.5483093261719, "b": 533.2503051757812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "117.44", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 514.2906494140625, "t": 538.3154907226562, "r": 530.809814453125, "b": 533.2503051757812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "122.41", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11492919921875, "t": 530.4517822265625, "r": 322.628662109375, "b": 525.3865966796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Vested", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 394.4322204589844, "t": 530.4517822265625, "r": 405.5362548828125, "b": 525.3865966796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "(0. 5 )", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 427.70159912109375, "t": 530.4517822265625, "r": 438.8056335449219, "b": 525.3865966796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "(0.1)", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 468.5553283691406, "t": 530.4517822265625, "r": 482.0704345703125, "b": 525.3865966796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "87.08", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 516.0186157226562, "t": 530.4517822265625, "r": 529.5337524414062, "b": 525.3865966796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "81.14", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11492919921875, "t": 522.3585205078125, "r": 356.2477111816406, "b": 517.2933349609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Canceled or forfeited", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 394.4322204589844, "t": 521.6805419921875, "r": 405.5362548828125, "b": 516.6153564453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "(0. 1 )", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 431.02801513671875, "t": 521.6805419921875, "r": 436.4280090332031, "b": 516.6153564453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 465.83099365234375, "t": 521.6805419921875, "r": 482.3501281738281, "b": 516.6153564453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "102.01", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 516.0186157226562, "t": 521.6805419921875, "r": 529.5337524414062, "b": 516.6153564453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "92.18", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11492919921875, "t": 513.5142822265625, "r": 373.3576354980469, "b": 508.4490661621094, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Nonvested on December 31", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 396.2466125488281, "t": 513.5142822265625, "r": 403.75531005859375, "b": 508.4490661621094, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "1.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 429.5159912109375, "t": 513.5142822265625, "r": 437.0246887207031, "b": 508.4490661621094, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 463.7142028808594, "t": 513.5142822265625, "r": 484.7396545410156, "b": 508.4490661621094, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "104.85 $", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.99462890625, "t": 513.5142822265625, "r": 534.0200805664062, "b": 508.4490661621094, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "$ 104.51", "column_header": false, "row_header": false, "row_section": false}], "num_rows": 7, "num_cols": 5, "grid": [[{"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 392.0967102050781, "t": 570.425537109375, "r": 438.0144958496094, "b": 565.3603515625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 3, "text": "Shares (in millions)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 392.0967102050781, "t": 570.425537109375, "r": 438.0144958496094, "b": 565.3603515625, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 3, "text": "Shares (in millions)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 459.0486145019531, "t": 570.3758544921875, "r": 542.0001831054688, "b": 559.1006469726562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 5, "text": "Weighted Average Grant Date Fair Value", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 459.0486145019531, "t": 570.3758544921875, "r": 542.0001831054688, "b": 559.1006469726562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 5, "text": "Weighted Average Grant Date Fair Value", "column_header": true, "row_header": false, "row_section": false}], [{"bbox": null, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 393.24420166015625, "t": 555.2528686523438, "r": 407.3463134765625, "b": 550.1876831054688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "RS U s", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 427.1832275390625, "t": 555.2528686523438, "r": 440.98779296875, "b": 550.1876831054688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "PSUs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 468.3825378417969, "t": 555.2528686523438, "r": 482.4846496582031, "b": 550.1876831054688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "RSUs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 516.92578125, "t": 555.2528686523438, "r": 530.7303466796875, "b": 550.1876831054688, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PSUs", "column_header": true, "row_header": false, "row_section": false}], [{"bbox": {"l": 306.11492919921875, "t": 547.38916015625, "r": 364.65606689453125, "b": 542.323974609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Nonvested on Janua ry 1", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 396.2466125488281, "t": 547.0867309570312, "r": 403.75531005859375, "b": 542.0215454101562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "1. 1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 429.8183898925781, "t": 547.0867309570312, "r": 437.32708740234375, "b": 542.0215454101562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 465.5285949707031, "t": 547.0867309570312, "r": 483.5500183105469, "b": 542.0215454101562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "90.10 $", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 513.4482421875, "t": 547.0867309570312, "r": 531.4696655273438, "b": 542.0215454101562, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "$ 91.19", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 306.11492919921875, "t": 538.3154907226562, "r": 325.6267395019531, "b": 533.2503051757812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Granted", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 396.2466125488281, "t": 538.3154907226562, "r": 403.75531005859375, "b": 533.2503051757812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "0. 5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 429.8183898925781, "t": 538.3154907226562, "r": 437.32708740234375, "b": 533.2503051757812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 466.435791015625, "t": 538.3154907226562, "r": 482.5483093261719, "b": 533.2503051757812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "117.44", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 514.2906494140625, "t": 538.3154907226562, "r": 530.809814453125, "b": 533.2503051757812, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "122.41", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 306.11492919921875, "t": 530.4517822265625, "r": 322.628662109375, "b": 525.3865966796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Vested", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 394.4322204589844, "t": 530.4517822265625, "r": 405.5362548828125, "b": 525.3865966796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "(0. 5 )", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 427.70159912109375, "t": 530.4517822265625, "r": 438.8056335449219, "b": 525.3865966796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "(0.1)", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 468.5553283691406, "t": 530.4517822265625, "r": 482.0704345703125, "b": 525.3865966796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "87.08", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 516.0186157226562, "t": 530.4517822265625, "r": 529.5337524414062, "b": 525.3865966796875, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "81.14", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 306.11492919921875, "t": 522.3585205078125, "r": 356.2477111816406, "b": 517.2933349609375, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Canceled or forfeited", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 394.4322204589844, "t": 521.6805419921875, "r": 405.5362548828125, "b": 516.6153564453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "(0. 1 )", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 431.02801513671875, "t": 521.6805419921875, "r": 436.4280090332031, "b": 516.6153564453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 465.83099365234375, "t": 521.6805419921875, "r": 482.3501281738281, "b": 516.6153564453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "102.01", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 516.0186157226562, "t": 521.6805419921875, "r": 529.5337524414062, "b": 516.6153564453125, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "92.18", "column_header": false, "row_header": false, "row_section": false}], [{"bbox": {"l": 306.11492919921875, "t": 513.5142822265625, "r": 373.3576354980469, "b": 508.4490661621094, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Nonvested on December 31", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 396.2466125488281, "t": 513.5142822265625, "r": 403.75531005859375, "b": 508.4490661621094, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "1.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 429.5159912109375, "t": 513.5142822265625, "r": 437.0246887207031, "b": 508.4490661621094, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 463.7142028808594, "t": 513.5142822265625, "r": 484.7396545410156, "b": 508.4490661621094, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "104.85 $", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.99462890625, "t": 513.5142822265625, "r": 534.0200805664062, "b": 508.4490661621094, "coord_origin": "BOTTOMLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "$ 104.51", "column_header": false, "row_header": false, "row_section": false}]]}}, {"self_ref": "#/tables/8", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 13, "bbox": {"l": 84.0283203125, "t": 635.6664428710938, "r": 239.1690673828125, "b": 577.606689453125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/9", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 13, "bbox": {"l": 82.92001342773438, "t": 558.2236938476562, "r": 239.1903533935547, "b": 500.716064453125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/10", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 13, "bbox": {"l": 83.94786071777344, "t": 482.9522705078125, "r": 239.17135620117188, "b": 424.0904235839844, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/11", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 13, "bbox": {"l": 83.31756591796875, "t": 395.9864501953125, "r": 248.873046875, "b": 304.7430114746094, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/503"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/12", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 13, "bbox": {"l": 310.3294372558594, "t": 690.8223266601562, "r": 555.8338623046875, "b": 655.8524780273438, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/13", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 13, "bbox": {"l": 309.9566345214844, "t": 637.385498046875, "r": 555.7466430664062, "b": 607.2774658203125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/14", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 13, "bbox": {"l": 309.9635314941406, "t": 596.2945556640625, "r": 555.7054443359375, "b": 558.4485473632812, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/15", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 13, "bbox": {"l": 309.79150390625, "t": 538.0946044921875, "r": 425.9603271484375, "b": 499.60601806640625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/505"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/16", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 13, "bbox": {"l": 335.2694091796875, "t": 403.53253173828125, "r": 490.081787109375, "b": 354.97760009765625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/17", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 13, "bbox": {"l": 334.9334716796875, "t": 338.0523681640625, "r": 490.0914306640625, "b": 289.2789001464844, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/18", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 13, "bbox": {"l": 335.2545471191406, "t": 272.92431640625, "r": 490.22369384765625, "b": 224.31207275390625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/19", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 13, "bbox": {"l": 333.9573669433594, "t": 198.8865966796875, "r": 518.4768676757812, "b": 126.5096435546875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/506"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/20", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 14, "bbox": {"l": 51.72642135620117, "t": 518.3907470703125, "r": 283.114013671875, "b": 447.7554931640625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/21", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 14, "bbox": {"l": 51.434879302978516, "t": 338.51251220703125, "r": 310.7267150878906, "b": 300.17974853515625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/22", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 14, "bbox": {"l": 50.86823654174805, "t": 287.90374755859375, "r": 310.6080017089844, "b": 249.55401611328125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/23", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 14, "bbox": {"l": 51.27280807495117, "t": 238.271484375, "r": 311.0897216796875, "b": 200.086669921875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/24", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 14, "bbox": {"l": 318.9809265136719, "t": 630.765380859375, "r": 534.6229248046875, "b": 577.3739624023438, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/25", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 14, "bbox": {"l": 319.0057678222656, "t": 565.8936767578125, "r": 534.408935546875, "b": 512.142333984375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/26", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 14, "bbox": {"l": 328.1381530761719, "t": 503.3182067871094, "r": 523.8916015625, "b": 433.7275695800781, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/27", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 14, "bbox": {"l": 319.4707946777344, "t": 361.09698486328125, "r": 518.5693359375, "b": 314.05645751953125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/28", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 14, "bbox": {"l": 319.982666015625, "t": 302.7562561035156, "r": 519.0963745117188, "b": 256.30419921875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/29", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 14, "bbox": {"l": 319.8287658691406, "t": 245.5906982421875, "r": 519.6065673828125, "b": 198.8935546875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/30", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 14, "bbox": {"l": 319.06494140625, "t": 182.1591796875, "r": 533.77392578125, "b": 122.80792236328125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/511"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/31", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 15, "bbox": {"l": 55.116363525390625, "t": 655.7449951171875, "r": 279.370849609375, "b": 542.6654663085938, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/32", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 15, "bbox": {"l": 54.28135299682617, "t": 531.7384033203125, "r": 279.2568359375, "b": 418.4729309082031, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/33", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 15, "bbox": {"l": 50.64818572998047, "t": 286.01953125, "r": 319.9103088378906, "b": 160.736328125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/512"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/34", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 15, "bbox": {"l": 323.0059509277344, "t": 670.452880859375, "r": 525.95166015625, "b": 569.088623046875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/35", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 15, "bbox": {"l": 323.384765625, "t": 550.0270385742188, "r": 526.1268920898438, "b": 447.90789794921875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/36", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 15, "bbox": {"l": 323.46868896484375, "t": 429.5491638183594, "r": 525.9569091796875, "b": 327.739501953125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}, {"self_ref": "#/tables/37", "parent": {"cref": "#/body"}, "children": [], "label": "table", "prov": [{"page_no": 15, "bbox": {"l": 353.6920471191406, "t": 304.594970703125, "r": 495.4288024902344, "b": 156.22674560546875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 0]}], "captions": [{"cref": "#/texts/514"}], "references": [], "footnotes": [], "image": null, "data": {"table_cells": [], "num_rows": 0, "num_cols": 0, "grid": []}}], "key_value_items": [], "pages": {"1": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 1}, "2": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 2}, "3": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 3}, "4": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 4}, "5": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 5}, "6": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 6}, "7": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 7}, "8": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 8}, "9": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 9}, "10": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 10}, "11": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 11}, "12": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 12}, "13": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 13}, "14": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 14}, "15": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 15}, "16": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 16}}} \ No newline at end of file diff --git a/tests/data/groundtruth/docling_v2/2203.01017v2.md b/tests/data/groundtruth/docling_v2/2203.01017v2.md index 93559a9ee..4a15137fc 100644 --- a/tests/data/groundtruth/docling_v2/2203.01017v2.md +++ b/tests/data/groundtruth/docling_v2/2203.01017v2.md @@ -52,11 +52,11 @@ To meet the design criteria listed above, we developed a new model called TableF The paper is structured as follows. In Sec. 2, we give a brief overview of the current state-of-the-art. In Sec. 3, we describe the datasets on which we train. In Sec. 4, we introduce the TableFormer model-architecture and describe -its results & performance in Sec. 5. As a conclusion, we describe how this new model-architecture can be re-purposed for other tasks in the computer-vision community. +its results & performance in Sec. 5. As a conclusion, we describe how this new model-architecture can be re-purposed for other tasks in the computer-vision community. ## 2. Previous work and State of the Art -Identifying the structure of a table has been an outstanding problem in the document-parsing community, that motivates many organised public challenges [6, 4, 14]. The difficulty of the problem can be attributed to a number of factors. First, there is a large variety in the shapes and sizes of tables. Such large variety requires a flexible method. This is especially true for complex column- and row headers, which can be extremely intricate and demanding. A second factor of complexity is the lack of data with regard to table-structure. Until the publication of PubTabNet [37], there were no large datasets (i.e. > 100 K tables) that provided structure information. This happens primarily due to the fact that tables are notoriously time-consuming to annotate by hand. However, this has definitely changed in recent years with the deliverance of PubTabNet [37], FinTabNet [36], TableBank [17] etc. +Identifying the structure of a table has been an outstanding problem in the document-parsing community, that motivates many organised public challenges [6, 4, 14]. The difficulty of the problem can be attributed to a number of factors. First, there is a large variety in the shapes and sizes of tables. Such large variety requires a flexible method. This is especially true for complex column- and row headers, which can be extremely intricate and demanding. A second factor of complexity is the lack of data with regard to table-structure. Until the publication of PubTabNet [37], there were no large datasets (i.e. > 100 K tables) that provided structure information. This happens primarily due to the fact that tables are notoriously time-consuming to annotate by hand. However, this has definitely changed in recent years with the deliverance of PubTabNet [37], FinTabNet [36], TableBank [17] etc. Before the rising popularity of deep neural networks, the community relied heavily on heuristic and/or statistical methods to do table structure identification [3, 7, 11, 5, 13, 28]. Although such methods work well on constrained tables [12], a more data-driven approach can be applied due to the advent of convolutional neural networks (CNNs) and the availability of large datasets. To the best-of-our knowledge, there are currently two different types of network architecture that are being pursued for state-of-the-art tablestructure identification. @@ -115,7 +115,7 @@ Given the image of a table, TableFormer is able to predict: 1) a sequence of tok ## 4.1. Model architecture. -We now describe in detail the proposed method, which is composed of three main components, see Fig. 4. Our CNN Backbone Network encodes the input as a feature vector of predefined length. The input feature vector of the encoded image is passed to the Structure Decoder to produce a sequence of HTML tags that represent the structure of the table. With each prediction of an HTML standard data cell (' < td > ') the hidden state of that cell is passed to the Cell BBox Decoder. As for spanning cells, such as row or column span, the tag is broken down to ' < ', 'rowspan=' or 'colspan=', with the number of spanning cells (attribute), and ' > '. The hidden state attached to ' < ' is passed to the Cell BBox Decoder. A shared feed forward network (FFN) receives the hidden states from the Structure Decoder, to provide the final detection predictions of the bounding box coordinates and their classification. +We now describe in detail the proposed method, which is composed of three main components, see Fig. 4. Our CNN Backbone Network encodes the input as a feature vector of predefined length. The input feature vector of the encoded image is passed to the Structure Decoder to produce a sequence of HTML tags that represent the structure of the table. With each prediction of an HTML standard data cell (' < td > ') the hidden state of that cell is passed to the Cell BBox Decoder. As for spanning cells, such as row or column span, the tag is broken down to ' < ', 'rowspan=' or 'colspan=', with the number of spanning cells (attribute), and ' > '. The hidden state attached to ' < ' is passed to the Cell BBox Decoder. A shared feed forward network (FFN) receives the hidden states from the Structure Decoder, to provide the final detection predictions of the bounding box coordinates and their classification. CNN Backbone Network. A ResNet-18 CNN is the backbone that receives the table image and encodes it as a vector of predefined length. The network has been modified by removing the linear and pooling layer, as we are not per- @@ -123,7 +123,7 @@ Figure 3: TableFormer takes in an image of the PDF and creates bounding box and -Figure 4: Given an input image of a table, the Encoder produces fixed-length features that represent the input image. The features are then passed to both the Structure Decoder and Cell BBox Decoder . During training, the Structure Decoder receives 'tokenized tags' of the HTML code that represent the table structure. Afterwards, a transformer encoder and decoder architecture is employed to produce features that are received by a linear layer, and the Cell BBox Decoder. The linear layer is applied to the features to predict the tags. Simultaneously, the Cell BBox Decoder selects features referring to the data cells (' < td > ', ' < ') and passes them through an attention network, an MLP, and a linear layer to predict the bounding boxes. +Figure 4: Given an input image of a table, the Encoder produces fixed-length features that represent the input image. The features are then passed to both the Structure Decoder and Cell BBox Decoder . During training, the Structure Decoder receives 'tokenized tags' of the HTML code that represent the table structure. Afterwards, a transformer encoder and decoder architecture is employed to produce features that are received by a linear layer, and the Cell BBox Decoder. The linear layer is applied to the features to predict the tags. Simultaneously, the Cell BBox Decoder selects features referring to the data cells (' < td > ', ' < ') and passes them through an attention network, an MLP, and a linear layer to predict the bounding boxes. @@ -133,7 +133,7 @@ Structure Decoder. The transformer architecture of this component is based on th The transformer encoder receives an encoded image from the CNN Backbone Network and refines it through a multi-head dot-product attention layer, followed by a Feed Forward Network. During training, the transformer decoder receives as input the output feature produced by the transformer encoder, and the tokenized input of the HTML ground-truth tags. Using a stack of multi-head attention layers, different aspects of the tag sequence could be inferred. This is achieved by each attention head on a layer operating in a different subspace, and then combining altogether their attention score. -Cell BBox Decoder. Our architecture allows to simultaneously predict HTML tags and bounding boxes for each table cell without the need of a separate object detector end to end. This approach is inspired by DETR [1] which employs a Transformer Encoder, and Decoder that looks for a specific number of object queries (potential object detections). As our model utilizes a transformer architecture, the hidden state of the < td > ' and ' < ' HTML structure tags become the object query. +Cell BBox Decoder. Our architecture allows to simultaneously predict HTML tags and bounding boxes for each table cell without the need of a separate object detector end to end. This approach is inspired by DETR [1] which employs a Transformer Encoder, and Decoder that looks for a specific number of object queries (potential object detections). As our model utilizes a transformer architecture, the hidden state of the < td > ' and ' < ' HTML structure tags become the object query. The encoding generated by the CNN Backbone Network along with the features acquired for every data cell from the Transformer Decoder are then passed to the attention network. The attention network takes both inputs and learns to provide an attention weighted encoding. This weighted at- @@ -145,9 +145,9 @@ Loss Functions. We formulate a multi-task loss Eq. 2 to train our network. The C The loss used to train the TableFormer can be defined as following: -$$l$\_{box}$ = λ$\_{iou}$l$\_{iou}$ + λ$\_{l}$$_{1}$ l = λl$_{s}$ + (1 - λ ) l$_{box}$ (1)$$ + -where λ ∈ [0, 1], and λ$\_{iou}$, λ$\_{l}$$_{1}$ ∈$_{R}$ are hyper-parameters. +where λ ∈ [0, 1], and λ$_{iou}$, λ$_{l}$$\_{1}$ ∈$\_{R}$ are hyper-parameters. ## 5. Experimental Results @@ -155,7 +155,7 @@ where λ ∈ [0, 1], and λ$\_{iou}$, λ$\_{l}$$_{1}$ ∈$_{R}$ are hyper-parame TableFormer uses ResNet-18 as the CNN Backbone Network . The input images are resized to 448*448 pixels and the feature map has a dimension of 28*28. Additionally, we enforce the following input constraints: -$$Image width and height ≤ 1024 pixels Structural tags length ≤ 512 tokens. (2)$$ + Although input constraints are used also by other methods, such as EDD, ours are less restrictive due to the improved @@ -177,7 +177,7 @@ We also share our baseline results on the challenging SynthTabNet dataset. Throu The Tree-Edit-Distance-Based Similarity (TEDS) metric was introduced in [37]. It represents the prediction, and ground-truth as a tree structure of HTML tags. This similarity is calculated as: -$$TEDS ( T$\_{a}$, T$\_{b}$ ) = 1 - EditDist ( T$\_{a}$, T$\_{b}$ ) max ( | T$\_{a}$ | , | T$\_{b}$ | ) (3)$$ + where T$_{a}$ and T$_{b}$ represent tables in tree structure HTML format. EditDist denotes the tree-edit distance, and | T | represents the number of nodes in T . @@ -277,7 +277,7 @@ Figure 6: An example of TableFormer predictions (bounding boxes and structure) f We showcase several visualizations for the different components of our network on various "complex" tables within datasets presented in this work in Fig. 5 and Fig. 6 As it is shown, our model is able to predict bounding boxes for all table cells, even for the empty ones. Additionally, our post-processing techniques can extract the cell content by matching the predicted bounding boxes to the PDF cells based on their overlap and spatial proximity. The left part of Fig. 5 demonstrates also the adaptability of our method to any language, as it can successfully extract Japanese text, although the training set contains only English content. We provide more visualizations including the intermediate steps in the supplementary material. Overall these illustrations justify the versatility of our method across a diverse range of table appearances and content type. -## 6. Future Work & Conclusion +## 6. Future Work & Conclusion In this paper, we presented TableFormer an end-to-end transformer based approach to predict table structures and bounding boxes of cells from an image. This approach enables us to recreate the table structure, and extract the cell content from PDF or OCR by using bounding boxes. Additionally, it provides the versatility required in real-world scenarios when dealing with various types of PDF documents, and languages. Furthermore, our method outperforms all state-of-the-arts with a wide margin. Finally, we introduce "SynthTabNet" a challenging synthetically generated dataset that reinforces missing characteristics from other datasets. @@ -377,7 +377,7 @@ Here is a step-by-step description of the prediction postprocessing: - 3.a. If all IOU scores in a column are below the threshold, discard all predictions (structure and bounding boxes) for that column. - 4. Find the best-fitting content alignment for the predicted cells with good IOU per each column. The alignment of the column can be identified by the following formula: -$$alignment = arg min c { D$\_{c}$ } D$\_{c}$ = max { x$\_{c}$ } - min { x$\_{c}$ } (4)$$ + where c is one of { left, centroid, right } and x$_{c}$ is the xcoordinate for the corresponding point. diff --git a/tests/data/groundtruth/docling_v2/2206.01062.md b/tests/data/groundtruth/docling_v2/2206.01062.md index c5452c579..4f3872cc2 100644 --- a/tests/data/groundtruth/docling_v2/2206.01062.md +++ b/tests/data/groundtruth/docling_v2/2206.01062.md @@ -55,7 +55,7 @@ In this paper, we present the DocLayNet dataset. It provides pageby-page layout This enables experimentation with annotation uncertainty and quality control analysis. -- (5) Pre-defined Train-, Test- & Validation-set : Like DocBank, we provide fixed train-, test- & validation-sets to ensure proportional representation of the class-labels. Further, we prevent leakage of unique layouts across sets, which has a large effect on model accuracy scores. +- (5) Pre-defined Train-, Test- & Validation-set : Like DocBank, we provide fixed train-, test- & validation-sets to ensure proportional representation of the class-labels. Further, we prevent leakage of unique layouts across sets, which has a large effect on model accuracy scores. All aspects outlined above are detailed in Section 3. In Section 4, we will elaborate on how we designed and executed this large-scale human annotation campaign. We will also share key insights and lessons learned that might prove helpful for other parties planning to set up annotation campaigns. @@ -77,9 +77,9 @@ Figure 2: Distribution of DocLayNet pages across document categories. -to a minimum, since they introduce difficulties in annotation (see Section 4). As a second condition, we focussed on medium to large documents ( > 10 pages) with technical content, dense in complex tables, figures, plots and captions. Such documents carry a lot of information value, but are often hard to analyse with high accuracy due to their challenging layouts. Counterexamples of documents not included in the dataset are receipts, invoices, hand-written documents or photographs showing "text in the wild". +to a minimum, since they introduce difficulties in annotation (see Section 4). As a second condition, we focussed on medium to large documents ( > 10 pages) with technical content, dense in complex tables, figures, plots and captions. Such documents carry a lot of information value, but are often hard to analyse with high accuracy due to their challenging layouts. Counterexamples of documents not included in the dataset are receipts, invoices, hand-written documents or photographs showing "text in the wild". -The pages in DocLayNet can be grouped into six distinct categories, namely Financial Reports , Manuals , Scientific Articles , Laws & Regulations , Patents and Government Tenders . Each document category was sourced from various repositories. For example, Financial Reports contain both free-style format annual reports 2 which expose company-specific, artistic layouts as well as the more formal SEC filings. The two largest categories ( Financial Reports and Manuals ) contain a large amount of free-style layouts in order to obtain maximum variability. In the other four categories, we boosted the variability by mixing documents from independent providers, such as different government websites or publishers. In Figure 2, we show the document categories contained in DocLayNet with their respective sizes. +The pages in DocLayNet can be grouped into six distinct categories, namely Financial Reports , Manuals , Scientific Articles , Laws & Regulations , Patents and Government Tenders . Each document category was sourced from various repositories. For example, Financial Reports contain both free-style format annual reports 2 which expose company-specific, artistic layouts as well as the more formal SEC filings. The two largest categories ( Financial Reports and Manuals ) contain a large amount of free-style layouts in order to obtain maximum variability. In the other four categories, we boosted the variability by mixing documents from independent providers, such as different government websites or publishers. In Figure 2, we show the document categories contained in DocLayNet with their respective sizes. We did not control the document selection with regard to language. The vast majority of documents contained in DocLayNet (close to 95%) are published in English language. However, DocLayNet also contains a number of documents in other languages such as German (2.5%), French (1.0%) and Japanese (1.0%). While the document language has negligible impact on the performance of computer vision methods such as object detection and segmentation models, it might prove challenging for layout analysis methods which exploit textual features. @@ -192,7 +192,7 @@ In Table 2, we present baseline experiments (given in mAP) on Mask R-CNN [12], F Table 3: Performance of a Mask R-CNN R50 network in mAP@0.5-0.95 scores trained on DocLayNet with different class label sets. The reduced label sets were obtained by either down-mapping or dropping labels. -Table 4: Performance of a Mask R-CNN R50 network with document-wise and page-wise split for different label sets. Naive page-wise split will result in GLYPH 10% point improvement. +Table 4: Performance of a Mask R-CNN R50 network with document-wise and page-wise split for different label sets. Naive page-wise split will result in GLYPH<tildelow> 10% point improvement. | Class-count | 11 | 6 | 5 | 4 | |----------------|------|---------|---------|---------| @@ -243,7 +243,7 @@ Many documents in DocLayNet have a unique styling. In order to avoid overfitting Throughout this paper, we claim that DocLayNet's wider variety of document layouts leads to more robust layout detection models. In Table 5, we provide evidence for that. We trained models on each of the available datasets (PubLayNet, DocBank and DocLayNet) and evaluated them on the test sets of the other datasets. Due to the different label sets and annotation styles, a direct comparison is not possible. Hence, we focussed on the common labels among the datasets. Between PubLayNet and DocLayNet, these are Picture , -Table 5: Prediction Performance (mAP@0.5-0.95) of a Mask R-CNN R50 network across the PubLayNet, DocBank & DocLayNet data-sets. By evaluating on common label classes of each dataset, we observe that the DocLayNet-trained model has much less pronounced variations in performance across all datasets. +Table 5: Prediction Performance (mAP@0.5-0.95) of a Mask R-CNN R50 network across the PubLayNet, DocBank & DocLayNet data-sets. By evaluating on common label classes of each dataset, we observe that the DocLayNet-trained model has much less pronounced variations in performance across all datasets. | | | Testing on | Testing on | Testing on | |-----------------|------------|--------------|--------------|--------------| diff --git a/tests/data/groundtruth/docling_v2/2305.03393v1.md b/tests/data/groundtruth/docling_v2/2305.03393v1.md index 362c00779..b5838fa9a 100644 --- a/tests/data/groundtruth/docling_v2/2305.03393v1.md +++ b/tests/data/groundtruth/docling_v2/2305.03393v1.md @@ -38,7 +38,7 @@ Approaches to formalize the logical structure and layout of tables in electronic Other work [20] aims at predicting a grid for each table and deciding which cells must be merged using an attention network. Im2Seq methods cast the problem as a sequence generation task [4,5,9,22], and therefore need an internal tablestructure representation language, which is often implemented with standard markup languages (e.g. HTML, LaTeX, Markdown). In theory, Im2Seq methods have a natural advantage over the OD and GNN methods by virtue of directly predicting the table-structure. As such, no post-processing or rules are needed in order to obtain the table-structure, which is necessary with OD and GNN approaches. In practice, this is not entirely true, because a predicted sequence of table-structure markup does not necessarily have to be syntactically correct. Hence, depending on the quality of the predicted sequence, some post-processing needs to be performed to ensure a syntactically valid (let alone correct) sequence. -Within the Im2Seq method, we find several popular models, namely the encoder-dual-decoder model (EDD) [22], TableFormer [9], Tabsplitter[2] and Ye et. al. [19]. EDD uses two consecutive long short-term memory (LSTM) decoders to predict a table in HTML representation. The tag decoder predicts a sequence of HTML tags. For each decoded table cell ( ), the attention is passed to the cell decoder to predict the content with an embedded OCR approach. The latter makes it susceptible to transcription errors in the cell content of the table. TableFormer address this reliance on OCR and uses two transformer decoders for HTML structure and cell bounding box prediction in an end-to-end architecture. The predicted cell bounding box is then used to extract text tokens from an originating (digital) PDF page, circumventing any need for OCR. TabSplitter [2] proposes a compact double-matrix representation of table rows and columns to do error detection and error correction of HTML structure sequences based on predictions from [19]. This compact double-matrix representation can not be used directly by the Img2seq model training, so the model uses HTML as an intermediate form. Chi et. al. [4] introduce a data set and a baseline method using bidirectional LSTMs to predict LaTeX code. Kayal [5] introduces Gated ResNet transformers to predict LaTeX code, and a separate OCR module to extract content. +Within the Im2Seq method, we find several popular models, namely the encoder-dual-decoder model (EDD) [22], TableFormer [9], Tabsplitter[2] and Ye et. al. [19]. EDD uses two consecutive long short-term memory (LSTM) decoders to predict a table in HTML representation. The tag decoder predicts a sequence of HTML tags. For each decoded table cell ( <td> ), the attention is passed to the cell decoder to predict the content with an embedded OCR approach. The latter makes it susceptible to transcription errors in the cell content of the table. TableFormer address this reliance on OCR and uses two transformer decoders for HTML structure and cell bounding box prediction in an end-to-end architecture. The predicted cell bounding box is then used to extract text tokens from an originating (digital) PDF page, circumventing any need for OCR. TabSplitter [2] proposes a compact double-matrix representation of table rows and columns to do error detection and error correction of HTML structure sequences based on predictions from [19]. This compact double-matrix representation can not be used directly by the Img2seq model training, so the model uses HTML as an intermediate form. Chi et. al. [4] introduce a data set and a baseline method using bidirectional LSTMs to predict LaTeX code. Kayal [5] introduces Gated ResNet transformers to predict LaTeX code, and a separate OCR module to extract content. Im2Seq approaches have shown to be well-suited for the TSR task and allow a full end-to-end network design that can output the final table structure without pre- or post-processing logic. Furthermore, Im2Seq models have demonstrated to deliver state-of-the-art prediction accuracy [9]. This motivated the authors to investigate if the performance (both in accuracy and inference time) can be further improved by optimising the table structure representation language. We believe this is a necessary step before further improving neural network architectures for this task. @@ -46,13 +46,13 @@ Im2Seq approaches have shown to be well-suited for the TSR task and allow a full All known Im2Seq based models for TSR fundamentally work in similar ways. Given an image of a table, the Im2Seq model predicts the structure of the table by generating a sequence of tokens. These tokens originate from a finite vocab- -ulary and can be interpreted as a table structure. For example, with the HTML tokens ,
, , , and , one can construct simple table structures without any spanning cells. In reality though, one needs at least 28 HTML tokens to describe the most common complex tables observed in real-world documents [21,22], due to a variety of spanning cells definitions in the HTML token vocabulary. +ulary and can be interpreted as a table structure. For example, with the HTML tokens <table> , </table> , <tr> , </tr> , <td> and </td> , one can construct simple table structures without any spanning cells. In reality though, one needs at least 28 HTML tokens to describe the most common complex tables observed in real-world documents [21,22], due to a variety of spanning cells definitions in the HTML token vocabulary. Fig. 2. Frequency of tokens in HTML and OTSL as they appear in PubTabNet. -Obviously, HTML and other general-purpose markup languages were not designed for Im2Seq models. As such, they have some serious drawbacks. First, the token vocabulary needs to be artificially large in order to describe all plausible tabular structures. Since most Im2Seq models use an autoregressive approach, they generate the sequence token by token. Therefore, to reduce inference time, a shorter sequence length is critical. Every table-cell is represented by at least two tokens ( and ). Furthermore, when tokenizing the HTML structure, one needs to explicitly enumerate possible column-spans and row-spans as words. In practice, this ends up requiring 28 different HTML tokens (when including column- and row-spans up to 10 cells) just to describe every table in the PubTabNet dataset. Clearly, not every token is equally represented, as is depicted in Figure 2. This skewed distribution of tokens in combination with variable token row-length makes it challenging for models to learn the HTML structure. +Obviously, HTML and other general-purpose markup languages were not designed for Im2Seq models. As such, they have some serious drawbacks. First, the token vocabulary needs to be artificially large in order to describe all plausible tabular structures. Since most Im2Seq models use an autoregressive approach, they generate the sequence token by token. Therefore, to reduce inference time, a shorter sequence length is critical. Every table-cell is represented by at least two tokens ( <td> and </td> ). Furthermore, when tokenizing the HTML structure, one needs to explicitly enumerate possible column-spans and row-spans as words. In practice, this ends up requiring 28 different HTML tokens (when including column- and row-spans up to 10 cells) just to describe every table in the PubTabNet dataset. Clearly, not every token is equally represented, as is depicted in Figure 2. This skewed distribution of tokens in combination with variable token row-length makes it challenging for models to learn the HTML structure. Additionally, it would be desirable if the representation would easily allow an early detection of invalid sequences on-the-go, before the prediction of the entire table structure is completed. HTML is not well-suited for this purpose as the verification of incomplete sequences is non-trivial or even impossible. @@ -194,7 +194,7 @@ Secondly, OTSL has more inherent structure and a significantly restricted vocabu - 12. Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR). vol. 1, pp. 1162-1167. IEEE (2017) - 13. Siddiqui, S.A., Fateh, I.A., Rizvi, S.T.R., Dengel, A., Ahmed, S.: Deeptabstr: Deep learning based table structure recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). pp. 1403-1409 (2019). https:// doi.org/10.1109/ICDAR.2019.00226 - 14. Smock, B., Pesala, R., Abraham, R.: PubTables-1M: Towards comprehensive table extraction from unstructured documents. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4634-4642 (June 2022) -- 15. Staar, P.W.J., Dolfi, M., Auer, C., Bekas, C.: Corpus conversion service: A machine learning platform to ingest documents at scale. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp. 774-782. KDD '18, Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3219819.3219834 , https://doi.org/10. 1145/3219819.3219834 +- 15. Staar, P.W.J., Dolfi, M., Auer, C., Bekas, C.: Corpus conversion service: A machine learning platform to ingest documents at scale. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp. 774-782. KDD '18, Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3219819.3219834 , https://doi.org/10. 1145/3219819.3219834 - 16. Wang, X.: Tabular Abstraction, Editing, and Formatting. Ph.D. thesis, CAN (1996), aAINN09397 - 17. Xue, W., Li, Q., Tao, D.: Res2tim: Reconstruct syntactic structures from table images. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). pp. 749-755. IEEE (2019) diff --git a/tests/data/groundtruth/docling_v2/code_and_formula.doctags.txt b/tests/data/groundtruth/docling_v2/code_and_formula.doctags.txt index ad4175400..386cf997d 100644 --- a/tests/data/groundtruth/docling_v2/code_and_formula.doctags.txt +++ b/tests/data/groundtruth/docling_v2/code_and_formula.doctags.txt @@ -7,7 +7,7 @@ Formula Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt. -a 2 + 8 = 12 + Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. diff --git a/tests/data/groundtruth/docling_v2/code_and_formula.json b/tests/data/groundtruth/docling_v2/code_and_formula.json index adcc14d1b..64e69c565 100644 --- a/tests/data/groundtruth/docling_v2/code_and_formula.json +++ b/tests/data/groundtruth/docling_v2/code_and_formula.json @@ -1 +1 @@ -{"schema_name": "DoclingDocument", "version": "1.0.0", "name": "code_and_formula", "origin": {"mimetype": "application/pdf", "binary_hash": 2394749058180317456, "filename": "code_and_formula.pdf", "uri": null}, "furniture": {"self_ref": "#/furniture", "parent": null, "children": [], "name": "_root_", "label": "unspecified"}, "body": {"self_ref": "#/body", "parent": null, "children": [{"cref": "#/texts/0"}, {"cref": "#/texts/1"}, {"cref": "#/texts/2"}, {"cref": "#/texts/3"}, {"cref": "#/texts/4"}, {"cref": "#/texts/5"}, {"cref": "#/texts/6"}, {"cref": "#/texts/7"}, {"cref": "#/texts/8"}, {"cref": "#/texts/9"}, {"cref": "#/texts/10"}, {"cref": "#/texts/11"}, {"cref": "#/texts/12"}, {"cref": "#/texts/13"}], "name": "_root_", "label": "unspecified"}, "groups": [], "texts": [{"self_ref": "#/texts/0", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 1, "bbox": {"l": 133.76800537109375, "t": 667.99462890625, "r": 273.4540100097656, "b": 653.6340942382812, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 17]}], "orig": "Java Code Example", "text": "Java Code Example", "level": 1}, {"self_ref": "#/texts/1", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 133.76800537109375, "t": 642.8859252929688, "r": 477.48065185546875, "b": 501.4163513183594, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 887]}], "orig": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.", "text": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet."}, {"self_ref": "#/texts/2", "parent": {"cref": "#/body"}, "children": [], "label": "paragraph", "prov": [{"page_no": 1, "bbox": {"l": 236.17599487304688, "t": 490.45794677734375, "r": 375.069580078125, "b": 480.4953308105469, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 30]}], "orig": "Listing 1: Simple Java Program", "text": "Listing 1: Simple Java Program"}, {"self_ref": "#/texts/3", "parent": {"cref": "#/body"}, "children": [], "label": "code", "prov": [{"page_no": 1, "bbox": {"l": 134.23899841308594, "t": 474.2005310058594, "r": 337.5928649902344, "b": 443.9358215332031, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 65]}], "orig": "public static void print() { System.out.println( \"Java Code\" ); }", "text": "public static void print() { System.out.println( \"Java Code\" ); }", "code_language": "unknown"}, {"self_ref": "#/texts/4", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 133.76800537109375, "t": 432.27593994140625, "r": 477.47589111328125, "b": 290.80633544921875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 887]}], "orig": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.", "text": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet."}, {"self_ref": "#/texts/5", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 1, "bbox": {"l": 303.13299560546875, "t": 96.83694458007812, "r": 308.1142883300781, "b": 86.87435150146484, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/6", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 2, "bbox": {"l": 133.76800537109375, "t": 717.8846435546875, "r": 191.51429748535156, "b": 703.5241088867188, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Formula", "text": "Formula", "level": 1}, {"self_ref": "#/texts/7", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 133.76800537109375, "t": 692.7759399414062, "r": 477.48065185546875, "b": 551.3063354492188, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 887]}], "orig": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.", "text": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet."}, {"self_ref": "#/texts/8", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 133.76800537109375, "t": 549.3139038085938, "r": 477.4748229980469, "b": 491.53033447265625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 369]}], "orig": "Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt.", "text": "Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt."}, {"self_ref": "#/texts/9", "parent": {"cref": "#/body"}, "children": [], "label": "formula", "prov": [{"page_no": 2, "bbox": {"l": 280.5539855957031, "t": 479.4553527832031, "r": 330.69659423828125, "b": 467.6203308105469, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 12]}], "orig": "a 2 + 8 = 12", "text": "a 2 + 8 = 12"}, {"self_ref": "#/texts/10", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 133.76800537109375, "t": 459.64996337890625, "r": 477.47589111328125, "b": 318.1803283691406, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 887]}], "orig": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.", "text": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet."}, {"self_ref": "#/texts/11", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 133.76800537109375, "t": 316.1879577636719, "r": 477.4748229980469, "b": 246.44935607910156, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 415]}], "orig": "Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.", "text": "Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat."}, {"self_ref": "#/texts/12", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 133.76800537109375, "t": 244.4569549560547, "r": 477.4748229980469, "b": 174.71835327148438, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 415]}], "orig": "Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.", "text": "Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat."}, {"self_ref": "#/texts/13", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 2, "bbox": {"l": 303.13299560546875, "t": 146.7259521484375, "r": 308.1142883300781, "b": 136.7633514404297, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}], "pictures": [], "tables": [], "key_value_items": [], "pages": {"1": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 1}, "2": {"size": {"width": 595.2760009765625, "height": 841.8900146484375}, "image": null, "page_no": 2}}} \ No newline at end of file +{"schema_name": "DoclingDocument", "version": "1.0.0", "name": "code_and_formula", "origin": {"mimetype": "application/pdf", "binary_hash": 2394749058180317456, "filename": "code_and_formula.pdf", "uri": null}, "furniture": {"self_ref": "#/furniture", "parent": null, "children": [], "name": "_root_", "label": "unspecified"}, "body": {"self_ref": "#/body", "parent": null, "children": [{"cref": "#/texts/0"}, {"cref": "#/texts/1"}, {"cref": "#/texts/2"}, {"cref": "#/texts/3"}, {"cref": "#/texts/4"}, {"cref": "#/texts/5"}, {"cref": "#/texts/6"}, {"cref": "#/texts/7"}, {"cref": "#/texts/8"}, {"cref": "#/texts/9"}, {"cref": "#/texts/10"}, {"cref": "#/texts/11"}, {"cref": "#/texts/12"}, {"cref": "#/texts/13"}], "name": "_root_", "label": "unspecified"}, "groups": [], "texts": [{"self_ref": "#/texts/0", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 1, "bbox": {"l": 133.76800537109375, "t": 667.99462890625, "r": 273.4540100097656, "b": 653.6340942382812, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 17]}], "orig": "Java Code Example", "text": "Java Code Example", "level": 1}, {"self_ref": "#/texts/1", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 133.76800537109375, "t": 642.8859252929688, "r": 477.48065185546875, "b": 501.4163513183594, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 887]}], "orig": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.", "text": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet."}, {"self_ref": "#/texts/2", "parent": {"cref": "#/body"}, "children": [], "label": "paragraph", "prov": [{"page_no": 1, "bbox": {"l": 236.17599487304688, "t": 490.45794677734375, "r": 375.069580078125, "b": 480.4953308105469, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 30]}], "orig": "Listing 1: Simple Java Program", "text": "Listing 1: Simple Java Program"}, {"self_ref": "#/texts/3", "parent": {"cref": "#/body"}, "children": [], "label": "code", "prov": [{"page_no": 1, "bbox": {"l": 134.23899841308594, "t": 474.2005310058594, "r": 337.5928649902344, "b": 443.9358215332031, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 65]}], "orig": "public static void print() { System.out.println( \"Java Code\" ); }", "text": "public static void print() { System.out.println( \"Java Code\" ); }", "code_language": "unknown"}, {"self_ref": "#/texts/4", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 133.76800537109375, "t": 432.27593994140625, "r": 477.47589111328125, "b": 290.80633544921875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 887]}], "orig": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.", "text": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet."}, {"self_ref": "#/texts/5", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 1, "bbox": {"l": 303.13299560546875, "t": 96.83694458007812, "r": 308.1142883300781, "b": 86.87435150146484, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/6", "parent": {"cref": "#/body"}, "children": [], "label": "section_header", "prov": [{"page_no": 2, "bbox": {"l": 133.76800537109375, "t": 717.8846435546875, "r": 191.51429748535156, "b": 703.5241088867188, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 7]}], "orig": "Formula", "text": "Formula", "level": 1}, {"self_ref": "#/texts/7", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 133.76800537109375, "t": 692.7759399414062, "r": 477.48065185546875, "b": 551.3063354492188, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 887]}], "orig": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.", "text": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet."}, {"self_ref": "#/texts/8", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 133.76800537109375, "t": 549.3139038085938, "r": 477.4748229980469, "b": 491.53033447265625, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 369]}], "orig": "Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt.", "text": "Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt."}, {"self_ref": "#/texts/9", "parent": {"cref": "#/body"}, "children": [], "label": "formula", "prov": [{"page_no": 2, "bbox": {"l": 280.5539855957031, "t": 479.4553527832031, "r": 330.69659423828125, "b": 467.6203308105469, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 12]}], "orig": "a 2 + 8 = 12", "text": ""}, {"self_ref": "#/texts/10", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 133.76800537109375, "t": 459.64996337890625, "r": 477.47589111328125, "b": 318.1803283691406, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 887]}], "orig": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.", "text": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet."}, {"self_ref": "#/texts/11", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 133.76800537109375, "t": 316.1879577636719, "r": 477.4748229980469, "b": 246.44935607910156, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 415]}], "orig": "Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.", "text": "Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat."}, {"self_ref": "#/texts/12", "parent": {"cref": "#/body"}, "children": [], "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 133.76800537109375, "t": 244.4569549560547, "r": 477.4748229980469, "b": 174.71835327148438, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 415]}], "orig": "Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.", "text": "Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat."}, {"self_ref": "#/texts/13", "parent": {"cref": "#/body"}, "children": [], "label": "page_footer", "prov": [{"page_no": 2, "bbox": {"l": 303.13299560546875, "t": 146.7259521484375, "r": 308.1142883300781, "b": 136.7633514404297, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}], "pictures": [], "tables": [], "key_value_items": [], "pages": {"1": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 1}, "2": {"size": {"width": 595.2760009765625, "height": 841.8900146484375}, "image": null, "page_no": 2}}} \ No newline at end of file diff --git a/tests/data/groundtruth/docling_v2/code_and_formula.md b/tests/data/groundtruth/docling_v2/code_and_formula.md index 5a2ad9c83..d3106f9ed 100644 --- a/tests/data/groundtruth/docling_v2/code_and_formula.md +++ b/tests/data/groundtruth/docling_v2/code_and_formula.md @@ -16,7 +16,7 @@ Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt. -$$a 2 + 8 = 12$$ + Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. diff --git a/tests/data/groundtruth/docling_v2/elife-56337.xml.md b/tests/data/groundtruth/docling_v2/elife-56337.xml.md index 7ff34bbf3..9aeffc69e 100644 --- a/tests/data/groundtruth/docling_v2/elife-56337.xml.md +++ b/tests/data/groundtruth/docling_v2/elife-56337.xml.md @@ -18,7 +18,7 @@ TEs, especially long terminal repeat (LTR) retrotransposons, also known as endog We analyzed the RNA expression profiles of mouse KRAB-ZFPs across a wide range of tissues to identify candidates active in early embryos/ES cells. While the majority of KRAB-ZFPs are expressed at low levels and uniformly across tissues, a group of KRAB-ZFPs are highly and almost exclusively expressed in ES cells (Figure 1—figure supplement 1A). About two thirds of these KRAB-ZFPs are physically linked in two clusters on chromosome 2 (Chr2-cl) and 4 (Chr4-cl) (Figure 1—figure supplement 1B). These two clusters encode 40 and 21 KRAB-ZFP annotated genes, respectively, which, with one exception on Chr4-cl, do not have orthologues in rat or any other sequenced mammals (Supplementary file 1). The KRAB-ZFPs within these two genomic clusters also group together phylogenetically (Figure 1—figure supplement 1C), indicating these gene clusters arose by a series of recent segmental gene duplications (Kauzlaric et al., 2017). -To determine the binding sites of the KRAB-ZFPs within these and other gene clusters, we expressed epitope-tagged KRAB-ZFPs using stably integrating vectors in mouse embryonic carcinoma (EC) or ES cells (Table 1, Supplementary file 1) and performed chromatin immunoprecipitation followed by deep sequencing (ChIP-seq). We then determined whether the identified binding sites are significantly enriched over annotated TEs and used the non-repetitive peak fraction to identify binding motifs. We discarded 7 of 68 ChIP-seq datasets because we could not obtain a binding motif or a target TE and manual inspection confirmed low signal to noise ratio. Of the remaining 61 KRAB-ZFPs, 51 significantly overlapped at least one TE subfamily (adjusted p-value<1e-5). Altogether, 81 LTR retrotransposon, 18 LINE, 10 SINE and one DNA transposon subfamilies were targeted by at least one of the 51 KRAB-ZFPs (Figure 1A and Supplementary file 1). Chr2-cl KRAB-ZFPs preferably bound IAPEz retrotransposons and L1-type LINEs, while Chr4-cl KRAB-ZFPs targeted various retrotransposons, including the closely related MMETn (hereafter referred to as ETn) and ETnERV (also known as MusD) elements (Figure 1A). ETn elements are non-autonomous LTR retrotransposons that require trans-complementation by the fully coding ETnERV elements that contain Gag, Pro and Pol genes (Ribet et al., 2004). These elements have accumulated to ~240 and~100 copies in the reference C57BL/6 genome, respectively, with ~550 solitary LTRs (Baust et al., 2003). Both ETn and ETnERVs are still active, generating polymorphisms and mutations in several mouse strains (Gagnier et al., 2019). The validity of our ChIP-seq screen was confirmed by the identification of binding motifs - which often resembled the computationally predicted motifs (Figure 1—figure supplement 2A) - for the majority of screened KRAB-ZFPs (Supplementary file 1). Moreover, predicted and experimentally determined motifs were found in targeted TEs in most cases (Supplementary file 1), and reporter repression assays confirmed KRAB-ZFP induced silencing for all the tested sequences (Figure 1—figure supplement 2B). Finally, we observed KAP1 and H3K9me3 enrichment at most of the targeted TEs in wild type ES cells, indicating that most of these KRAB-ZFPs are functionally active in the early embryo (Figure 1A). +To determine the binding sites of the KRAB-ZFPs within these and other gene clusters, we expressed epitope-tagged KRAB-ZFPs using stably integrating vectors in mouse embryonic carcinoma (EC) or ES cells (Table 1, Supplementary file 1) and performed chromatin immunoprecipitation followed by deep sequencing (ChIP-seq). We then determined whether the identified binding sites are significantly enriched over annotated TEs and used the non-repetitive peak fraction to identify binding motifs. We discarded 7 of 68 ChIP-seq datasets because we could not obtain a binding motif or a target TE and manual inspection confirmed low signal to noise ratio. Of the remaining 61 KRAB-ZFPs, 51 significantly overlapped at least one TE subfamily (adjusted p-value<1e-5). Altogether, 81 LTR retrotransposon, 18 LINE, 10 SINE and one DNA transposon subfamilies were targeted by at least one of the 51 KRAB-ZFPs (Figure 1A and Supplementary file 1). Chr2-cl KRAB-ZFPs preferably bound IAPEz retrotransposons and L1-type LINEs, while Chr4-cl KRAB-ZFPs targeted various retrotransposons, including the closely related MMETn (hereafter referred to as ETn) and ETnERV (also known as MusD) elements (Figure 1A). ETn elements are non-autonomous LTR retrotransposons that require trans-complementation by the fully coding ETnERV elements that contain Gag, Pro and Pol genes (Ribet et al., 2004). These elements have accumulated to ~240 and~100 copies in the reference C57BL/6 genome, respectively, with ~550 solitary LTRs (Baust et al., 2003). Both ETn and ETnERVs are still active, generating polymorphisms and mutations in several mouse strains (Gagnier et al., 2019). The validity of our ChIP-seq screen was confirmed by the identification of binding motifs - which often resembled the computationally predicted motifs (Figure 1—figure supplement 2A) - for the majority of screened KRAB-ZFPs (Supplementary file 1). Moreover, predicted and experimentally determined motifs were found in targeted TEs in most cases (Supplementary file 1), and reporter repression assays confirmed KRAB-ZFP induced silencing for all the tested sequences (Figure 1—figure supplement 2B). Finally, we observed KAP1 and H3K9me3 enrichment at most of the targeted TEs in wild type ES cells, indicating that most of these KRAB-ZFPs are functionally active in the early embryo (Figure 1A). We generally observed that KRAB-ZFPs present exclusively in mouse target TEs that are restricted to the mouse genome, indicating KRAB-ZFPs and their targets emerged together. For example, several mouse-specific KRAB-ZFPs in Chr2-cl and Chr4-cl target IAP and ETn elements which are only found in the mouse genome and are highly active. This is the strongest data to date supporting that recent KRAB-ZFP expansions in these young clusters is a response to recent TE activity. Likewise, ZFP599 and ZFP617, both conserved in Muroidea, bind to various ORR1-type LTRs which are present in the rat genome (Supplementary file 1). However, ZFP961, a KRAB-ZFP encoded on a small gene cluster on chromosome 8 that is conserved in Muroidea targets TEs that are only found in the mouse genome (e.g. ETn), a paradox we have previously observed with ZFP809, which also targets TEs that are evolutionarily younger than itself (Wolf et al., 2015b). The ZFP961 binding site is located at the 5’ end of the internal region of ETn and ETnERV elements, a sequence that usually contains the primer binding site (PBS), which is required to prime retroviral reverse transcription. Indeed, the ZFP961 motif closely resembles the PBSLys1,2 (Figure 1—figure supplement 3A), which had been previously identified as a KAP1-dependent target of retroviral repression (Yamauchi et al., 1995; Wolf et al., 2008). Repression of the PBSLys1,2 by ZFP961 was also confirmed in reporter assays (Figure 1—figure supplement 2B), indicating that ZFP961 is likely responsible for this silencing effect. @@ -38,7 +38,7 @@ While we generally observed that TE-associated gene reactivation is not caused b ### ETn retrotransposition in Chr4-cl KO and WT mice -IAP, ETn/ETnERV and MuLV/RLTR4 retrotransposons are highly polymorphic in inbred mouse strains (Nellåker et al., 2012), indicating that these elements are able to mobilize in the germ line. Since these retrotransposons are upregulated in Chr2-cl and Chr4-cl KO ES cells, we speculated that these KRAB-ZFP clusters evolved to minimize the risks of insertional mutagenesis by retrotransposition. To test this, we generated Chr2-cl and Chr4-cl KO mice via ES cell injection into blastocysts, and after germ line transmission we genotyped the offspring of heterozygous breeding pairs. While the offspring of Chr4-cl KO/WT parents were born close to Mendelian ratios in pure C57BL/6 and mixed C57BL/6 129Sv matings, one Chr4-cl KO/WT breeding pair gave birth to significantly fewer KO mice than expected (p-value=0.022) (Figure 4—figure supplement 1A). Likewise, two out of four Chr2-cl KO breeding pairs on mixed C57BL/6 129Sv matings failed to give birth to a single KO offspring (p-value<0.01) while the two other mating pairs produced KO offspring at near Mendelian ratios (Figure 4—figure supplement 1A). Altogether, these data indicate that KRAB-ZFP clusters are not absolutely essential in mice, but that genetic and/or epigenetic factors may contribute to reduced viability. +IAP, ETn/ETnERV and MuLV/RLTR4 retrotransposons are highly polymorphic in inbred mouse strains (Nellåker et al., 2012), indicating that these elements are able to mobilize in the germ line. Since these retrotransposons are upregulated in Chr2-cl and Chr4-cl KO ES cells, we speculated that these KRAB-ZFP clusters evolved to minimize the risks of insertional mutagenesis by retrotransposition. To test this, we generated Chr2-cl and Chr4-cl KO mice via ES cell injection into blastocysts, and after germ line transmission we genotyped the offspring of heterozygous breeding pairs. While the offspring of Chr4-cl KO/WT parents were born close to Mendelian ratios in pure C57BL/6 and mixed C57BL/6 129Sv matings, one Chr4-cl KO/WT breeding pair gave birth to significantly fewer KO mice than expected (p-value=0.022) (Figure 4—figure supplement 1A). Likewise, two out of four Chr2-cl KO breeding pairs on mixed C57BL/6 129Sv matings failed to give birth to a single KO offspring (p-value<0.01) while the two other mating pairs produced KO offspring at near Mendelian ratios (Figure 4—figure supplement 1A). Altogether, these data indicate that KRAB-ZFP clusters are not absolutely essential in mice, but that genetic and/or epigenetic factors may contribute to reduced viability. We reasoned that retrotransposon activation could account for the reduced viability of Chr2-cl and Chr4-cl KO mice in some matings. However, since only rare matings produced non-viable KO embryos, we instead turned to the viable KO mice to assay for increased transposon activity. RNA-seq in blood, brain and testis revealed that, with a few exceptions, retrotransposons upregulated in Chr2 and Chr4 KRAB-ZFP cluster KO ES cells are not expressed at higher levels in adult tissues (Figure 4—figure supplement 1B). Likewise, no strong transcriptional TE reactivation phenotype was observed in liver and kidney of Chr4-cl KO mice (data not shown) and ChIP-seq with antibodies against H3K4me1, H3K4me3 and H3K27ac in testis of Chr4-cl WT and KO mice revealed no increase of active histone marks at ETn elements or other TEs (data not shown). This indicates that Chr2-cl and Chr4-cl KRAB-ZFPs are primarily required for TE repression during early development. This is consistent with the high expression of these KRAB-ZFPs uniquely in ES cells (Figure 1—figure supplement 1A). To determine whether retrotransposition occurs at a higher frequency in Chr4-cl KO mice during development, we screened for novel ETn (ETn/ETnERV) and MuLV (MuLV/RLTR4\_MM) insertions in viable Chr4-cl KO mice. For this purpose, we developed a capture-sequencing approach to enrich for ETn/MuLV DNA and flanking sequences from genomic DNA using probes that hybridize with the 5’ and 3’ ends of ETn and MuLV LTRs prior to deep sequencing. We screened genomic DNA samples from a total of 76 mice, including 54 mice from ancestry-controlled Chr4-cl KO matings in various strain backgrounds, the two ES cell lines the Chr4-cl KO mice were generated from, and eight mice from a Chr2-cl KO mating which served as a control (since ETn and MuLVs are not activated in Chr2-cl KO ES cells) (Supplementary file 4). Using this approach, we were able to enrich reads mapping to ETn/MuLV LTRs about 2,000-fold compared to genome sequencing without capture. ETn/MuLV insertions were determined by counting uniquely mapped reads that were paired with reads mapping to ETn/MuLV elements (see materials and methods for details). To assess the efficiency of the capture approach, we determined what proportion of a set of 309 largely intact (two LTRs flanking an internal sequence) reference ETn elements could be identified using our sequencing data. 95% of these insertions were called with high confidence in the majority of our samples (data not shown), indicating that we are able to identify ETn insertions at a high recovery rate. @@ -74,7 +74,7 @@ All gRNAs were expressed from the pX330-U6-Chimeric\_BB-CBh-hSpCas9 vector (RRID For ChIP-seq analysis of KRAB-ZFP expressing cells, 5–10 × 107 cells were crosslinked and immunoprecipitated with anti-FLAG (Sigma-Aldrich Cat# F1804, RRID:AB\_262044) or anti-HA (Abcam Cat# ab9110, RRID:AB\_307019 or Covance Cat# MMS-101P-200, RRID:AB\_10064068) antibody using one of two previously described protocols (O'Geen et al., 2010; Imbeault et al., 2017) as indicated in Supplementary file 1. H3K9me3 distribution in Chr4-cl, Chr10-cl, Chr13.1-cl and Chr13.2-cl KO ES cells was determined by native ChIP-seq with anti-H3K9me3 serum (Active Motif Cat# 39161, RRID:AB\_2532132) as described previously (Karimi et al., 2011). In Chr2-cl KO ES cells, H3K9me3 and KAP1 ChIP-seq was performed as previously described (Ecco et al., 2016). In Chr4-cl KO and WT ES cells KAP1 binding was determined by endogenous tagging of KAP1 with C-terminal GFP (Supplementary file 3), followed by FACS to enrich for GFP-positive cells and ChIP with anti-GFP (Thermo Fisher Scientific Cat# A-11122, RRID:AB\_221569) using a previously described protocol (O'Geen et al., 2010). For ChIP-seq analysis of active histone marks, cross-linked chromatin from ES cells or testis (from two-week old mice) was immunoprecipitated with antibodies against H3K4me3 (Abcam Cat# ab8580, RRID:AB\_306649), H3K4me1 (Abcam Cat# ab8895, RRID:AB\_306847) and H3K27ac (Abcam Cat# ab4729, RRID:AB\_2118291) following the protocol developed by O'Geen et al., 2010 or Khil et al., 2012 respectively. -ChIP-seq libraries were constructed and sequenced as indicated in Supplementary file 4. Reads were mapped to the mm9 genome using Bowtie (RRID:SCR\_005476; settings: --best) or Bowtie2 (Langmead and Salzberg, 2012) as indicated in Supplementary file 4. Under these settings, reads that map to multiple genomic regions are assigned to the top-scored match and, if a set of equally good choices is encountered, a pseudo-random number is used to choose one location. Peaks were called using MACS14 (RRID:SCR\_013291) under high stringency settings (p<1e-10, peak enrichment >20) (Zhang et al., 2008). Peaks were called both over the Input control and a FLAG or HA control ChIP (unless otherwise stated in Supplementary file 4) and only peaks that were called in both settings were kept for further analysis. In cases when the stringency settings did not result in at least 50 peaks, the settings were changed to medium (p<1e-10, peak enrichment >10) or low (p<1e-5, peak enrichment >10) stringency (Supplementary file 4). For further analysis, all peaks were scaled to 200 bp regions centered around the peak summits. The overlap of the scaled peaks to each repeat element in UCSC Genome Browser (RRID:SCR\_005780) were calculated by using the bedfisher function (settings: -f 0.25) from BEDTools (RRID:SCR\_006646). The right-tailed p-values between pair-wise comparison of each ChIP-seq peak and repeat element were extracted, and then adjusted using the Benjamini-Hochberg approach implemented in the R function p.adjust(). Binding motifs were determined using only nonrepetitive (<10% repeat content) peaks with MEME (Bailey et al., 2009). MEME motifs were compared with in silico predicted motifs (Najafabadi et al., 2015) using Tomtom (Bailey et al., 2009) and considered as significantly overlapping with a False Discovery Rate (FDR) below 0.1. To find MEME and predicted motifs in repetitive peaks, we used FIMO (Bailey et al., 2009). Differential H3K9me3 and KAP1 distribution in WT and Chr2-cl or Chr4-cl KO ES cells at TEs was determined by counting ChIP-seq reads overlapping annotated insertions of each TE group using BEDTools (MultiCovBed). Additionally, ChIP-seq reads were counted at the TE fraction that was bound by Chr2-cl or Chr4-cl KRAB-ZFPs (overlapping with 200 bp peaks). Count tables were concatenated and analyzed using DESeq2 (Love et al., 2014). The previously published ChIP-seq datasets for KAP1 (Castro-Diaz et al., 2014) and H3K9me3 (Dan et al., 2014) were re-mapped using Bowtie (--best). +ChIP-seq libraries were constructed and sequenced as indicated in Supplementary file 4. Reads were mapped to the mm9 genome using Bowtie (RRID:SCR\_005476; settings: --best) or Bowtie2 (Langmead and Salzberg, 2012) as indicated in Supplementary file 4. Under these settings, reads that map to multiple genomic regions are assigned to the top-scored match and, if a set of equally good choices is encountered, a pseudo-random number is used to choose one location. Peaks were called using MACS14 (RRID:SCR\_013291) under high stringency settings (p<1e-10, peak enrichment >20) (Zhang et al., 2008). Peaks were called both over the Input control and a FLAG or HA control ChIP (unless otherwise stated in Supplementary file 4) and only peaks that were called in both settings were kept for further analysis. In cases when the stringency settings did not result in at least 50 peaks, the settings were changed to medium (p<1e-10, peak enrichment >10) or low (p<1e-5, peak enrichment >10) stringency (Supplementary file 4). For further analysis, all peaks were scaled to 200 bp regions centered around the peak summits. The overlap of the scaled peaks to each repeat element in UCSC Genome Browser (RRID:SCR\_005780) were calculated by using the bedfisher function (settings: -f 0.25) from BEDTools (RRID:SCR\_006646). The right-tailed p-values between pair-wise comparison of each ChIP-seq peak and repeat element were extracted, and then adjusted using the Benjamini-Hochberg approach implemented in the R function p.adjust(). Binding motifs were determined using only nonrepetitive (<10% repeat content) peaks with MEME (Bailey et al., 2009). MEME motifs were compared with in silico predicted motifs (Najafabadi et al., 2015) using Tomtom (Bailey et al., 2009) and considered as significantly overlapping with a False Discovery Rate (FDR) below 0.1. To find MEME and predicted motifs in repetitive peaks, we used FIMO (Bailey et al., 2009). Differential H3K9me3 and KAP1 distribution in WT and Chr2-cl or Chr4-cl KO ES cells at TEs was determined by counting ChIP-seq reads overlapping annotated insertions of each TE group using BEDTools (MultiCovBed). Additionally, ChIP-seq reads were counted at the TE fraction that was bound by Chr2-cl or Chr4-cl KRAB-ZFPs (overlapping with 200 bp peaks). Count tables were concatenated and analyzed using DESeq2 (Love et al., 2014). The previously published ChIP-seq datasets for KAP1 (Castro-Diaz et al., 2014) and H3K9me3 (Dan et al., 2014) were re-mapped using Bowtie (--best). ### Luciferase reporter assays @@ -149,7 +149,7 @@ Key resources table: ## Figures Figure 1.: Genome-wide binding patterns of mouse KRAB-ZFPs. -(A) Probability heatmap of KRAB-ZFP binding to TEs. Blue color intensity (main field) corresponds to -log10 (adjusted p-value) enrichment of ChIP-seq peak overlap with TE groups (Fisher’s exact test). The green/red color intensity (top panel) represents mean KAP1 (GEO accession: GSM1406445) and H3K9me3 (GEO accession: GSM1327148) enrichment (respectively) at peaks overlapping significantly targeted TEs (adjusted p-value<1e-5) in WT ES cells. (B) Summarized ChIP-seq signal for indicated KRAB-ZFPs and previously published KAP1 and H3K9me3 in WT ES cells across 127 intact ETn elements. (C) Heatmaps of KRAB-ZFP ChIP-seq signal at ChIP-seq peaks. For better comparison, peaks for all three KRAB-ZFPs were called with the same parameters (p<1e-10, peak enrichment >20). The top panel shows a schematic of the arrangement of the contact amino acid composition of each zinc finger. Zinc fingers are grouped and colored according to similarity, with amino acid differences relative to the five consensus fingers highlighted in white. +(A) Probability heatmap of KRAB-ZFP binding to TEs. Blue color intensity (main field) corresponds to -log10 (adjusted p-value) enrichment of ChIP-seq peak overlap with TE groups (Fisher’s exact test). The green/red color intensity (top panel) represents mean KAP1 (GEO accession: GSM1406445) and H3K9me3 (GEO accession: GSM1327148) enrichment (respectively) at peaks overlapping significantly targeted TEs (adjusted p-value<1e-5) in WT ES cells. (B) Summarized ChIP-seq signal for indicated KRAB-ZFPs and previously published KAP1 and H3K9me3 in WT ES cells across 127 intact ETn elements. (C) Heatmaps of KRAB-ZFP ChIP-seq signal at ChIP-seq peaks. For better comparison, peaks for all three KRAB-ZFPs were called with the same parameters (p<1e-10, peak enrichment >20). The top panel shows a schematic of the arrangement of the contact amino acid composition of each zinc finger. Zinc fingers are grouped and colored according to similarity, with amino acid differences relative to the five consensus fingers highlighted in white. Figure 1—source data 1.KRAB-ZFP expression in 40 mouse tissues and cell lines (ENCODE).Mean values of replicates are shown as log2 transcripts per million. Figure 1—source data 2.Probability heatmap of KRAB-ZFP binding to TEs.Values corresponds to -log10 (adjusted p-value) enrichment of ChIP-seq peak overlap with TE groups (Fisher’s exact test). @@ -161,7 +161,7 @@ Figure 1—figure supplement 1.: ES cell-specific expression of KRAB-ZFP gene cl Figure 1—figure supplement 2.: KRAB-ZFP binding motifs and their repression activity. -(A) Comparison of computationally predicted (bottom) and experimentally determined (top) KRAB-ZFP binding motifs. Only significant pairs are shown (FDR < 0.1). (B) Luciferase reporter assays to confirm KRAB-ZFP repression of the identified target sites. Bars show the luciferase activity (normalized to Renilla luciferase) of reporter plasmids containing the indicated target sites cloned upstream of the SV40 promoter. Reporter plasmids were co-transfected into 293 T cells with a Renilla luciferase plasmid for normalization and plasmids expressing the targeting KRAB-ZFP. Normalized mean luciferase activity (from three replicates) is shown relative to luciferase activity of the reporter plasmid co-transfected with an empty pcDNA3.1 vector. +(A) Comparison of computationally predicted (bottom) and experimentally determined (top) KRAB-ZFP binding motifs. Only significant pairs are shown (FDR < 0.1). (B) Luciferase reporter assays to confirm KRAB-ZFP repression of the identified target sites. Bars show the luciferase activity (normalized to Renilla luciferase) of reporter plasmids containing the indicated target sites cloned upstream of the SV40 promoter. Reporter plasmids were co-transfected into 293 T cells with a Renilla luciferase plasmid for normalization and plasmids expressing the targeting KRAB-ZFP. Normalized mean luciferase activity (from three replicates) is shown relative to luciferase activity of the reporter plasmid co-transfected with an empty pcDNA3.1 vector. @@ -171,7 +171,7 @@ Figure 1—figure supplement 3.: KRAB-ZFP binding to ETn retrotransposons. Figure 2.: Retrotransposon reactivation in KRAB-ZFP cluster KO ES cells. -(A) RNA-seq analysis of TE expression in five KRAB-ZFP cluster KO ES cells. Green and grey squares on top of the panel represent KRAB-ZFPs with or without ChIP-seq data, respectively, within each deleted gene cluster. Reactivated TEs that are bound by one or several KRAB-ZFPs are indicated by green squares in the panel. Significantly up- and downregulated elements (adjusted p-value<0.05) are highlighted in red and green, respectively. (B) Differential KAP1 binding and H3K9me3 enrichment at TE groups (summarized across all insertions) in Chr2-cl and Chr4-cl KO ES cells. TE groups targeted by one or several KRAB-ZFPs encoded within the deleted clusters are highlighted in blue (differential enrichment over the entire TE sequences) and red (differential enrichment at TE regions that overlap with KRAB-ZFP ChIP-seq peaks). (C) DNA methylation status of CpG sites at indicated TE groups in WT and Chr4-cl KO ES cells grown in serum containing media or in hypomethylation-inducing media (2i + Vitamin C). P-values were calculated using paired t-test. +(A) RNA-seq analysis of TE expression in five KRAB-ZFP cluster KO ES cells. Green and grey squares on top of the panel represent KRAB-ZFPs with or without ChIP-seq data, respectively, within each deleted gene cluster. Reactivated TEs that are bound by one or several KRAB-ZFPs are indicated by green squares in the panel. Significantly up- and downregulated elements (adjusted p-value<0.05) are highlighted in red and green, respectively. (B) Differential KAP1 binding and H3K9me3 enrichment at TE groups (summarized across all insertions) in Chr2-cl and Chr4-cl KO ES cells. TE groups targeted by one or several KRAB-ZFPs encoded within the deleted clusters are highlighted in blue (differential enrichment over the entire TE sequences) and red (differential enrichment at TE regions that overlap with KRAB-ZFP ChIP-seq peaks). (C) DNA methylation status of CpG sites at indicated TE groups in WT and Chr4-cl KO ES cells grown in serum containing media or in hypomethylation-inducing media (2i + Vitamin C). P-values were calculated using paired t-test. Figure 2—source data 1.Differential H3K9me3 and KAP1 distribution in WT and KRAB-ZFP cluster KO ES cells at TE families and KRAB-ZFP bound TE insertions.Differential read counts and statistical testing were determined by DESeq2. @@ -182,7 +182,7 @@ Figure 2—figure supplement 1.: Epigenetic changes at TEs and TE-borne enhancer Figure 3.: TE-dependent gene activation in KRAB-ZFP cluster KO ES cells. -(A) Differential gene expression in Chr2-cl and Chr4-cl KO ES cells. Significantly up- and downregulated genes (adjusted p-value<0.05) are highlighted in red and green, respectively, KRAB-ZFP genes within the deleted clusters are shown in blue. (B) Correlation of TEs and gene deregulation. Plots show enrichment of TE groups within 100 kb of up- and downregulated genes relative to all genes. Significantly overrepresented LTR and LINE groups (adjusted p-value<0.1) are highlighted in blue and red, respectively. (C) Schematic view of the downstream region of Chst1 where a 5’ truncated ETn insertion is located. ChIP-seq (Input subtracted from ChIP) data for overexpressed epitope-tagged Gm13051 (a Chr4-cl KRAB-ZFP) in F9 EC cells, and re-mapped KAP1 (GEO accession: GSM1406445) and H3K9me3 (GEO accession: GSM1327148) in WT ES cells are shown together with RNA-seq data from Chr4-cl WT and KO ES cells (mapped using Bowtie (-a -m 1 --strata -v 2) to exclude reads that cannot be uniquely mapped). (D) RT-qPCR analysis of Chst1 mRNA expression in Chr4-cl WT and KO ES cells with or without the CRISPR/Cas9 deleted ETn insertion near Chst1. Values represent mean expression (normalized to Gapdh) from three biological replicates per sample (each performed in three technical replicates) in arbitrary units. Error bars represent standard deviation and asterisks indicate significance (p<0.01, Student’s t-test). n.s.: not significant. (E) Mean coverage of ChIP-seq data (Input subtracted from ChIP) in Chr4-cl WT and KO ES cells over 127 full-length ETn insertions. The binding sites of the Chr4-cl KRAB-ZFPs Rex2 and Gm13051 are indicated by dashed lines. +(A) Differential gene expression in Chr2-cl and Chr4-cl KO ES cells. Significantly up- and downregulated genes (adjusted p-value<0.05) are highlighted in red and green, respectively, KRAB-ZFP genes within the deleted clusters are shown in blue. (B) Correlation of TEs and gene deregulation. Plots show enrichment of TE groups within 100 kb of up- and downregulated genes relative to all genes. Significantly overrepresented LTR and LINE groups (adjusted p-value<0.1) are highlighted in blue and red, respectively. (C) Schematic view of the downstream region of Chst1 where a 5’ truncated ETn insertion is located. ChIP-seq (Input subtracted from ChIP) data for overexpressed epitope-tagged Gm13051 (a Chr4-cl KRAB-ZFP) in F9 EC cells, and re-mapped KAP1 (GEO accession: GSM1406445) and H3K9me3 (GEO accession: GSM1327148) in WT ES cells are shown together with RNA-seq data from Chr4-cl WT and KO ES cells (mapped using Bowtie (-a -m 1 --strata -v 2) to exclude reads that cannot be uniquely mapped). (D) RT-qPCR analysis of Chst1 mRNA expression in Chr4-cl WT and KO ES cells with or without the CRISPR/Cas9 deleted ETn insertion near Chst1. Values represent mean expression (normalized to Gapdh) from three biological replicates per sample (each performed in three technical replicates) in arbitrary units. Error bars represent standard deviation and asterisks indicate significance (p<0.01, Student’s t-test). n.s.: not significant. (E) Mean coverage of ChIP-seq data (Input subtracted from ChIP) in Chr4-cl WT and KO ES cells over 127 full-length ETn insertions. The binding sites of the Chr4-cl KRAB-ZFPs Rex2 and Gm13051 are indicated by dashed lines. @@ -194,7 +194,7 @@ Figure 4—source data 2.Sequences of capture-seq probes used to enrich genomic Figure 4—figure supplement 1.: Birth statistics of KRAB-ZFP cluster KO mice and TE reactivation in adult tissues. -(A) Birth statistics of Chr4- and Chr2-cl mice derived from KO/WT x KO/WT matings in different strain backgrounds. (B) RNA-seq analysis of TE expression in Chr2- (left) and Chr4-cl (right) KO tissues. TE groups with the highest reactivation phenotype in ES cells are shown separately. Significantly up- and downregulated elements (adjusted p-value<0.05) are highlighted in red and green, respectively. Experiments were performed in at least two biological replicates. +(A) Birth statistics of Chr4- and Chr2-cl mice derived from KO/WT x KO/WT matings in different strain backgrounds. (B) RNA-seq analysis of TE expression in Chr2- (left) and Chr4-cl (right) KO tissues. TE groups with the highest reactivation phenotype in ES cells are shown separately. Significantly up- and downregulated elements (adjusted p-value<0.05) are highlighted in red and green, respectively. Experiments were performed in at least two biological replicates. @@ -214,7 +214,7 @@ Figure 4—figure supplement 3.: Confirmation of novel ETn insertions identified - C Baust; L Gagnier; GJ Baillie; MJ Harris; DM Juriloff; DL Mager. Structure and expression of mobile ETnII retroelements and their coding-competent MusD relatives in the mouse. Journal of Virology (2003) - K Blaschke; KT Ebata; MM Karimi; JA Zepeda-Martínez; P Goyal; S Mahapatra; A Tam; DJ Laird; M Hirst; A Rao; MC Lorincz; M Ramalho-Santos. Vitamin C induces Tet-dependent DNA demethylation and a blastocyst-like state in ES cells. Nature (2013) - A Brodziak; E Ziółko; M Muc-Wierzgoń; E Nowakowska-Zajdel; T Kokot; K Klakla. The role of human endogenous retroviruses in the pathogenesis of autoimmune diseases. Medical Science Monitor : International Medical Journal of Experimental and Clinical Research (2012) -- N Castro-Diaz; G Ecco; A Coluccio; A Kapopoulou; B Yazdanpanah; M Friedli; J Duc; SM Jang; P Turelli; D Trono. Evolutionally dynamic L1 regulation in embryonic stem cells. Genes & Development (2014) +- N Castro-Diaz; G Ecco; A Coluccio; A Kapopoulou; B Yazdanpanah; M Friedli; J Duc; SM Jang; P Turelli; D Trono. Evolutionally dynamic L1 regulation in embryonic stem cells. Genes & Development (2014) - EB Chuong; NC Elde; C Feschotte. Regulatory evolution of innate immunity through co-option of endogenous retroviruses. Science (2016) - J Dan; Y Liu; N Liu; M Chiourea; M Okuka; T Wu; X Ye; C Mou; L Wang; L Wang; Y Yin; J Yuan; B Zuo; F Wang; Z Li; X Pan; Z Yin; L Chen; DL Keefe; S Gagos; A Xiao; L Liu. Rif1 maintains telomere length homeostasis of ESCs by mediating heterochromatin silencing. Developmental Cell (2014) - A De Iaco; E Planet; A Coluccio; S Verp; J Duc; D Trono. DUX-family transcription factors regulate zygotic genome activation in placental mammals. Nature Genetics (2017) @@ -238,7 +238,7 @@ Figure 4—figure supplement 3.: Confirmation of novel ETn insertions identified - JA Lehoczky; PE Thomas; KM Patrie; KM Owens; LM Villarreal; K Galbraith; J Washburn; CN Johnson; B Gavino; AD Borowsky; KJ Millen; P Wakenight; W Law; ML Van Keuren; G Gavrilina; ED Hughes; TL Saunders; L Brihn; JH Nadeau; JW Innis. A novel intergenic ETnII-β insertion mutation causes multiple malformations in Polypodia mice. PLOS Genetics (2013) - D Leung; T Du; U Wagner; W Xie; AY Lee; P Goyal; Y Li; KE Szulwach; P Jin; MC Lorincz; B Ren. Regulation of DNA methylation turnover at LTR retrotransposons and imprinted loci by the histone methyltransferase Setdb1. PNAS (2014) - J Lilue; AG Doran; IT Fiddes; M Abrudan; J Armstrong; R Bennett; W Chow; J Collins; S Collins; A Czechanski; P Danecek; M Diekhans; DD Dolle; M Dunn; R Durbin; D Earl; A Ferguson-Smith; P Flicek; J Flint; A Frankish; B Fu; M Gerstein; J Gilbert; L Goodstadt; J Harrow; K Howe; X Ibarra-Soria; M Kolmogorov; CJ Lelliott; DW Logan; J Loveland; CE Mathews; R Mott; P Muir; S Nachtweide; FCP Navarro; DT Odom; N Park; S Pelan; SK Pham; M Quail; L Reinholdt; L Romoth; L Shirley; C Sisu; M Sjoberg-Herrera; M Stanke; C Steward; M Thomas; G Threadgold; D Thybert; J Torrance; K Wong; J Wood; B Yalcin; F Yang; DJ Adams; B Paten; TM Keane. Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci. Nature Genetics (2018) -- S Liu; J Brind'Amour; MM Karimi; K Shirane; A Bogutz; L Lefebvre; H Sasaki; Y Shinkai; MC Lorincz. Setdb1 is required for germline development and silencing of H3K9me3-marked endogenous retroviruses in primordial germ cells. Genes & Development (2014) +- S Liu; J Brind'Amour; MM Karimi; K Shirane; A Bogutz; L Lefebvre; H Sasaki; Y Shinkai; MC Lorincz. Setdb1 is required for germline development and silencing of H3K9me3-marked endogenous retroviruses in primordial germ cells. Genes & Development (2014) - MI Love; W Huber; S Anders. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology (2014) - F Lugani; R Arora; N Papeta; A Patel; Z Zheng; R Sterken; RA Singer; G Caridi; C Mendelsohn; L Sussel; VE Papaioannou; AG Gharavi. A retrotransposon insertion in the 5' regulatory domain of Ptf1a results in ectopic gene expression and multiple congenital defects in Danforth's short tail mouse. PLOS Genetics (2013) - TS Macfarlan; WD Gifford; S Driscoll; K Lettieri; HM Rowe; D Bonanomi; A Firth; O Singer; D Trono; SL Pfaff. Embryonic stem cell potency fluctuates with endogenous retrovirus activity. Nature (2012) @@ -253,7 +253,7 @@ Figure 4—figure supplement 3.: Confirmation of novel ETn insertions identified - HM Rowe; J Jakobsson; D Mesnard; J Rougemont; S Reynard; T Aktas; PV Maillard; H Layard-Liesching; S Verp; J Marquis; F Spitz; DB Constam; D Trono. KAP1 controls endogenous retroviruses in embryonic stem cells. Nature (2010) - HM Rowe; A Kapopoulou; A Corsinotti; L Fasching; TS Macfarlan; Y Tarabay; S Viville; J Jakobsson; SL Pfaff; D Trono. TRIM28 repression of retrotransposon-based enhancers is necessary to preserve transcriptional dynamics in embryonic stem cells. Genome Research (2013) - SN Schauer; PE Carreira; R Shukla; DJ Gerhardt; P Gerdes; FJ Sanchez-Luque; P Nicoli; M Kindlova; S Ghisletti; AD Santos; D Rapoud; D Samuel; J Faivre; AD Ewing; SR Richardson; GJ Faulkner. L1 retrotransposition is a common feature of mammalian hepatocarcinogenesis. Genome Research (2018) -- DC Schultz; K Ayyanathan; D Negorev; GG Maul; FJ Rauscher. SETDB1: a novel KAP-1-associated histone H3, lysine 9-specific methyltransferase that contributes to HP1-mediated silencing of euchromatic genes by KRAB zinc-finger proteins. Genes & Development (2002) +- DC Schultz; K Ayyanathan; D Negorev; GG Maul; FJ Rauscher. SETDB1: a novel KAP-1-associated histone H3, lysine 9-specific methyltransferase that contributes to HP1-mediated silencing of euchromatic genes by KRAB zinc-finger proteins. Genes & Development (2002) - K Semba; K Araki; K Matsumoto; H Suda; T Ando; A Sei; H Mizuta; K Takagi; M Nakahara; M Muta; G Yamada; N Nakagata; A Iida; S Ikegawa; Y Nakamura; M Araki; K Abe; K Yamamura. Ectopic expression of Ptf1a induces spinal defects, urogenital defects, and anorectal malformations in Danforth's short tail mice. PLOS Genetics (2013) - SP Sripathy; J Stevens; DC Schultz. The KAP1 corepressor functions to coordinate the assembly of de novo HP1-demarcated microenvironments of heterochromatin required for KRAB zinc finger protein-mediated transcriptional repression. Molecular and Cellular Biology (2006) - JH Thomas; S Schneider. Coevolution of retroelements and tandem zinc finger genes. Genome Research (2011) @@ -263,6 +263,6 @@ Figure 4—figure supplement 3.: Confirmation of novel ETn insertions identified - J Wang; G Xie; M Singh; AT Ghanbarian; T Raskó; A Szvetnik; H Cai; D Besser; A Prigione; NV Fuchs; GG Schumann; W Chen; MC Lorincz; Z Ivics; LD Hurst; Z Izsvák. Primate-specific endogenous retrovirus-driven transcription defines naive-like stem cells. Nature (2014) - D Wolf; K Hug; SP Goff. TRIM28 mediates primer binding site-targeted silencing of Lys1,2 tRNA-utilizing retroviruses in embryonic cells. PNAS (2008) - G Wolf; D Greenberg; TS Macfarlan. Spotting the enemy within: targeted silencing of foreign DNA in mammalian genomes by the Krüppel-associated box zinc finger protein family. Mobile DNA (2015a) -- G Wolf; P Yang; AC Füchtbauer; EM Füchtbauer; AM Silva; C Park; W Wu; AL Nielsen; FS Pedersen; TS Macfarlan. The KRAB zinc finger protein ZFP809 is required to initiate epigenetic silencing of endogenous retroviruses. Genes & Development (2015b) +- G Wolf; P Yang; AC Füchtbauer; EM Füchtbauer; AM Silva; C Park; W Wu; AL Nielsen; FS Pedersen; TS Macfarlan. The KRAB zinc finger protein ZFP809 is required to initiate epigenetic silencing of endogenous retroviruses. Genes & Development (2015b) - M Yamauchi; B Freitag; C Khan; B Berwin; E Barklis. Stem cell factor binding to retrovirus primer binding site silencers. Journal of Virology (1995) - Y Zhang; T Liu; CA Meyer; J Eeckhoute; DS Johnson; BE Bernstein; C Nusbaum; RM Myers; M Brown; W Li; XS Liu. Model-based analysis of ChIP-Seq (MACS). Genome Biology (2008) \ No newline at end of file diff --git a/tests/data/groundtruth/docling_v2/example_04.html.md b/tests/data/groundtruth/docling_v2/example_04.html.md index e620a999c..f204a12a8 100644 --- a/tests/data/groundtruth/docling_v2/example_04.html.md +++ b/tests/data/groundtruth/docling_v2/example_04.html.md @@ -1,7 +1,7 @@ # Data Table with Rowspan and Colspan -| Header 1 | Header 2 & 3 (colspan) | Header 2 & 3 (colspan) | +| Header 1 | Header 2 & 3 (colspan) | Header 2 & 3 (colspan) | |----------------------------|----------------------------|----------------------------| -| Row 1 & 2, Col 1 (rowspan) | Row 1, Col 2 | Row 1, Col 3 | -| Row 1 & 2, Col 1 (rowspan) | Row 2, Col 2 & 3 (colspan) | Row 2, Col 2 & 3 (colspan) | +| Row 1 & 2, Col 1 (rowspan) | Row 1, Col 2 | Row 1, Col 3 | +| Row 1 & 2, Col 1 (rowspan) | Row 2, Col 2 & 3 (colspan) | Row 2, Col 2 & 3 (colspan) | | Row 3, Col 1 | Row 3, Col 2 | Row 3, Col 3 | \ No newline at end of file diff --git a/tests/data/groundtruth/docling_v2/example_05.html.md b/tests/data/groundtruth/docling_v2/example_05.html.md index 787f6d232..fbb24b629 100644 --- a/tests/data/groundtruth/docling_v2/example_05.html.md +++ b/tests/data/groundtruth/docling_v2/example_05.html.md @@ -1,7 +1,7 @@ # Omitted html and body tags -| Header 1 | Header 2 & 3 (colspan) | Header 2 & 3 (colspan) | +| Header 1 | Header 2 & 3 (colspan) | Header 2 & 3 (colspan) | |----------------------------|----------------------------|----------------------------| -| Row 1 & 2, Col 1 (rowspan) | Row 1, Col 2 | Row 1, Col 3 | -| Row 1 & 2, Col 1 (rowspan) | Row 2, Col 2 & 3 (colspan) | Row 2, Col 2 & 3 (colspan) | +| Row 1 & 2, Col 1 (rowspan) | Row 1, Col 2 | Row 1, Col 3 | +| Row 1 & 2, Col 1 (rowspan) | Row 2, Col 2 & 3 (colspan) | Row 2, Col 2 & 3 (colspan) | | Row 3, Col 1 | Row 3, Col 2 | Row 3, Col 3 | \ No newline at end of file diff --git a/tests/data/groundtruth/docling_v2/ipa20180000016.md b/tests/data/groundtruth/docling_v2/ipa20180000016.md index d02144c60..d1cd8c0bd 100644 --- a/tests/data/groundtruth/docling_v2/ipa20180000016.md +++ b/tests/data/groundtruth/docling_v2/ipa20180000016.md @@ -112,25 +112,25 @@ Examples of the first fluorescent material 71 specifically include fluorescent m (i−j)MgO.(j/2)Sc₂O₃.kMgF₂.mCaF₂.(1−n)GeO₂.(n/2)Mt₂O₃:zMn⁴⁺ (I) -wherein Mt is at least one selected from the group consisting of Al, Ga, and In, and j, k, m, n, and z are numbers satisfying 2≦i≦4, 0≦j<0.5, 00.3) and higher rainfall (>700 mm per year) contribute to expansion of vector habitats and population. Additionally, having more than five rounds of MDA before pre-TAS was also statistically significantly associated with higher failure in the bivariate analysis. It is unclear why higher number of rounds is associated with first pre-TAS failure given that other research has shown the opposite [15,16]. +The small number of failures overall prevented the inclusion of a large number of variables in the final log-binomial model. However, other variables that are associated with failure as identified in the bivariate analyses, such as Culex vector, higher population density, higher EVI, higher rainfall and more rounds of MDA, should not be discounted when making programmatic decisions. Other models have shown that Culex as the predominant vector in a district, compared to Anopheles, results in more intense interventions needed to reach elimination [24,41]. Higher population density, which was also found to predict TAS failure [7], could be related to different vector species’ transmission dynamics in urban areas, as well as the fact that MDAs are harder to conduct and to accurately measure in urban areas [46,47]. Both higher enhanced vegetation index (>0.3) and higher rainfall (>700 mm per year) contribute to expansion of vector habitats and population. Additionally, having more than five rounds of MDA before pre-TAS was also statistically significantly associated with higher failure in the bivariate analysis. It is unclear why higher number of rounds is associated with first pre-TAS failure given that other research has shown the opposite [15,16]. All other variables included in this analysis were not significantly associated with pre-TAS failure in our analysis. Goldberg et al. found Brugia spp. to be significantly associated with failure, but our results did not. This is likely due in part to the small number of districts with Brugia spp. in our dataset (6%) compared to 46% in the Goldberg et al. article [7]. MDA coverage levels were not significantly associated with pre-TAS failure, likely due to the lack of variance in the coverage data since WHO guidance dictates a minimum of five rounds of MDA with ≥65% epidemiological coverage to be eligible to implement pre-TAS. It should not be interpreted as evidence that high MDA coverage levels are not necessary to lower prevalence. @@ -110,16 +110,16 @@ Table 1: Categorization of potential factors influencing pre-TAS results. | Domain | Factor | Covariate | Description | Reference Group | Summary statistic | Temporal Resolution | Source | |------------------------|-----------------------|-------------------------------|-----------------------------------------------------------------|----------------------|---------------------|-----------------------|--------------------| -| Prevalence | Baseline prevalence | 5% cut off | Maximum reported mapping or baseline sentinel site prevalence | <5% | Maximum | Varies | Programmatic data | -| Prevalence | Baseline prevalence | 10% cut off | Maximum reported mapping or baseline sentinel site prevalence | <10% | Maximum | Varies | Programmatic data | -| Agent | Parasite | Parasite | Predominate parasite in district | W. bancrofti & mixed | Binary value | 2018 | Programmatic data | -| Environment | Vector | Vector | Predominate vector in district | Anopheles & Mansonia | Binary value | 2018 | Country expert | -| Environment | Geography | Elevation | Elevation measured in meters | >350 | Mean | 2000 | CGIAR-CSI SRTM [9] | -| Environment | Geography | District area | Area measured in km2 | >2,500 | Maximum sum | Static | Programmatic data | -| Environment | Climate | EVI | Enhanced vegetation index | > 0.3 | Mean | 2015 | MODIS [10] | +| Prevalence | Baseline prevalence | 5% cut off | Maximum reported mapping or baseline sentinel site prevalence | <5% | Maximum | Varies | Programmatic data | +| Prevalence | Baseline prevalence | 10% cut off | Maximum reported mapping or baseline sentinel site prevalence | <10% | Maximum | Varies | Programmatic data | +| Agent | Parasite | Parasite | Predominate parasite in district | W. bancrofti & mixed | Binary value | 2018 | Programmatic data | +| Environment | Vector | Vector | Predominate vector in district | Anopheles & Mansonia | Binary value | 2018 | Country expert | +| Environment | Geography | Elevation | Elevation measured in meters | >350 | Mean | 2000 | CGIAR-CSI SRTM [9] | +| Environment | Geography | District area | Area measured in km2 | >2,500 | Maximum sum | Static | Programmatic data | +| Environment | Climate | EVI | Enhanced vegetation index | > 0.3 | Mean | 2015 | MODIS [10] | | Environment | Climate | Rainfall | Annual rainfall measured in mm | ≤ 700 | Mean | 2015 | CHIRPS [11] | | Environment | Socio-economic | Population density | Number of people per km2 | ≤ 100 | Mean | 2015 | WorldPop [12] | -| Environment | Socio-economic | Nighttime lights | Nighttime light index from 0 to 63 | >1.5 | Mean | 2015 | VIIRS [13] | +| Environment | Socio-economic | Nighttime lights | Nighttime light index from 0 to 63 | >1.5 | Mean | 2015 | VIIRS [13] | | Environment | Co-endemicity | Co-endemic for onchocerciasis | Part or all of district is also endemic for onchocerciases | Non-endemic | Binary value | 2018 | Programmatic data | | MDA | Drug efficacy | Drug package | DEC-ALB or IVM-ALB | DEC-ALB | Binary value | 2018 | Programmatic data | | MDA | Implementation of MDA | Coverage | Median MDA coverage for last 5 rounds | ≥ 65% | Median | Varies | Programmatic data | @@ -136,12 +136,12 @@ Table 2: Adjusted risk ratios for pre-TAS failure from log-binomial model sensit | Number of Failures | 74 | 74 | 44 | 72 | 46 | | Number of total districts | (N = 554) | (N = 420) | (N = 407) | (N = 518) | (N = 414) | | Covariate | RR (95% CI) | RR (95% CI) | RR (95% CI) | RR (95% CI) | RR (95% CI) | -| Baseline prevalence > = 10% & used FTS test | 2.38 (0.96–5.90) | 1.23 (0.52–2.92) | 14.52 (1.79–117.82) | 2.61 (1.03–6.61) | 15.80 (1.95–127.67) | -| Baseline prevalence > = 10% & used ICT test | 0.80 (0.20–3.24) | 0.42 (0.11–1.68) | 1.00 (0.00–0.00) | 0.88 (0.21–3.60) | 1.00 (0.00–0.00) | +| Baseline prevalence > = 10% & used FTS test | 2.38 (0.96–5.90) | 1.23 (0.52–2.92) | 14.52 (1.79–117.82) | 2.61 (1.03–6.61) | 15.80 (1.95–127.67) | +| Baseline prevalence > = 10% & used ICT test | 0.80 (0.20–3.24) | 0.42 (0.11–1.68) | 1.00 (0.00–0.00) | 0.88 (0.21–3.60) | 1.00 (0.00–0.00) | | +Used FTS test | 1.16 (0.52–2.59) | 2.40 (1.12–5.11) | 0.15 (0.02–1.11) | 1.03 (0.45–2.36) | 0.13 (0.02–0.96) | | +Used ICT test | 0.92 (0.32–2.67) | 1.47 (0.51–4.21) | 0.33 (0.04–2.54) | 0.82 (0.28–2.43) | 0.27 (0.03–2.04) | -| +Baseline prevalence > = 10% | 2.52 (1.37–4.64) | 2.42 (1.31–4.47) | 2.03 (1.06–3.90) | 2.30 (1.21–4.36) | 2.01 (1.07–3.77) | -| Elevation < 350m | 3.07 (1.95–4.83) | 2.21 (1.42–3.43) | 4.68 (2.22–9.87) | 3.04 (1.93–4.79) | 3.76 (1.92–7.37) | +| +Baseline prevalence > = 10% | 2.52 (1.37–4.64) | 2.42 (1.31–4.47) | 2.03 (1.06–3.90) | 2.30 (1.21–4.36) | 2.01 (1.07–3.77) | +| Elevation < 350m | 3.07 (1.95–4.83) | 2.21 (1.42–3.43) | 4.68 (2.22–9.87) | 3.04 (1.93–4.79) | 3.76 (1.92–7.37) | ## Figures diff --git a/tests/data/groundtruth/docling_v2/pone.0234687.xml.md b/tests/data/groundtruth/docling_v2/pone.0234687.xml.md index 0e9c0f027..36758fd58 100644 --- a/tests/data/groundtruth/docling_v2/pone.0234687.xml.md +++ b/tests/data/groundtruth/docling_v2/pone.0234687.xml.md @@ -62,7 +62,7 @@ The CH4 emissions from enteric fermentation intensity (g (kg ECM)-1) was a funct The CH4 emission from manure (kg (kg ECM)-1) was a function of daily CH4 emission from manure (kg cow-1) and daily ECM (kg cow-1). The daily CH4 emission from manure was estimated according to IPCC [38], which considered daily volatile solid (VS) excreted (kg DM cow-1) in manure. The daily VS was estimated as proposed by Eugène et al. [44] as: VS = NDOMI + (UE × GE) × (OM/18.45), where: VS = volatile solid excretion on an organic matter (OM) basis (kg day-1), NDOMI = non-digestible OM intake (kg day-1): (1- OM digestibility) × OM intake, UE = urinary energy excretion as a fraction of GE (0.04), GE = gross energy intake (MJ day-1), OM = organic matter (g), 18.45 = conversion factor for dietary GE per kg of DM (MJ kg-1). -The OM digestibility was estimated as a function of chemical composition, using equations published by INRA [21], which takes into account the effects of digestive interactions due to feeding level, the proportion of concentrate and rumen protein balance on OM digestibility. For scenarios where cows had access to grazing, the amount of calculated VS were corrected as a function of the time at pasture. The biodegradability of manure factor (0.13 for dairy cows in Latin America) and methane conversion factor (MCF) values were taken from IPCC [38]. The MCF values for pit storage below animal confinements (> 1 month) were used for the calculation, taking into account the annual average temperature (16.6ºC) or the average temperatures during the growth period of temperate (14.4ºC) or tropical (21ºC) annual pastures, which were 31%, 26% and 46%, respectively. +The OM digestibility was estimated as a function of chemical composition, using equations published by INRA [21], which takes into account the effects of digestive interactions due to feeding level, the proportion of concentrate and rumen protein balance on OM digestibility. For scenarios where cows had access to grazing, the amount of calculated VS were corrected as a function of the time at pasture. The biodegradability of manure factor (0.13 for dairy cows in Latin America) and methane conversion factor (MCF) values were taken from IPCC [38]. The MCF values for pit storage below animal confinements (> 1 month) were used for the calculation, taking into account the annual average temperature (16.6ºC) or the average temperatures during the growth period of temperate (14.4ºC) or tropical (21ºC) annual pastures, which were 31%, 26% and 46%, respectively. The N2O-N emissions from urine and feces were estimated considering the proportion of N excreted as manure and storage or as urine and dung deposited by grazing animals. These proportions were calculated based on the proportion of daily time that animals stayed on pasture (7 h/24 h = 0.29) or confinement (1−0.29 = 0.71). For lactating heifers and cows, the total amount of N excreted was calculated by the difference between N intake and milk N excretion. For heifers and non-lactating cows, urinary and fecal N excretion were estimated as proposed by Reed et al. [45] (Table 3: equations 10 and 12, respectively). The N2O emissions from stored manure as well as urine and dung during grazing were calculated based on the conversion of N2O-N emissions to N2O emissions, where N2O emissions = N2O-N emissions × 44/28. The emission factors were 0.002 kg N2O-N (kg N)-1 stored in a pit below animal confinements, and 0.02 kg N2O-N (kg of urine and dung)-1 deposited on pasture [38]. The indirect N2O emissions from storage manure and urine and dung deposits on pasture were also estimated using the IPCC [38] emission factors. @@ -106,7 +106,7 @@ The lower C footprint in scenarios with access to pasture, when local emission f The enteric CH4 intensity was similar between different scenarios (Fig 2), showing the greatest sensitivity index, with values ranging from 0.53 to 0.62, which indicate that for a 10% change in this source, the C footprint may change between 5.3 and 6.2% (Fig 3). The large effect of enteric CH4 emissions on the whole C footprint was expected, because the impact of enteric CH4 on GHG emissions of milk production in different dairy systems has been estimated to range from 44 to 60% of the total CO2e [50,52,57,58]. However, emissions in feed production may be the most important source of GHG when emission factors for producing concentrate feeds are greater than 0.7 kg CO2e kg-1 [59], which did not happen in this study. -The lack of difference in enteric CH4 emissions in different systems can be explained by the narrow range of NDF content in diets (<4% difference). This non-difference is due to the lower NDF content of annual temperate pastures (495 g (kg DM)-1) compared to corn silage (550 g (kg DM)-1). Hence, an expected, increase NDF content with decreased concentrate was partially offset by an increase in the pasture proportion relatively low in NDF. This is in agreement with studies conducted in southern Brazil, which have shown that the actual enteric CH4 emissions may decrease with inclusion of temperate pastures in cows receiving corn silage and soybean meal [60] or increase enteric CH4 emissions when dairy cows grazing a temperate pasture was supplemented with corn silage [61]. Additionally, enteric CH4 emissions did not differ between dairy cows receiving TMR exclusively or grazing a tropical pasture in the same scenarios as in this study [26]. +The lack of difference in enteric CH4 emissions in different systems can be explained by the narrow range of NDF content in diets (<4% difference). This non-difference is due to the lower NDF content of annual temperate pastures (495 g (kg DM)-1) compared to corn silage (550 g (kg DM)-1). Hence, an expected, increase NDF content with decreased concentrate was partially offset by an increase in the pasture proportion relatively low in NDF. This is in agreement with studies conducted in southern Brazil, which have shown that the actual enteric CH4 emissions may decrease with inclusion of temperate pastures in cows receiving corn silage and soybean meal [60] or increase enteric CH4 emissions when dairy cows grazing a temperate pasture was supplemented with corn silage [61]. Additionally, enteric CH4 emissions did not differ between dairy cows receiving TMR exclusively or grazing a tropical pasture in the same scenarios as in this study [26]. ### Emissions from excreta and feed production diff --git a/tests/data/groundtruth/docling_v2/redp5110_sampled.md b/tests/data/groundtruth/docling_v2/redp5110_sampled.md index a0e71aad5..460b7a35e 100644 --- a/tests/data/groundtruth/docling_v2/redp5110_sampled.md +++ b/tests/data/groundtruth/docling_v2/redp5110_sampled.md @@ -63,10 +63,10 @@ Solution Brief IBM Systems Lab Services and Training ## Highlights -- GLYPHGLYPH GLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPH GLYPHGLYPHGLYPHGLYPH GLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPH GLYPHGLYPHGLYPH GLYPHGLYPHGLYPHGLYPH GLYPH GLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPH GLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPH -- GLYPHGLYPH GLYPHGLYPHGLYPH GLYPHGLYPH GLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPH GLYPHGLYPHGLYPHGLYPHGLYPH GLYPHGLYPH GLYPHGLYPHGLYPH GLYPHGLYPHGLYPH GLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPH GLYPH GLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPH GLYPHGLYPHGLYPHGLYPHGLYPH GLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPH GLYPHGLYPHGLYPH GLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPH GLYPHGLYPHGLYPH GLYPH GLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPH -- GLYPHGLYPH GLYPHGLYPHGLYPHGLYPHGLYPH GLYPHGLYPHGLYPH GLYPHGLYPHGLYPHGLYPH GLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPH GLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPH GLYPHGLYPHGLYPHGLYPHGLYPHGLYPH GLYPH GLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPH GLYPHGLYPHGLYPHGLYPH GLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPH GLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPH -- GLYPHGLYPH GLYPH GLYPHGLYPHGLYPHGLYPH GLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPH GLYPHGLYPHGLYPH GLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPH GLYPHGLYPHGLYPH GLYPH GLYPH GLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPH GLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPH GLYPHGLYPHGLYPH GLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPHGLYPH +- GLYPH<g115>GLYPH<g3> GLYPH<g40>GLYPH<g81>GLYPH<g75>GLYPH<g68>GLYPH<g81>GLYPH<g70>GLYPH<g72>GLYPH<g3> GLYPH<g87>GLYPH<g75>GLYPH<g72>GLYPH<g3> GLYPH<g83>GLYPH<g72>GLYPH<g85>GLYPH<g73>GLYPH<g82>GLYPH<g85>GLYPH<g80>GLYPH<g68>GLYPH<g81>GLYPH<g70>GLYPH<g72>GLYPH<g3> GLYPH<g82>GLYPH<g73>GLYPH<g3> GLYPH<g92>GLYPH<g82>GLYPH<g88>GLYPH<g85> GLYPH<g3> GLYPH<g71>GLYPH<g68>GLYPH<g87>GLYPH<g68>GLYPH<g69>GLYPH<g68>GLYPH<g86>GLYPH<g72>GLYPH<g3> GLYPH<g82>GLYPH<g83>GLYPH<g72>GLYPH<g85>GLYPH<g68>GLYPH<g87>GLYPH<g76>GLYPH<g82>GLYPH<g81>GLYPH<g86> +- GLYPH<g115>GLYPH<g3> GLYPH<g40>GLYPH<g68>GLYPH<g85> GLYPH<g81>GLYPH<g3> GLYPH<g74>GLYPH<g85>GLYPH<g72>GLYPH<g68>GLYPH<g87>GLYPH<g72>GLYPH<g85>GLYPH<g3> GLYPH<g85>GLYPH<g72>GLYPH<g87>GLYPH<g88>GLYPH<g85> GLYPH<g81>GLYPH<g3> GLYPH<g82>GLYPH<g81>GLYPH<g3> GLYPH<g44>GLYPH<g55>GLYPH<g3> GLYPH<g83>GLYPH<g85>GLYPH<g82>GLYPH<g77>GLYPH<g72>GLYPH<g70>GLYPH<g87>GLYPH<g86> GLYPH<g3> GLYPH<g87>GLYPH<g75>GLYPH<g85>GLYPH<g82>GLYPH<g88>GLYPH<g74>GLYPH<g75>GLYPH<g3> GLYPH<g80>GLYPH<g82>GLYPH<g71>GLYPH<g72>GLYPH<g85> GLYPH<g81>GLYPH<g76>GLYPH<g93>GLYPH<g68>GLYPH<g87>GLYPH<g76>GLYPH<g82>GLYPH<g81>GLYPH<g3> GLYPH<g82>GLYPH<g73>GLYPH<g3> GLYPH<g71>GLYPH<g68>GLYPH<g87>GLYPH<g68>GLYPH<g69>GLYPH<g68>GLYPH<g86>GLYPH<g72>GLYPH<g3> GLYPH<g68>GLYPH<g81>GLYPH<g71> GLYPH<g3> GLYPH<g68>GLYPH<g83>GLYPH<g83>GLYPH<g79>GLYPH<g76>GLYPH<g70>GLYPH<g68>GLYPH<g87>GLYPH<g76>GLYPH<g82>GLYPH<g81>GLYPH<g86> +- GLYPH<g115>GLYPH<g3> GLYPH<g53>GLYPH<g72>GLYPH<g79>GLYPH<g92>GLYPH<g3> GLYPH<g82>GLYPH<g81>GLYPH<g3> GLYPH<g44>GLYPH<g37>GLYPH<g48>GLYPH<g3> GLYPH<g72>GLYPH<g91>GLYPH<g83>GLYPH<g72>GLYPH<g85>GLYPH<g87>GLYPH<g3> GLYPH<g70>GLYPH<g82>GLYPH<g81>GLYPH<g86>GLYPH<g88>GLYPH<g79>GLYPH<g87>GLYPH<g76>GLYPH<g81>GLYPH<g74>GLYPH<g15>GLYPH<g3> GLYPH<g86>GLYPH<g78>GLYPH<g76>GLYPH<g79>GLYPH<g79>GLYPH<g86> GLYPH<g3> GLYPH<g86>GLYPH<g75>GLYPH<g68>GLYPH<g85>GLYPH<g76>GLYPH<g81>GLYPH<g74>GLYPH<g3> GLYPH<g68>GLYPH<g81>GLYPH<g71>GLYPH<g3> GLYPH<g85>GLYPH<g72>GLYPH<g81>GLYPH<g82>GLYPH<g90>GLYPH<g81>GLYPH<g3> GLYPH<g86>GLYPH<g72>GLYPH<g85>GLYPH<g89>GLYPH<g76>GLYPH<g70>GLYPH<g72>GLYPH<g86> +- GLYPH<g115>GLYPH<g3> GLYPH<g55> GLYPH<g68>GLYPH<g78>GLYPH<g72>GLYPH<g3> GLYPH<g68>GLYPH<g71>GLYPH<g89>GLYPH<g68>GLYPH<g81>GLYPH<g87>GLYPH<g68>GLYPH<g74>GLYPH<g72>GLYPH<g3> GLYPH<g82>GLYPH<g73>GLYPH<g3> GLYPH<g68>GLYPH<g70>GLYPH<g70>GLYPH<g72>GLYPH<g86>GLYPH<g86>GLYPH<g3> GLYPH<g87>GLYPH<g82>GLYPH<g3> GLYPH<g68> GLYPH<g3> GLYPH<g90>GLYPH<g82>GLYPH<g85>GLYPH<g79>GLYPH<g71>GLYPH<g90>GLYPH<g76>GLYPH<g71>GLYPH<g72>GLYPH<g3> GLYPH<g86>GLYPH<g82>GLYPH<g88>GLYPH<g85>GLYPH<g70>GLYPH<g72>GLYPH<g3> GLYPH<g82>GLYPH<g73>GLYPH<g3> GLYPH<g72>GLYPH<g91>GLYPH<g83>GLYPH<g72>GLYPH<g85>GLYPH<g87>GLYPH<g76>GLYPH<g86>GLYPH<g72> @@ -130,20 +130,20 @@ Businesses must make a serious effort to secure their data and recognize that se This chapter describes how you can secure and protect data in DB2 for i. The following topics are covered in this chapter: -- GLYPH Security fundamentals -- GLYPH Current state of IBM i security -- GLYPH DB2 for i security controls +- GLYPH<SM590000> Security fundamentals +- GLYPH<SM590000> Current state of IBM i security +- GLYPH<SM590000> DB2 for i security controls ## 1.1 Security fundamentals Before reviewing database security techniques, there are two fundamental steps in securing information assets that must be described: -- GLYPH First, and most important, is the definition of a company's security policy . Without a security policy, there is no definition of what are acceptable practices for using, accessing, and storing information by who, what, when, where, and how. A security policy should minimally address three things: confidentiality, integrity, and availability. +- GLYPH<SM590000> First, and most important, is the definition of a company's security policy . Without a security policy, there is no definition of what are acceptable practices for using, accessing, and storing information by who, what, when, where, and how. A security policy should minimally address three things: confidentiality, integrity, and availability. - The monitoring and assessment of adherence to the security policy determines whether your security strategy is working. Often, IBM security consultants are asked to perform security assessments for companies without regard to the security policy. Although these assessments can be useful for observing how the system is defined and how data is being accessed, they cannot determine the level of security without a security policy. Without a security policy, it really is not an assessment as much as it is a baseline for monitoring the changes in the security settings that are captured. A security policy is what defines whether the system and its settings are secure (or not). -- GLYPH The second fundamental in securing data assets is the use of resource security . If implemented properly, resource security prevents data breaches from both internal and external intrusions. Resource security controls are closely tied to the part of the security policy that defines who should have access to what information resources. A hacker might be good enough to get through your company firewalls and sift his way through to your system, but if they do not have explicit access to your database, the hacker cannot compromise your information assets. +- GLYPH<SM590000> The second fundamental in securing data assets is the use of resource security . If implemented properly, resource security prevents data breaches from both internal and external intrusions. Resource security controls are closely tied to the part of the security policy that defines who should have access to what information resources. A hacker might be good enough to get through your company firewalls and sift his way through to your system, but if they do not have explicit access to your database, the hacker cannot compromise your information assets. With your eyes now open to the importance of securing information assets, the rest of this chapter reviews the methods that are available for securing database resources on IBM i. @@ -173,9 +173,9 @@ Figure 1-2 Existing row and column controls The following CL commands can be used to work with, display, or change function usage IDs: -- GLYPH Work Function Usage ( WRKFCNUSG ) -- GLYPH Change Function Usage ( CHGFCNUSG ) -- GLYPH Display Function Usage ( DSPFCNUSG ) +- GLYPH<SM590000> Work Function Usage ( WRKFCNUSG ) +- GLYPH<SM590000> Change Function Usage ( CHGFCNUSG ) +- GLYPH<SM590000> Display Function Usage ( DSPFCNUSG ) For example, the following CHGFCNUSG command shows granting authorization to user HBEDOYA to administer and manage RCAC rules: @@ -191,8 +191,8 @@ Table 2-1 FUNCTION\_USAGE view |---------------|-------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------| | FUNCTION\_ID | VARCHAR(30) | ID of the function. | | USER\_NAME | VARCHAR(10) | Name of the user profile that has a usage setting for this function. | -| USAGE | VARCHAR(7) | Usage setting: GLYPH ALLOWED: The user profile is allowed to use the function. GLYPH DENIED: The user profile is not allowed to use the function. | -| USER\_TYPE | VARCHAR(5) | Type of user profile: GLYPH USER: The user profile is a user. GLYPH GROUP: The user profile is a group. | +| USAGE | VARCHAR(7) | Usage setting: GLYPH<SM590000> ALLOWED: The user profile is allowed to use the function. GLYPH<SM590000> DENIED: The user profile is not allowed to use the function. | +| USER\_TYPE | VARCHAR(5) | Type of user profile: GLYPH<SM590000> USER: The user profile is a user. GLYPH<SM590000> GROUP: The user profile is a group. | To discover who has authorization to define and manage RCAC, you can use the query that is shown in Example 2-1. @@ -273,11 +273,11 @@ Table 3-1 Special registers and their corresponding values Figure 3-5 shows the difference in the special register values when an adopted authority is used: -- GLYPH A user connects to the server using the user profile ALICE. -- GLYPH USER and CURRENT USER initially have the same value of ALICE. -- GLYPH ALICE calls an SQL procedure that is named proc1, which is owned by user profile JOE and was created to adopt JOE's authority when it is called. -- GLYPH While the procedure is running, the special register USER still contains the value of ALICE because it excludes any adopted authority. The special register CURRENT USER contains the value of JOE because it includes any adopted authority. -- GLYPH When proc1 ends, the session reverts to its original state with both USER and CURRENT USER having the value of ALICE. +- GLYPH<SM590000> A user connects to the server using the user profile ALICE. +- GLYPH<SM590000> USER and CURRENT USER initially have the same value of ALICE. +- GLYPH<SM590000> ALICE calls an SQL procedure that is named proc1, which is owned by user profile JOE and was created to adopt JOE's authority when it is called. +- GLYPH<SM590000> While the procedure is running, the special register USER still contains the value of ALICE because it excludes any adopted authority. The special register CURRENT USER contains the value of JOE because it includes any adopted authority. +- GLYPH<SM590000> When proc1 ends, the session reverts to its original state with both USER and CURRENT USER having the value of ALICE. Figure 3-5 Special registers and adopted authority @@ -318,7 +318,7 @@ Here is an example of using the VERIFY\_GROUP\_FOR\_USER function: - 3. If a user is connected to the server using user profile JANE, all of the following function invocations return a value of 1: ``` -VERIFY\_GROUP\_FOR\_USER (CURRENT\_USER, 'MGR') VERIFY\_GROUP\_FOR\_USER (CURRENT\_USER, 'JANE', 'MGR') VERIFY\_GROUP\_FOR\_USER (CURRENT\_USER, 'JANE', 'MGR', 'STEVE') The following function invocation returns a value of 0: VERIFY\_GROUP\_FOR\_USER (CURRENT\_USER, 'JUDY', 'TONY') +VERIFY_GROUP_FOR_USER (CURRENT_USER, 'MGR') VERIFY_GROUP_FOR_USER (CURRENT_USER, 'JANE', 'MGR') VERIFY_GROUP_FOR_USER (CURRENT_USER, 'JANE', 'MGR', 'STEVE') The following function invocation returns a value of 0: VERIFY_GROUP_FOR_USER (CURRENT_USER, 'JUDY', 'TONY') ``` RETURN @@ -326,7 +326,7 @@ RETURN CASE ``` -WHEN VERIFY\_GROUP\_FOR\_USER ( SESSION\_USER , 'HR', 'EMP' ) = 1 THEN EMPLOYEES . DATE\_OF\_BIRTH WHEN VERIFY\_GROUP\_FOR\_USER ( SESSION\_USER , 'MGR' ) = 1 AND SESSION\_USER = EMPLOYEES . USER\_ID THEN EMPLOYEES . DATE\_OF\_BIRTH WHEN VERIFY\_GROUP\_FOR\_USER ( SESSION\_USER , 'MGR' ) = 1 AND SESSION\_USER <> EMPLOYEES . USER\_ID THEN ( 9999 || '-' || MONTH ( EMPLOYEES . DATE\_OF\_BIRTH ) || '-' || DAY (EMPLOYEES.DATE\_OF\_BIRTH )) ELSE NULL END ENABLE ; +WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'HR', 'EMP' ) = 1 THEN EMPLOYEES . DATE_OF_BIRTH WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'MGR' ) = 1 AND SESSION_USER = EMPLOYEES . USER_ID THEN EMPLOYEES . DATE_OF_BIRTH WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'MGR' ) = 1 AND SESSION_USER <> EMPLOYEES . USER_ID THEN ( 9999 || '-' || MONTH ( EMPLOYEES . DATE_OF_BIRTH ) || '-' || DAY (EMPLOYEES.DATE_OF_BIRTH )) ELSE NULL END ENABLE ; ``` - 2. The other column to mask in this example is the TAX\_ID information. In this example, the rules to enforce include the following ones: @@ -339,7 +339,7 @@ WHEN VERIFY\_GROUP\_FOR\_USER ( SESSION\_USER , 'HR', 'EMP' ) = 1 THEN EMPLOYEES Example 3-9 Creating a mask on the TAX\_ID column ``` -CREATE MASK HR\_SCHEMA.MASK\_TAX\_ID\_ON\_EMPLOYEES ON HR\_SCHEMA.EMPLOYEES AS EMPLOYEES FOR COLUMN TAX\_ID RETURN CASE WHEN VERIFY\_GROUP\_FOR\_USER ( SESSION\_USER , 'HR' ) = 1 THEN EMPLOYEES . TAX\_ID WHEN VERIFY\_GROUP\_FOR\_USER ( SESSION\_USER , 'MGR' ) = 1 AND SESSION\_USER = EMPLOYEES . USER\_ID THEN EMPLOYEES . TAX\_ID WHEN VERIFY\_GROUP\_FOR\_USER ( SESSION\_USER , 'MGR' ) = 1 AND SESSION\_USER <> EMPLOYEES . USER\_ID THEN ( 'XXX-XX-' CONCAT QSYS2 . SUBSTR ( EMPLOYEES . TAX\_ID , 8 , 4 ) ) WHEN VERIFY\_GROUP\_FOR\_USER ( SESSION\_USER , 'EMP' ) = 1 THEN EMPLOYEES . TAX\_ID ELSE 'XXX-XX-XXXX' END ENABLE ; +CREATE MASK HR_SCHEMA.MASK_TAX_ID_ON_EMPLOYEES ON HR_SCHEMA.EMPLOYEES AS EMPLOYEES FOR COLUMN TAX_ID RETURN CASE WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'HR' ) = 1 THEN EMPLOYEES . TAX_ID WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'MGR' ) = 1 AND SESSION_USER = EMPLOYEES . USER_ID THEN EMPLOYEES . TAX_ID WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'MGR' ) = 1 AND SESSION_USER <> EMPLOYEES . USER_ID THEN ( 'XXX-XX-' CONCAT QSYS2 . SUBSTR ( EMPLOYEES . TAX_ID , 8 , 4 ) ) WHEN VERIFY_GROUP_FOR_USER ( SESSION_USER , 'EMP' ) = 1 THEN EMPLOYEES . TAX_ID ELSE 'XXX-XX-XXXX' END ENABLE ; ``` - 3. Figure 3-10 shows the masks that are created in the HR\_SCHEMA. @@ -386,7 +386,7 @@ Figure 4-69 Index advice with no RCAC ``` -THEN C . CUSTOMER\_TAX\_ID WHEN QSYS2 . VERIFY\_GROUP\_FOR\_USER ( SESSION\_USER , 'TELLER' ) = 1 THEN ( 'XXX-XX-' CONCAT QSYS2 . SUBSTR ( C . CUSTOMER\_TAX\_ID , 8 , 4 ) ) WHEN QSYS2 . VERIFY\_GROUP\_FOR\_USER ( SESSION\_USER , 'CUSTOMER' ) = 1 THEN C . CUSTOMER\_TAX\_ID ELSE 'XXX-XX-XXXX' END ENABLE ; CREATE MASK BANK\_SCHEMA.MASK\_DRIVERS\_LICENSE\_ON\_CUSTOMERS ON BANK\_SCHEMA.CUSTOMERS AS C FOR COLUMN CUSTOMER\_DRIVERS\_LICENSE\_NUMBER RETURN CASE WHEN QSYS2 . VERIFY\_GROUP\_FOR\_USER ( SESSION\_USER , 'ADMIN' ) = 1 THEN C . CUSTOMER\_DRIVERS\_LICENSE\_NUMBER WHEN QSYS2 . VERIFY\_GROUP\_FOR\_USER ( SESSION\_USER , 'TELLER' ) = 1 THEN C . CUSTOMER\_DRIVERS\_LICENSE\_NUMBER WHEN QSYS2 . VERIFY\_GROUP\_FOR\_USER ( SESSION\_USER , 'CUSTOMER' ) = 1 THEN C . CUSTOMER\_DRIVERS\_LICENSE\_NUMBER ELSE '*************' END ENABLE ; CREATE MASK BANK\_SCHEMA.MASK\_LOGIN\_ID\_ON\_CUSTOMERS ON BANK\_SCHEMA.CUSTOMERS AS C FOR COLUMN CUSTOMER\_LOGIN\_ID RETURN CASE WHEN QSYS2 . VERIFY\_GROUP\_FOR\_USER ( SESSION\_USER , 'ADMIN' ) = 1 THEN C . CUSTOMER\_LOGIN\_ID WHEN QSYS2 . VERIFY\_GROUP\_FOR\_USER ( SESSION\_USER , 'CUSTOMER' ) = 1 THEN C . CUSTOMER\_LOGIN\_ID ELSE '*****' END ENABLE ; CREATE MASK BANK\_SCHEMA.MASK\_SECURITY\_QUESTION\_ON\_CUSTOMERS ON BANK\_SCHEMA.CUSTOMERS AS C FOR COLUMN CUSTOMER\_SECURITY\_QUESTION RETURN CASE WHEN QSYS2 . VERIFY\_GROUP\_FOR\_USER ( SESSION\_USER , 'ADMIN' ) = 1 THEN C . CUSTOMER\_SECURITY\_QUESTION WHEN QSYS2 . VERIFY\_GROUP\_FOR\_USER ( SESSION\_USER , 'CUSTOMER' ) = 1 THEN C . CUSTOMER\_SECURITY\_QUESTION ELSE '*****' END ENABLE ; CREATE MASK BANK\_SCHEMA.MASK\_SECURITY\_QUESTION\_ANSWER\_ON\_CUSTOMERS ON BANK\_SCHEMA.CUSTOMERS AS C FOR COLUMN CUSTOMER\_SECURITY\_QUESTION\_ANSWER RETURN CASE WHEN QSYS2 . VERIFY\_GROUP\_FOR\_USER ( SESSION\_USER , 'ADMIN' ) = 1 THEN C . CUSTOMER\_SECURITY\_QUESTION\_ANSWER WHEN QSYS2 . VERIFY\_GROUP\_FOR\_USER ( SESSION\_USER , 'CUSTOMER' ) = 1 THEN C . CUSTOMER\_SECURITY\_QUESTION\_ANSWER ELSE '*****' END ENABLE ; ALTER TABLE BANK\_SCHEMA.CUSTOMERS ACTIVATE ROW ACCESS CONTROL ACTIVATE COLUMN ACCESS CONTROL ; +THEN C . CUSTOMER_TAX_ID WHEN QSYS2 . VERIFY_GROUP_FOR_USER ( SESSION_USER , 'TELLER' ) = 1 THEN ( 'XXX-XX-' CONCAT QSYS2 . SUBSTR ( C . CUSTOMER_TAX_ID , 8 , 4 ) ) WHEN QSYS2 . VERIFY_GROUP_FOR_USER ( SESSION_USER , 'CUSTOMER' ) = 1 THEN C . CUSTOMER_TAX_ID ELSE 'XXX-XX-XXXX' END ENABLE ; CREATE MASK BANK_SCHEMA.MASK_DRIVERS_LICENSE_ON_CUSTOMERS ON BANK_SCHEMA.CUSTOMERS AS C FOR COLUMN CUSTOMER_DRIVERS_LICENSE_NUMBER RETURN CASE WHEN QSYS2 . VERIFY_GROUP_FOR_USER ( SESSION_USER , 'ADMIN' ) = 1 THEN C . CUSTOMER_DRIVERS_LICENSE_NUMBER WHEN QSYS2 . VERIFY_GROUP_FOR_USER ( SESSION_USER , 'TELLER' ) = 1 THEN C . CUSTOMER_DRIVERS_LICENSE_NUMBER WHEN QSYS2 . VERIFY_GROUP_FOR_USER ( SESSION_USER , 'CUSTOMER' ) = 1 THEN C . CUSTOMER_DRIVERS_LICENSE_NUMBER ELSE '*************' END ENABLE ; CREATE MASK BANK_SCHEMA.MASK_LOGIN_ID_ON_CUSTOMERS ON BANK_SCHEMA.CUSTOMERS AS C FOR COLUMN CUSTOMER_LOGIN_ID RETURN CASE WHEN QSYS2 . VERIFY_GROUP_FOR_USER ( SESSION_USER , 'ADMIN' ) = 1 THEN C . CUSTOMER_LOGIN_ID WHEN QSYS2 . VERIFY_GROUP_FOR_USER ( SESSION_USER , 'CUSTOMER' ) = 1 THEN C . CUSTOMER_LOGIN_ID ELSE '*****' END ENABLE ; CREATE MASK BANK_SCHEMA.MASK_SECURITY_QUESTION_ON_CUSTOMERS ON BANK_SCHEMA.CUSTOMERS AS C FOR COLUMN CUSTOMER_SECURITY_QUESTION RETURN CASE WHEN QSYS2 . VERIFY_GROUP_FOR_USER ( SESSION_USER , 'ADMIN' ) = 1 THEN C . CUSTOMER_SECURITY_QUESTION WHEN QSYS2 . VERIFY_GROUP_FOR_USER ( SESSION_USER , 'CUSTOMER' ) = 1 THEN C . CUSTOMER_SECURITY_QUESTION ELSE '*****' END ENABLE ; CREATE MASK BANK_SCHEMA.MASK_SECURITY_QUESTION_ANSWER_ON_CUSTOMERS ON BANK_SCHEMA.CUSTOMERS AS C FOR COLUMN CUSTOMER_SECURITY_QUESTION_ANSWER RETURN CASE WHEN QSYS2 . VERIFY_GROUP_FOR_USER ( SESSION_USER , 'ADMIN' ) = 1 THEN C . CUSTOMER_SECURITY_QUESTION_ANSWER WHEN QSYS2 . VERIFY_GROUP_FOR_USER ( SESSION_USER , 'CUSTOMER' ) = 1 THEN C . CUSTOMER_SECURITY_QUESTION_ANSWER ELSE '*****' END ENABLE ; ALTER TABLE BANK_SCHEMA.CUSTOMERS ACTIVATE ROW ACCESS CONTROL ACTIVATE COLUMN ACCESS CONTROL ; ``` Back cover diff --git a/tests/data/groundtruth/docling_v2/wiki_duck.html.md b/tests/data/groundtruth/docling_v2/wiki_duck.html.md index 856e97a7a..df4554fcd 100644 --- a/tests/data/groundtruth/docling_v2/wiki_duck.html.md +++ b/tests/data/groundtruth/docling_v2/wiki_duck.html.md @@ -389,22 +389,22 @@ The 1992 Disney film The Mighty Ducks, starring Emilio Estevez, chose the duck a 4. ^ Visca, Curt; Visca, Kelley (2003). How to Draw Cartoon Birds. The Rosen Publishing Group. ISBN 9780823961566. 5. ^ a b c d Carboneras 1992, p. 536. 6. ^ Livezey 1986, pp. 737–738. -7. ^ Madsen, McHugh & de Kloet 1988, p. 452. -8. ^ Donne-Goussé, Laudet & Hänni 2002, pp. 353–354. +7. ^ Madsen, McHugh & de Kloet 1988, p. 452. +8. ^ Donne-Goussé, Laudet & Hänni 2002, pp. 353–354. 9. ^ a b c d e f Carboneras 1992, p. 540. -10. ^ Elphick, Dunning & Sibley 2001, p. 191. +10. ^ Elphick, Dunning & Sibley 2001, p. 191. 11. ^ Kear 2005, p. 448. 12. ^ Kear 2005, p. 622–623. 13. ^ Kear 2005, p. 686. -14. ^ Elphick, Dunning & Sibley 2001, p. 193. +14. ^ Elphick, Dunning & Sibley 2001, p. 193. 15. ^ a b c d e f g Carboneras 1992, p. 537. 16. ^ American Ornithologists' Union 1998, p. xix. 17. ^ American Ornithologists' Union 1998. 18. ^ Carboneras 1992, p. 538. -19. ^ Christidis & Boles 2008, p. 62. +19. ^ Christidis & Boles 2008, p. 62. 20. ^ Shirihai 2008, pp. 239, 245. -21. ^ a b Pratt, Bruner & Berrett 1987, pp. 98–107. -22. ^ Fitter, Fitter & Hosking 2000, pp. 52–3. +21. ^ a b Pratt, Bruner & Berrett 1987, pp. 98–107. +22. ^ Fitter, Fitter & Hosking 2000, pp. 52–3. 23. ^ "Pacific Black Duck". www.wiresnr.org. Retrieved 2018-04-27. 24. ^ Ogden, Evans. "Dabbling Ducks". CWE. Retrieved 2006-11-02. 25. ^ Karl Mathiesen (16 March 2015). "Don't feed the ducks bread, say conservationists". The Guardian. Retrieved 13 November 2016. @@ -412,7 +412,7 @@ The 1992 Disney film The Mighty Ducks, starring Emilio Estevez, chose the duck a 27. ^ Smith, Cyndi M.; Cooke, Fred; Robertson, Gregory J.; Goudie, R. Ian; Boyd, W. Sean (2000). "Long-Term Pair Bonds in Harlequin Ducks". The Condor. 102 (1): 201–205. doi:10.1093/condor/102.1.201. hdl:10315/13797. 28. ^ "If You Find An Orphaned Duckling - Wildlife Rehabber". wildliferehabber.com. Archived from the original on 2018-09-23. Retrieved 2018-12-22. 29. ^ Carver, Heather (2011). The Duck Bible. Lulu.com. ISBN 9780557901562.[self-published source] -30. ^ Titlow, Budd (2013-09-03). Bird Brains: Inside the Strange Minds of Our Fine Feathered Friends. Rowman & Littlefield. ISBN 9780762797707. +30. ^ Titlow, Budd (2013-09-03). Bird Brains: Inside the Strange Minds of Our Fine Feathered Friends. Rowman & Littlefield. ISBN 9780762797707. 31. ^ Amos, Jonathan (2003-09-08). "Sound science is quackers". BBC News. Retrieved 2006-11-02. 32. ^ "Mythbusters Episode 8". 12 December 2003. 33. ^ Erlandson 1994, p. 171. @@ -446,10 +446,10 @@ The 1992 Disney film The Mighty Ducks, starring Emilio Estevez, chose the duck a - Christidis, Les; Boles, Walter E., eds. (2008). Systematics and Taxonomy of Australian Birds. Collingwood, VIC: Csiro Publishing. ISBN 978-0-643-06511-6. - Donne-Goussé, Carole; Laudet, Vincent; Hänni, Catherine (July 2002). "A molecular phylogeny of Anseriformes based on mitochondrial DNA analysis". Molecular Phylogenetics and Evolution. 23 (3): 339–356. Bibcode:2002MolPE..23..339D. doi:10.1016/S1055-7903(02)00019-2. PMID 12099792. - Elphick, Chris; Dunning, John B. Jr.; Sibley, David, eds. (2001). The Sibley Guide to Bird Life and Behaviour. London: Christopher Helm. ISBN 978-0-7136-6250-4. -- Erlandson, Jon M. (1994). Early Hunter-Gatherers of the California Coast. New York, NY: Springer Science & Business Media. ISBN 978-1-4419-3231-0. +- Erlandson, Jon M. (1994). Early Hunter-Gatherers of the California Coast. New York, NY: Springer Science & Business Media. ISBN 978-1-4419-3231-0. - Fieldhouse, Paul (2002). Food, Feasts, and Faith: An Encyclopedia of Food Culture in World Religions. Vol. I: A–K. Santa Barbara: ABC-CLIO. ISBN 978-1-61069-412-4. - Fitter, Julian; Fitter, Daniel; Hosking, David (2000). Wildlife of the Galápagos. Princeton, NJ: Princeton University Press. ISBN 978-0-691-10295-5. -- Higman, B. W. (2012). How Food Made History. Chichester, UK: John Wiley & Sons. ISBN 978-1-4051-8947-7. +- Higman, B. W. (2012). How Food Made History. Chichester, UK: John Wiley & Sons. ISBN 978-1-4051-8947-7. - Hume, Julian H. (2012). Extinct Birds. London: Christopher Helm. ISBN 978-1-4729-3744-5. - Jeffries, Richard (2008). Holocene Hunter-Gatherers of the Lower Ohio River Valley. Tuscaloosa: University of Alabama Press. ISBN 978-0-8173-1658-7. - Kear, Janet, ed. (2005). Ducks, Geese and Swans: Species Accounts (Cairina to Mergus). Bird Families of the World. Oxford: Oxford University Press. ISBN 978-0-19-861009-0. @@ -457,7 +457,7 @@ The 1992 Disney film The Mighty Ducks, starring Emilio Estevez, chose the duck a - Madsen, Cort S.; McHugh, Kevin P.; de Kloet, Siwo R. (July 1988). "A partial classification of waterfowl (Anatidae) based on single-copy DNA" (PDF). The Auk. 105 (3): 452–459. doi:10.1093/auk/105.3.452. Archived (PDF) from the original on 2022-10-09. - Maisels, Charles Keith (1999). Early Civilizations of the Old World. London: Routledge. ISBN 978-0-415-10975-8. - Pratt, H. Douglas; Bruner, Phillip L.; Berrett, Delwyn G. (1987). A Field Guide to the Birds of Hawaii and the Tropical Pacific. Princeton, NJ: Princeton University Press. ISBN 0-691-02399-9. -- Rau, Charles (1876). Early Man in Europe. New York: Harper & Brothers. LCCN 05040168. +- Rau, Charles (1876). Early Man in Europe. New York: Harper & Brothers. LCCN 05040168. - Shirihai, Hadoram (2008). A Complete Guide to Antarctic Wildlife. Princeton, NJ, US: Princeton University Press. ISBN 978-0-691-13666-0. - Sued-Badillo, Jalil (2003). Autochthonous Societies. General History of the Caribbean. Paris: UNESCO. ISBN 978-92-3-103832-7. - Thorpe, I. J. (1996). The Origins of Agriculture in Europe. New York: Routledge. ISBN 978-0-415-08009-5.