Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF response to HTML #29

Open
mllife opened this issue Sep 19, 2024 · 4 comments
Open

TF response to HTML #29

mllife opened this issue Sep 19, 2024 · 4 comments

Comments

@mllife
Copy link

mllife commented Sep 19, 2024

Any helper code available in repo to do this?

I see some code (related to dataset conversion?)

Not sure about this -
--

Any insight will be helpful.

@maxmnemonic
Copy link
Contributor

Tableformer generates structure predictions in OTSL+ format (OTSL with header support),
to convert OTSL structure represented as list of OTSL tags, to HTML structure (list of HTML tags) you can use this function:
otsl_to_html

OTSL format described in our paper: Optimized Table Tokenization for Table Structure Recognition, there are big benefits in quality and performance to use it.
It has a limited vocabulary:
"ecel" - empty cell
"fcel" - full cell
"lcel" - left-looking span cell
"ucel" - up-looking span cell
"xcel" - cross cell (or 2d span cell)
"nl" - new line
More semantics and logic behind it we describe in a paper.

OTSL+ is extension of OTSL with extra tags or instructions that describe cells of:
"ched" - column headers
"rhed" - row headers
"srow" - section rows

Model predicts these tags sequentially in tag decoder, simultaneously with bounding boxes from bbox decoder.
then we can convert prediction to any other format, ie MD, HTML, etc.

By the way more high level usage of docling-ibm-models can be seen in docling itself: https://github.com/DS4SD/docling

@mllife
Copy link
Author

mllife commented Sep 25, 2024

@maxmnemonic , can you link to the code to do the same or add this as a test or sample notebook to the current repo? it will be really helpful for everyone. thanks

@maxmnemonic
Copy link
Contributor

Thanks for suggestion @mllife, indeed we can add some good examples purely related to tables in this repo

@mllife
Copy link
Author

mllife commented Nov 18, 2024

@maxmnemonic , any update to this? can you add some sample code for this or some test like this https://github.com/DS4SD/docling-ibm-models/blob/main/tests/test_tf_predictor.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants