Document page numbers #669
Unanswered
JonathanVelkeneers
asked this question in
1. Q&A
Replies: 1 comment 1 reply
-
Currently not possible. We did some initial work to extract the information, but there's more work left to do. We need a new text partitioning class that can break text maintaining the text metadata (page number, titles, etc) so that the metadata can be stored in the memory DB. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Is there a way to get PDF page numbers from the Citations list?
The PdfDecoder seems to add them to the results, but as far as I can tell from looking through the code and the results (tags and payload) they go unused.
https://github.com/microsoft/kernel-memory/blob/main/service/Core/DataFormats/Pdf/PdfDecoder.cs
https://github.com/microsoft/kernel-memory/blob/main/service/Core/Handlers/TextExtractionHandler.cs
Beta Was this translation helpful? Give feedback.
All reactions