From ff2c5f320b57e20939fa9e4fb0022e68d6b25908 Mon Sep 17 00:00:00 2001 From: davidmezzetti <561939+davidmezzetti@users.noreply.github.com> Date: Fri, 13 Dec 2024 05:55:14 -0500 Subject: [PATCH] Update README --- README.md | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 41a6c91..90e9e74 100644 --- a/README.md +++ b/README.md @@ -72,6 +72,8 @@ The following section gives an overview of highlighters and available methods/co ### Create a new highlighter +Creates a new highlighter instance. + ```python from txtmarker.factory import Factory highlighter = Factory.create("pdf") @@ -100,8 +102,25 @@ chunks: int Splits queries into multiple chunks. This is designed for very long text matches. +### Page text + +Extracts page text from `infile` and returns as a generator. This enables analysis on the text exactly as it will appear to the highlighter. + +```python +highlighter.pages("input.pdf") +``` + +#### infile +```yaml +infile: string +``` + +Full path to input file + ### Highlight text +Highlights using provided annotations. Annotated file is stored as `outfile`. + ```python highlighter.highlight("input.pdf", "output.pdf", [("name", "text to highlight")]) ``` @@ -125,4 +144,4 @@ Full path to output file, i.e. the highlighted file highlights: list of (string, string|regex) ``` -List of highlight elements. Each pair has a name (can be None) and text value. The text can either be a string or a regular expression. +List of highlight elements. Each pair has a name (can be None) and text value. The text can either be a string or a regular expression. When using string matching, make sure to escape regular expressions (i.e. call `re.escape`).