Skip to content

Commit 8537b18

Browse files
committed
upload v1.18.11
1 parent a1d8963 commit 8537b18

19 files changed

+418
-104
lines changed

PKG-INFO

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Home-page: https://github.com/pymupdf/PyMuPDF
1010
Download-url: https://github.com/pymupdf/PyMuPDF
1111
Summary: PyMuPDF is a Python binding for the document renderer and toolkit MuPDF
1212
Description:
13-
Release date: March 26, 2021
13+
Release date: April 10, 2021
1414

1515
Authors
1616
=======
@@ -25,7 +25,7 @@ Description:
2525

2626
MuPDF can access files in PDF, XPS, OpenXPS, CBZ, EPUB and FB2 (e-books) formats, and it is known for its top performance and high rendering quality.
2727

28-
With PyMuPDF you can access files with extensions like .pdf”, “.xps”, “.oxps”, “.cbz”, “.fb2 or .epub. In addition, about 10 popular image formats can also be opened and handled like documents.
28+
With PyMuPDF you can access files with extensions like .pdf, .xps, .oxps, .cbz, .fb2 or .epub. In addition, about 10 popular image formats can also be handled like documents: .png, .bmp, .gif, .tiff, etc..
2929

3030
PyMuPDF should run on all platforms that are supported by both, MuPDF and Python 3.6+. These include, but are not limited to, Windows, Mac OSX and Linux, 32-bit or 64-bit. If you can generate MuPDF on a Python supported platform, then also PyMuPDF can be used there.
3131

@@ -59,7 +59,7 @@ Description:
5959
License and Copyright Information
6060
==================================
6161

62-
In order to comply with MuPDFs dual licensing model, PyMuPDF has entered into an agreement with Artifex who has the right to sublicense PyMuPDF to third parties.
62+
In order to comply with MuPDF's dual licensing model, PyMuPDF has entered into an agreement with Artifex who has the right to sublicense PyMuPDF to third parties.
6363

6464
PyMuPDF and MuPDF are now available under both open-source AGPL and commercial license agreements. Please read the full text of the AGPL license agreement, available in the distribution material (file COPYING) and `here <https://www.gnu.org/licenses/agpl-3.0.html>`_, to ensure that your use case complies with the guidelines of the license. If you determine you cannot meet the requirements of the AGPL, please contact `Artifex <https://artifex.com/contact/>`_ for more information regarding a commercial license.
6565

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
![logo](https://github.com/pymupdf/PyMuPDF/blob/master/demo/pymupdf.jpg)
44

5-
Release date: March 22, 2021
5+
Release date: April 10, 2021
66

77
**Travis-CI:** [![Build Status](https://travis-ci.org/JorjMcKie/py-mupdf.svg?branch=master)](https://travis-ci.org/JorjMcKie/py-mupdf)
88

@@ -19,9 +19,9 @@ PyMuPDF (current version 1.18.11) is a Python binding with support for [MuPDF](h
1919

2020
MuPDF can access files in PDF, XPS, OpenXPS, CBZ, EPUB and FB2 (e-books) formats, and it is known for its top performance and high rendering quality.
2121

22-
With PyMuPDF you can access files with extensions like .pdf”, “.xps”, “.oxps”, “.cbz”, “.fb2 or .epub. In addition, about 10 popular image formats can also be opened and handled like documents: ".png", ".jpg", ".bmp", ".tiff", etc..
22+
With PyMuPDF you can access files with extensions like ".pdf", ".xps", ".oxps", ".cbz", ".fb2" or ".epub". In addition, about 10 popular image formats can also be handled like documents: ".png", ".jpg", ".bmp", ".tiff", etc..
2323

24-
> In partnership with [Artifex](https://artifex.com/), PyMuPDF is now also available for commercial licensing. This agreement has no impact on use cases, that are compliant with the open-source license AGPL. Please see the License and Copyright section below for additional information.
24+
> In partnership with [Artifex](https://artifex.com/), PyMuPDF is now also available for commercial licensing. This agreement has no impact on use cases, that are compliant with the open-source license AGPL. Please see the "License and Copyright" section below for additional information.
2525
2626
# Usage and Documentation
2727
For all supported document types (i.e. **_including images_**) you can
@@ -79,7 +79,7 @@ Before you can do that, you must first build MuPDF. For most platforms, the MuPD
7979
- Now MuPDF can be generated.
8080

8181
* Please note that you will need the interface generator [SWIG](http://www.swig.org/) when building PyMuPDF from the sources of this repository (please refer to issue #312 for some background on this).
82-
- PyMuPDF wheels are being generated using **SWIG v4.0.1**.
82+
- PyMuPDF wheels are being generated using **SWIG v4.0.2**.
8383

8484
* If you do **not use SWIG**, please download the **sources from PyPI** - they contain sources pre-processed by SWIG, so installation should work like any other Python extension generation on your system.
8585

docs/app2.rst

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -135,12 +135,12 @@ To address the font issue, you can use a simple utility script to scan through t
135135
testn = font_sans # use Helvetica
136136
elif test.endswith(",monospace"): # monospaced font?
137137
testn = font_mono # becomes Courier
138-
138+
139139
if testn != "": # any of the above found?
140140
otext = otext.replace(test, testn) # change the source
141141
found_one = True
142142
pos1 = 0 # start over
143-
143+
144144
if found_one:
145145
ofile = open(filename + ".html", "w")
146146
ofile.write(otext)
@@ -217,7 +217,7 @@ XML
217217
~~~
218218

219219
The :meth:`TextPage.extractXML` (or *Page.get_text("xml")*) version extracts text (no images) with the detail level of RAWDICT::
220-
220+
221221
>>> for line in page.get_text("xml").splitlines():
222222
print(line)
223223

@@ -261,17 +261,19 @@ Text Extraction Flags Defaults
261261
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
262262
*(New in version 1.16.2)* Method :meth:`Page.get_text` supports a keyword parameter *flags* *(int)* to control the amount and the quality of extracted data. The following table shows the defaults settings (flags parameter omitted or None) for each extraction variant. If you specify flags with a value other than *None*, be aware that you must set **all desired** options. A description of the respective bit settings can be found in :ref:`TextPreserve`.
263263

264-
=================== ==== ==== ===== === ==== ======= ===== ======
265-
Indicator text html xhtml xml dict rawdict words blocks
266-
=================== ==== ==== ===== === ==== ======= ===== ======
267-
preserve ligatures 1 1 1 1 1 1 1 1
268-
preserve whitespace 1 1 1 1 1 1 1 1
269-
preserve images n/a 1 1 n/a 1 1 n/a 0
270-
inhibit spaces 0 0 0 0 0 0 0 0
271-
dehyphenate 0 0 0 0 0 0 0 0
272-
=================== ==== ==== ===== === ==== ======= ===== ======
273-
264+
=================== ==== ==== ===== === ==== ======= ===== ====== ======
265+
Indicator text html xhtml xml dict rawdict words blocks search
266+
=================== ==== ==== ===== === ==== ======= ===== ====== ======
267+
preserve ligatures 1 1 1 1 1 1 1 1 0
268+
preserve whitespace 1 1 1 1 1 1 1 1 1
269+
preserve images n/a 1 1 n/a 1 1 n/a 0 0
270+
inhibit spaces 0 0 0 0 0 0 0 0 0
271+
dehyphenate 0 0 0 0 0 0 0 0 1
272+
=================== ==== ==== ===== === ==== ======= ===== ====== ======
273+
274+
* **search** refers to the text search function.
274275
* **"json"** is handled exactly like **"dict"** and is hence left out.
276+
* **"rawjson"** is handled exactly like **"rawdict"** and is hence left out.
275277
* An "n/a" specification means a value of 0 and setting this bit never has any effect on the output (but an adverse effect on performance).
276278
* If you are not interested in images when using an output variant which includes them by default, then by all means set the respective bit off: You will experience a better performance and much lower space requirements.
277279

@@ -291,7 +293,7 @@ To show the effect of *TEXT_INHIBIT_SPACES* have a look at this example::
291293
in English
292294
... let's see
293295
what happens.
294-
>>>
296+
>>>
295297

296298

297299
Performance

docs/app4.rst

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,60 @@
33
================================================
44
Appendix 4: Assorted Technical Information
55
================================================
6+
This section deals with various technical topics, that are not necessarily related to each other.
7+
8+
------------
9+
10+
.. _ImageTransformation:
11+
12+
Image Transformation Matrix
13+
----------------------------
14+
Starting with version 1.18.11, the image transformation matrix is returned by some methods for text and image extraction: :meth:`Page.get_text` and :meth:`Page.get_image_bbox`.
15+
16+
The transformation matrix contains information about how an image was transformed to fit into the rectangle (its "boundary box" = "bbox") on some document page. By inspecting the image's bbox on the page and this matrix, one can determine for example, whether and how the image is displayed scaled or rotated on a page.
17+
18+
The relationship between image width and height and the bbox on a page is the following:
19+
20+
1. Using the original image's width and height, we can define the image rectangle ``imgrect = fitz.Rect(0, 0, width, height)`` and a "shrink matrix" ``shrink = fitz.Matrix(1/width, 0, 0, 1/height, 0, 0)``.
21+
2. Transforming the image rectangle with its shrink matrix, will result in the unit rectangle: ``imgrect * shrink = fitz.Rect(0, 0, 1, 1)``.
22+
3. Using the image **transformation matrix** "transform", the following steps will compute the bbox::
23+
24+
imgrect = fitz.Rect(0, 0, width, height)
25+
shrink = fitz.Matrix(1/width, 0, 0, 1/height, 0, 0)
26+
bbox = imgrect * shrink * transform
27+
28+
4. Inspecting the matrix product ``shrink * transform`` will reveal all information about what happened to the image rectangle to make it fit into the bbox on the page: rotation, scaling of its sides and translation of its origin. Let us look at an example:
29+
30+
>>> imginfo = page.get_images()[0] # get an image item on a page
31+
>>> imginfo
32+
(5, 0, 439, 501, 8, 'DeviceRGB', '', 'fzImg0', 'DCTDecode')
33+
>>> #------------------------------------------------
34+
>>> # define image shrink matrix and rectangle
35+
>>> #------------------------------------------------
36+
>>> shrink = fitz.Matrix(1 / 439, 0, 0, 1 / 501, 0, 0)
37+
>>> imgrect = fitz.Rect(0, 0, 439, 501)
38+
>>> #------------------------------------------------
39+
>>> # determine image bbox and transformation matrix:
40+
>>> #------------------------------------------------
41+
>>> bbox, transform = page.get_image_bbox("fzImg0", transform=True)
42+
>>> #------------------------------------------------
43+
>>> # confirm equality - permitting rounding errors
44+
>>> #------------------------------------------------
45+
>>> bbox
46+
Rect(100.0, 112.37525939941406, 300.0, 287.624755859375)
47+
>>> imgrect * shrink * transform
48+
Rect(100.0, 112.375244140625, 300.0, 287.6247253417969)
49+
>>> #------------------------------------------------
50+
>>> shrink * transform
51+
Matrix(0.0, -0.39920157194137573, 0.3992016017436981, 0.0, 100.0, 287.6247253417969)
52+
>>> #------------------------------------------------
53+
>>> # the above shows:
54+
>>> # image sides scaled by same factor 0.4
55+
>>> # image rotated by 90 degrees anti-clockwise
56+
>>> #------------------------------------------------
57+
58+
59+
------------
660

761
.. _Base-14-Fonts:
862

docs/changes.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,17 @@
11
Change Logs
22
===============
33

4+
Changes in Version 1.18.11
5+
---------------------------
6+
* **Fixed** issue `#972 <https://github.com/pymupdf/PyMuPDF/issues/972>`_. Improved layout of source distribution material.
7+
* **Fixed** issue `#962 <https://github.com/pymupdf/PyMuPDF/issues/962>`_. Stabilized Linux distribution detection for generating PyMuPDF from sources.
8+
* **Added:** :meth:`Page.get_xobjects` delivers the result of :meth:`Document.get_page_xobjects`.
9+
* **Added:** :meth:`Page.get_image_info` delivers meta information for all images shown on the page.
10+
* **Added:** :meth:`Tools.mupdf_display_warnings` allows setting on / off the display of MuPDF-generated warnings. The default is off.
11+
* **Added:** :meth:`Document.ez_save` convenience alias of :meth:`Document.save` with some different defaults.
12+
* **Changed:** Image extractions of document pages now also contain the image's **transformation matrix**. This concerns :meth:`Page.get_image_bbox` and the DICT, JSON, RAWDICT, and RAWJSON variants of :meth:`Page.get_text`.
13+
14+
415
Changes in Version 1.18.10
516
---------------------------
617
* **Fixed** issue `#941 <https://github.com/pymupdf/PyMuPDF/issues/941>`_. Added old aliases for :meth:`DisplayList.get_pixmap` and :meth:`DisplayList.get_textpage`.

docs/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@
4242
# built documents.
4343
#
4444
# The full version, including alpha/beta/rc tags.
45-
release = "1.18.10"
45+
release = "1.18.11"
4646

4747
# The short X.Y version
4848
version = release

docs/document.rst

Lines changed: 28 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ For details on **embedded files** refer to Appendix 3.
4343
:meth:`Document.embfile_info` PDF only: metadata of an embedded file
4444
:meth:`Document.embfile_names` PDF only: list of embedded files
4545
:meth:`Document.embfile_upd` PDF only: change an embedded file
46+
:meth:`Document.ez_save` PDF only: :meth:`Document.save` with different defaults
4647
:meth:`Document.find_bookmark` retrieve page location after layouting
4748
:meth:`Document.fullcopy_page` PDF only: duplicate a page
4849
:meth:`Document.get_oc_states` PDF only: lists of OCGs in ON, OFF, RBGroups
@@ -706,7 +707,7 @@ For details on **embedded files** refer to Appendix 3.
706707

707708
PDF only: Return the PDF dictionary keys of the object provided by its xref number.
708709

709-
:arg int xref: the :data:`xref`. *(Changed in v1.18.10)* Use ``-1`` if you want to access the special dictionary "PDF trailer" (it has no identifying xref).
710+
:arg int xref: the :data:`xref`. *(Changed in v1.18.10)* Use ``-1`` to access the special dictionary "PDF trailer" (it has no identifying xref).
710711

711712
:returns: a tuple of dictionary keys present in object :data:`xref`. Examples:
712713

@@ -727,7 +728,7 @@ For details on **embedded files** refer to Appendix 3.
727728

728729
PDF only: Return type and value of a PDF dictionary key of an xref.
729730

730-
:arg int xref: the :data:`xref`. *(Changed in v1.18.10)* Use ``-1`` if you want to access the special dictionary "PDF trailer" (it has no identifying xref).
731+
:arg int xref: the :data:`xref`. *Changed in v1.18.10:* Use ``-1`` to access the special dictionary "PDF trailer" (it has no identifying xref).
731732
:arg str key: the desired PDF key. Must **exactly** match (case-sensitive) one of the keys contained in :meth:`Document.xref_get_keys`.
732733

733734
:returns: a tuple (type, value), where type is one of "xref", "array", "dict", "int", "float" "null", "bool", "float", "name", "string" or "unknown" (should not occur). Independent of "type", the value of the key is **always** formatted as a string -- see the following example -- and a faithful reflection of what is stored in the PDF. An argument like the return value can be used to modify the value of a key of :data:`xref`.
@@ -739,7 +740,7 @@ For details on **embedded files** refer to Appendix 3.
739740
Resources = ('xref', '1296 0 R')
740741
MediaBox = ('array', '[0 0 612 792]')
741742
Parent = ('xref', '1301 0 R')
742-
>>> # no the same thing for the PDF trailer:
743+
>>> # same thing for the PDF trailer:
743744
>>> for key in doc.xref_get_keys(-1):
744745
print(key, "=", doc.xref_get_key(-1, key))
745746
Type = ('name', '/XRef')
@@ -790,17 +791,19 @@ For details on **embedded files** refer to Appendix 3.
790791

791792
.. method:: get_page_xobjects(pno)
792793

794+
*(Changed in v1.18.11)*
795+
793796
PDF only: *(New in v1.16.13)* Return a list of all XObjects referenced by a page.
794797

795798
:arg int pno: page number, 0-based, *-inf < pno < page_count*.
796799

797800
:rtype: list
798-
:returns: a list of (non-image) XObjects. These objects typically represent pages *embedded* (not copied) from other PDFs. For example, :meth:`Page.show_pdf_page` will create this type of object. An item of this list has the following layout: **(xref, name, invoker, bbox)**, where
801+
:returns: a list of (non-image) XObjects. These objects typically represent pages *embedded* (not copied) from other PDFs. For example, :meth:`Page.show_pdf_page` will create this type of object. An item of this list has the following layout: ``(xref, name, invoker, bbox)``, where
799802

800-
* **xref** (*int*) is the XObject's :data:`xref`
801-
* **name** (*str*) is the symbolic name to reference the XObject
802-
* **invoker** (*int*) the :data:`xref` of the invoking XObject or zero if the page directly invokes it
803-
* **bbox** (*tuple*) the boundary box of the XObject's location on the page **in untransformed coordinates**. To get actual, non-rotated page coordinates, multiply with the page's transformation matrix :attr:`Page.transformation_matrix`.
803+
* **xref** (*int*) is the XObject's :data:`xref`.
804+
* **name** (*str*) is the symbolic name to reference the XObject.
805+
* **invoker** (*int*) the :data:`xref` of the invoking XObject or zero if the page directly invokes it.
806+
* **bbox** (:ref:`Rect`) the boundary box of the XObject's location on the page **in untransformed coordinates**. To get actual, non-rotated page coordinates, multiply with the page's transformation matrix :attr:`Page.transformation_matrix`. *Changed in v.18.11:* the bbox is now formatted as :ref:`Rect`.
804807

805808

806809
.. method:: get_page_images(pno, full=False)
@@ -1095,11 +1098,19 @@ For details on **embedded files** refer to Appendix 3.
10951098

10961099
:arg str user_pw: *(new in version 1.16.0)* set the document's user password.
10971100

1101+
.. method:: ez_save(*args, **kwargs)
1102+
1103+
*(New in v1.18.11)*
1104+
1105+
PDF only: The same as :meth:`Document.save` but with the changed defaults `deflate=True, garbage=3`.
1106+
10981107
.. method:: saveIncr()
10991108

11001109
PDF only: saves the document incrementally. This is a convenience abbreviation for *doc.save(doc.name, incremental=True, encryption=PDF_ENCRYPT_KEEP)*.
11011110

11021111

1112+
.. method:: ez_save()
1113+
11031114
.. method:: tobytes(garbage=0, clean=False, deflate=False, deflate_images=False, deflate_fonts=False, ascii=False, expand=0, linear=False, pretty=False, encryption=PDF_ENCRYPT_NONE, permissions=-1, owner_pw=None, user_pw=None)
11041115

11051116
*(Changed in v1.18.7)*
@@ -1397,10 +1408,17 @@ For details on **embedded files** refer to Appendix 3.
13971408

13981409
.. method:: xref_object(xref, compressed=False, ascii=False)
13991410

1400-
*(New in version 1.16.8)*
1411+
*(New in version 1.16.8, changed in v1.18.10)*
14011412

14021413
PDF only: Return the definition source of a PDF object.
14031414

1415+
:arg int xref: the object's :data`xref`. *Changed in v1.18.10:* A value of -1 returns the PDF trailer source.
1416+
:arg bool compressed: whether to generate a compact output with no line breaks or spaces.
1417+
:arg bool: ascii: whether to ASCII-encode binary data.
1418+
1419+
:rtype: str
1420+
:returns: The object definition source.
1421+
14041422
.. method:: pdf_catalog()
14051423

14061424
*(New in version 1.16.8)*
@@ -1412,7 +1430,7 @@ For details on **embedded files** refer to Appendix 3.
14121430

14131431
*(New in version 1.16.8)*
14141432

1415-
PDF only: Return the trailer source of the PDF (UTF-8), which is usually located at the PDF file's end. This is similar to :meth:`Document.xref_object` except that this object has no identifier to access it.
1433+
PDF only: Return the trailer source of the PDF, which is usually located at the PDF file's end. This is :meth:`Document.xref_object` with an *xref* argument of -1.
14161434

14171435

14181436
.. method:: xref_xml_metadata()

docs/images/img-line-dir.png

28 KB
Loading

docs/images/img-textpage.png

-36 KB
Loading

0 commit comments

Comments
 (0)