PyMuPDF is a Python binding with support for MuPDF

Last update: Jan 03, 2023

Overview

PyMuPDF 1.18.14

Release date: June 1, 2021

Travis-CI:

On PyPI since August 2016:

Authors

Introduction

PyMuPDF (current version 1.18.14) is a Python binding with support for MuPDF (current version 1.18.*), a lightweight PDF, XPS, and E-book viewer, renderer, and toolkit, which is maintained and developed by Artifex Software, Inc.

MuPDF can access files in PDF, XPS, OpenXPS, CBZ, EPUB and FB2 (e-books) formats, and it is known for its top performance and high rendering quality.

With PyMuPDF you can access files with extensions like ".pdf", ".xps", ".oxps", ".cbz", ".fb2" or ".epub". In addition, about 10 popular image formats can also be handled like documents: ".png", ".jpg", ".bmp", ".tiff", etc..

In partnership with Artifex, PyMuPDF is now also available for commercial licensing. This agreement has no impact on use cases, that are compliant with the open-source license AGPL. Please see the "License and Copyright" section below for additional information.

Usage and Documentation

For all supported document types (i.e. including images) you can

decrypt the document
access meta information, links and bookmarks
render pages in raster formats (PNG and some others), or the vector format SVG
search for text
extract text and images
convert to other formats: PDF, (X)HTML, XML, JSON, text

To some degree, PyMuPDF can therefore be used as an image converter: it can read a range of input formats and can produce Portable Network Graphics (PNG), Portable Anymaps (PNM, etc.), Portable Arbitrary Maps (PAM), Adobe Postscript and Adobe Photoshop documents, making the use of other graphics packages obselete in these cases. But interfacing with e.g. PIL/Pillow for image input and output is easy as well.

For PDF documents, there exists a plethorea of additional features: they can be created, joined or split up. Pages can be inserted, deleted, re-arranged or modified in many ways (including annotations and form fields).

Images and fonts can be extracted or inserted.

You may want to have a look at this cool GUI example script, which lets you insert, delete, replace or re-position images under your visual control.

Since v1.18.8 there is a new experimental Document method subset_fonts(), which automatically builds subsets based on the usage of all eligible fonts in the document. Especially for new documents, this can lead to significant file size reductions. The method was developed in cooperation with our user @cuteufo - again thanks a lot for the contribution.
Embedded files are fully supported.
PDFs can be reformatted to support double-sided printing, posterizing, applying logos or watermarks
Password protection is fully supported: decryption, encryption, encryption method selection, permmission level and user / owner password setting.
Support of the PDF Optional Content concept for images, text and drawings.
Low-level PDF structures can be accessed and modified.
PyMuPDF can also be used as a module in the command line using "python -m fitz ...". This is a versatile utility, which we will further develop going forward. It currently supports PDF document
- encryption / decryption / optimization
- creating sub-documents
- document joining
- image / font extraction
- full support of embedded files.

Have a look at the basic demos, the examples (which contain complete, working programs), and the recipes section of our Wiki sidebar, which contains more than a dozen of guides in How-To-style.

Our documentation, written using Sphinx, is available in various formats from the following sources. It currently is a combination of a reference guide and a user manual. For a quick start look at the tutorial and the recipes chapters.

You can view it online at Read the Docs. This site also provides download options for PDF.
The search function on Read the Docs does not work for me currently. If you want a working searchable local version, please download a zipped HTML for here.
Find a Windows help file here.

Installation

For Windows, Linux and Mac OSX platforms, there are wheels in the download section of PyPI. This includes Python 64bit versions 3.6 through 3.9. For Windows only, 32bit versions are available too. Since version 1.18.14 there also exist wheels for the Linux ARM architecture - look for platform tag manylinux2014_aarch64.

If your platform is not supported with one of our wheels, you need to generate PyMuPDF yourself as follows. This requires the development version of Python.

Before you can do that, you must first build MuPDF. For most platforms, the MuPDF sources contain prepared procedures for achieving this. Please observe the following general steps:

Be sure to download the official MuPDF source release from here. Do not use MuPDF's GitHub repo. It contains their development source for future versions.
This repo's fitz folder contains one or more files whose names start with a single underscore "_". These files contain configuration data and potentially other fixes. Copy-rename each of them to their correct target location within the downloaded MuPDF source. Currently, these files are:
- Optional: fitz configuration file _config.h copy-replace to: mupdf/include/mupdf/fitz/config.h. It contains configuration data like e.g. which fonts to support. If omitting this change, the binary extension module will be over 30 MB (compared to around 11 MB). Does not impact functionality.
- Now MuPDF can be generated.
Please note that you will need the interface generator SWIG when building PyMuPDF from the sources of this repository (please refer to issue #312 for some background on this).
- PyMuPDF wheels are being generated using SWIG v4.0.2.
If you do not use SWIG, please download the sources from PyPI - they contain sources pre-processed by SWIG, so installation should work like any other Python extension generation on your system.

Once this is done, adjust directories in setup.py and run python setup.py install.

The following sections contain further comments for some platforms.

Ubuntu

Our users (thanks to @gileadslostson and @jbarlow83!) have documented their MuPDF installation experiences from sources in this Wiki page.

OSX

First, install the MuPDF headers and libraries, which are provided by mupdf-tools: brew install mupdf-tools.

Then you might need to export ARCHFLAGS='-arch x86_64', since libmupdf.a is for x86_64 only.

Finally, please double check setup.py before building. Update include_dirs and library_dirs if necessary.

MS Windows

If you are looking to make your own binary, consult this Wiki page. It explains how to use Visual Studio for generating MuPDF in quite some detail.

Earlier Versions

Earlier versions are available in the releases directory.

License and Copyright

In order to comply with MuPDF’s dual licensing model, PyMuPDF has entered into an agreement with Artifex who has the right to sublicense PyMuPDF to third parties.

PyMuPDF and MuPDF are now available under both, open-source AGPL and commercial license agreements.

Please read the full text of the AGPL license agreement (which is also included here in file COPYING) to ensure that your use case complies with the guidelines of this license. If you determine you cannot meet the requirements of the AGPL, please contact Artifex for more information regarding a commercial license.

Artifex is the exclusive commercial licensing agent for MuPDF.

Contact

Please use the Discussions menu for questions, comments, or asking others for help, and submit issues here. If you wish, you can also contact me directly via [email protected].

Comments

Wrong Handling of Reference Count of "None" Object

I'm iterating all xrefs found in the pdf to determine their "content":

document = fitz.Document(fileName)
nonImageXrefs = []
imageXrefs = []

allXrefsLength = document.xref_length()
for xref in range(1, allXrefsLength):
    if document.xref_get_key(xref, "Subtype")[1] == "/Image":
        if document.extract_image(xref):
            imageXrefs.append(xref)
    else:
        rawData = document.xref_stream_raw(xref)
        if rawData is None or len(rawData) == 0:
            print("xref {0} is neither image nor deflatable stream".format(xref))
        else:
            nonImageXrefs.append(xref)

And when there are lot's of such actions I'm getting following error:

Fatal Python error: none_dealloc: deallocating None
Python runtime state: initialized

Current thread 0x00002b44 (most recent call first):
  File "C:\Program Files\Python\lib\pdfUtils.py", line 592 in optimizeWithPyMuPdf
  File "C:\Users\Alex\PycharmProjects\pdfOptimizer\pdf_opt.py", line 8 in <module>

Extension modules: fitz._fitz, zopfli.zopfli, PIL._imaging (total: 3)

Process finished with exit code -1073740791 (0xC0000409)

Line 592 is rawData = document.xref_stream_raw(xref)

This happens in random place of xrefs list, but usual counter is between 11000-13000

I'm using Windows 10, python 3.10 x64, pyMuPDF 1.21.1 installed by pip.

Attached sample file, but as far as I can see it is not caused by some specific file. eos6d-mk2-im2-en1.pdf

bug Fixed in next release

opened by AlexMatiash 2

Replace image throws an error
Please provide all mandatory information!

Describe the bug (mandatory)

Using the replace_image method on the Page object fails with an error for a missing method on the Document object.

To Reproduce (mandatory)

>>> fitz_doc = fitz.open("/Users/ashah/GoogleDrive/YearbookCreatorInput/Test_School.pdf") >>> page6 = fitz_doc.load_page(7) >>> page6.get_images() [(112, 0, 1985, 1600, 8, 'ICCBased', '', 'Im55', 'DCTDecode'), (113, 0, 1800, 1200, 8, 'ICCBased', '', 'Im56', 'DCTDecode'), (114, 0, 2100, 1402, 8, 'ICCBased', '', 'Im57', 'DCTDecode'), (115, 0, 808, 1436, 8, 'ICCBased', '', 'Im58', 'DCTDecode'), (90, 0, 1800, 1200, 8, 'ICCBased', '', 'Im48', 'DCTDecode'), (95, 0, 1200, 1800, 8, 'ICCBased', '', 'Im53', 'DCTDecode'), (117, 121, 1767, 1144, 8, 'ICCBased', '', 'Im59', 'FlateDecode'), (92, 0, 1200, 1800, 8, 'ICCBased', '', 'Im50', 'DCTDecode'), (118, 122, 1365, 1365, 8, 'ICCBased', '', 'Im60', 'FlateDecode'), (119, 123, 924, 1159, 8, 'ICCBased', '', 'Im61', 'FlateDecode')] >>> page6.replace_image(95, filename='/Users/ashah/GoogleDrive/Test_School/blank.png') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.10/site-packages/fitz/utils.py", line 255, in replace_image if not doc.is_image(xref): AttributeError: 'Document' object has no attribute 'is_image'

For problems when building or installing PyMuPDF, give the full output of the build/install command so that, for example, all pip/compiler/linker errors/warnings can be seen.

Expected behavior (optional)

Describe what you expected to happen (if not obvious).

Screenshots (optional)

If applicable, add screenshots to help explain your problem.

Your configuration (mandatory)

Operating system, potentially version and bitness

Python version, bitness

PyMuPDF version, installation method (wheel or generated from source).

print(sys.version, "\n", sys.platform, "\n", fitz.doc) 3.10.6 (main, Aug 11 2022, 13:49:25) [Clang 13.1.6 (clang-1316.0.21.2.5)] darwin

PyMuPDF 1.21.1: Python bindings for the MuPDF 1.21.1 library. Version date: 2022-12-13 00:00:01. Built for Python 3.10 on darwin (64-bit).

For example, the output of print(sys.version, "\n", sys.platform, "\n", fitz.__doc__) would be sufficient (for the first two bullets).

Additional context (optional)

Add any other context about the problem here.
bug Fixed in next release
opened by foranuj 1

Failed to read JPX header when trying to get blocks

Describe the bug (mandatory)

When I'm trying to get blocks from some pdfs, the following error occurs: RuntimeError: Failed to read JPX header. The same error occurs when I'm trying to get the pixmap with the function get_pixmap.

It works if I use page.gettext() without block or dict parameter.

PDFs with this error have the following attributes:

Producer: GPL Ghostscript 9.23
PDF Version: 1.5

If I edit the PDF file with any online tool, for example https://www.sejda.com/pdf-editor, the attributes change and the error disappears.

To Reproduce (mandatory)

PDF file - test_get_blocks.pdf

import fitz

with fitz.open("test_get_blocks.pdf") as doc:
    for page in doc:
        print(page.get_text("blocks"))

Traceback

Traceback (most recent call last):
  File "/home/johni/Projects/pdf-to-txt/main.py", line 5, in <module>
    print(page.get_text("dict"))
  File "/home/johni/.pyenv/versions/3.9.15/lib/python3.9/site-packages/fitz/utils.py", line 808, in get_text
    tp = page.get_textpage(clip=clip, flags=flags)
  File "/home/johni/.pyenv/versions/3.9.15/lib/python3.9/site-packages/fitz/fitz.py", line 5675, in get_textpage
    textpage = self._get_textpage(clip, flags=flags, matrix=matrix)
  File "/home/johni/.pyenv/versions/3.9.15/lib/python3.9/site-packages/fitz/fitz.py", line 5661, in _get_textpage
    val = _fitz.Page__get_textpage(self, clip, flags, matrix)
RuntimeError: Failed to read JPX header

Notebook to reproduce the error

Your configuration (mandatory)

Operating system Ubuntu 22.04.1 LTS
Python version 3.9.15
PyMuPDF version 1.21.1

upstream bug

opened by johnidm 4

1.21.1: test_color_count fails

Please provide all mandatory information!

Describe the bug (mandatory)

test_color_count fails

To Reproduce (mandatory)

  export PYMUPDF_SETUP_MUPDF_BUILD=""
  python -m build --wheel --no-isolation

  local _site_packages=$(python -c "import site; print(site.getsitepackages()[0])")
  local _test_dir="test_dir"

  cd $_name-$pkgver
  mkdir -vp $_test_dir
  # install to test dir for testing
  python -m installer --destdir="$_test_dir" dist/*.whl

  export PYTHONPATH="$_test_dir/$_site_packages:$PYTHONPATH"
  # disable broken test: https://github.com/pymupdf/PyMuPDF/issues/2040
  pytest -vv -c /dev/null tests/ -k 'not test_textbox3'

=================================== FAILURES ===================================
_______________________________ test_color_count _______________________________

    def test_color_count():
        pm = fitz.Pixmap(imgfile)
>       assert pm.color_count() == 40624
E       assert 39912 == 40624
E        +  where 39912 = <bound method Pixmap.color_count of Pixmap(DeviceRGB, IRect(0, 0, 439, 501), 0)>()
E        +    where <bound method Pixmap.color_count of Pixmap(DeviceRGB, IRect(0, 0, 439, 501), 0)> = Pixmap(DeviceRGB, IRect(0, 0, 439, 501), 0).color_count

tests/test_pixmap.py:94: AssertionError
=============================== warnings summary ===============================
../../../../usr/lib/python3.10/site-packages/_pytest/cacheprovider.py:433
  /usr/lib/python3.10/site-packages/_pytest/cacheprovider.py:433: PytestCacheWarning: could not create cache path /dev/.pytest_cache/v/cache/nodeids
    config.cache.set("cache/nodeids", sorted(self.cached_nodeids))

../../../../usr/lib/python3.10/site-packages/_pytest/cacheprovider.py:387
  /usr/lib/python3.10/site-packages/_pytest/cacheprovider.py:387: PytestCacheWarning: could not create cache path /dev/.pytest_cache/v/cache/lastfailed
    config.cache.set("cache/lastfailed", self.lastfailed)

../../../../usr/lib/python3.10/site-packages/_pytest/stepwise.py:52
  /usr/lib/python3.10/site-packages/_pytest/stepwise.py:52: PytestCacheWarning: could not create cache path /dev/.pytest_cache/v/cache/stepwise
    session.config.cache.set(STEPWISE_CACHE_DIR, [])

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED ../../../../dev/test_pixmap.py::test_color_count - assert 39912 == 40624
====== 1 failed, 95 passed, 1 skipped, 1 deselected, 3 warnings in 1.65s =======

python-pymupdf-1.21.1-1-x86_64-build.log python-pymupdf-1.21.1-1-x86_64-check.log

Expected behavior (optional)

All tests pass.

Screenshots (optional)

n/a

Your configuration (mandatory)

Arch Linux
Python 3.10.8
PyMuPDF 1.21.1 from tarball

Additional context (optional)

n/a

opened by dvzrv 2

Redaction removing more text than expected
Describe the bug (mandatory)

When applying a redaction on a document, the following word is removed as well.

To Reproduce (mandatory)

Example PDF file: test_doc.pdf

Run this script:

import fitz doc = fitz.open("test_doc.pdf") page = doc[0] areas = page.search_for("{sig}") rect = areas[0] page.add_redact_annot(rect) page.apply_redactions() doc.saveIncr() doc.close()

The searched word "{sig}" is removed (as expected). The word "Vertrag" on the top right is removed as well (unexpected).

Expected behavior (optional)

Searched string should be removed. No other change should be made.

Screenshots (optional)

Before script: After script:

Your configuration (mandatory)

OS independant, happening on Windows 11 as well as Debian 11

Python Python 3.10.8

PyMuPDF 1.21.0, installed via pip

Thank you!
upstream bug
opened by seb-bau 3
Image in pdf changes color after applying redactions
Description

Image in a PDF file changes color after applying redactions.

To Reproduce

Execute the following python script to reproduce the issue. The script uses this pdf file image_issue.pdf .

import os import fitz script_path = os.path.abspath(__file__) script_folder = os.path.dirname(script_path) doc = fitz.open(os.path.join(script_folder, 'image_issue.pdf')) page = doc.load_page(0) rx=135.123 ry=123.56878 rw=69.8409 rh=9.46397 x0 = rx y0 = ry x1 = rx + rw y1 = ry + rh rect = fitz.Rect(x0, y0, x1, y1) font = fitz.Font("Helvetica") fill_color=(0,0,0) page.add_redact_annot( quad=rect, #text="null", fontname=font.name, fontsize=12, align=fitz.TEXT_ALIGN_CENTER, fill=fill_color, text_color=(1,1,1), ) page.apply_redactions() doc.save(os.path.join(script_folder, 'image_issue_redacted.pdf'))

Note that I am using the default images=2 (blank out overlapping image parts) when calling apply_redactions(). Using images= 0 (ignore) or images=1(remove complete overlapping image) are not desirable for my use case.

Expected behavior

The color of the image in the pdf file should not change after applying redactions.

Screenshots

Here's a screenshot of the problem.

Your configuration

Operating system Ubuntu 22.04.1 LTS

Python version 3.8.14

PyMuPDF version 1.20.2

upstream bug
opened by ot-ksrinivasan 7

Releases(1.21.1)

1.21.1(Dec 13, 2022)
PyMuPDF-1.21.1 has been released.

Wheels for Windows, Linux and MacOS, and the sdist, are available on pypi.org and can be installed in the usual way, for example:

python -m pip install --upgrade pymupdf

Changes in Version 1.21.1 (2022-12-13)

This release uses MuPDF-1.21.1.

Bug fixes:

Fixed #2110: Fully embedded font is extracted only partially if it occupies more than one object

Fixed #2094: Rectangle Detection Logic

Fixed #2088: Destination point not set for named links in toc

Fixed #2087: Image with Filter "[/FlateDecode/JPXDecode]" not extracted

Fixed #2086: Document.save() owner_pw & user_pw has buffer overflow bug

Fixed #2076: Segfault in fitz.py

Fixed #2051: Missing DPI Parameter

Fixed #2048: Invalid size of TextPage and bbox with newest version 1.21.0

Fixed #2045: SystemError: returned a result with an error set

Fixed #2039: 1.21.0 fails to build against system libmupdf

Fixed #2036: Archive::Archive defined twice

Other

Swallow "&zoom=nan" in link uri strings.

Add new Page utility methods Page.replace_image() and Page.delete_image().

Documentation:

#2040: Added note about test failure with non-default build of MuPDF, to tests/README.md.

#2037: In docs/installation.rst, mention incompatibility with chocolatey.org on Windows.

#2061: Fixed description of Annot.file_info.

#2065: Show how to insert internal PDF link.

Improved description of building from source without an sdist.

Added information about running tests.

#2084: Fixed broken link to PyMuPDF-Utilities.

Source code(tar.gz)
Source code(zip)
1.21.0rc2(Nov 7, 2022)
This is largely unchanged from 1.21.0rc1, except that it builds with the official MuPDF-1.21.0 release.

Install with: python -m pip install pymupdf==1.21.0rc2

Uses mupdf-1.21.0.

New Story support.

Added wheels for Python-3.11.

Docs: https://pymupdf.readthedocs.io/en/1.21.0rc2/

Changelog: https://pymupdf.readthedocs.io/en/1.21.0rc2/changes.html

Source code(tar.gz)
Source code(zip)
1.21.0rc1(Nov 1, 2022)
Install with: python -m pip install pymupdf==1.21.0rc1

Uses mupdf-1.21.0-rc1.

New Story support.

Added wheels for Python-3.11.

Docs: https://pymupdf.readthedocs.io/en/1.21.0rc1/

Changelog: https://pymupdf.readthedocs.io/en/1.21.0rc1/changes.html

Source code(tar.gz)
Source code(zip)
1.20.2(Aug 13, 2022)
Built with MuPDF-1.20.3.

Fix #1787.

Fix #1824.

Improvements to documentation:

Moved old docs/faq.rst into separate docs/recipes-* files.

Improved information about building from source in docs/installation.rst.

Clarified memory allocation setting JM_MEMORY in docs/tools.rst.

Fixed link to PDF Reference manual in docs/app3.rst.

Fixed building of html documentation on OpenBSD.

Wheels for Windows, Linux and MacOS, and the sdist, are available on pypi.org and can be installed in the usual way, for example:

pip install --upgrade pymupdf
Source code(tar.gz)
Source code(zip)
1.20.1(Jun 27, 2022)
Fix https://github.com/pymupdf/PyMuPDF/pull/1724.

Fix https://github.com/pymupdf/PyMuPDF/issues/1771.

Fix https://github.com/pymupdf/PyMuPDF/issues/1751.

Fix https://github.com/pymupdf/PyMuPDF/issues/1645.

Improvements to sphinx-generated documentation.

Source code(tar.gz)
Source code(zip)
1.20.0(Jun 27, 2022)
This release integrates the recently-released MuPDF-1.20.0, and has fixes for #1733 and #1738. The latter also contains an additional fix for occasional SEGVs when freeing documents.

Building from source works slightly differently from before:

We now automatically download the required MuPDF source and build it into PyMuPDF.

Python sdists (source distributions) already contain the required MuPDF source and build without downloading.

One can override the default build behaviour by setting environmental variables, for example to build with a system-installed mupdf. See the doc-comment at the start of setup.py for details.

Source code(tar.gz)
Source code(zip)
1.19.6(Mar 5, 2022)
Fixes: #1620, #1601

Enhancements:

new method Page.load_widget() to load a widget from its xref

new dictionary pdfcolor which contains 500 predefined PDF colors

Quad class supports operator algebra

text search and extraction default flags now accessible as predefined constants

iterators Page.annots() and Page.widgets() now prohibit reloading the page within their scope

removed multiple utility functions from the Tools class and redefined them as standalone

Parameter new in Document.update_stream() is now obsolete.

Source code(tar.gz)
Source code(zip)
1.19.5(Feb 3, 2022)

Fixes: #1583, #1552, #1550, #1521, #1518, #1513, #1510, #1417, #1550. Also fixed some undocumented errors that caused the span["origin"] to be incorrectly set in corner cases.

Added new items "orientation" and associated transformtion matrix to the output of fitz.image_properties(), which contains EXIF data of supporting image files.

A new method Document.xref_copy() allows making xref objects duplicates of each other.
Source code(tar.gz)
Source code(zip)
1.19.4(Jan 1, 2022)
Fixes: #1505, #1484, #1479, #1474.

Changes:

Full support of PDF page rectangles like /ArtBox etc.

New global variable TESSDATA_PREFIX for comfortably checking presence of OCR support

Changed Document.xref_set_key() such that dictionary keys will physically be removed if set to value "null".

Changed Document.extract_font() to optionally return a dictionary (instead of a tuple).

Source code(tar.gz)
Source code(zip)
1.19.3(Dec 12, 2021)
Fixes: #1351, #1417, #1418, #1430, #1433

New or changed Pixmap methods color_topusage(), color_count(), warp(). Some of them solve #1397.

New Annot method and property irt_xref, set_irt_xref(). Implements #1450.

New Rect / IRect method torect() which creates a matrix to transform between given rectangles.

Page.get_texttrace() now also supports non-horizontal text.

Source code(tar.gz)
Source code(zip)
1.19.2(Nov 20, 2021)
Improvements:

Page.get_drawings() now includes area orientation for rectangles

Page pixmap creation has a new parameter "dpi"

New check for monochrome / unicolor pixmaps and number of colors

Fixes: #1388, #1375, #1364, #1342, #1355, #1397, #1408.
Source code(tar.gz)
Source code(zip)
1.19.1(Oct 24, 2021)

OCR of a document page has been improved a lot compared to v1.19.0. Text extractions now also come with an integrated sort. Fixes: #1328
Source code(tar.gz)
Source code(zip)
1.19.0(Oct 17, 2021)

Introduces major new features like PDF journalling and OCR support by directly invoking Tesseract-OCR. In addition, it is possible to detect whether object are covered (hidden) by other objects.

As part of the new version, the following issues have resolved: #1313, #1311, #1290, #1286, #1287, #1284.
Source code(tar.gz)
Source code(zip)
1.18.19(Sep 16, 2021)

Fixes #1266
Source code(tar.gz)
Source code(zip)
1.18.18(Sep 16, 2021)

This version fixes #1257, #1252, #1244, #1241, #1234, #1236, #1227.
Source code(tar.gz)
Source code(zip)
1.18.17(Aug 24, 2021)

Source code(tar.gz)
Source code(zip)
1.18.16(Aug 8, 2021)

The fitz module now supports text extraction via a new subcommand "gettext". Among a couple of modes, preserving the original layout can be chosen.

Also fixed #1187, #1184, #1154, #1152 and #1146.
Source code(tar.gz)
Source code(zip)
1.18.15(Jul 10, 2021)

Apart from some minor fixes, this release introduces support for small caps in TextWriter based text output.

In addition, method Document.subset_fonts() now prefixes subsetted font names with the 6 upper case letter prefix as prescribed by the PDF standard.

List of fixed issues: #1088, #1081, #1078, #1085.
Source code(tar.gz)
Source code(zip)
1.18.14(Jun 2, 2021)
The following habe been fixed:

#1043

#1053

undocumented occasional error calculating envelopping rectangle for paths in Page.get_drawings()

undocumented occasional loop in TextWriter.fill_textbox()

added method Font.char_lengths() which returns a tuple of all character widths for a given string. An improved version of Font.text_length()

greatly improved performance of Font.text_length()

added various ways to delete multiple PDF pages, among them are slices and the Python del statement

changed method Document.del_toc_item(): the item's title text will no longer be removed - instead the item is shown grayed-out to indicate its deletion.

Source code(tar.gz)
Source code(zip)
PyMuPDF-1.18.14-cp36-cp36m-macosx_10_9_x86_64.whl(5.30 MB)
PyMuPDF-1.18.14-cp36-cp36m-manylinux2010_x86_64.whl(6.08 MB)
PyMuPDF-1.18.14-cp36-cp36m-win32.whl(4.70 MB)
PyMuPDF-1.18.14-cp36-cp36m-win_amd64.whl(5.10 MB)
PyMuPDF-1.18.14-cp37-cp37m-macosx_10_9_x86_64.whl(5.30 MB)
PyMuPDF-1.18.14-cp37-cp37m-manylinux2010_x86_64.whl(6.08 MB)
PyMuPDF-1.18.14-cp37-cp37m-win32.whl(4.70 MB)
PyMuPDF-1.18.14-cp37-cp37m-win_amd64.whl(5.10 MB)
PyMuPDF-1.18.14-cp38-cp38-macosx_10_9_x86_64.whl(5.31 MB)
PyMuPDF-1.18.14-cp38-cp38-manylinux2010_x86_64.whl(6.10 MB)
PyMuPDF-1.18.14-cp38-cp38-win32.whl(4.71 MB)
PyMuPDF-1.18.14-cp38-cp38-win_amd64.whl(5.11 MB)
PyMuPDF-1.18.14-cp39-cp39-macosx_10_9_x86_64.whl(5.31 MB)
PyMuPDF-1.18.14-cp39-cp39-manylinux2010_x86_64.whl(6.11 MB)
PyMuPDF-1.18.14-cp39-cp39-win32.whl(4.71 MB)
PyMuPDF-1.18.14-cp39-cp39-win_amd64.whl(5.11 MB)
1.18.13(May 5, 2021)
Method Page.insert_image has been rewritten for improved performance in standard cases. Also introduced option to re-use pre-existing images in the file directly to provide another performance boost. Other changes:

implemented or fixed #1042, #1041, #1037

minor improvements in PDF EmbeddedFiles handling for better support of building PDF collections apps.

Source code(tar.gz)
Source code(zip)
PyMuPDF-1.18.13-cp36-cp36m-macosx_10_9_x86_64.whl(5.30 MB)
PyMuPDF-1.18.13-cp36-cp36m-manylinux2010_x86_64.whl(6.07 MB)
PyMuPDF-1.18.13-cp36-cp36m-manylinux2014_aarch64.whl(6.14 MB)
PyMuPDF-1.18.13-cp36-cp36m-win32.whl(4.70 MB)
PyMuPDF-1.18.13-cp36-cp36m-win_amd64.whl(5.10 MB)
PyMuPDF-1.18.13-cp37-cp37m-macosx_10_9_x86_64.whl(5.30 MB)
PyMuPDF-1.18.13-cp37-cp37m-manylinux2010_x86_64.whl(6.07 MB)
PyMuPDF-1.18.13-cp37-cp37m-manylinux2014_aarch64.whl(6.14 MB)
PyMuPDF-1.18.13-cp37-cp37m-win32.whl(4.70 MB)
PyMuPDF-1.18.13-cp37-cp37m-win_amd64.whl(5.10 MB)
PyMuPDF-1.18.13-cp38-cp38-macosx_10_9_x86_64.whl(5.30 MB)
PyMuPDF-1.18.13-cp38-cp38-manylinux2010_x86_64.whl(6.08 MB)
PyMuPDF-1.18.13-cp38-cp38-manylinux2014_aarch64.whl(6.17 MB)
PyMuPDF-1.18.13-cp38-cp38-win32.whl(4.70 MB)
PyMuPDF-1.18.13-cp38-cp38-win_amd64.whl(5.10 MB)
PyMuPDF-1.18.13-cp39-cp39-macosx_10_9_x86_64.whl(5.30 MB)
PyMuPDF-1.18.13-cp39-cp39-manylinux2010_x86_64.whl(6.10 MB)
PyMuPDF-1.18.13-cp39-cp39-manylinux2014_aarch64.whl(6.19 MB)
PyMuPDF-1.18.13-cp39-cp39-win32.whl(4.70 MB)
PyMuPDF-1.18.13-cp39-cp39-win_amd64.whl(5.10 MB)
1.18.11(Apr 10, 2021)

Meta information for images embedded in document pages has been enriched by the so-called transformation matrix. It can be used to find out, what "happened" to the image rectangle to make it fit in its bbox on the page, like scaling and rotation.

Other changes are mostly minor bug fixes: #990 #972

A new Page method get_image_info() is also available, which extracts image meta information from the page's TextPage - much like the corresponding Page.get_text("dict"), but without extracting any text or the image binary data themselves.
Source code(tar.gz)
Source code(zip)
PyMuPDF-1.18.11-cp36-cp36m-macosx_10_9_x86_64.whl(5.29 MB)
PyMuPDF-1.18.11-cp36-cp36m-manylinux2010_x86_64.whl(6.07 MB)
PyMuPDF-1.18.11-cp36-cp36m-win32.whl(4.70 MB)
PyMuPDF-1.18.11-cp36-cp36m-win_amd64.whl(5.10 MB)
PyMuPDF-1.18.11-cp37-cp37m-macosx_10_9_x86_64.whl(5.30 MB)
PyMuPDF-1.18.11-cp37-cp37m-manylinux2010_x86_64.whl(6.07 MB)
PyMuPDF-1.18.11-cp37-cp37m-win32.whl(4.70 MB)
PyMuPDF-1.18.11-cp37-cp37m-win_amd64.whl(5.10 MB)
PyMuPDF-1.18.11-cp38-cp38-macosx_10_9_x86_64.whl(5.30 MB)
PyMuPDF-1.18.11-cp38-cp38-manylinux2010_x86_64.whl(6.08 MB)
PyMuPDF-1.18.11-cp38-cp38-win32.whl(4.70 MB)
PyMuPDF-1.18.11-cp38-cp38-win_amd64.whl(5.10 MB)
PyMuPDF-1.18.11-cp39-cp39-macosx_10_9_x86_64.whl(5.30 MB)
PyMuPDF-1.18.11-cp39-cp39-manylinux2010_x86_64.whl(6.10 MB)
PyMuPDF-1.18.11-cp39-cp39-win32.whl(4.70 MB)
PyMuPDF-1.18.11-cp39-cp39-win_amd64.whl(5.10 MB)
1.18.10(Mar 22, 2021)
Fixed: #941 #929 #927

included PDF trailer access in Document.xref_get_key()

added a number of functions for recovering text quads in "dict" / "rawdict" text extractions

Source code(tar.gz)
Source code(zip)
PyMuPDF-1.18.10-cp36-cp36m-macosx_10_9_x86_64.whl(5.29 MB)
PyMuPDF-1.18.10-cp36-cp36m-manylinux2010_x86_64.whl(6.06 MB)
PyMuPDF-1.18.10-cp36-cp36m-win32.whl(4.69 MB)
PyMuPDF-1.18.10-cp36-cp36m-win_amd64.whl(5.09 MB)
PyMuPDF-1.18.10-cp37-cp37m-macosx_10_9_x86_64.whl(5.29 MB)
PyMuPDF-1.18.10-cp37-cp37m-manylinux2010_x86_64.whl(6.06 MB)
PyMuPDF-1.18.10-cp37-cp37m-win32.whl(4.69 MB)
PyMuPDF-1.18.10-cp37-cp37m-win_amd64.whl(5.09 MB)
PyMuPDF-1.18.10-cp38-cp38-macosx_10_9_x86_64.whl(5.30 MB)
PyMuPDF-1.18.10-cp38-cp38-manylinux2010_x86_64.whl(6.08 MB)
PyMuPDF-1.18.10-cp38-cp38-win32.whl(4.70 MB)
PyMuPDF-1.18.10-cp38-cp38-win_amd64.whl(5.10 MB)
PyMuPDF-1.18.10-cp39-cp39-macosx_10_9_x86_64.whl(5.30 MB)
PyMuPDF-1.18.10-cp39-cp39-manylinux2010_x86_64.whl(6.09 MB)
PyMuPDF-1.18.10-cp39-cp39-win32.whl(4.70 MB)
PyMuPDF-1.18.10-cp39-cp39-win_amd64.whl(5.10 MB)
1.18.9(Feb 26, 2021)
Fixed #888, #895, #896, #885, #922 Implemented #897 (text output right-to-left).

Font subsetting now works without rewriting the respective text.

Added a utility function to compute the quad of a text span for "dict" and "rawdict" text extraction.

Source code(tar.gz)
Source code(zip)
PyMuPDF-1.18.9-cp36-cp36m-macosx_10_9_x86_64.whl(5.29 MB)
PyMuPDF-1.18.9-cp36-cp36m-manylinux2010_x86_64.whl(6.06 MB)
PyMuPDF-1.18.9-cp36-cp36m-win32.whl(4.69 MB)
PyMuPDF-1.18.9-cp36-cp36m-win_amd64.whl(5.09 MB)
PyMuPDF-1.18.9-cp37-cp37m-macosx_10_9_x86_64.whl(5.29 MB)
PyMuPDF-1.18.9-cp37-cp37m-manylinux2010_x86_64.whl(6.06 MB)
PyMuPDF-1.18.9-cp37-cp37m-win32.whl(4.69 MB)
PyMuPDF-1.18.9-cp37-cp37m-win_amd64.whl(5.09 MB)
PyMuPDF-1.18.9-cp38-cp38-macosx_10_9_x86_64.whl(5.29 MB)
PyMuPDF-1.18.9-cp38-cp38-manylinux2010_x86_64.whl(6.07 MB)
PyMuPDF-1.18.9-cp38-cp38-win32.whl(4.69 MB)
PyMuPDF-1.18.9-cp38-cp38-win_amd64.whl(5.09 MB)
PyMuPDF-1.18.9-cp39-cp39-macosx_10_9_x86_64.whl(5.29 MB)
PyMuPDF-1.18.9-cp39-cp39-manylinux2010_x86_64.whl(6.09 MB)
PyMuPDF-1.18.9-cp39-cp39-win32.whl(4.69 MB)
PyMuPDF-1.18.9-cp39-cp39-win_amd64.whl(5.09 MB)
1.18.8(Feb 4, 2021)
Fixes:

#881

#878

Source code(tar.gz)
Source code(zip)
PyMuPDF-1.18.8-cp36-cp36m-macosx_10_9_x86_64.whl(5.28 MB)
PyMuPDF-1.18.8-cp36-cp36m-manylinux2010_x86_64.whl(6.05 MB)
PyMuPDF-1.18.8-cp36-cp36m-win32.whl(4.68 MB)
PyMuPDF-1.18.8-cp36-cp36m-win_amd64.whl(5.08 MB)
PyMuPDF-1.18.8-cp37-cp37m-macosx_10_9_x86_64.whl(5.28 MB)
PyMuPDF-1.18.8-cp37-cp37m-manylinux2010_x86_64.whl(6.05 MB)
PyMuPDF-1.18.8-cp37-cp37m-win32.whl(4.68 MB)
PyMuPDF-1.18.8-cp37-cp37m-win_amd64.whl(5.08 MB)
PyMuPDF-1.18.8-cp38-cp38-macosx_10_9_x86_64.whl(5.28 MB)
PyMuPDF-1.18.8-cp38-cp38-manylinux2010_x86_64.whl(6.06 MB)
PyMuPDF-1.18.8-cp38-cp38-win32.whl(4.68 MB)
PyMuPDF-1.18.8-cp38-cp38-win_amd64.whl(5.08 MB)
PyMuPDF-1.18.8-cp39-cp39-macosx_10_9_x86_64.whl(5.28 MB)
PyMuPDF-1.18.8-cp39-cp39-manylinux2010_x86_64.whl(6.07 MB)
PyMuPDF-1.18.8-cp39-cp39-win32.whl(4.68 MB)
PyMuPDF-1.18.8-cp39-cp39-win_amd64.whl(5.08 MB)
1.18.7(Feb 2, 2021)
Fixes:

#844, #838, #823, #818, #814

Implemented enhancement requests:

#855, which allows font subsetting using package fontTools

#870, which allows convert_to_pdf method also for PDF documents.

#843, Document.tobytes() (formerly Document.write()) now also support linearized output. Plus several extensions / improvements around supporting Python fileobjects.

Added new methods to quickly determine whether a PDF has annotations or links.

Extended the Document.scrub() method with a new parameter, which allows to also remove page thumbnails.

Added methods to directly inquire and set values in PDF objects - without the need to manipulating PDF object sources in an unwieldy way - see methods Document.xref_set_key() / Document.xref_get_key().

Continued the process of changing the naming convention for class methods and attributes to "snake_case". As announced before, this is a tedious, error-prone process, and requires special care to maintain a high backlevel support for existing scripts. In future versions - probably synchronously to MuPDF v1.19.0 - we will remove definitions of old names, but a method for re-activating old aliases will remain available.
Source code(tar.gz)
Source code(zip)
1.18.6(Jan 7, 2021)
The recent introduction of "Discussions" by Github has been very motivating for our users. Based on their feedback, several enhancement have been implemented. Here is a selection:

Most Python functions now have typing / annotation support .

For PDF table-of-contents items, colors are now supported (reading and writing)

PDF page label support for reading and writing

Support personalized tagging of new annotations, fields and links for easier selection of relevant objects.

There also is a number of fixes - please consult the documentation.
Source code(tar.gz)
Source code(zip)
PyMuPDF-1.18.6-cp36-cp36m-macosx_10_9_x86_64.whl(5.30 MB)
PyMuPDF-1.18.6-cp36-cp36m-manylinux2010_x86_64.whl(6.03 MB)
PyMuPDF-1.18.6-cp36-cp36m-win32.whl(4.67 MB)
PyMuPDF-1.18.6-cp36-cp36m-win_amd64.whl(5.07 MB)
PyMuPDF-1.18.6-cp37-cp37m-macosx_10_9_x86_64.whl(5.31 MB)
PyMuPDF-1.18.6-cp37-cp37m-manylinux2010_x86_64.whl(6.03 MB)
PyMuPDF-1.18.6-cp37-cp37m-win32.whl(4.67 MB)
PyMuPDF-1.18.6-cp37-cp37m-win_amd64.whl(5.07 MB)
PyMuPDF-1.18.6-cp38-cp38-macosx_10_9_x86_64.whl(5.31 MB)
PyMuPDF-1.18.6-cp38-cp38-manylinux2010_x86_64.whl(6.04 MB)
PyMuPDF-1.18.6-cp38-cp38-win32.whl(4.67 MB)
PyMuPDF-1.18.6-cp38-cp38-win_amd64.whl(5.08 MB)
PyMuPDF-1.18.6-cp39-cp39-macosx_10_9_x86_64.whl(5.31 MB)
PyMuPDF-1.18.6-cp39-cp39-manylinux2010_x86_64.whl(6.06 MB)
PyMuPDF-1.18.6-cp39-cp39-win32.whl(4.67 MB)
PyMuPDF-1.18.6-cp39-cp39-win_amd64.whl(5.08 MB)
1.18.5(Dec 17, 2020)
Font metrics handling has been improved: text box writing now observes the relevant font properties when determining line heights. In this course a new option has been introduced, which allows getting text bboxes (glyphs, spans, text search quads, etc.) that more exactly wrap the text only - as opposed to always returning line height bboxes.

Fixes:

#771

#768

#750

#739

#728

#727

Source code(tar.gz)
Source code(zip)
PyMuPDF-1.18.5-cp36-cp36m-macosx_10_9_x86_64.whl(5.30 MB)
PyMuPDF-1.18.5-cp36-cp36m-manylinux2010_x86_64.whl(6.02 MB)
PyMuPDF-1.18.5-cp36-cp36m-win32.whl(4.67 MB)
PyMuPDF-1.18.5-cp36-cp36m-win_amd64.whl(5.06 MB)
PyMuPDF-1.18.5-cp37-cp37m-macosx_10_9_x86_64.whl(5.30 MB)
PyMuPDF-1.18.5-cp37-cp37m-manylinux2010_x86_64.whl(6.02 MB)
PyMuPDF-1.18.5-cp37-cp37m-win32.whl(4.67 MB)
PyMuPDF-1.18.5-cp37-cp37m-win_amd64.whl(5.06 MB)
PyMuPDF-1.18.5-cp38-cp38-macosx_10_9_x86_64.whl(5.30 MB)
PyMuPDF-1.18.5-cp38-cp38-manylinux2010_x86_64.whl(6.03 MB)
PyMuPDF-1.18.5-cp38-cp38-win32.whl(4.67 MB)
PyMuPDF-1.18.5-cp38-cp38-win_amd64.whl(5.07 MB)
PyMuPDF-1.18.5-cp39-cp39-macosx_10_9_x86_64.whl(5.30 MB)
PyMuPDF-1.18.5-cp39-cp39-manylinux2010_x86_64.whl(6.04 MB)
PyMuPDF-1.18.5-cp39-cp39-win32.whl(4.67 MB)
PyMuPDF-1.18.5-cp39-cp39-win_amd64.whl(5.07 MB)
1.18.4(Nov 20, 2020)
Improved PDF Optional Content support

Started overhaul of method and attribute naming

Introduced support of Popup annotations

Implemented the following fixes:

#727

#726

#724

Source code(tar.gz)
Source code(zip)
PyMuPDF-1.18.4-cp27-cp27m-macosx_10_9_x86_64.whl(5.30 MB)
PyMuPDF-1.18.4-cp27-cp27m-manylinux2010_x86_64.whl(6.00 MB)
PyMuPDF-1.18.4-cp27-cp27m-win32.whl(4.66 MB)
PyMuPDF-1.18.4-cp27-cp27m-win_amd64.whl(5.07 MB)
PyMuPDF-1.18.4-cp27-cp27mu-manylinux2010_x86_64.whl(6.00 MB)
PyMuPDF-1.18.4-cp35-cp35m-macosx_10_9_x86_64.whl(5.29 MB)
PyMuPDF-1.18.4-cp35-cp35m-manylinux2010_x86_64.whl(6.01 MB)
PyMuPDF-1.18.4-cp35-cp35m-win32.whl(4.67 MB)
PyMuPDF-1.18.4-cp35-cp35m-win_amd64.whl(5.06 MB)
1.18.3(Nov 9, 2020)
As a major new feature, the PDF Optional Content concept is now widely supported.

The following fixes have been implemented:

#714

#711

#707

#713

#709

Source code(tar.gz)
Source code(zip)
PyMuPDF-1.18.3-cp27-cp27m-macosx_10_9_x86_64.whl(5.29 MB)
PyMuPDF-1.18.3-cp27-cp27m-manylinux2010_x86_64.whl(5.99 MB)
PyMuPDF-1.18.3-cp27-cp27m-win32.whl(4.66 MB)
PyMuPDF-1.18.3-cp27-cp27m-win_amd64.whl(5.06 MB)
PyMuPDF-1.18.3-cp27-cp27mu-manylinux2010_x86_64.whl(5.99 MB)
PyMuPDF-1.18.3-cp35-cp35m-macosx_10_9_x86_64.whl(5.29 MB)
PyMuPDF-1.18.3-cp35-cp35m-manylinux2010_x86_64.whl(5.99 MB)
PyMuPDF-1.18.3-cp35-cp35m-win32.whl(4.66 MB)
PyMuPDF-1.18.3-cp35-cp35m-win_amd64.whl(5.06 MB)
PyMuPDF-1.18.3-cp36-cp36m-macosx_10_9_x86_64.whl(5.29 MB)
PyMuPDF-1.18.3-cp36-cp36m-manylinux2010_x86_64.whl(5.99 MB)
PyMuPDF-1.18.3-cp36-cp36m-win32.whl(4.66 MB)
PyMuPDF-1.18.3-cp36-cp36m-win_amd64.whl(5.06 MB)
PyMuPDF-1.18.3-cp37-cp37m-macosx_10_9_x86_64.whl(5.29 MB)
PyMuPDF-1.18.3-cp37-cp37m-manylinux2010_x86_64.whl(5.99 MB)
PyMuPDF-1.18.3-cp37-cp37m-win32.whl(4.66 MB)
PyMuPDF-1.18.3-cp37-cp37m-win_amd64.whl(5.06 MB)
PyMuPDF-1.18.3-cp38-cp38-macosx_10_9_x86_64.whl(5.29 MB)
PyMuPDF-1.18.3-cp38-cp38-manylinux2010_x86_64.whl(6.01 MB)
PyMuPDF-1.18.3-cp38-cp38-win32.whl(4.66 MB)
PyMuPDF-1.18.3-cp38-cp38-win_amd64.whl(5.06 MB)
PyMuPDF-1.18.3-cp39-cp39-macosx_10_9_x86_64.whl(5.29 MB)
PyMuPDF-1.18.3-cp39-cp39-manylinux2010_x86_64.whl(6.02 MB)
PyMuPDF-1.18.3-cp39-cp39-win32.whl(4.66 MB)
PyMuPDF-1.18.3-cp39-cp39-win_amd64.whl(5.06 MB)
1.18.2(Oct 27, 2020)
This resolves

#575

#697

#691

and removes the hit_max parameter from text searching. In addition, hyphenated words around line breaks are still found.

The use of the clip parameter in text searches and text extractions now only includes characters whose bboxes are fully contained in the clip rctangle.
Source code(tar.gz)
Source code(zip)
PyMuPDF-1.18.2-cp36-cp36m-macosx_10_9_x86_64.whl(5.28 MB)
PyMuPDF-1.18.2-cp36-cp36m-manylinux2010_x86_64.whl(5.96 MB)
PyMuPDF-1.18.2-cp36-cp36m-win32.whl(4.65 MB)
PyMuPDF-1.18.2-cp36-cp36m-win_amd64.whl(5.04 MB)
PyMuPDF-1.18.2-cp37-cp37m-macosx_10_9_x86_64.whl(5.28 MB)
PyMuPDF-1.18.2-cp37-cp37m-manylinux2010_x86_64.whl(5.96 MB)
PyMuPDF-1.18.2-cp37-cp37m-win32.whl(4.65 MB)
PyMuPDF-1.18.2-cp37-cp37m-win_amd64.whl(5.04 MB)
PyMuPDF-1.18.2-cp38-cp38-macosx_10_9_x86_64.whl(5.28 MB)
PyMuPDF-1.18.2-cp38-cp38-manylinux2010_x86_64.whl(5.97 MB)
PyMuPDF-1.18.2-cp38-cp38-win32.whl(4.65 MB)
PyMuPDF-1.18.2-cp38-cp38-win_amd64.whl(5.04 MB)
PyMuPDF-1.18.2-cp39-cp39-macosx_10_9_x86_64.whl(5.28 MB)
PyMuPDF-1.18.2-cp39-cp39-manylinux2010_x86_64.whl(5.98 MB)
PyMuPDF-1.18.2-cp39-cp39-win32.whl(4.65 MB)
PyMuPDF-1.18.2-cp39-cp39-win_amd64.whl(5.04 MB)

PyMuPDF is a Python binding with support for MuPDF

Related tags

Overview

PyMuPDF 1.18.14

Authors

Introduction

Usage and Documentation

Installation

Ubuntu

OSX

MS Windows

Earlier Versions

License and Copyright

Contact

Comments

Wrong Handling of Reference Count of "None" Object

Replace image throws an error

Describe the bug (mandatory)

To Reproduce (mandatory)

Expected behavior (optional)

Screenshots (optional)

Your configuration (mandatory)

Additional context (optional)

Failed to read JPX header when trying to get blocks

Describe the bug (mandatory)

To Reproduce (mandatory)

Your configuration (mandatory)

1.21.1: test_color_count fails

Describe the bug (mandatory)

To Reproduce (mandatory)

Expected behavior (optional)

Screenshots (optional)

Your configuration (mandatory)

Additional context (optional)

Redaction removing more text than expected

Describe the bug (mandatory)

To Reproduce (mandatory)

Expected behavior (optional)

Screenshots (optional)

Your configuration (mandatory)

Image in pdf changes color after applying redactions

Description

To Reproduce

Expected behavior

Screenshots

Your configuration

Releases(1.21.1)

1.21.1(Dec 13, 2022)

1.21.0rc2(Nov 7, 2022)

1.21.0rc1(Nov 1, 2022)

1.20.2(Aug 13, 2022)

1.20.1(Jun 27, 2022)

1.20.0(Jun 27, 2022)

1.19.6(Mar 5, 2022)

1.19.5(Feb 3, 2022)

1.19.4(Jan 1, 2022)

1.19.3(Dec 12, 2021)

1.19.2(Nov 20, 2021)

1.19.1(Oct 24, 2021)

1.19.0(Oct 17, 2021)

1.18.19(Sep 16, 2021)

1.18.18(Sep 16, 2021)

1.18.17(Aug 24, 2021)

1.18.16(Aug 8, 2021)

1.18.15(Jul 10, 2021)

1.18.14(Jun 2, 2021)

1.18.13(May 5, 2021)

1.18.11(Apr 10, 2021)

1.18.10(Mar 22, 2021)

1.18.9(Feb 26, 2021)

1.18.8(Feb 4, 2021)

1.18.7(Feb 2, 2021)

1.18.6(Jan 7, 2021)

1.18.5(Dec 17, 2020)

1.18.4(Nov 20, 2020)

1.18.3(Nov 9, 2020)

1.18.2(Oct 27, 2020)

Owner

PyMuPDF

Extract the table in the PDF，outputs the data similar to the json format