Python lib for Simple PDF text extraction

Overview

pdftotext

PyPI Status Azure Status AppVeyor status Coverage Status Downloads

Simple PDF text extraction

import pdftotext

# Load your PDF
with open("lorem_ipsum.pdf", "rb") as f:
    pdf = pdftotext.PDF(f)

# If it's password-protected
with open("secure.pdf", "rb") as f:
    pdf = pdftotext.PDF(f, "secret")

# How many pages?
print(len(pdf))

# Iterate over all the pages
for page in pdf:
    print(page)

# Read some individual pages
print(pdf[0])
print(pdf[1])

# Read all the text into one string
print("\n\n".join(pdf))

OS Dependencies

These instructions assume you're using Python 3 on a recent OS. Package names may differ for Python 2 or for an older OS.

Debian, Ubuntu, and friends

sudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev

Fedora, Red Hat, and friends

sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python3-devel

macOS

brew install pkg-config poppler python

Windows

Currently tested only when using conda:

  • Install the Microsoft Visual C++ Build Tools
  • Install poppler through conda:
    conda install -c conda-forge poppler
    

Install

pip install pdftotext
Comments
  • error: command 'gcc' failed with exit status 1

    error: command 'gcc' failed with exit status 1

    Hi,

    I'm having trouble installing pdftotext. I'm using Python 3.6 on Anaconda 5.2.0 and pip version 18.0. There seems to be a problem with gcc so I did conda install libgcc but that didn't make any difference. I also made sure python3-dev was installed.

    [email protected]:~/py3eg$` pip install pdftotext
    Collecting pdftotext
      Using cached https://files.pythonhosted.org/packages/96/41/aa31f4a6809eb0574674d6c0cf6bc0e00aaf0ea53c62db8a2d9af50b7cc6/pdftotext-2.1.0.tar.gz
    Building wheels for collected packages: pdftotext
      Running setup.py bdist_wheel for pdftotext ... error
      Complete output from command /home/john/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-9uyu6ggf/pdftotext/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/pip-wheel-epbnqs4m --python-tag cp36:
      running bdist_wheel
      running build
      running build_ext
      building 'pdftotext' extension
      creating build
      creating build/temp.linux-x86_64-3.6
      gcc -pthread -B /home/john/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DPOPPLER_CPP_AT_LEAST_0_30_0=0 -I/home/john/anaconda3/include/python3.6m -c pdftotext.cpp -o build/temp.linux-x86_64-3.6/pdftotext.o -Wall
      cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
      pdftotext.cpp:3:10: fatal error: poppler/cpp/poppler-document.h: No such file or directory
       #include <poppler/cpp/poppler-document.h>
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      compilation terminated.
      error: command 'gcc' failed with exit status 1
      
      ----------------------------------------
      Failed building wheel for pdftotext
      Running setup.py clean for pdftotext
    Failed to build pdftotext
    Installing collected packages: pdftotext
      Running setup.py install for pdftotext ... error
        Complete output from command /home/john/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-9uyu6ggf/pdftotext/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-sx0bea7r/install-record.txt --single-version-externally-managed --compile:
        running install
        running build
        running build_ext
        building 'pdftotext' extension
        creating build
        creating build/temp.linux-x86_64-3.6
        gcc -pthread -B /home/john/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DPOPPLER_CPP_AT_LEAST_0_30_0=0 -I/home/john/anaconda3/include/python3.6m -c pdftotext.cpp -o build/temp.linux-x86_64-3.6/pdftotext.o -Wall
        cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
        pdftotext.cpp:3:10: fatal error: poppler/cpp/poppler-document.h: No such file or directory
         #include <poppler/cpp/poppler-document.h>
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        compilation terminated.
        error: command 'gcc' failed with exit status 1
        
        ----------------------------------------
    Command "/home/john/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-9uyu6ggf/pdftotext/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-sx0bea7r/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-install-9uyu6ggf/pdftotext/
    
    

    Any help would be greatly appreciated.

    Thanks!

    opened by johndurning 24
  • Add OS X Mojave-specific build / link config

    Add OS X Mojave-specific build / link config

    • add OS X Mojave to the list of platforms which require /usr/local/include to be in include_dirs
    • add OS X Mojave to the list of platforms which require /usr/local/lib to be in library_dirs
    opened by wileykestner 22
  • pip install fails on macOS

    pip install fails on macOS

    Hi, I'm running on macOs and trying ton install pdftotext I tried pip install pdftotext and got this error

    `Collecting pdftotext Using cached https://files.pythonhosted.org/packages/a6/a7/c202adb0bcd3adc3030b0c5f7f0e21f62a721913e93296e6c4ddc305cbd3/pdftotext-2.1.2.tar.gz Building wheels for collected packages: pdftotext Building wheel for pdftotext (setup.py) ... error ERROR: Command errored out with exit status 1: command: /Users/romainvandelouw/venv/oreilly/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/zg/8mfp262s1093qtv0klghbfnr0000gn/T/pip-install-oailros8/pdftotext/setup.py'"'"'; file='"'"'/private/var/folders/zg/8mfp262s1093qtv0klghbfnr0000gn/T/pip-install-oailros8/pdftotext/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/zg/8mfp262s1093qtv0klghbfnr0000gn/T/pip-wheel-yvlotdyb --python-tag cp36 cwd: /private/var/folders/zg/8mfp262s1093qtv0klghbfnr0000gn/T/pip-install-oailros8/pdftotext/ Complete output (27 lines): running bdist_wheel running build running build_ext building 'pdftotext' extension creating build creating build/temp.macosx-10.7-x86_64-3.6 gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/romainvandelouw/anaconda/include -arch x86_64 -I/Users/romainvandelouw/anaconda/include -arch x86_64 -DPOPPLER_CPP_AT_LEAST_0_30_0=1 -I/usr/local/include -I/Users/romainvandelouw/anaconda/include/python3.6m -c pdftotext.cpp -o build/temp.macosx-10.7-x86_64-3.6/pdftotext.o -Wall -mmacosx-version-min=10.9 In file included from pdftotext.cpp:5: /Users/romainvandelouw/anaconda/include/poppler/cpp/poppler-page.h:37:22: warning: rvalue references are a C++11 extension [-Wc++11-extensions] text_box(text_box&&) = default; ^ /Users/romainvandelouw/anaconda/include/poppler/cpp/poppler-page.h:37:28: warning: defaulted function definitions are a C++11 extension [-Wc++11-extensions] text_box(text_box&&) = default; ^ /Users/romainvandelouw/anaconda/include/poppler/cpp/poppler-page.h:38:33: warning: rvalue references are a C++11 extension [-Wc++11-extensions] text_box& operator=(text_box&&) = default; ^ /Users/romainvandelouw/anaconda/include/poppler/cpp/poppler-page.h:38:39: warning: defaulted function definitions are a C++11 extension [-Wc++11-extensions] text_box& operator=(text_box&&) = default; ^ 4 warnings generated. creating build/lib.macosx-10.7-x86_64-3.6 g++ -bundle -undefined dynamic_lookup -L/Users/romainvandelouw/anaconda/lib -arch x86_64 -L/Users/romainvandelouw/anaconda/lib -arch x86_64 -arch x86_64 build/temp.macosx-10.7-x86_64-3.6/pdftotext.o -L/usr/local/lib -lpoppler-cpp -o build/lib.macosx-10.7-x86_64-3.6/pdftotext.cpython-36m-darwin.so clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 [-Wdeprecated] ld: library not found for -lstdc++ clang: error: linker command failed with exit code 1 (use -v to see invocation) error: command 'g++' failed with exit status 1

    ERROR: Failed building wheel for pdftotext`

    I read in previous issues that it could be related to dependencies but Popler is installed

    Warning: pkg-config 0.29.2 is already installed and up-to-date To reinstall 0.29.2, runbrew reinstall pkg-configWarning: poppler 0.81.0 is already installed and up-to-date To reinstall 0.81.0, runbrew reinstall poppler`

    I read #26 but in my case it doesn't work outside the virtualenv either...

    verbose_pdftotext.txt is the result of pip --verbose install pdftotext :

    What am I missing ? Thanks for your help !

    opened by vandelouw 14
  • Cant install on windows using pip

    Cant install on windows using pip

    pip install pdftotext Collecting pdftotext Using cached pdftotext-2.0.1.tar.gz Installing collected packages: pdftotext Running setup.py install for pdftotext ... error Complete output from command "c:\users\vinayak sharma\appdata\local\programs\python\python35\python.exe" -u -c "import setuptools, tokenize;__file__='C:\\Users\\Local\\Temp\\pip-build-6eh2vxu8\\pdftotext\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\VINAYA~1\AppData\Local\Temp\pip-kyy39x3a-record\install-record.txt --single-version-externally-managed --compile: WARNING: pkg-config not found--guessing at poppler version. If the build fails, install pkg-config and try again. running install running build running build_ext building 'pdftotext' extension error: Unable to find vcvarsall.bat

    ----------------------------------------
    

    Command ""c:\users\Local\\Temp\\pip-build-6eh2vxu8\\pdftotext\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\VINAYA~1\AppData\Local\Temp\pip-kyy39x3a-record\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\VINAYA~1\AppData\Local\Temp\pip-build-6eh2vxu8\pdftotext\

    opened by vnyk 12
  • hyphen ignored at end of line

    hyphen ignored at end of line

    I have a pdf file and used the below code to print it out on a terminal, the hyphens at the end of the lines were not included. I created a 1 page pdf test file (using qpdf).

    My test file is: https://github.com/ripspin5/scripts/blob/master/misc/test1.pdf

    Code: (python3.7)

    import pdftotext
    
    # Load your PDF
    with open("test1.pdf", "rb") as f:
        pdf = pdftotext.PDF(f)
    
    print(pdf[0])    
    
    opened by ripspin5 10
  • Can't install on MacOS via pip

    Can't install on MacOS via pip

    Running pip, with or without su, on MacOS produces the following error:

    Command "/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python -u -c "import setuptools, tokenize;file='/private/var/folders/cm/60_4h2mj23d_70fhqwvtjf7m0000gn/T/pip-build-bd88s9/pdftotext/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /var/folders/cm/60_4h2mj23d_70fhqwvtjf7m0000gn/T/pip-eNla3s-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/cm/60_4h2mj23d_70fhqwvtjf7m0000gn/T/pip-build-bd88s9/pdftotext/

    This library is DOA until the dependency issue is resolved.

    opened by sfsdfd 10
  • [CI] Build wheels for macOS, Linux and Windows

    [CI] Build wheels for macOS, Linux and Windows

    Hey,

    this creates binary wheels including dependencies for the major operating systems.

    For Windows: Currently, the CI will only build wheels for 64-bit Python (amd64). This is due to the libraries bundled with conda being 64-bit as well. This can be fixed by installing a 32-bit distribution of poppler. delvewheel is used to ensure that all non-system DLLs are bundled. This will not work on systems older than Windows 7 but I guess we can ignore that. I have tested this on my Windows 10 machine.

    For Linux: The wheel has manylinux1 compatibility so it supports even the most ancient operating systems like CentOS 6. The latest poppler is compiled from source. I have tested this on an Ubuntu 20.04 system.

    For macOS: I've used the cibuildwheel example and added the dependencies like for the test jobs. It manages to create a wheel but it's only a few kilobytes so I'm kind of sceptical about this. However, I don't have a macOS system to test this on. It would be nice if someone with a Mac could test this.

    The generated wheels can be downloaded here: https://dev.azure.com/jhnnbr/pdftotext/_build/results?buildId=36&view=artifacts&pathAsName=false&type=publishedArtifacts

    Closes: #29

    opened by bauerj 9
  • Can't install using conda. error: no template named 'unique_ptr' in namespace 'std'

    Can't install using conda. error: no template named 'unique_ptr' in namespace 'std'

    I get an error on pip install pdftotext

    In file included from pdftotext.cpp:5:
    /usr/local/include/poppler/cpp/poppler-page.h:63:10: error: no template named 'unique_ptr' in namespace 'std'
        std::unique_ptr<text_box_data> m_data;
        ~~~~~^
    1 error generated.
    error: command 'gcc' failed with exit status 1
    
    opened by kelvinu 9
  • Can't pip install on Mac

    Can't pip install on Mac

    hey, when I run the pip command it gives me the following error: ERROR: Could not find a version that satisfies the requirement pdftotext (from versions: none) ERROR: No matching distribution found for pdftotext

    is there another way to install it, or solve this way?

    opened by R470R 8
  • pdftotext.Error: Poppler error creating document

    pdftotext.Error: Poppler error creating document

    while using pdftotext with multiprocessing module on ec2

    ('read pdf file', '1004.5293.pdf')
    Traceback (most recent call last):
      File "main.py", line 44, in <module>
        result = pool.map(pdf_extract, filenames)
      File "/usr/lib64/python2.7/multiprocessing/pool.py", line 251, in map
        return self.map_async(func, iterable, chunksize).get()
      File "/usr/lib64/python2.7/multiprocessing/pool.py", line 567, in get
        raise self._value
    pdftotext.Error: Poppler error creating document
    

    My code:

    def pdf_extract(dirs):
        paths, filename = dirs
        file = filename.replace(".pdf", ".txt")
        if file in have:
            print("file alreafy extracted!!")
        else:
    	print("read pdf file", filename)
            with open(os.path.join(paths, filename), "rb") as f:
                pdf = pdftotext.PDF(f)
                prin(len(pdf))
            text = "\n\n".join(pdf)
            print("converted file")
            file = filename.replace(".pdf", ".txt")
            with open(txt_path+file, "w") as f:
                f.writelines(text)
                f.close()
                print("saved file")
            time.sleep(0.01)
    

    Link : arxiv paper

    opened by prakritidev 8
  • Symbol not found in flat namespace

    Symbol not found in flat namespace

    Hi all. I'm getting this error when trying to import pdftotext in a flask project and cannot figure out how to resolve it.

    ImportError: dlopen(/Users/casey/PycharmProjects/virtual environments/oadoi/lib/python3.9/site-packages/pdftotext.cpython-39-darwin.so, 0x0002): symbol not found in flat namespace '__ZN7poppler24set_debug_error_functionEPFvRKNSt3__112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIcEEEEPvES9_'

    My installation details are:

    • Mac M1
    • pdftotext version 2.1.5
    • Installed poppler with homebrew, set to version 22.04.0
    • Ensured poppler visible to pdftotext by setting these environment variables in my shell:
    export LDFLAGS="-L/opt/homebrew/opt/openssl/lib -L/opt/homebrew/lib $LDFLAGS"
    export CPPFLAGS="-I/opt/homebrew/opt/openssl/include -I/opt/homebrew/include $CPPFLAGS"
    
    opened by caseydm 7
  • Enable tests requiring at least version 0.88 if requirement is met

    Enable tests requiring at least version 0.88 if requirement is met

    At the moment, two tests will always be skipped as they require at least poppler 0.88 which might not be available in all environments. When building own wheels for the package, it would be nice to be able to run them nevertheless if at least version 0.88 is available to verify the correct behavior before uploading.

    It might make sense to expose the poppler version embedded into the Python package as well, due to it being rather variable and by (nearly) no means tied to a specific version of this package at all.

    opened by stefan6419846 3
  • Import error when running on MacOs (M1)

    Import error when running on MacOs (M1)

    Hi All, I'm running into this error when importing pdftotext. I've followed the instructions correctly and have tried to reinstall all dependencies including all the brew packages.

    ImportError: dlopen(/Users/ethannguyen/Documents/GitHub/livebuildings/env/lib/python2.7/site-packages/pdftotext.so, 0x0002): symbol not found in flat namespace '__ZN7poppler24set_debug_error_functionEPFvRKNSt3__112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIcEEEEPvES9_'

    opened by Ethansev 1
  • question about how to approach bonding box problem

    question about how to approach bonding box problem

    My PDF's have a lot of math, symbols, figures, etc.

    is there any way you know of to extract text from a page but only within one of several bounding boxes?

    I basically want to set up a feedback loop where I:

    1. iterate through the pages of the pdf
    2. set ordered bounding boxes visually on each page
    3. automatically extract and concatenate text from these bounding boxes, in their indicated order (from step 2)

    Is this doable? is there a simple way to do this? what do you think?

    opened by klebs6 0
  • Docker examples

    Docker examples

    Provide examples of how to install this module starting from some common docker images.

    See https://github.com/jalan/pdftotext/issues/61#issuecomment-625118990

    opened by jalan 0
  • Pass more arguments to pdftotext

    Pass more arguments to pdftotext

    First of all, thanks for the handy module!

    I'd be interested in having access to more of the features offered by pdftotext/xpdf to tune the quality of the extracted text.

    As far as I know it is not possible to pass arguments freely to pdftotext but there are a few hardcoded parameters (password, raw).

    Would that be something you would be open to add?

    I'm not fluent in C++ but it seems that I could get inspiration from the existing code to try to have my arguments in.

    The parameters/options in most interested in are nodiag, lineprinter, linespacing and fixed. The full list can be found here: http://www.xpdfreader.com/pdftotext-man.html

    opened by zufj 9
Releases(v2.2.2)
Owner
Jason Alan Palmer
Born under a bad sign
Jason Alan Palmer
pikepdf is a Python library for reading and writing PDF files.

A Python library for reading and writing PDF, powered by qpdf

1.6k Jan 03, 2023
pystitcher stitches your PDF files together, generating nice customizable bookmarks for you using a declarative markdown file as input

pystitcher pystitcher stitches your PDF files together, generating nice customizable bookmarks for you using a declarative input in the form of a mark

Nemo 387 Dec 10, 2022
Merge multiple PDF files into one.

PDF Merger Merge multiple PDF files into one. Usage % python pdf_merger.py -h usage: pdf_merger.py [-h] [-o OUTPUT] [-f [FILES ...]] optional argumen

Duo Apps 6 Oct 03, 2022
Generate a bunch of malicious pdf files with phone-home functionality. Can be used with Burp Collaborator

Malicious PDF Generator ☠️ Generate ten different malicious pdf files with phone-home functionality. Can be used with Burp Collaborator. Used for pene

Jonas Lejon 1.9k Jan 01, 2023
A bulk pdf generator. This application can generate PDFs in bulk by using just one click.

A bulk html pdf generator. This application can generate PDFs in bulk by using just one click. Screenshots Requirements 🧱 Your system must have the f

Aman Nirala 3 Apr 23, 2022
rst2pdf: Use a text editor. Make a PDF.

rst2pdf: Use a text editor. Make a PDF.

rst2pdf 487 Jan 06, 2023
Small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface

Small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface

1.8k Dec 29, 2022
Busca no nome e conteúdo de arquivos PDF no diretório e subdiretórios.

PDF Finder Este script auxilia na pesquisa em pastas com inúmeros arquivos PDF. A pesquisa é feita em todos os arquivos do doretório e subdiretórios.

William Pilger 1 Nov 27, 2021
Mipdfcompressor - 💕A simple pdf size compressing telegram robot

Pdf Compressor Telegram Bot A simple pdf size compressing telegram robot. Useful for digital documentation. Mandatory Variables API_HASH - Your A

Madhavan Mi 1 Feb 14, 2022
PyMuPDF is a Python binding with support for MuPDF

PyMuPDF is a Python binding with support for MuPDF (current version 1.18.*), a lightweight PDF, XPS, and E-book viewer, renderer, and toolkit, which is maintained and developed by Artifex Software, I

PyMuPDF 1.9k Jan 03, 2023
Generate a preview image for a PDF.

PDF ➡️ Preview A simple tool to save me time on Illustrator. Generates a preview image for a PDF file. Useful for sneak peeks to academic publications

David Chuan-En Lin 51 Sep 22, 2022
WeasyPrint is a smart solution helping web developers to create PDF documents.

WeasyPrint is a smart solution helping web developers to create PDF documents. It turns simple HTML pages into gorgeous statistical reports, invoices, tickets…

Kozea 5.4k Jan 08, 2023
Trata PDF para torná-lo compatível com PDF/X e com impressoras em escala de cinza.

tratapdf Trata PDF para torná-lo compatível com PDF/X e com impressoras em escala de cinza. dependências icc-profiles ghostscript visualizador de PDF

1 Nov 30, 2021
Simple pdf editor while preserving structure and format.

SIMPdf Simple pdf editor while preserving structure and format.

Shashwat Singh 242 Jan 04, 2023
Convert MD files to PDF automatically (with CSS) 📄🚀

MD2PDF Action Convert MD files to PDF automatically (with CSS)! Converts a pattern described set of markdown files and converts them to pdf whilst app

Will Fantom 1 Feb 09, 2022
A simple Python script to convert multiple images (well technically also a single image) into a pdf.

PythonImage2PDF A simple Python script to convert multiple images into a single PDF-document. Created basically for only my own needs for converting m

Joona Gynther 1 Jun 28, 2022
Table automatically extraction from PDF Document

PDF Table Extractor Table automatically extraction from PDF Document Our Icon 📌 Name : PDF Table Extractor 📌 Authors : Minku Koo Jiyong Park 📌 Deve

1 Jan 10, 2022
Converting Html files to pdf using python script, pdfkit module and wkhtmltopdf.

Html-to-pdf-pdfkit-wkhtml- This repository has code for converting local html files and online html resources into pdf. It is an python script which u

Hemachandran P 1 Nov 09, 2021
Excalibur: A web interface to extract tabular data from PDFs

Excalibur: A web interface to extract tabular data from PDFs Excalibur is a web interface to extract tabular data from PDFs, written in Python 3! It i

1.2k Jan 04, 2023
Compare-pdf - A Flask driven restful API for comparing two PDF files

COMPARE-PDF A Flask driven restful API for comparing two PDF files. Description

Karthikeyan JC 3 Mar 13, 2022