pystitcher stitches your PDF files together, generating nice customizable bookmarks for you using a declarative markdown file as input

Overview

pystitcher

pystitcher stitches your PDF files together, generating nice customizable bookmarks for you using a declarative input in the form of a markdown file. It is written in pure python and uses PyPDF3 for reading and writing PDF files.

Description

pystitcher is a command line tool, with very few cli options:

usage: pystitcher [-h] [--version] [-v] [--cleanup | --no-cleanup] spine.md output.pdf

Stitch PDF files together

positional arguments:
  spine.md              Input markdown file
  output.pdf            Output PDF file

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -v, --verbose         log more things
  --cleanup, --no-cleanup
                        Delete temporary files (default: True)

Given this input:

existing_bookmarks: remove
title: Complete Guide to the Personal Data Protection Bill
author: Medianama
keywords: privacy, surveillance, personal data protection
subject: Personal Data Protection Bill
# A Complete Guide to the Personal Data Protection Bill

- [Cover](cover.pdf)

# The Bills

- [Personal Data Protection Bill, 2019](https://example.com/2019-bill.pdf)
- [Personal Data Protection Bill, 2018](https://example.com/2018-bill.pdf)

# Other key reading material

- [Srikrishna Committee Report](2.a.pdf)
- [Dvara Research's Personal Data Protection Bill](2.b.pdf)
- [MP Shashi Tharoor's Data Protection Bill](2.c.pdf)
- [MP Jay Panda's Data Protection Bill](2.d.pdf)
- [SaveOurPrivacy.in bill](2.e.pdf)
- [TRAI recommendations on privacy](2.f1.pdf)
- [Comments on TRAI recommendations on privacy](2.f2.pdf)

Will generate a PDF with proper bookmarks:

https://i.imgur.com/qPVpZGt.png

And the correct metadata:

Title:          Complete Guide to the Personal Data Protection Bill
Subject:        Personal Data Protection Bill
Keywords:       privacy, surveillance, personal data protection
Author:         Medianama
Creator:        pystitcher/1.0.0
Producer:       pystitcher/1.0.0

Configuration options can be specified with Meta data at the top of the file.

Option Notes
fit Default fit of the bookmark. Can be overwritten per bookmark See wiki for more details.
author PDF Author
keywords PDF Keywords
subject PDF Subject
title PDF Title. If left unspecified, first Heading (h1) in the document is used.
existing_bookmarks What to do with existing bookmarks in individual files. Options are keep, flatten, and remove. See docs for more details.

Additionally, PDF links specified in markdown can have attributes to alter the PDFs before merging. The below attribute will rotate the second PDF file by 90 degrees clockwise before merging:

[Part 1](1.pdf)
[Part 2](2.pdf){: rotate="90"}

And the below attribute will merge only pages 2 to 5, both inclusive, from the second PDF file:

[Part 1](1.pdf)
[Part 2](2.pdf){: start=2 end=5}

The list of available attributes are:

Attribute Notes
rotate Rotate the PDF. Valid values are 90, 180, 270
start Start page number for PDF page selection
end End page number for PDF page selection

Documentation

Additional documentation is maintained on the project wiki on GitHub.

Comments
  • Installation instructions: please update the readme front page

    Installation instructions: please update the readme front page

    Thanks for your work providing this tool. I ended up here looking for an alternative to python stapler.

    There's a lot of good and important stuff on the README.

    Please add one or two lines, at the top of the README with the most important stuff. How to install it.

    It might be obvious to you, a python developer, but not for a potential end user. Is it using "pip install xyz" ? Will it work with pipx ? Are there any "official" packages for Linux distro xyz ?

    Thanks in advance.

    opened by m040601 5
  • Added PDF rotation filter

    Added PDF rotation filter

    Closes #1

    Added a test input in book-rotate.md as well.

    I've been using pdftk a lot recently, but pystitcher definitely works better for me. Thanks for working on it!

    opened by Vonter 3
  • Python 3.9 required?

    Python 3.9 required?

    Thanks for the great code, it worked well for me putting books back together from chapters in Elsevier. The only issue I had was that it required using Python 3.9 at a minimum. I initially had an error under Python 3.7, complaining about line 56 of skeleton.py with reference to argparse.BooleanOptionalAction ; I lost the actual error message e.g see here for related.

    bug 
    opened by jd-foster 2
  • Add Tests

    Add Tests

    Starting Integration tests. Currently tracks:

    • [x] Sticher functionality by generating all test files
    • [x] Number of pages in these test files
    • [x] Bookmarks (title/destination page number)
    • [x] Bookmark level
    • [x] PDF metadata
    • [x] Attributes: Rotation
    • [x] Attributes: Page Selection
    • [x] Run CI tests on GitHub Actions
    • [x] Generate coverage reports

    Missing testcases:

    • [ ] Remote fetching (Will take this up later)
    • [x] Custom Title
    • [x] H2/H3 as bookmarks
    • [x] Disable cleanup and validate
    enhancement 
    opened by captn3m0 1
  • Specify Zoom level for links in markdown

    Specify Zoom level for links in markdown

    [Personal Data Protection Bill, 2019](1.a.pdf){: zoom=FitWidth}
    

    Other options:

        Inherit - Inherit zoom
        FitPage - Fit page width+height
        FitWidth - Fit page width
        FitHeight - Fit page height
        ##% - Zoom to ##% eg 50% = 50% zoom
    
    opened by captn3m0 0
  • Support external URLs to fetch PDF

    Support external URLs to fetch PDF

    # Title
    
    - [chapter 1](https://example.com/chapter1.pdf)
    - [chapter 2](https://example.com/chapter2.pdf)
    

    Download the PDFs, cache them and merge accordingly.

    opened by captn3m0 0
  • Auto Page numbering support

    Auto Page numbering support

    Want to be able to add page numbers to the generated PDF with font configuration. Use case - Printouts. Once this is done, easy to add a Table of Contents too. #6

    I was exploring and found this as one option: https://github.com/vlad-anisov/numbering2pdf/blob/main/numbering2pdf/numbering2pdf.py It uses reportlab to generate empty numbered PDFs and merges those pages with the existing pages one by one.

    Happy to work on this issue, if you suggest a preferred method (given your research) to implement this.

    enhancement 
    opened by lprsd 1
  • Fix current working directory hack

    Fix current working directory hack

    Currently, we switch our CWD to the markdown file directory, and don't reset it back. Playing around with chdir is bad and causes issues.

    Fix this to instead use paths relative from the markdown file directory.

    bug 
    opened by captn3m0 0
  • Render Markdown inline

    Render Markdown inline

    Within the markdown, provide a way to declare pages that get rendered as stand-alone pages as well.

    <!-- This only goes in bookmark-->
    # Cover
    
    ![Cover](cover.pdf)
    
    # Colophon
    
    ```
    # Hobbit
    ## There and back again
    ## By JRR Tolkein
    ```{: inline=1}
    
    ![Foreword](foreword.pdf)
    

    Renders a cover, a single page with the 3 lines as above, and then the foreword. So the colophon ends up linking to the middle text section

    opened by captn3m0 3
  • Fetch HTML online and render

    Fetch HTML online and render

    # Title
    
    - [chapter 1](https://example.com/chapter1.html)
    - [chapter 2](https://example.com/chapter2.html)
    

    Download the source HTML, run it through readability, then render as PDF and merge accordingly.

    opened by captn3m0 0
Releases(v1.0.4)
  • v1.0.4(Dec 31, 2021)

    Changed

    • Switched from html5 to html5lib as a dependency, since the former is unmaintained.

    Added

    • Python 3.10 support

    Removed

    • Python 3.6 support
    Source code(tar.gz)
    Source code(zip)
  • v1.0.3(Jul 16, 2021)

    • Added tests and code coverage
    • PDFs can be directly fetched from Remote URLs
    • PDFs can be filtered to have start and end pages
    • Support for Python 3.6-3.8
    • Removed --cleanup argument, since that is default

    Published on PyPI: https://pypi.org/project/pystitcher/1.0.3/

    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(Jul 16, 2021)

A bot for PDF for doing Many Things....

Telegram PDF Bot A Telegram bot that can: Compress, crop, decrypt, encrypt, merge, preview, rename, rotate, scale and split PDF files Compare text dif

Mr. Developer 60 Dec 27, 2022
PyPDF2 is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files.

PyPDF2 is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files

Matthew Stamy 5k Jan 04, 2023
A tool for certificate PDF generation.

certificate-pdf-generator 获奖证书PDF批量生成工具 | a Tool for certificate PDF generation. ⚠️ 下载前请注意 本项目使用了LFS来存储PDF等大文件。在克隆或下载本仓库前,请先使用apt等包管理器安装git-lfs包。如果已经克

Wanghao Xu 4 Nov 28, 2022
Simple pdf editor while preserving structure and format.

SIMPdf Simple pdf editor while preserving structure and format.

Shashwat Singh 242 Jan 04, 2023
Svg2pdfgen - Svg To PDF gen with python

Svg2pdfgen - Svg To PDF gen with python

Robert Urbańczyk 3 May 30, 2022
PyMuPDF is a Python binding with support for MuPDF

PyMuPDF is a Python binding with support for MuPDF (current version 1.18.*), a lightweight PDF, XPS, and E-book viewer, renderer, and toolkit, which is maintained and developed by Artifex Software, I

PyMuPDF 1.9k Jan 03, 2023
Performing the following operations using python on PDF.

Python PDF Handling Tutorial Python is a highly versatile language with a huge set of libraries. It is a high level language with simple syntax. Pytho

Prajwol Lamichhane 131 Dec 16, 2022
Python script that split PDF files.

Automatic PDF Splitter This script can create new single-page PDFs files from multipaged PDFs. Requirements Python 3.0+ # Debian distros sudo apt-get

Leandro Padula 5 Apr 02, 2022
An application which enables the users to perform simple yet intriguing PDF operations

AstutePDF A repository containing the GUI for an application which enables the users to perform simple yet intriguing PDF operations. These include, M

Raghav S 5 Jan 22, 2022
rst2pdf: Use a text editor. Make a PDF.

rst2pdf: Use a text editor. Make a PDF.

rst2pdf 487 Jan 06, 2023
A bulk pdf generator. This application can generate PDFs in bulk by using just one click.

A bulk html pdf generator. This application can generate PDFs in bulk by using just one click. Screenshots Requirements 🧱 Your system must have the f

Aman Nirala 3 Apr 23, 2022
Split given PDF document into 4 page groups and convert them to booklet format

PUTO: PDF to Booklet converter Split given PDF document into 4 page groups and convert them to booklet format. It creates a PDF like shown below: Fir

3 Mar 12, 2022
x-ray is a Python library for finding bad redactions in PDF documents.

A tool to detect whether a PDF has a bad redaction

Free Law Project 73 Dec 19, 2022
Convert PDF to AudioBook and Audio Speech to PDF

In this Python project, we will build a GUI-based PDF to Audio and Audio to PDF converter using the Tkinter, OS, path, pyttsx3, SpeechRecognition, PyPDF4, and Pydub libraries and the messagebox modul

RISHABH MISHRA 1 Feb 13, 2022
Convert given source code into .pdf with syntax highlighting and more features

Code2pdf 📠 Convert given source code into .pdf with syntax highlighting and more features Build Status Version Downloads Python Demo Installation Bui

Tushar Gautam 343 Jan 05, 2023
Pdfencrypt is a tool to encrypt/lock PDFs

Pdfencrypt Pdfencrypt is a tool to encrypt/lock PDFs Installation $ apt update $ apt upgrade $ apt install git $ apt install python $ git clone https:

Anontemitayo 5 Nov 28, 2021
borb is a library for reading, creating and manipulating PDF files in python.

borb is a library for reading, creating and manipulating PDF files in python.

Joris Schellekens 2.9k Jan 01, 2023
Python bindings for MuPDF's rendering library.

PyMuPDF 1.19.3 Release date: December 15, 2021 On PyPI since August 2016: Author Jorj X. McKie, based on original code by Ruikai Liu. Introduction PyM

Jorj X. McKie 0 Nov 03, 2022
Convert Lecture Videos to PDF

Convert Lecture Videos to PDF Description Want to go through lecture videos faster without missing any information? Wish you can read the lecture vide

Emilio Kartono 20 Nov 25, 2022
Extract the table in the PDF,outputs the data similar to the json format

extract the table in the PDF,outputs the data similar to the json format

3 Nov 25, 2021