pystitcher stitches your PDF files together, generating nice customizable bookmarks for you using a declarative markdown file as input

Last update: Dec 10, 2022

Overview

pystitcher

pystitcher stitches your PDF files together, generating nice customizable bookmarks for you using a declarative input in the form of a markdown file. It is written in pure python and uses PyPDF3 for reading and writing PDF files.

Description

pystitcher is a command line tool, with very few cli options:

usage: pystitcher [-h] [--version] [-v] [--cleanup | --no-cleanup] spine.md output.pdf

Stitch PDF files together

positional arguments:
  spine.md              Input markdown file
  output.pdf            Output PDF file

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -v, --verbose         log more things
  --cleanup, --no-cleanup
                        Delete temporary files (default: True)

Given this input:

existing_bookmarks: remove
title: Complete Guide to the Personal Data Protection Bill
author: Medianama
keywords: privacy, surveillance, personal data protection
subject: Personal Data Protection Bill
# A Complete Guide to the Personal Data Protection Bill

- [Cover](cover.pdf)

# The Bills

- [Personal Data Protection Bill, 2019](https://example.com/2019-bill.pdf)
- [Personal Data Protection Bill, 2018](https://example.com/2018-bill.pdf)

# Other key reading material

- [Srikrishna Committee Report](2.a.pdf)
- [Dvara Research's Personal Data Protection Bill](2.b.pdf)
- [MP Shashi Tharoor's Data Protection Bill](2.c.pdf)
- [MP Jay Panda's Data Protection Bill](2.d.pdf)
- [SaveOurPrivacy.in bill](2.e.pdf)
- [TRAI recommendations on privacy](2.f1.pdf)
- [Comments on TRAI recommendations on privacy](2.f2.pdf)

Will generate a PDF with proper bookmarks:

And the correct metadata:

Title:          Complete Guide to the Personal Data Protection Bill
Subject:        Personal Data Protection Bill
Keywords:       privacy, surveillance, personal data protection
Author:         Medianama
Creator:        pystitcher/1.0.0
Producer:       pystitcher/1.0.0

Configuration options can be specified with Meta data at the top of the file.

Option	Notes
fit	Default fit of the bookmark. Can be overwritten per bookmark See wiki for more details.
author	PDF Author
keywords	PDF Keywords
subject	PDF Subject
title	PDF Title. If left unspecified, first Heading (h1) in the document is used.
existing_bookmarks	What to do with existing bookmarks in individual files. Options are `keep`, `flatten`, and `remove`. See docs for more details.

Additionally, PDF links specified in markdown can have attributes to alter the PDFs before merging. The below attribute will rotate the second PDF file by 90 degrees clockwise before merging:

[Part 1](1.pdf)
[Part 2](2.pdf){: rotate="90"}

And the below attribute will merge only pages 2 to 5, both inclusive, from the second PDF file:

[Part 1](1.pdf)
[Part 2](2.pdf){: start=2 end=5}

The list of available attributes are:

Attribute	Notes
rotate	Rotate the PDF. Valid values are 90, 180, 270
start	Start page number for PDF page selection
end	End page number for PDF page selection

Documentation

Additional documentation is maintained on the project wiki on GitHub.

Comments

Installation instructions: please update the readme front page

Thanks for your work providing this tool. I ended up here looking for an alternative to python stapler.

There's a lot of good and important stuff on the README.

Please add one or two lines, at the top of the README with the most important stuff. How to install it.

It might be obvious to you, a python developer, but not for a potential end user. Is it using "pip install xyz" ? Will it work with pipx ? Are there any "official" packages for Linux distro xyz ?

Thanks in advance.

opened by m040601 5
Added PDF rotation filter

Closes #1

Added a test input in book-rotate.md as well.

I've been using pdftk a lot recently, but pystitcher definitely works better for me. Thanks for working on it!

opened by Vonter 3
Python 3.9 required?

Thanks for the great code, it worked well for me putting books back together from chapters in Elsevier. The only issue I had was that it required using Python 3.9 at a minimum. I initially had an error under Python 3.7, complaining about line 56 of skeleton.py with reference to argparse.BooleanOptionalAction ; I lost the actual error message e.g see here for related.
bug

opened by jd-foster 2
Add Tests
Starting Integration tests. Currently tracks:

[x] Sticher functionality by generating all test files

[x] Number of pages in these test files

[x] Bookmarks (title/destination page number)

[x] Bookmark level

[x] PDF metadata

[x] Attributes: Rotation

[x] Attributes: Page Selection

[x] Run CI tests on GitHub Actions

[x] Generate coverage reports

Missing testcases:

[ ] Remote fetching (Will take this up later)

[x] Custom Title

[x] H2/H3 as bookmarks

[x] Disable cleanup and validate

enhancement
opened by captn3m0 1

Specify Zoom level for links in markdown

[Personal Data Protection Bill, 2019](1.a.pdf){: zoom=FitWidth}

Other options:

    Inherit - Inherit zoom
    FitPage - Fit page width+height
    FitWidth - Fit page width
    FitHeight - Fit page height
    ##% - Zoom to ##% eg 50% = 50% zoom

opened by captn3m0 0

Support external URLs to fetch PDF

# Title

- [chapter 1](https://example.com/chapter1.pdf)
- [chapter 2](https://example.com/chapter2.pdf)

Download the PDFs, cache them and merge accordingly.

opened by captn3m0 0

Auto Page numbering support

Want to be able to add page numbers to the generated PDF with font configuration. Use case - Printouts. Once this is done, easy to add a Table of Contents too. #6

I was exploring and found this as one option: https://github.com/vlad-anisov/numbering2pdf/blob/main/numbering2pdf/numbering2pdf.py It uses reportlab to generate empty numbered PDFs and merges those pages with the existing pages one by one.

Happy to work on this issue, if you suggest a preferred method (given your research) to implement this.
enhancement

opened by lprsd 1
Fix current working directory hack

Currently, we switch our CWD to the markdown file directory, and don't reset it back. Playing around with chdir is bad and causes issues.

Fix this to instead use paths relative from the markdown file directory.
bug

opened by captn3m0 0
Render Markdown inline
Within the markdown, provide a way to declare pages that get rendered as stand-alone pages as well.

 # Cover ![Cover](cover.pdf) # Colophon ``` # Hobbit ## There and back again ## By JRR Tolkein ```{: inline=1} ![Foreword](foreword.pdf)

Renders a cover, a single page with the 3 lines as above, and then the foreword. So the colophon ends up linking to the middle text section
opened by captn3m0 3
Fetch HTML online and render
# Title - [chapter 1](https://example.com/chapter1.html) - [chapter 2](https://example.com/chapter2.html)

Download the source HTML, run it through readability, then render as PDF and merge accordingly.
opened by captn3m0 0

Releases(v1.0.4)

v1.0.4(Dec 31, 2021)
Changed

Switched from html5 to html5lib as a dependency, since the former is unmaintained.

Added

Python 3.10 support

Removed

Python 3.6 support

Source code(tar.gz)
Source code(zip)
v1.0.3(Jul 16, 2021)
Added tests and code coverage

PDFs can be directly fetched from Remote URLs

PDFs can be filtered to have start and end pages

Support for Python 3.6-3.8

Removed --cleanup argument, since that is default

Published on PyPI: https://pypi.org/project/pystitcher/1.0.3/
Source code(tar.gz)
Source code(zip)
v1.0.2(Jul 16, 2021)
Adds support for rotating PDFs

Source code(tar.gz)
Source code(zip)

Owner

Nemo

Working at @razorpay

GitHub Repository https://pypi.org/project/pystitcher/

Generate a bunch of malicious pdf files with phone-home functionality. Can be used with Burp Collaborator

Malicious PDF Generator ☠️ Generate ten different malicious pdf files with phone-home functionality. Can be used with Burp Collaborator. Used for pene

1.9k Jan 01, 2023

WeasyPrint is a smart solution helping web developers to create PDF documents.

WeasyPrint is a smart solution helping web developers to create PDF documents. It turns simple HTML pages into gorgeous statistical reports, invoices, tickets…

5.4k Jan 08, 2023

A tool for certificate PDF generation.

certificate-pdf-generator 获奖证书PDF批量生成工具 | a Tool for certificate PDF generation. ⚠️ 下载前请注意本项目使用了LFS来存储PDF等大文件。在克隆或下载本仓库前，请先使用apt等包管理器安装git-lfs包。如果已经克

4 Nov 28, 2022

x-ray is a Python library for finding bad redactions in PDF documents.

A tool to detect whether a PDF has a bad redaction

73 Dec 19, 2022

Python script that split PDF files.

Automatic PDF Splitter This script can create new single-page PDFs files from multipaged PDFs. Requirements Python 3.0+ # Debian distros sudo apt-get

5 Apr 02, 2022

Produce pdf in python backend from simple bootstrap vue frontend and download to browser

vollmacht produce pdf in python backend from simple bootstrap vue frontend and download to browser Frontend in one file with bootstrap-vue (allthough

1 Nov 08, 2020

pikepdf is a Python library for reading and writing PDF files.

A Python library for reading and writing PDF, powered by qpdf

1.6k Jan 03, 2023

this is simple program, that converts pdf file to png

author: a5892731 last update:2021-11-01 version: 1.1 resources: -https://pypi.org/project/pdf2image/ -https://github.com/oschwartz10612/poppler-window

1 Nov 01, 2021

A Python tool to generate a static HTML file that represents the internal structure of a PDF file

PDFSyntax A Python tool to generate a static HTML file that represents the internal structure of a PDF file At some point the low-level functions deve

394 Dec 30, 2022

Busca no nome e conteúdo de arquivos PDF no diretório e subdiretórios.

PDF Finder Este script auxilia na pesquisa em pastas com inúmeros arquivos PDF. A pesquisa é feita em todos os arquivos do doretório e subdiretórios.

1 Nov 27, 2021

Mipdfcompressor - 💕A simple pdf size compressing telegram robot

Pdf Compressor Telegram Bot A simple pdf size compressing telegram robot. Useful for digital documentation. Mandatory Variables API_HASH - Your A

1 Feb 14, 2022

rst2pdf: Use a text editor. Make a PDF.

487 Jan 06, 2023

Simple HTML and PDF document generator for Python - with built-in support for popular data analysis and plotting libraries.

Esparto is a simple HTML and PDF document generator for Python. Its primary use is for generating shareable single page reports with content from popular analytics and data science libraries.

76 Dec 12, 2022

pystitcher stitches your PDF files together, generating nice customizable bookmarks for you using a declarative markdown file as input

Related tags

Overview

pystitcher

Description

Documentation

Comments

Releases(v1.0.4)

v1.0.4(Dec 31, 2021)

Changed

Added

Removed

v1.0.3(Jul 16, 2021)

v1.0.2(Jul 16, 2021)

Owner

Nemo

Generate a bunch of malicious pdf files with phone-home functionality. Can be used with Burp Collaborator

WeasyPrint is a smart solution helping web developers to create PDF documents.

A tool for certificate PDF generation.

x-ray is a Python library for finding bad redactions in PDF documents.

Python script that split PDF files.

Produce pdf in python backend from simple bootstrap vue frontend and download to browser

pikepdf is a Python library for reading and writing PDF files.

this is simple program, that converts pdf file to png

A Python tool to generate a static HTML file that represents the internal structure of a PDF file

Busca no nome e conteúdo de arquivos PDF no diretório e subdiretórios.

Mipdfcompressor - 💕A simple pdf size compressing telegram robot

rst2pdf: Use a text editor. Make a PDF.

Simple HTML and PDF document generator for Python - with built-in support for popular data analysis and plotting libraries.

Camelot is a Python library that can help you extract tables from PDFs!

Trata PDF para torná-lo compatível com PDF/X e com impressoras em escala de cinza.

Camelot is a Python library that makes it easy for anyone to extract tables from PDF files

JoplinPdf2Images - Converts a PDF to images in Joplin and adds it to the specified note as a printout

Convert MD files to PDF automatically (with CSS) 📄🚀

PyMuPDF is a Python binding with support for MuPDF

Convert PDF to AudioBook and Audio Speech to PDF