A document format conversion service based on Pandoc.

Overview

reformed

Lint Status Test Status codecov

Document format conversion service based on Pandoc.

Usage

The API specification for the Reformed server is as follows:

GET /api/v1/formats: Lists available input and output formats for documents

Response

{
  "input": {
    "commonmark": {
      "mime": "text/markdown",
      "ext": "md",
      "detail": "CommonMark Markdown"
    },
    "docx": {
      "mime": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
      "ext": "docx",
      "detail": "Word docx"
    },
    // ...
  },
  "output": {
    "commonmark": {
      "mime": "text/markdown",
      "ext": "md",
      "detail": "CommonMark Markdown"
    },
    "docx": {
      "mime": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
      "ext": "docx",
      "detail": "Word docx"
    },
    // ...
    "latex": {
      "mime": "text/x-tex",
      "ext": "tex",
      "detail": "LaTeX"
    },
    // ...
  }
}

POST /api/v1/from/[input format]/to/[output format]: Converts a document from one format to another

Request

The request should be made with the multipart/form-data encoding.

Parameters

The request parameters are as follows:

File document

Document to convert. For example, to convert a docx file to a pdf file, the following cURL command will work:

curl -X POST -F '[email protected]' http://localhost:8000/api/v1/from/docx/to/pdf > test.pdf
Boolean bundle

Whether to bundle the created document and any media (extracted pictures from e.g. a .docx file) together in a .zip archive.

If the form value for this option is anything except a blank string, it will be treated as True.

If no media is generated and this option is set, this will return the reformatted document in a .zip archive by itself.

If media is generated and this option is not set, any extracted media will be discarded and just the document will be returned.

Boolean Pandoc flags

This endpoint supports the following Pandoc standalone flags: ascii, gladtex, html-q-tags, incremental, listings, mathml, no-highlight, number-sections, preserve-tabs, reference-links, section-divs, standalone, strip-comments, toc.

If the form value for a given flag is anything except a blank string, it will be added to the Pandoc call.

See the Pandoc manual for more information on these flags' effects.

Pandoc flags with choices

This endpoint supports the following Pandoc flags which have specific choices: eol, markdown-headings, reference-location, top-level-division, track-changes, wrap.

If the form value for a given flag is valid, it will be added to the Pandoc call.

See the Pandoc manual for more information on these flags' effects.

Integer columns (Pandoc option)

If specified and a valid integer, this will add the --columns=XX option to the Pandoc call. The value is bounded to 1 <= columns <= 300 by Reformed.

See the Pandoc manual's description for more.

Integer dpi (Pandoc option)

If specified and a valid integer, this will add the --dpi=XX option to the Pandoc call. The value is bounded to 36 <= dpi <= 600 by Reformed.

See the Pandoc manual's description for more.

Integer toc-depth (Pandoc option)

If specified and a valid integer, this will add the --toc-depth=XX option to the Pandoc call. The value is bounded to 1 <= toc-depth <= 6 by Reformed.

See the Pandoc manual's description for more.

Response

A binary stream with the MIME type specified in the list of formats. Content-Disposition is forced to be an attachment to prevent files from rendering in the browser.

If an error is encountered, this will instead be a JSON response with an error key specifying what went wrong.

Configuration

A few configuration environment variables are available for the Reformed server, listed here with their default values:

# Maximum buffer size for requests, in bytes - mostly useful for controlling file uploads
# Defaults to 25 MiB
REFORMED_MAX_BUFFER_SIZE=26214400

# Port to accept requests on
REFORMED_PORT=8000

# Number of worker processes to start
REFORMED_WORKERS=2

Deploying

Main-branch and tagged releases are both automatically published as Docker images to the GitHub Container Registry. These images can be run in the standard fashion as a daemon, and expose a Tornado HTTP server on port 8000.

See the package listing for more information on pulling the image.

Developing and Testing

The development requirements are specified in requirements.dev.txt.

To test with coverage, use the following command:

coverage run -m unittest -v

To run the linter, use the following command:

flake8 reformed
You might also like...
Service for visualisation of high dimensional for hydrosphere
Service for visualisation of high dimensional for hydrosphere

hydro-visualization Service for visualization of high dimensional for hydrosphere DEPENDENCIES DEBUG_ENV = bool(os.getenv("DEBUG_ENV", False)) APP_POR

Documentation generator for C++ based on Doxygen and mosra/m.css.

mosra/m.css is a Doxygen-based documentation generator that significantly improves on Doxygen's default output by controlling some of Doxygen's more unruly options, supplying it's own slick HTML+CSS generation and adding a fantastic live search feature.

Dev Centric Tools for Mkdocs Based Documentation
Dev Centric Tools for Mkdocs Based Documentation

docutools MkDocs Documentation Tools For Developers This repo is providing a set of plugins for mkdocs material compatible documentation. It is meant

Fast syllable estimation library based on pattern matching.

Syllables: A fast syllable estimator for Python Syllables is a fast, simple syllable estimator for Python. It's intended for use in places where speed

Explicit, strict and automatic project version management based on semantic versioning.
Explicit, strict and automatic project version management based on semantic versioning.

Explicit, strict and automatic project version management based on semantic versioning. Getting started End users Semantic versioning Project version

This repo provides a package to automatically select a random seed based on ancient Chinese Xuanxue

🤞 Random Luck Deep learning is acturally the alchemy. This repo provides a package to automatically select a random seed based on ancient Chinese Xua

script to calculate total GPA out of 4, based on input gpa.csv

gpa_calculator script to calculate total GPA out of 4 based on input gpa.csv to use, create a total.csv file containing only one integer showing the t

Code for our SIGIR 2022 accepted paper : P3 Ranker: Mitigating the Gaps between Pre-training and Ranking Fine-tuning with Prompt-based Learning and Pre-finetuning

P3 Ranker Implementation for our SIGIR2022 accepted paper: P3 Ranker: Mitigating the Gaps between Pre-training and Ranking Fine-tuning with Prompt-bas

Matlab Python Heuristic Battery Opt - SMOP conversion and manual conversion

SMOP is Small Matlab and Octave to Python compiler. SMOP translates matlab to py

Conversion of Image, video, text into ASCII format

asciju Python package that converts image to ascii Free software: MIT license

Django-Audiofield is a simple app that allows Audio files upload, management and conversion to different audio format (mp3, wav & ogg), which also makes it easy to play audio files into your Django application.
Django-Audiofield is a simple app that allows Audio files upload, management and conversion to different audio format (mp3, wav & ogg), which also makes it easy to play audio files into your Django application.

Django-Audiofield Description: Django Audio Management Tools Maintainer: Areski Contributors: list of contributors Django-Audiofield is a simple app t

Simple, minimal conversion of Bus Open Data Service SIRI-VM data to JSON

Simple, minimal conversion of Bus Open Data Service SIRI-VM data to JSON

Universal Office Converter - Convert between any document format supported by LibreOffice/OpenOffice.

Automated conversion and styling using LibreOffice Universal Office Converter (unoconv) is a command line tool to convert any document format that Lib

PAGE XML format collection for document image page content and more
PAGE XML format collection for document image page content and more

PAGE-XML PAGE XML format collection for document image page content and more For an introduction, please see the following publication: http://www.pri

Split given PDF document into 4 page groups and convert them to booklet format

PUTO: PDF to Booklet converter Split given PDF document into 4 page groups and convert them to booklet format. It creates a PDF like shown below: Fir

This repository contains a set of benchmarks of different implementations of Parquet (storage format) <-> Arrow (in-memory format).
This repository contains a set of benchmarks of different implementations of Parquet (storage format) - Arrow (in-memory format).

Parquet benchmarks This repository contains a set of benchmarks of different implementations of Parquet (storage format) - Arrow (in-memory format).

Json2Xml tool will help you convert from json COCO format to VOC xml format in Object Detection Problem.

JSON 2 XML All codes assume running from root directory. Please update the sys path at the beginning of the codes before running. Over View Json2Xml t

Png2Jpg tool will help you convert from png image format to jpg images format.

PNG 2 JPG All codes assume running from root directory. Please update the sys path at the beginning of the codes before running. Over View Png2Jpg too

Releases(v0.1.0)
Owner
David Lougheed
M.Sc. student in Human Genetics
David Lougheed
Code and pre-trained models for "ReasonBert: Pre-trained to Reason with Distant Supervision", EMNLP'2021

ReasonBERT Code and pre-trained models for ReasonBert: Pre-trained to Reason with Distant Supervision, EMNLP'2021 Pretrained Models The pretrained mod

SunLab-OSU 29 Dec 19, 2022
A comprehensive and FREE Online Python Development tutorial going step-by-step into the world of Python.

FREE Reverse Engineering Self-Study Course HERE Fundamental Python The book and code repo for the FREE Fundamental Python book by Kevin Thomas. FREE B

Kevin Thomas 7 Mar 19, 2022
Official Matplotlib cheat sheets

Official Matplotlib cheat sheets

Matplotlib Developers 6.7k Jan 09, 2023
Minimal reproducible example for `mkdocstrings` Python handler issue

Minimal reproducible example for `mkdocstrings` Python handler issue

Hayden Richards 0 Feb 17, 2022
FireEye Related Projects

FireEye FireEye Related Projects Tor-IP-Collector Simple python script that will collect a list of TOR IPs from the SecOps Institute Github and inject

Taran Ulrich 2 Nov 12, 2022
A simple flask application to collect annotations for the Turing Change Point Dataset, a benchmark dataset for change point detection algorithms

AnnotateChange Welcome to the repository of the "AnnotateChange" application. This application was created to collect annotations of time series data

The Alan Turing Institute 16 Jul 21, 2022
A simple USI Shogi Engine written in python using python-shogi.

Revengeshogi My attempt at creating a USI Shogi Engine in python using python-shogi. Current State of Engine Currently only generating random moves us

1 Jan 06, 2022
My solutions to the Advent of Code 2021 problems in Go and Python 🎄

🎄 Advent of Code 2021 🎄 Summary Advent of Code is an annual Advent calendar of programming puzzles. This year I am doing it in Go and Python. Runnin

Orfeas Antoniou 16 Jun 16, 2022
Automated Integration Testing and Live Documentation for your API

Automated Integration Testing and Live Documentation for your API

ScanAPI 1.3k Dec 30, 2022
Quilt is a self-organizing data hub for S3

Quilt is a self-organizing data hub Python Quick start, tutorials If you have Python and an S3 bucket, you're ready to create versioned datasets with

Quilt Data 1.2k Dec 30, 2022
Service for visualisation of high dimensional for hydrosphere

hydro-visualization Service for visualization of high dimensional for hydrosphere DEPENDENCIES DEBUG_ENV = bool(os.getenv("DEBUG_ENV", False)) APP_POR

hydrosphere.io 1 Nov 12, 2021
Run `black` on python code blocks in documentation files

blacken-docs Run black on python code blocks in documentation files. install pip install blacken-docs usage blacken-docs provides a single executable

Anthony Sottile 460 Dec 23, 2022
Materi workshop "Light up your Python!" Himpunan Mahasiswa Sistem Informasi Fakultas Ilmu Komputer Universitas Singaperbangsa Karawang, 4 September 2021 (Online via Zoom).

Workshop Python UNSIKA 2021 Materi workshop "Light up your Python!" Himpunan Mahasiswa Sistem Informasi Fakultas Ilmu Komputer Universitas Singaperban

Eka Putra 20 Mar 24, 2022
Convert excel xlsx file's table to csv file, A GUI application on top of python/pyqt and other opensource softwares.

Convert excel xlsx file's table to csv file, A GUI application on top of python/pyqt and other opensource softwares.

David A 0 Jan 20, 2022
Fully typesafe, Rust-like Result and Option types for Python

safetywrap Fully typesafe, Rust-inspired wrapper types for Python values Summary This library provides two main wrappers: Result and Option. These typ

Matthew Planchard 32 Dec 25, 2022
Spin-off Notice: the modules and functions used by our research notebooks have been refactored into another repository

Fecon235 - Notebooks for financial economics. Keywords: Jupyter notebook pandas Federal Reserve FRED Ferbus GDP CPI PCE inflation unemployment wage income debt Case-Shiller housing asset portfolio eq

Adriano 825 Dec 27, 2022
Word document generator with python

In this study, real world data is anonymized. The content is completely different, but the structure is the same. It was a script I prepared for the backend of a work using UiPath.

Ezgi Turalı 3 Jan 30, 2022
Obmovies - A short guide on setting up the system and environment dependencies required for ob's Movies database

Obmovies - A short guide on setting up the system and environment dependencies required for ob's Movies database

1 Jan 04, 2022
Elliptic curve cryptography (ed25519) beginner tutorials in Python 3

ed25519_tutorials Elliptic curve cryptography (ed25519) beginner tutorials in Python 3 Instructions Just download the repo and read the tutorial files

6 Dec 27, 2022
Members: Thomas Longuevergne Program: Network Security Course: 1DV501 Date of submission: 2021-11-02

Mini-project report Members: Thomas Longuevergne Program: Network Security Course: 1DV501 Date of submission: 2021-11-02 Introduction This project was

1 Nov 08, 2021