A collection of pre-commit hooks for handling text files.

Overview

texthooks

A collection of pre-commit hooks for handling text files.

In particular, hooks for handling unicode characters which may be undesirable in a repository.

Usage with pre-commit

To use with pre-commit, include this repo and the desired hooks in .pre-commit-config.yaml:

- repo: https://github.com/sirosen/texthooks
  rev: 0.1.0
  hooks:
    - id: fix-smartquotes
    - id: fix-ligatures

Standalone Usage

Each hook is usable as a CLI script. Simply

pip install texthooks

and then invoke, e.g.

fix-smartquotes FILENAME

Supported Hooks

fix-smartquotes

This fixes copy-paste from some applications which replace double-quotes with curly quotes. It does not convert corner brackets, braile quotation marks, or angle quotation marks. Those characters are not typically the result of copy-paste errors, so they are allowed.

Low quotation marks vary in usage and meaning by language, and some languages use quotation marks which are facing "outwards" (opposite facing from english). For the most part, these and exotic characters (double-prime quotes) are ignored.

In files with the offending marks, they are replaced and the run is marked as failed.

Overriding Quotation Characters

Two options are available for specifying exactly which characters will be replaced. For ease of use, they are specified as hex-encoded unicode codepoints.

Suppose you wanted to avoid replacing the "Heavy single comma quotation mark ornament" (275C) and the "Heavy single turned comma quotation mark ornament" (275B) characters. You could override the single quote codepoints as follows:

- repo: https://github.com/sirosen/texthooks
  rev: 0.1.0
  hooks:
    - id: fix-smartquotes
      # replace default single quote chars with this set:
      # apostrophe, fullwidth apostrophe, left single quote, single high
      # reversed-9 quote, right single quote
      args: ["--single-quote-codepoints", "0027,FF07,2018,201B,2019"]

fix-ligatures

Automatically find and replace ligature characters with their ascii equivalents.

This replaces liguatures which may be created by programs like LaTeX for presentation with their strictly-equivalent ASCII counterparts. For example, fi and ff may be ligature-ized.

This hook converts these back into ASCII so that tools like grep will behave as expected.

forbid-bidi-controls

This is checker which forbids the use of unicode bidirectional text control characters.

These are directional formatting characters which can be used to construct text with unexpected or unclear semantics. For example, in programming languages which allow bidirectional text in statements, "X" = ייִדיש can be written with right-to-left reversal to mean that the variable ייִדיש is assigned a value of "X".

CHANGELOG

0.2.2

  • Fix a bug in CLI argument handling for all hooks

0.2.1

  • Fix a typo in forbid-bidi-controls entrypoint

0.2.0

  • Add the forbid-bidi-controls hook
  • Adjust the handling of file encodings. Files will be read with UTF-8 encoding by default in most cases.

0.1.0

  • Initial release with fix-ligatures and fix-smartquotes hooks
Owner
Stephen Rosen
Stephen Rosen
Wikipedia Extractive Text Summarizer + Keywords Identification (entropy-based)

Wikipedia Extractive Text Summarizer + Keywords Identification (entropy-based)Wikipedia Extractive Text Summarizer + Keywords Identification (entropy-based)

Kevin Lai 1 Nov 08, 2021
Python tool to make adding to your armory spreadsheet armory less of a pain.

Python tool to make adding to your armory spreadsheet armory slightly less of a pain by creating a CSV to simply copy and paste.

1 Oct 20, 2021
Goblin-sim - Procedural fantasy world generator

goblin-sim This project is an attempt to create a procedural goblin fantasy worl

3 May 18, 2022
A simple text editor for linux

wolf-editor A simple text editor for linux Installing using Deb Package Download newest package from releases CD into folder where the downloaded acka

Focal Fossa 5 Nov 30, 2021
text-to-speach bot - You really do NOT have time for read a newsletter? Now you can listen to it

NewsletterReader You really do NOT have time for read a newsletter? Now you can listen to it The Newsletter of Filipe Deschamps is a great place to re

ItanuRomero 8 Sep 18, 2021
Export solved codewars kata challenges to a text file.

Codewars Kata Exporter Note:this is not totally my work.i've edited the project to make more easier and faster for me.you can find the original work h

Oussama Ben Sassi 4 Aug 13, 2021
StealBit1.1 and earlier strings and config extraction scripts

StealBit1.1 and earlier scripts Use strings_decryptor.py to extract RC4 encrypted strings from a StealBit1.1 sample(s). Use config_extractor.py to ext

Soolidsnake 5 Dec 29, 2022
TextStatistics - Get a text file wich contains English text

TextStatistics This program get a text file wich contains English text. The program analyses the text, and print some information. For this program I

2 Nov 15, 2021
Parse Any Text With Python

ParseAnyText A small package to parse strings. What is the work of it? Well It's a module to creates parser that helps to parse a text easily with les

Sayam Goswami 1 Jan 11, 2022
box is a text-based visual programming language inspired by Unreal Engine Blueprint function graphs.

Box is a text-based visual programming language inspired by Unreal Engine blueprint function graphs. $ cat factorial.box ┌─ƒ(Factorial)───┐

Pranav 104 Dec 24, 2022
A minimal code sceleton for a textadveture parser written in python.

Textadventure sceleton written in python Use with a map file generated on https://www.trizbort.io Use the following Sockets for walking directions: n

1 Jan 06, 2022
An experimental Fang Song style Chinese font generated with skeleton-tracing and pix2pix

An experimental Fang Song style Chinese font generated with skeleton-tracing and pix2pix, with glyphs based on cwTeXFangSong. The font is optimised fo

Lingdong Huang 98 Jan 07, 2023
This repos is auto action which generating a wordcloud made by Twitter.

auto_tweet_wordcloud This repos is auto action which generating a wordcloud made by Twitter. Preconditions Install Python dependencies pip install -r

tubone(Yu Otsubo) 0 Apr 29, 2022
This repository contains scripts to control a RGB text fan attached to a Raspberry Pi.

RGB Text Fan Controller This repository contains scripts to control a RGB text fan attached to a Raspberry Pi. Setup The Raspberry Pi and RGB text fan

Luke Prior 1 Oct 01, 2021
Adventura is an open source Python Text Adventure Engine

Adventura Adventura is an open source Python Text Adventure Engine, Not yet uplo

5 Oct 02, 2022
A python Tk GUI that creates, writes text and attaches images into a custom spreadsheet file

A python Tk GUI that creates, writes text and attaches images into a custom spreadsheet file

Mirko Simunovic 13 Dec 09, 2022
PyMultiDictionary is a Dictionary Module for Python 3+ to get meanings, translations, synonyms and antonyms of words in 20 different languages

PyMultiDictionary PyMultiDictionary is a Dictionary Module for Python 3+ to get meanings, translations, synonyms and antonyms of words in 20 different

Pablo Pizarro R. 19 Dec 26, 2022
Widevine KEY Extractor in Python

Widevine Client 3 This was originally written by T3rry7f. This repo is slightly modified version of his repo. This only works on standard Windows! Usa

Vank0n (SJJeon) 68 Dec 29, 2022
Python Lex-Yacc

PLY (Python Lex-Yacc) Copyright (C) 2001-2020 David M. Beazley (Dabeaz LLC) All rights reserved. Redistribution and use in source and binary forms, wi

David Beazley 2.4k Dec 31, 2022
Um simulador de caixa registradora com database usando arquivos .txt

🛒 Caixa Registradora V2 ❓ - Como usar? Execute o caixa-registradora.py, nele vai ter um menu interativo, você pode cadastrar diversos produtos em um

Gabriel 0 Sep 25, 2022