🔤 Measure edit distance based on keyboard layout

Related tags

Miscellaneousclavier
Overview

clavier

Measure edit distance based on keyboard layout.



Table of contents

Introduction

Default edit distances, such as the Levenshtein distance, don't differentiate between characters. The distance between two characters is either 0 or 1. This package allows you to measure edit distances by taking into account keyboard layouts.

The scope is purposefully limited to alphabetical, numeric, and punctuation keys. That's because this package is meant to assist in analyzing user inputs -- e.g. for spelling correction in a search engine.

The goal of this package is to be flexible. You can define any logical layout, such as QWERTY or AZERTY. You can also control the physical layout by defining where the keys are on the board.

Installation

pip install git+https://github.com/MaxHalford/clavier

User guide

Keyboard layouts

☝️ Things are a bit more complicated than QWERTY vs. AZERTY vs. XXXXXX. Each layout has many variants. I haven't yet figured out a comprehensive way to map all these out.

This package provides a list of keyboard layouts. For instance, we'll load the QWERTY keyboard layout.

>>> import clavier
>>> keyboard = clavier.load_qwerty()
>>> keyboard
1 2 3 4 5 6 7 8 9 0 - =
q w e r t y u i o p [ ] \
a s d f g h j k l ; '
z x c v b n m , . /

>>> keyboard.shape
(4, 13)

>>> len(keyboard)
46

Here is the list of currently available layouts:

>>> for layout in (member for member in dir(clavier) if member.startswith('load_')):
...     print(layout.replace('load_', ''))
...     exec(f'print(clavier.{layout}())')
...     print('---')
dvorak
` 1 2 3 4 5 6 7 8 9 0 [ ]
' , . p y f g c r l / = \
a o e u i d h t n s -
; q j k x b m w v z
---
qwerty
1 2 3 4 5 6 7 8 9 0 - =
q w e r t y u i o p [ ] \
a s d f g h j k l ; '
z x c v b n m , . /
---

Distance between characters

Measure the Euclidean distance between two characters on the keyboard.

>>> keyboard.char_distance('1', '2')
1.0

>>> keyboard.char_distance('q', '2')
1.4142135623730951

>>> keyboard.char_distance('1', 'm')
6.708203932499369

Distance between words

Measure a modified version of the Levenshtein distance, where the substitution cost is the output of the char_distance method.

>>> keyboard.word_distance('apple', 'wople')
2.414213562373095

>>> keyboard.word_distance('apple', 'woplee')
3.414213562373095

You can also override the deletion cost by specifying the deletion_cost parameter, and the insertion cost via the insertion_cost parameter. Both default to 1.

Typing distance

Measure the sum of distances between each pair of consecutive characters. This can be useful for studying keystroke dynamics.

>>> keyboard.typing_distance('hello')
10.245040190466598

For sentences, you can split them up into words and sum the typing distances.

>>> sentence = 'the quick brown fox jumps over the lazy dog'
>>> sum(keyboard.typing_distance(word) for word in sentence.split(' '))
105.60457487263012

Interestingly, this can be used to compare keyboard layouts in terms of efficiency. For instance, the Dvorak keyboard layout is supposedly more efficient than the QWERTY layout. Let's compare both on the first stanza of If— by Rudyard Kipling:

>> words = list(map(str.lower, stanza.split())) >>> qwerty = clavier.load_qwerty() >>> sum(qwerty.typing_distance(word) for word in words) 740.3255229138255 >>> dvorak = clavier.load_dvorak() >>> sum(dvorak.typing_distance(word) for word in words) 923.6597116104518 ">
>>> stanza = """
... If you can keep your head when all about you
...    Are losing theirs and blaming it on you;
... If you can trust yourself when all men doubt you,
...    But make allowance for their doubting too;
... If you can wait and not be tired by waiting,
...    Or, being lied about, don't deal in lies,
... Or, being hated, don't give way to hating,
...    And yet don't look too good, nor talk too wise;
... """

>>> words = list(map(str.lower, stanza.split()))

>>> qwerty = clavier.load_qwerty()
>>> sum(qwerty.typing_distance(word) for word in words)
740.3255229138255

>>> dvorak = clavier.load_dvorak()
>>> sum(dvorak.typing_distance(word) for word in words)
923.6597116104518

It seems the Dvorak layout is in fact slower than the QWERTY layout. But of course this might not be the case in general.

Nearest neighbors

You can iterate over the k nearest neighbors of any character.

>>> qwerty = clavier.load_qwerty()
>>> for char, dist in qwerty.nearest_neighbors('s', k=8, cache=True):
...     print(char, f'{dist:.4f}')
w 1.0000
a 1.0000
d 1.0000
x 1.0000
q 1.4142
e 1.4142
z 1.4142
c 1.4142

The cache parameter determines whether or not the result should be cached for the next call.

Physical layout specification

By default, the keyboard layouts are ortholinear, meaning that the characters are physically arranged over a grid. You can customize the physical layout to make it more realistic and thus obtain distance measures which are closer to reality. This can be done by specifying parameters to the keyboards when they're loaded.

Staggering

Staggering is the amount of offset between two consecutive keyboard rows.

You can specify a constant staggering as so:

>>> keyboard = clavier.load_qwerty(staggering=0.5)

By default the keys are spaced by 1 unit. So a staggering value of 0.5 implies a 50% horizontal shift between each pair of consecutive rows. You may also specify a different amount of staggering for each pair of rows:

>>> keyboard = clavier.load_qwerty(staggering=[0.5, 0.25, 0.5])

There's 3 elements in the list because the keyboard has 4 rows.

Key pitch

Key pitch is the amount of distance between the centers of two adjacent keys. Most computer keyboards have identical horizontal and vertical pitches, because the keys are all of the same size width and height. But this isn't the case for mobile phone keyboards. For instance, iPhone keyboards have a higher vertical pitch.

Drawing a keyboard layout

>>> keyboard = clavier.load_qwerty()
>>> ax = keyboard.draw()
>>> ax.get_figure().savefig('img/qwerty.png', bbox_inches='tight')

qwerty

>>> keyboard = clavier.load_qwerty(staggering=[0.5, 0.25, 0.5])
>>> ax = keyboard.draw()
>>> ax.get_figure().savefig('img/qwerty_staggered.png', bbox_inches='tight')

qwerty_staggered

Custom layouts

You can of course specify your own keyboard layout. There are different ways to do this. We'll use the iPhone keypad as an example.

The from_coordinates method

>>> keypad = clavier.Keyboard.from_coordinates({
...     '1': (0, 0), '2': (0, 1), '3': (0, 2),
...     '4': (1, 0), '5': (1, 1), '6': (1, 2),
...     '7': (2, 0), '8': (2, 1), '9': (2, 2),
...     '*': (3, 0), '0': (3, 1), '#': (3, 2),
...                  '☎': (4, 1)
... })
>>> keypad
1 2 3
4 5 6
7 8 9
* 0 #

The from_grid method

>> keypad 1 2 3 4 5 6 7 8 9 * 0 # ☎ ">
>>> keypad = clavier.Keyboard.from_grid("""
...     1 2 3
...     4 5 6
...     7 8 9
...     * 0 #
...       ☎
... """)
>>> keypad
1 2 3
4 5 6
7 8 9
* 0 #

Development

git clone https://github.com/MaxHalford/clavier
cd clavier
pip install poetry
poetry install
poetry shell
pytest

License

The MIT License (MIT). Please see the license file for more information.

Owner
Max Halford
Going where the wind blows 🍃 🦔
Max Halford
LinkML based SPARQL template library and execution engine

sparqlfun LinkML based SPARQL template library and execution engine modularized core library of SPARQL templates generic templates using common vocabs

Linked data Modeling Language 6 Oct 10, 2022
Яндекс тренировки по алгоритмам. Июнь 2021

Young&&Yandex Тренировки по алгоритмам Если вы хотите попасть на летнюю стажировку в Яндекс, но пока не уверены в своих силах, приходите на наши трени

Podlevskiy Viktor 6 Sep 03, 2021
Replit theme sync; Github theme sync but in Replit.

This is a Replit theme sync, basically meaning that it keeps track of the current time (which may need to be edited later on), and if the time passes morning, afternoon, etc, the theme switches. The

Glitch 8 Jun 25, 2022
A python script for combining multiple native SU2 format meshes into one mesh file for multi-zone simulations.

A python script for combining multiple native SU2 format meshes into one mesh file for multi-zone simulations.

MKursatUzuner 1 Jan 20, 2022
The first Python 1v1.lol triggerbot working with colors !

1v1.lol TriggerBot Afin d'utiliser mon triggerbot, vous devez activer le plein écran sur 1v1.lol sur votre naviguateur (quelque-soit ce dernier). Vous

Venax 5 Jul 25, 2022
Check broken access control exists in the Java web application

javaEeAccessControlCheck Check broken access control exists in the Java web application. 检查 Java Web 应用程序中是否存在访问控制绕过问题。 使用 python3 javaEeAccessControl

kw0ng 3 May 04, 2022
Covid-ChatBot - A Rapid Response Virtual Agent for Covid-19 Queries

COVID-19 CHatBot A Rapid Response Virtual Agent for Covid-19 Queries Contents What is ChatBot Types of ChatBots About the Project Dataset Prerequisite

NelakurthiSudheer 2 Jan 04, 2022
Painel simples com consulta de cep,CNPJ,placa e ip

Painel mpm Um painel simples com consultas de IP, CNPJ, CEP e PLACA Início 🌐 apt update && apt upgrade -y pkg i python git pip install requests Insta

8 Feb 27, 2022
Interactivity Lab: Household Pulse Explorable

Interactivity Lab: Household Pulse Explorable Goal: Build an interactive application that incorporates fundamental Streamlit components to offer a cur

1 Feb 10, 2022
NFT generator for Solana!

Solseum NFT Generator for Solana! Check this guide here! Creating your randomized uniques NFTs, getting rarity information and displaying it on a webp

Solseum™ VR NFTs 145 Dec 30, 2022
A replacement of qsreplace, accepts URLs as standard input, replaces all query string values with user-supplied values and stdout.

Bhedak A replacement of qsreplace, accepts URLs as standard input, replaces all query string values with user-supplied values and stdout. Works on eve

Eshan Singh 84 Dec 31, 2022
Framework for creating efficient data processing pipelines

Aqueduct Framework for creating efficient data processing pipelines. Contact Feel free to ask questions in telegram t.me/avito-ml Key Features Increas

avito.tech 137 Dec 29, 2022
Resources for the 2021 offering of COMP 598

comp598-2021 Resources for the 2021 offering of COMP 598 General submission instructions Important Please read these instructions located in the corre

Derek Ruths 23 May 18, 2022
C++ Environment InitiatorVisual Studio Code C / C++ Environment Initiator

Visual Studio Code C / C++ Environment Initiator Latest Version : v 1.0.1(2021/11/08) .exe link here About : Visual Studio Code에서 C/C++환경을 MinGW GCC/G

Junho Yoon 2 Dec 19, 2021
A package selector for building your confy nest

Hornero A package selector for building your comfy nest About Hornero helps you to install your favourite packages on your fresh installed Linux distr

Santiago Soler 1 Nov 22, 2021
Fast Base64 encoding/decoding in Python

Fast Base64 implementation This project is a wrapper on libbase64. It aims to provide a fast base64 implementation for base64 encoding/decoding. Insta

Matthieu Darbois 96 Dec 26, 2022
Add-In for Blender to automatically save files when rendering

Autosave - Render: Automatically save .blend, .png and readme.txt files when rendering with Blender Purpose This Blender Add-On provides an easy way t

Volker 9 Aug 10, 2022
A feed generator. Currently supports generating RSS feeds from Google, Bing, and Yahoo news.

A feed generator. Currently supports generating RSS feeds from Google, Bing, and Yahoo news.

Josh Cardenzana 0 Dec 13, 2021
Larvamatch - Find your larva or punk match.

LarvaMatch Find your larva or punk match. UI TBD API (not started) The API will allow you to specify a punk by token id to find a larva match, and vic

1 Jan 02, 2022
Simple, configuration-driven backup software for servers and workstations

title permalink borgmatic index.html It's your data. Keep it that way. borgmatic is simple, configuration-driven backup software for servers and works

borgmatic collective 1.3k Dec 30, 2022