🎐 a python library for doing approximate and phonetic matching of strings.

Last update: Dec 21, 2022

Overview

jellyfish

Jellyfish is a python library for doing approximate and phonetic matching of strings.

Written by James Turk <[email protected]> and Michael Stephens.

See https://github.com/jamesturk/jellyfish/graphs/contributors for contributors.

See http://jellyfish.readthedocs.io for documentation.

Source is available at http://github.com/jamesturk/jellyfish.

Jellyfish >= 0.7 only supports Python 3, if you need Python 2 please use 0.6.x.

Included Algorithms

String comparison:

Levenshtein Distance
Damerau-Levenshtein Distance
Jaro Distance
Jaro-Winkler Distance
Match Rating Approach Comparison
Hamming Distance

Phonetic encoding:

American Soundex
Metaphone
NYSIIS (New York State Identification and Intelligence System)
Match Rating Codex

Example Usage

>>> import jellyfish
>>> jellyfish.levenshtein_distance(u'jellyfish', u'smellyfish')
2
>>> jellyfish.jaro_distance(u'jellyfish', u'smellyfish')
0.89629629629629637
>>> jellyfish.damerau_levenshtein_distance(u'jellyfish', u'jellyfihs')
1

>>> jellyfish.metaphone(u'Jellyfish')
'JLFX'
>>> jellyfish.soundex(u'Jellyfish')
'J412'
>>> jellyfish.nysiis(u'Jellyfish')
'JALYF'
>>> jellyfish.match_rating_codex(u'Jellyfish')
'JLLFSH'

Running Tests

If you are interested in contributing to Jellyfish, you may want to run tests locally. Jellyfish uses tox to run tests, which you can setup and run as follows:

pip install tox
# cd jellyfish/
tox

🎐 a python library for doing approximate and phonetic matching of strings.

Related tags

Overview

jellyfish

Included Algorithms

Example Usage

Running Tests

Owner

James Turk

BERT, LDA, and TFIDF based keyword extraction in Python

Yet Another Neural Machine Translation Toolkit

Reformer, the efficient Transformer, in Pytorch

CPC-big and k-means clustering for zero-resource speech processing

A repository to run gpt-j-6b on low vram machines (4.2 gb minimum vram for 2000 token context, 3.5 gb for 1000 token context). Model loading takes 12gb free ram.

Interactive Jupyter Notebook Environment for using the GPT-3 Instruct API

A PyTorch implementation of the Transformer model in "Attention is All You Need".

Model parallel transformers in JAX and Haiku

A Paper List for Speech Translation

A demo of chinese asr

超轻量级bert的pytorch版本，大量中文注释，容易修改结构，持续更新

PIZZA - a task-oriented semantic parsing dataset

Pangu-Alpha for Transformers

VD-BERT: A Unified Vision and Dialog Transformer with BERT

OpenAI CLIP text encoders for multiple languages!

GSoC'2021 | TensorFlow implementation of Wav2Vec2

Entity Disambiguation as text extraction (ACL 2022)

A BERT-based reverse-dictionary of Korean proverbs

NLP made easy

原神抽卡记录数据集-Genshin Impact gacha data