Python flexible slugify function

Overview

awesome-slugify

https://travis-ci.org/dimka665/awesome-slugify.svg?branch=master

Python flexible slugify function

Install

pip install awesome-slugify

Usage

from slugify import slugify

slugify('Any text')  # 'Any-text'

Custom slugify

from slugify import slugify, Slugify, UniqueSlugify

slugify('Any text', to_lower=True)  # 'any-text'

custom_slugify = Slugify(to_lower=True)
custom_slugify('Any text')          # 'any-text'

custom_slugify.separator = '_'
custom_slugify('Any text')          # 'any_text'

custom_slugify = UniqueSlugify()
custom_slugify('Any text')          # 'any-text'
custom_slugify('Any text')          # 'any-text-1'

slugify function optional args

to_lower              # if True convert text to lowercase
max_length            # output string max length
separator             # separator string
capitalize            # if True upper first letter

Slugify class args

pretranslate = None               # function or dict for replace before translation
translate = unidecode.unidecode   # function for slugifying or None
safe_chars = ''                   # additional safe chars
stop_words = ()                   # remove these words from slug

to_lower = False                  # default to_lower value
max_length = None                 # default max_length value
separator = '-'                   # default separator value
capitalize = False                # default capitalize value

UniqueSlugify class args

# all Slugify class args +
uids = []                         # initial unique ids

Predefined slugify functions

Some slugify functions is predefined this way:

from slugify import Slugify, CYRILLIC, GERMAN, GREEK

slugify = Slugify()
slugify_unicode = Slugify(translate=None)

slugify_url = Slugify()
slugify_url.to_lower = True
slugify_url.stop_words = ('a', 'an', 'the')
slugify_url.max_length = 200

slugify_filename = Slugify()
slugify_filename.separator = '_'
slugify_filename.safe_chars = '-.'
slugify_filename.max_length = 255

slugify_ru = Slugify(pretranslate=CYRILLIC)
slugify_de = Slugify(pretranslate=GERMAN)
slugify_el = Slugify(pretranslate=GREEK)

Examples

from slugify import Slugify, UniqueSlugify, slugify, slugify_unicode
from slugify import slugify_url, slugify_filename
from slugify import slugify_ru, slugify_de

slugify('one kožušček')                       # one-kozuscek
slugify('one two three', separator='.')       # one.two.three
slugify('one two three four', max_length=12)  # one-two-four   (12 chars)
slugify('one TWO', to_lower=True)             # one-two
slugify('one TWO', capitalize=True)           # One-TWO

slugify_filename(u'Дrаft №2.txt')             # Draft_2.txt
slugify_url(u'Дrаft №2.txt')                  # draft-2-txt

my_slugify = Slugify()
my_slugify.separator = '.'
my_slugify.pretranslate = {'я': 'i', '♥': 'love'}
my_slugify('Я ♥ борщ')                        # I.love.borshch  (custom translate)

slugify('Я ♥ борщ')                           # Ia-borshch  (standard translation)
slugify_ru('Я ♥ борщ')                        # Ya-borsch   (alternative russian translation)
slugify_unicode('Я ♥ борщ')                   # Я-борщ      (sanitize only)

slugify_de('ÜBER Über slugify')               # UEBER-Ueber-slugify

slugify_unique = UniqueSlugify(separator='_')
slugify_unique('one TWO')                     # One_TWO
slugify_unique('one TWO')                     # One_TWO_1

slugify_unique = UniqueSlugify(uids=['cellar-door'])
slugify_unique('cellar door')                 # cellar-door-1

Custom Unique Slugify Checker

from slugify import UniqueSlugify

def my_unique_check(text, uids):
    if text in uids:
        return False
    return not SomeDBClass.objects.filter(slug_field=text).exists()

custom_slugify_unique = UniqueSlugify(unique_check=my_unique_check)

# Checks the database for a matching document
custom_slugify_unique('te occidere possunt')

Running UnitTests

$ virtualenv venv
$ venv/bin/pip install -r requirements.txt
$ venv/bin/nosetests slugify
Comments
  • Update minimum unidecode version

    Update minimum unidecode version

    Unidecode has moved to semantic versioning. I don't believe there are any major changes that awesome-slugify requires.

    Unidecode release change:

    2018-01-05	unidecode 1.0.22
    	* Move to semantic version numbering, no longer following version
    	  numbers from the original Perl module. This fixes an issue with
    	  setuptools (>= 8) and others expecting major.minor.patch format.
    	  (https://github.com/avian2/unidecode/issues/13)
    	* Add transliterations for currency signs U+20B0 through U+20BF
    	  (thanks to Mike Swanson)
    	* Surround transliterations of vulgar fractions with spaces to avoid
    	  incorrect combinations with adjacent numerals
    	  (thanks to Jeffrey Gerard)
    
    opened by jwbixby 5
  • Fix for import on python 2.7.7 (windows)

    Fix for import on python 2.7.7 (windows)

    Error with awesome-slugify 1.6.4:

    python -c "from slugify import UniqueSlugify" Traceback (most recent call last): File "", line 1, in File "myvirtualenv\lib\site-packages\slugify__init__.py", line 2, in from slugify.main import Slugify, UniqueSlugify ImportError: No module named main

    opened by srault95 2
  • UniqueSlugify Improvements

    UniqueSlugify Improvements

    Hello, I'm using "awesome-slugify" for my work, and there are a few minor improvements I felt I could contribute which we were looking for on my team.

    Thanks! Greg


    1. UniqueSlugify should use sets for self.uids instead of a list, for performance reasons.

    Running the following code on the base branch (with self.uids as a list) yields:

    >>> import time
    >>> import uuid
    >>> from slugify import UniqueSlugify
    >>>
    >>> def test_time():
    ...     start = time.time()
    ...     slugify = UniqueSlugify()
    ...     for i in xrange(100000):
    ...         _ = slugify(str(uuid.uuid4()))
    ...     return time.time() - start
    ...
    >>> test_time()
    212.86210703849792
    

    Running the same code with sets:

    >>> test_time()
    10.824954986572266
    
    1. Often, uids are stored in a database or external key/value system. It's helpful to have an option to override the uniqueness check without having to load all the uids into memory. For example, supposing I have a Django project with a slug field, instead of having to do:
    from django_blog.models import BlogPost
    from slugify import UniqueSlugify
    
    slugify = UniqueSlugify(uids=BlogPost.objects.values_list('url_slug', flat=True))
    

    I can check with a lightweight existence call to the DB by overriding the check:

    from slugify import UniqueSlugify
    
    def my_unique_check(text, uids):
        if text in uids:
            return False
        return not BlogPost.objects.filter(url_slug=text).exists()
    
    custom_slugify_unique = UniqueSlugify(unique_check=my_unique_check)
    custom_slugify_unique('te occidere possunt')
    
    1. Also added some documentation on running the unit tests.
    opened by gthole 2
  • not working in python 3

    not working in python 3

    Please make this library compatible with Python 3. Right now it doesn't work because of at least several syntax errors in some string literals, and an iteration over a changing dict issue.

    opened by irmen 2
  • Please remove capitalizing of first character in get_pretranslate function

    Please remove capitalizing of first character in get_pretranslate function

    Great project! Thank you so much!

    However, when using the pretranslations, it forces capitalization. Is there any reason for this (something unicode related) or can I submit a pull request to maintain the case of the entire word?

    opened by pydanny 2
  • Fix the import path for get_slugify

    Fix the import path for get_slugify

    At the moment, I can't do:

    from slugify import get_slugify
    

    I get an import error because of how slugify.__init__.py is defined. Instead I have to do the following:

    from slugify.main import get_slugify
    

    I would like to either submit a patch to slugify.__init__.py or correct the documentation. @dimka665, what approach would you prefer?

    opened by pydanny 2
  • Add test cases with numbers in them.

    Add test cases with numbers in them.

    We use awesome-slugify for Wok and it's mostly great, but I've found that it errors on numeric slugs (in our case, 404).

    These test cases demonstrate what I'd naively assume to be the correct slugification behavior for alphanumerics, but I'm PR-ing to verify that before spending a lot of time modifying Slugify's behavior to meet these expectations in case I'm wrong.

    opened by edunham 1
  • Add simple implementation with Travis CI

    Add simple implementation with Travis CI

    Check out passing tests of my fork on Travis https://travis-ci.org/jpadilla/awesome-slugify

    If you accept this, you just have to signup to Travis and setup your accounts https://travis-ci.org/profile. Then add a badge to the README.md.

    opened by jpadilla 1
  • Fix to_lower with unicode text

    Fix to_lower with unicode text

    I noticed that when trying to do something like:

    slugify('自転車', to_lower=True)
    

    I got Zi-Zhuan-Che instead of the expected zi-zhuan-che. Effectively I introduced a test which failed, implemented a fix, and all tests kept passing.

    opened by jpadilla 1
  • allow Slugify to take callable pretranslate

    allow Slugify to take callable pretranslate

    Slugify::set_pretranslate should accept a dictionary, a callable, or None, but as currently implemented, it only accepts a dict or None. This PR allows set_translate to take a callable as well. All tests passing.

    opened by jmcarp 1
  • Exception StopIteration on empty strings '' with max_length or separator

    Exception StopIteration on empty strings '' with max_length or separator

    >>> from slugify import slugify
    >>> slugify('', max_length=40, separator='_')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python2.7/dist-packages/slugify/main.py", line 108, in slugify
        text = join_words(words, separator, max_length)
      File "/usr/lib/python2.7/dist-packages/slugify/main.py", line 85, in join_words
        text = next(words)    # text = words.pop(0)
    StopIteration
    
    opened by sebest 1
  • Incorrect Japanese transliteration for っ

    Incorrect Japanese transliteration for っ

    っ is U+3063 HIRAGANA LETTER SMALL TU, which is different than つ U+3064 HIRAGANA LETTER TU in that the phonetic transliteration of it is a glottal stop; the English equivalent is doubling the consonant-sound of the next mora.

    For example, ほっこり should be transliterated as 'hokkori', but awesome-slugify incorrectly renders it 'hotsukori' (as if the っ were a つ):

    >>> slugify('ほっこり')
    'hotsukori'
    

    See also https://translate.google.com/?sl=ja&tl=en&text=%E3%81%BB%E3%81%A3%E3%81%93%E3%82%8A&op=translate for the use of this character (and https://translate.google.com/?sl=ja&tl=en&text=%E3%81%BB%E3%81%A4%E3%81%93%E3%82%8A%0A%0A&op=translate to see what the large tsu does instead).

    opened by fluffy-critter 0
  • Clash with zacharyvoase/slugify

    Clash with zacharyvoase/slugify

    The project at https://github.com/zacharyvoase/slugify named "slugify" also has the module name "slugify".

    pip install slugify

    import slugify slugify.slugify(u"Héllø Wörld")

    If you add both packages to your requirements.txt, what happens when you import "slugify"?

    opened by Chris2048 4
  • Fix DeprecationWarnings in newer Pythons (3.6+)

    Fix DeprecationWarnings in newer Pythons (3.6+)

    I see these warnings each time I run:

    ...python3.6/site-packages/slugify/main.py:65
      ...python3.6/site-packages/slugify/main.py:65: DeprecationWarning: invalid escape sequence \p
        '''
    
    ...python3.6/site-packages/slugify/main.py:98
      ...python3.6/site-packages/slugify/main.py:98: DeprecationWarning: invalid escape sequence \L
        PRETRANSLATE = re.compile(u'(\L<options>)', options=convert_dict)
    
    ...python3.6/site-packages/slugify/main.py:140
      ...python3.6/site-packages/slugify/main.py:140: DeprecationWarning: invalid escape sequence \p
        unwanted_chars_re = u'[^\p{{AlNum}}{safe_chars}]+'.format(safe_chars=re.escape(self._safe_chars or ''))
    
    ...python3.6/site-packages/slugify/main.py:144
      ...python3.6/site-packages/slugify/main.py:144: DeprecationWarning: invalid escape sequence \p
        unwanted_chars_and_words_re = unwanted_chars_re + u'|(?<!\p{AlNum})(?:\L<stop_words>)(?!\p{AlNum})'
    

    Perhaps this is the problem: https://stackoverflow.com/questions/50504500/deprecationwarning-invalid-escape-sequence-what-to-use-instead-of-d

    opened by mcarans 0
  • Update requirements.txt

    Update requirements.txt

    regex==2018.11.6 breaks this package:

    $ pip install regex==2018.11.6
    [...]
    $ python
    Python 2.7.10 (default, Oct  6 2017, 22:29:07)
    [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.31)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import slugify
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/artur/venv/qq/lib/python2.7/site-packages/slugify/__init__.py", line 2, in <module>
        from slugify.main import Slugify, UniqueSlugify
      File "/Users/artur/venv/qq/lib/python2.7/site-packages/slugify/main.py", line 68, in <module>
        class Slugify(object):
      File "/Users/artur/venv/qq/lib/python2.7/site-packages/slugify/main.py", line 70, in Slugify
        upper_to_upper_letters_re = re.compile(UPPER_TO_UPPER_LETTERS_RE, re.VERBOSE | re.VERSION1)
      File "/Users/artur/venv/qq/lib/python2.7/site-packages/regex.py", line 345, in compile
        return _compile(pattern, flags, kwargs)
      File "/Users/artur/venv/qq/lib/python2.7/site-packages/regex.py", line 486, in _compile
        parsed = _parse_pattern(source, info)
      File "/Users/artur/venv/qq/lib/python2.7/site-packages/_regex_core.py", line 388, in _parse_pattern
        branches = [parse_sequence(source, info)]
      File "/Users/artur/venv/qq/lib/python2.7/site-packages/_regex_core.py", line 413, in parse_sequence
        element = parse_paren(source, info)
      File "/Users/artur/venv/qq/lib/python2.7/site-packages/_regex_core.py", line 823, in parse_paren
        subpattern = _parse_pattern(source, info)
      File "/Users/artur/venv/qq/lib/python2.7/site-packages/_regex_core.py", line 388, in _parse_pattern
        branches = [parse_sequence(source, info)]
      File "/Users/artur/venv/qq/lib/python2.7/site-packages/_regex_core.py", line 410, in parse_sequence
        sequence.append(parse_escape(source, info, False))
      File "/Users/artur/venv/qq/lib/python2.7/site-packages/_regex_core.py", line 1186, in parse_escape
        return parse_property(source, info, ch == "p", in_set)
      File "/Users/artur/venv/qq/lib/python2.7/site-packages/_regex_core.py", line 1341, in parse_property
        prop = lookup_property(prop_name, name, positive != negate, source)
      File "/Users/artur/venv/qq/lib/python2.7/site-packages/_regex_core.py", line 1603, in lookup_property
        value = standardise_name(value)
      File "/Users/artur/venv/qq/lib/python2.7/site-packages/_regex_core.py", line 1593, in standardise_name
        return ascii_upper("".join(ch for ch in name if ch not in "_- "))
      File "/Users/artur/venv/qq/lib/python2.7/site-packages/_regex_core.py", line 1586, in ascii_upper
        return s.translate(upper_trans)
    TypeError: character mapping must return integer, None or unicode
    
    opened by arturh 0
  • UniqueSlugify will exceed max_length if it adds digits to make slug unique

    UniqueSlugify will exceed max_length if it adds digits to make slug unique

    Python 3.6.4 (default, Mar  9 2018, 23:15:03)
    [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from slugify import UniqueSlugify
    >>> s = UniqueSlugify(to_lower=True, max_length=3)
    >>> s("Hello World")
    'hel'
    >>> s("Hello World")
    'hel-1'
    

    I would expect something like hel, h-1, etc.

    opened by iandees 0
Releases(v1.4)
Python library for creating PEG parsers

PyParsing -- A Python Parsing Module Introduction The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the t

Pyparsing 1.7k Dec 27, 2022
Code Jam for creating a text-based adventure game engine and custom worlds

Text Based Adventure Jam Author: Devin McIntyre Our goal is two-fold: Create a text based adventure game engine that can parse a standard file format

HTTPChat 4 Dec 26, 2021
Extract knowledge from raw text

Extract knowledge from raw text This repository is a nearly copy-paste of "From Text to Knowledge: The Information Extraction Pipeline" with some cosm

Raphael Sourty 10 Dec 03, 2022
Format Covid values to ASCII-Table (Only for Germany and Austria)

Covid-19-Formatter (Only for Germany and Austria) Dieses Script speichert die gemeldeten Daten des RKIs / BMSGPK und formatiert diese zu einer Asci Ta

56 Jan 22, 2022
A python tool to convert Bangla Bijoy text to Unicode text.

Unicode Converter A python tool to convert Bangla Bijoy text to Unicode text. Installation Unicode Converter can be installed via PyPi. Make sure pip

Shahad Mahmud 10 Sep 29, 2022
Question answering on russian with XLMRobertaLarge as a service

QA Roberta Ru SaaS Question answering on russian with XLMRobertaLarge as a service. Thanks for the model to Alexander Kaigorodov. Stack Flask Gunicorn

Gladkikh Prohor 21 Jul 04, 2022
Search for terms(word / table / field name or any) under Snowflake schema names

snowflake-search-terms-in-ddl-views Search for terms(word / table / field name or any) under Snowflake schema names Version : 1.0v How to use ? Run th

Igal Emona 1 Dec 15, 2021
A slugifier that works in unicode

Unicode Slugify Unicode Slugify is a slugifier that generates unicode slugs. It was originally used in the Firefox Add-ons web site to generate slugs

Mozilla 315 Nov 21, 2022
box is a text-based visual programming language inspired by Unreal Engine Blueprint function graphs.

Box is a text-based visual programming language inspired by Unreal Engine blueprint function graphs. $ cat factorial.box ┌─ƒ(Factorial)───┐

Pranav 104 Dec 24, 2022
Username reconnaisance tool that checks the availability of a specified username on over 200 websites.

Username reconnaisance tool that checks the availability of a specified username on over 200 websites. Installation & Usage Clone from Github: $ git c

Richard Mwewa 20 Oct 30, 2022
Umamusume story patcher with python

umamusume-story-patcher How to use Go to your umamusume folder, usually C:\Users\user\AppData\LocalLow\Cygames\umamusume Make a mods folder and clon

8 May 07, 2022
Chilean Digital Vaccination Pass Parser (CDVPP) parses digital vaccination passes from PDF files

cdvpp Chilean Digital Vaccination Pass Parser (CDVPP) parses digital vaccination passes from PDF files Reads a Digital Vaccination Pass PDF file as in

Esteban Borai 1 Nov 17, 2021
A Python app which can convert normal text to Handwritten text.

Text to HandWritten Text ✍️ Converter Watch Tutorial for this project Usage:- Clone my repository. Open CMD in working directory. Run following comman

Kushal Bhavsar 5 Dec 11, 2022
Translate .sbv subtitle files

deepl4subtitle Deeplを使って字幕ファイル(.sbv)を翻訳します。タイムスタンプも含めて出力しますが、翻訳時はタイムスタンプは文の一部とは切り離されるので、.sbvファイルをそのまま翻訳機に突っ込むよりも高精度な翻訳ができるはずです。 つかいかた 入力する.sbvファイルの前処理

Yasunori Toshimitsu 1 Oct 20, 2021
A python tool one can extract the "hash" from a WINDOWS HELLO PIN

WINHELLO2hashcat About With this tool one can extract the "hash" from a WINDOWS HELLO PIN. This hash can be cracked with Hashcat, more precisely with

33 Dec 05, 2022
This repos is auto action which generating a wordcloud made by Twitter.

auto_tweet_wordcloud This repos is auto action which generating a wordcloud made by Twitter. Preconditions Install Python dependencies pip install -r

tubone(Yu Otsubo) 0 Apr 29, 2022
LazyText is inspired b the idea of lazypredict, a library which helps build a lot of basic models without much code.

LazyText is inspired b the idea of lazypredict, a library which helps build a lot of basic models without much code. LazyText is for text what lazypredict is for numeric data.

Jay Vala 13 Nov 04, 2022
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

TextDistance TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Features: 30+ algorithms Pure pyt

Life4 3k Jan 02, 2023
REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.

MUSE stands for Multilingual Universal Sentence Encoder - multilingual extension (supports 16 languages) of Universal Sentence Encoder (USE).

Dani El-Ayyass 47 Sep 05, 2022
Little python script + dictionary to help solve Wordle puzzles

Wordle Solver Little python script + dictionary to help solve Wordle puzzles Usage Usage: ./wordlesolver.py [letters in word] [letters not in word] [p

Luke Stephens (hakluke) 4 Jul 24, 2022