Extract price amount and currency symbol from a raw text string

Overview

price-parser

PyPI Version Supported Python Versions Build Status Coverage report

price-parser is a small library for extracting price and currency from raw text strings.

Features:

  • robust price amount and currency symbol extraction
  • zero-effort handling of thousand and decimal separators

The main use case is parsing prices extracted from web pages. For example, you can write a CSS/XPath selector which targets an element with a price, and then use this library for cleaning it up, instead of writing custom site-specific regex or Python code.

License is BSD 3-clause.

Installation

pip install price-parser

price-parser requires Python 3.6+.

Usage

Basic usage

>> price Price(amount=Decimal('22.90'), currency='€') >>> price.amount # numeric price amount Decimal('22.90') >>> price.currency # currency symbol, as appears in the string '€' >>> price.amount_text # price amount, as appears in the string '22,90' >>> price.amount_float # price amount as float, not Decimal 22.9">
>>> from price_parser import Price
>>> price = Price.fromstring("22,90 €")
>>> price
Price(amount=Decimal('22.90'), currency='€')
>>> price.amount       # numeric price amount
Decimal('22.90')
>>> price.currency     # currency symbol, as appears in the string
'€'
>>> price.amount_text  # price amount, as appears in the string
'22,90'
>>> price.amount_float # price amount as float, not Decimal
22.9

If you prefer, Price.fromstring has an alias price_parser.parse_price, they do the same:

>>> from price_parser import parse_price
>>> parse_price("22,90 €")
Price(amount=Decimal('22.90'), currency='€')

The library has extensive tests (900+ real-world examples of price strings). Some of the supported cases are described below.

Supported cases

Unclean price strings with various currencies are supported; thousand separators and decimal separators are handled:

>>> Price.fromstring("Price: $119.00")
Price(amount=Decimal('119.00'), currency='$')
>>> Price.fromstring("15 130 Р")
Price(amount=Decimal('15130'), currency='Р')
>>> Price.fromstring("151,200 تومان")
Price(amount=Decimal('151200'), currency='تومان')
>>> Price.fromstring("Rp 1.550.000")
Price(amount=Decimal('1550000'), currency='Rp')
>>> Price.fromstring("Běžná cena 75 990,00 Kč")
Price(amount=Decimal('75990.00'), currency='Kč')

Euro sign is used as a decimal separator in a wild:

>>> Price.fromstring("1,235€ 99")
Price(amount=Decimal('1235.99'), currency='€')
>>> Price.fromstring("99 € 95 €")
Price(amount=Decimal('99'), currency='€')
>>> Price.fromstring("35€ 999")
Price(amount=Decimal('35'), currency='€')

Some special cases are handled:

>>> Price.fromstring("Free")
Price(amount=Decimal('0'), currency=None)

When price or currency can't be extracted, corresponding attribute values are set to None:

>>> Price.fromstring("")
Price(amount=None, currency=None)
>>> Price.fromstring("Foo")
Price(amount=None, currency=None)
>>> Price.fromstring("50% OFF")
Price(amount=None, currency=None)
>>> Price.fromstring("50")
Price(amount=Decimal('50'), currency=None)
>>> Price.fromstring("R$")
Price(amount=None, currency='R$')

Currency hints

currency_hint argument allows to pass a text string which may (or may not) contain currency information. This feature is most useful for automated price extraction.

>>> Price.fromstring("34.99", currency_hint="руб. (шт)")
Price(amount=Decimal('34.99'), currency='руб.')

Note that currency mentioned in the main price string may be preferred over currency specified in currency_hint argument; it depends on currency symbols found there. If you know the correct currency, you can set it directly:

>> price.currency = 'EUR' >>> price Price(amount=Decimal('1000'), currency='EUR')">
>>> price = Price.fromstring("1 000")
>>> price.currency = 'EUR'
>>> price
Price(amount=Decimal('1000'), currency='EUR')

Decimal separator

If you know which symbol is used as a decimal separator in the input string, pass that symbol in the decimal_separator argument to prevent price-parser from guessing the wrong decimal separator symbol.

>>> Price.fromstring("Price: $140.600", decimal_separator=".")
Price(amount=Decimal('140.600'), currency='$')
>>> Price.fromstring("Price: $140.600", decimal_separator=",")
Price(amount=Decimal('140600'), currency='$')

Contributing

Use tox to run tests with different Python versions:

tox

The command above also runs type checks; we use mypy.

Owner
Scrapinghub
Turn web content into useful data
Scrapinghub
Shows twitch pay for any streamer from Twitch leaked CSV files.

twitch_leak_csv_reader Shows twitch pay for any streamer from Twitch leaked CSV files. Requirements: You need python3 (you can install python 3 from o

5 Nov 11, 2022
REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.

MUSE stands for Multilingual Universal Sentence Encoder - multilingual extension (supports 16 languages) of Universal Sentence Encoder (USE).

Dani El-Ayyass 47 Sep 05, 2022
A Python app which can convert normal text to Handwritten text.

Text to HandWritten Text ✍️ Converter Watch Tutorial for this project Usage:- Clone my repository. Open CMD in working directory. Run following comman

Kushal Bhavsar 5 Dec 11, 2022
Python flexible slugify function

awesome-slugify Python flexible slugify function PyPi: https://pypi.python.org/pypi/awesome-slugify Github: https://github.com/dimka665/awesome-slugif

Dmitry Voronin 471 Dec 20, 2022
Bidirectionally transformed strings

bistring The bistring library provides non-destructive versions of common string processing operations like normalization, case folding, and find/repl

Microsoft 352 Dec 19, 2022
Maiden & Spell community player ranking based on tournament data.

MnSRank Maiden & Spell community player ranking based on tournament data. Why? 2021 just ended and this seemed like a cool idea. Elo doesn't work well

Jonathan Lee 1 Apr 20, 2022
Getting git-style versioning working on RDFlib

Getting git-style versioning working on RDFlib

Gabe Fierro 1 Feb 01, 2022
Vector space based Information Retrieval System for Text Processing - Information retrieval

Information Retrieval: Text Processing Group 13 Sequence of operations Install Requirements Add given wikipedia files to the corpus directory. Downloa

1 Jan 01, 2022
Word and phrase lists in CSV

Word Lists Word and phrase lists in CSV, collected from different sources. Oxford Word Lists: oxford-5k.csv - Oxford 3000 and 5000 oxford-opal.csv - O

Anton Zhiyanov 14 Oct 14, 2022
Text Summarizationcls app with python

Text Summarizationcls app This is the repo for the Text Summarization AI Project. It makes use of pre-trained Hugging Face models Packages Used The pa

Edem Gold 1 Oct 23, 2021
Convert English text to IPA using the toPhonetic

Installation: Windows python -m pip install text2ipa macOS sudo pip3 install text2ipa Linux pip install text2ipa Features Convert English text to I

Joseph Quang 3 Jun 14, 2022
Code Jam for creating a text-based adventure game engine and custom worlds

Text Based Adventure Jam Author: Devin McIntyre Our goal is two-fold: Create a text based adventure game engine that can parse a standard file format

HTTPChat 4 Dec 26, 2021
Make writing easier!

Handwriter Make writing easier! How to Download and install a handwriting font, or create a font from your handwriting. Use a word processor like Micr

64 Dec 25, 2022
Chilean Digital Vaccination Pass Parser (CDVPP) parses digital vaccination passes from PDF files

cdvpp Chilean Digital Vaccination Pass Parser (CDVPP) parses digital vaccination passes from PDF files Reads a Digital Vaccination Pass PDF file as in

Esteban Borai 1 Nov 17, 2021
Redlines produces a Markdown text showing the differences between two strings/text

Redlines Redlines produces a Markdown text showing the differences between two strings/text. The changes are represented with strike-throughs and unde

Houfu Ang 2 Apr 08, 2022
Export solved codewars kata challenges to a text file.

Codewars Kata Exporter Note:this is not totally my work.i've edited the project to make more easier and faster for me.you can find the original work h

Oussama Ben Sassi 4 Aug 13, 2021
Convert ebooks with few clicks on Telegram!

E-Book Converter Bot A bot that converts e-books to various formats, powered by calibre! It currently supports 34 input formats and 19 output formats.

Youssif Shaaban Alsager 45 Jan 05, 2023
Python Lex-Yacc

PLY (Python Lex-Yacc) Copyright (C) 2001-2020 David M. Beazley (Dabeaz LLC) All rights reserved. Redistribution and use in source and binary forms, wi

David Beazley 2.4k Dec 31, 2022
split Word file by chapter

split Word file by chapter we use the mircosoft word api to code this tool api url:https://docs.microsoft.com/zh-cn/dotnet/api/ if this tool is good f

wisdom under lemon trees 5 Nov 06, 2021
Meeting, rendezvous, confluence (Finnish kohtaaminen) mark up, down, and up again.

kohtaaminen Meeting, rendezvous, confluence (Finnish kohtaaminen) mark up, down, and up again. Given a zip file containing a tree of html and media fi

Stefan Hagen 2 Dec 14, 2022