๐Ÿ”Ž Like Chardet. ๐Ÿš€ Package for encoding & language detection. Charset detection.

Overview

Charset Detection, for Everyone ๐Ÿ‘‹

The Real First Universal Charset Detector
Download Count Total

A library that helps you read text from an unknown charset encoding.
Motivated by chardet, I'm trying to resolve the issue by taking a new approach. All IANA character set names for which the Python core library provides codecs are supported.

>>>>> ๐Ÿ‘‰ Try Me Online Now, Then Adopt Me ๐Ÿ‘ˆ <<<<<

This project offers you an alternative to Universal Charset Encoding Detector, also known as Chardet.

Feature Chardet Charset Normalizer cChardet
Fast โŒ
โœ”๏ธ
โœ”๏ธ
Universal** โŒ โœ”๏ธ โŒ
Reliable without distinguishable standards โŒ โœ”๏ธ โœ”๏ธ
Reliable with distinguishable standards โœ”๏ธ โœ”๏ธ โœ”๏ธ
Free & Open โœ”๏ธ โœ”๏ธ โœ”๏ธ
License LGPL-2.1 MIT MPL-1.1
Native Python โœ”๏ธ โœ”๏ธ โŒ
Detect spoken language โŒ โœ”๏ธ N/A
Supported Encoding 30 ๐ŸŽ‰ 93 40

Reading Normalized TextCat Reading Text

** : They are clearly using specific code for a specific encoding even if covering most of used one

โญ Your support

Fork, test-it, star-it, submit your ideas! We do listen.

โšก Performance

This package offer better performance than its counterpart Chardet. Here are some numbers.

Package Accuracy Mean per file (ms) File per sec (est)
chardet 92 % 220 ms 5 file/sec
charset-normalizer 98 % 40 ms 25 file/sec
Package 99th percentile 95th percentile 50th percentile
chardet 1115 ms 300 ms 27 ms
charset-normalizer 460 ms 240 ms 18 ms

Chardet's performance on larger file (1MB+) are very poor. Expect huge difference on large payload.

Stats are generated using 400+ files using default parameters. More details on used files, see GHA workflows. And yes, these results might change at any time. The dataset can be updated to include more files. The actual delays heavily depends on your CPU capabilities. The factors should remain the same.

cchardet is a non-native (cpp binding) and unmaintained faster alternative with a better accuracy than chardet but lower than this package. If speed is the most important factor, you should try it.

โœจ Installation

Using PyPi for latest stable

pip install charset-normalizer -U

If you want a more up-to-date unicodedata than the one available in your Python setup.

pip install charset-normalizer[unicode_backport] -U

๐Ÿš€ Basic Usage

CLI

This package comes with a CLI.

usage: normalizer [-h] [-v] [-a] [-n] [-m] [-r] [-f] [-t THRESHOLD]
                  file [file ...]

The Real First Universal Charset Detector. Discover originating encoding used
on text file. Normalize text to unicode.

positional arguments:
  files                 File(s) to be analysed

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Display complementary information about file if any.
                        Stdout will contain logs about the detection process.
  -a, --with-alternative
                        Output complementary possibilities if any. Top-level
                        JSON WILL be a list.
  -n, --normalize       Permit to normalize input file. If not set, program
                        does not write anything.
  -m, --minimal         Only output the charset detected to STDOUT. Disabling
                        JSON output.
  -r, --replace         Replace file when trying to normalize it instead of
                        creating a new one.
  -f, --force           Replace file without asking if you are sure, use this
                        flag with caution.
  -t THRESHOLD, --threshold THRESHOLD
                        Define a custom maximum amount of chaos allowed in
                        decoded content. 0. <= chaos <= 1.
  --version             Show version information and exit.
normalizer ./data/sample.1.fr.srt

๐ŸŽ‰ Since version 1.4.0 the CLI produce easily usable stdout result in JSON format.

{
    "path": "/home/default/projects/charset_normalizer/data/sample.1.fr.srt",
    "encoding": "cp1252",
    "encoding_aliases": [
        "1252",
        "windows_1252"
    ],
    "alternative_encodings": [
        "cp1254",
        "cp1256",
        "cp1258",
        "iso8859_14",
        "iso8859_15",
        "iso8859_16",
        "iso8859_3",
        "iso8859_9",
        "latin_1",
        "mbcs"
    ],
    "language": "French",
    "alphabets": [
        "Basic Latin",
        "Latin-1 Supplement"
    ],
    "has_sig_or_bom": false,
    "chaos": 0.149,
    "coherence": 97.152,
    "unicode_path": null,
    "is_preferred": true
}

Python

Just print out normalized text

from charset_normalizer import from_path

results = from_path('./my_subtitle.srt')

print(str(results.best()))

Normalize any text file

from charset_normalizer import normalize
try:
    normalize('./my_subtitle.srt') # should write to disk my_subtitle-***.srt
except IOError as e:
    print('Sadly, we are unable to perform charset normalization.', str(e))

Upgrade your code without effort

from charset_normalizer import detect

The above code will behave the same as chardet. We ensure that we offer the best (reasonable) BC result possible.

See the docs for advanced usage : readthedocs.io

๐Ÿ˜‡ Why

When I started using Chardet, I noticed that it was not suited to my expectations, and I wanted to propose a reliable alternative using a completely different method. Also! I never back down on a good challenge!

I don't care about the originating charset encoding, because two different tables can produce two identical rendered string. What I want is to get readable text, the best I can.

In a way, I'm brute forcing text decoding. How cool is that ? ๐Ÿ˜Ž

Don't confuse package ftfy with charset-normalizer or chardet. ftfy goal is to repair unicode string whereas charset-normalizer to convert raw file in unknown encoding to unicode.

๐Ÿฐ How

  • Discard all charset encoding table that could not fit the binary content.
  • Measure chaos, or the mess once opened (by chunks) with a corresponding charset encoding.
  • Extract matches with the lowest mess detected.
  • Additionally, we measure coherence / probe for a language.

Wait a minute, what is chaos/mess and coherence according to YOU ?

Chaos : I opened hundred of text files, written by humans, with the wrong encoding table. I observed, then I established some ground rules about what is obvious when it seems like a mess. I know that my interpretation of what is chaotic is very subjective, feel free to contribute in order to improve or rewrite it.

Coherence : For each language there is on earth, we have computed ranked letter appearance occurrences (the best we can). So I thought that intel is worth something here. So I use those records against decoded text to check if I can detect intelligent design.

โšก Known limitations

  • Language detection is unreliable when text contains two or more languages sharing identical letters. (eg. HTML (english tags) + Turkish content (Sharing Latin characters))
  • Every charset detector heavily depends on sufficient content. In common cases, do not bother run detection on very tiny content.

๐Ÿ‘ค Contributing

Contributions, issues and feature requests are very much welcome.
Feel free to check issues page if you want to contribute.

๐Ÿ“ License

Copyright ยฉ 2019 Ahmed TAHRI @Ousret.
This project is MIT licensed.

Characters frequencies used in this project ยฉ 2012 Denny Vrandeฤiฤ‡

Comments
  • [Proposal] Add module creation with mypyc to speed up

    [Proposal] Add module creation with mypyc to speed up

    Hello. I ran some tests to find bottlenecks and speed up the package. The easiest option, since you are already using mypy, is to compile the module during installation using mypyc. In this case the acceleration is about 2 times. Here are the results of the tests using your bin/performance.py file:

    ------------------------------
    --> Charset-Normalizer Conclusions
       --> Avg: 0.03485252343844548s
       --> 99th: 0.2629306570015615s
       --> 95th: 0.14874039799906313s
       --> 50th: 0.02182378301222343s
    ------------------------------
    --> Charset-Normalizer_m Conclusions (Charset-Normalizer, compiled with mypyc )
       --> Avg: 0.01605459922575392s
       --> 99th: 0.12211546800972428s
       --> 95th: 0.06977643301070202s
       --> 50th: 0.009204783011227846s
    ------------------------------
    --> Chardet Conclusions
       --> Avg: 0.12291852888552735s
       --> 99th: 0.6617688919941429s
       --> 95th: 0.17344348499318585s
       --> 50th: 0.023028297000564635s
    ------------------------------
    --> Cchardet Conclusions
       --> Avg: 0.003174804929368931s
       --> 99th: 0.04868195200106129s
       --> 95th: 0.008641656007966958s
       --> 50th: 0.0005420649977168068s
    

    test_log.txt I think the acceleration would be greater if annotate all functions

    enhancement 
    opened by deedy5 20
  • Don't inject unicodedata2 into sys.modules

    Don't inject unicodedata2 into sys.modules

    I noticed charset_normalizer meddles with sys.modules, causing this:

    >>> import charset_normalizer
    >>> import unicodedata
    >>> unicodedata
    <module 'unicodedata2' from '.../site-packages/unicodedata2.cpython-39-darwin.so'>
    

    This PR fixes that by using a fairly standard try: except ImportError: guard instead of the sys.modules hook.

    >>> import charset_normalizer
    >>> import unicodedata
    >>> unicodedata
    <module 'unicodedata' from '.../python3.9/lib-dynload/unicodedata.cpython-39-darwin.so'>
    
    opened by akx 16
  • [Proposal] Incrase language coverage

    [Proposal] Incrase language coverage

    Is your feature request related to a problem? Please describe. Not of a problem, more of an enhancement

    Describe the solution you'd like Add other languages from other repos, assuming that they use the Unicode codepoint + n-grams model.

    Describe alternatives you've considered

    • https://github.com/wooorm/franc/tree/master/packages/franc-all (JS, 401 languages)
      • Codepoints https://github.com/wooorm/franc/blob/master/packages/franc-all/expressions.js
      • Ngrams https://github.com/wooorm/franc/blob/master/packages/franc-all/data.json
    • https://github.com/cloudmark/language-detect (Python, 271 languages)
    • https://github.com/cloudmark/language-detect/tree/master/data/udhr
    • https://github.com/kapsteur/franco (Golang, 175 languages)
      • Codepoints https://github.com/kapsteur/franco/blob/master/expression_data.go
      • Ngrams https://github.com/kapsteur/franco/blob/master/script_data.go
    • https://github.com/patrickschur/language-detection (PHP, 110 languages)
      • https://github.com/patrickschur/language-detection/tree/master/resources
    • https://github.com/richtr/guessLanguage.js (JS, 100 languages)
      • Codepoints https://github.com/richtr/guessLanguage.js/blob/master/lib/guessLanguage.js
      • Ngrams https://github.com/richtr/guessLanguage.js/blob/master/lib/_languageData.js
    • https://github.com/saffsd/langid.py (Python, 97 languages)
      • Alternate https://github.com/saffsd/langid.c
      • Alternate https://github.com/saffsd/langid.js
      • Alternate https://github.com/carrotsearch/langid-java
    • https://github.com/feedbackmine/language_detector (Ruby, 96 languages)
      • https://github.com/feedbackmine/language_detector/tree/master/lib/training_data
    • https://github.com/jonathansp/guess-language (Golang, 94 languages)
      • Codepoints
        • https://github.com/jonathansp/guess-language/blob/master/data/blocks.go
        • https://github.com/jonathansp/guess-language/blob/master/data/languages.go
      • Ngrams
        • https://github.com/jonathansp/guess-language/blob/master/data/trigrams.go
    • https://github.com/abadojack/whatlanggo (Golang, 84 languages)
      • Codepoints
        • https://github.com/abadojack/whatlanggo/blob/master/script.go
        • https://github.com/abadojack/whatlanggo/blob/master/detect.go
      • Ngrams https://github.com/abadojack/whatlanggo/blob/master/lang.go
    • https://github.com/chattylabs/language-detector (JS, 73 language)
      • https://github.com/chattylabs/language-detector/tree/master/data/resources
    • https://github.com/optimaize/language-detector (Java, 71 languages)
    • https://github.com/endeveit/guesslanguage (Golang, 67 languages)
      • https://github.com/endeveit/guesslanguage/tree/master/models
    • https://github.com/dsc/guess-language (Python, 64 languages)
      • https://github.com/dsc/guess-language/tree/master/guess_language/trigrams
      • Co-reference https://github.com/kent37/guess-language
    • https://github.com/decultured/Python-Language-Detector (Python, 58 languages)
      • https://github.com/decultured/Python-Language-Detector/tree/master/trigrams
    • https://github.com/Mimino666/langdetect (Python, 55 languages)
      • Codepoints
        • https://github.com/Mimino666/langdetect/blob/master/langdetect/utils/unicode_block.py
        • https://github.com/Mimino666/langdetect/blob/master/langdetect/utils/messages.properties
        • https://github.com/Mimino666/langdetect/blob/master/langdetect/utils/ngram.py
      • Ngrams https://github.com/Mimino666/langdetect/tree/master/langdetect/profiles
    • https://github.com/pemistahl/lingua (Kotlin, 55 languages)
      • Codepoints https://github.com/pemistahl/lingua/blob/master/src/main/kotlin/com/github/pemistahl/lingua/internal/Alphabet.kt
      • Ngrams https://github.com/pemistahl/lingua/tree/master/src/main/resources/language-models
    • https://github.com/landrok/language-detector (PHP, 54 language)
      • https://github.com/landrok/language-detector/tree/master/src/LanguageDetector/subsets
    • https://github.com/shuyo/language-detection (Java, 53 languages)
    • https://github.com/newmsz/node-language-detection (JS, 53 languages)
      • Codepoints https://github.com/newmsz/node-language-detection/blob/master/index.js
      • Ngrams https://github.com/newmsz/node-language-detection/tree/master/profiles
    • https://github.com/pdonald/language-detection (C#, 53 languages)
      • https://github.com/pdonald/language-detection/tree/master/LanguageDetection/Profiles
    • https://github.com/malcolmgreaves/language-detection (Java, 53 languages)
    • https://github.com/FGRibreau/node-language-detect (JS, 52 languages)
      • Codepoints https://github.com/FGRibreau/node-language-detect/blob/master/data/unicode_blocks.json
      • Ngram https://github.com/FGRibreau/node-language-detect/blob/master/data/lang.json
    • https://github.com/webmil/text-language-detect (PHP, 52 languages)
      • Codepoints https://github.com/webmil/text-language-detect/blob/master/lib/data/unicode_blocks.dat
      • Ngram https://github.com/webmil/text-language-detect/blob/master/lib/data/lang.dat
    • https://github.com/pear/Text_LanguageDetect (PHP, 52 languages)
      • https://github.com/pear/Text_LanguageDetect/tree/master/data
    • https://github.com/Imaginatio/langdetect (Java, 50 languages)
      • https://github.com/Imaginatio/langdetect/tree/master/src/main/resources/profiles

    • https://github.com/dachev/node-cld (C++, 160 languages)
      • co-reference https://github.com/jtoy/cld
      • co-reference https://github.com/mzsanford/cld
      • co-reference https://github.com/jaukia/cld-js
      • co-reference https://github.com/vhyza/language_detection
      • Co-referecne https://github.com/ambs/Lingua-Identify-CLD
      • Co-reference https://github.com/jaukia/cld-js
    • https://github.com/CLD2Owners/cld2 (C++, 83 languages)
      • Co-reference https://github.com/rainycape/cld2
      • Co-reference https://github.com/dachev/node-cld
      • Co-reference https://github.com/ropensci/cld2
      • Co-reference https://github.com/fntlnz/cld2-php-ext
    • https://github.com/commoncrawl/language-detection-cld2 (Java)
    • https://github.com/lstrojny/php-cld (PHP)
    enhancement good first issue 
    opened by DonaldTsang 13
  • charset_normalizer logging behavior

    charset_normalizer logging behavior

    Hi @Ousret,

    This is a bit of a continuation of #145. I wanted to start a discussion on the current logging levels and why they were chosen to better understand the use case/design decision. Most of that wasn't covered in the previous issue. I'd originally read this as being a DEBUG level log but realized I was mistaken, as it's INFO.

    What do you envision the common case for logging these messages as INFO (there are more but we'll start here) [1][2][3][4]? What would the user be expected to do with the info provided? They seem like more of a stream of consciousness on what the hot path for the charset_normalizer is doing, rather than noting novel events. I'd personally not expect this to be relevant for general library usage. It probably becomes less relevant to libraries integrating with the project.

    Currently, that would result in somewhere around 3 MB of logs per hour at 1 TPS which scales out to a couple gigabytes a month. While that's not huge, it's not trivial either. If you start to scale that up to 100s of TPS, we start recording closer to 250-500GB/mo. That's a lot of IO and potential disk space for long lived logs.

    enhancement 
    opened by nateprewitt 9
  • Refactoring for potential performance improvements in loops

    Refactoring for potential performance improvements in loops

    Experiments with ideas to potentially improve performance or code consistency without impacting readability (#111).

    This PR:

    1. defines caches and sets in cd.py
    2. uses list comprehensions for language associations in cd.py
    3. refactors duplicate code in md.py

    Close #111

    opened by adbar 9
  • Use unicodedata2 if available

    Use unicodedata2 if available

    https://pypi.org/project/unicodedata2/ is usually more up to date than even the latest cpython release.

    iirc, using it is simply a matter of checking if unicodedata2 data version is higher than unicodedata, and if so sys.modules['unicodedata'] = unicodedata2 . Need to check that though

    enhancement question 
    opened by jayvdb 8
  • Fixing some performance bottlenecks

    Fixing some performance bottlenecks

    pprofile tests

    test.py

    from glob import glob
    from os.path import isdir
    from charset_normalizer import detect
    
    def performance_compare(size_coeff):
        if not isdir("./char-dataset"):
            print("This script require https://github.com/Ousret/char-dataset to be cloned on package root directory")
            exit(1)
        for tbt_path in sorted(glob("./char-dataset/**/*.*")):
            with open(tbt_path, "rb") as fp:
                content = fp.read() * size_coeff            
            detect(content)
    
    if __name__ == "__main__":
        performance_compare(1)
    

    Before

    pprofile --format callgrind --out cachegrind.out.original.test test.py
    

    Time: 838.97 s. cachegrind.out.original.zip cachegrind out original test


    Merged

    pprofile --format callgrind --out cachegrind.out.commits.test test.py
    

    Time: 716.45 s. cachegrind.out.commits.zip cachegrind out commits

    opened by deedy5 7
  • Python 2 not yet supported

    Python 2 not yet supported

    Traceback:
    test/test_on_file.py:5: in <module>
        from charset_normalizer import CharsetNormalizerMatches as CnM
    charset_normalizer/__init__.py:2: in <module>
        from charset_normalizer.normalizer import CharsetNormalizerMatches, CharsetNormalizerMatch
    charset_normalizer/normalizer.py:3: in <module>
        import statistics
    E   ImportError: No module named statistics
    
    help wanted 
    opened by jayvdb 7
  • :wrench: Tweak/adjust the logging verbosity greater-eq to warning level

    :wrench: Tweak/adjust the logging verbosity greater-eq to warning level

    I understand that the latest release unexpectedly generated some noise for some people in specific environments.

    The engagement I made with charset-normalizer given its wide deployments* still applies. Therefore, regarding :

    • https://github.com/spaam/svtplay-dl/issues/1445
    • https://github.com/home-assistant/core/issues/60615
    • https://github.com/Ousret/charset_normalizer/issues/145

    With this PR I adjust the impact to a minimal impact while keeping backward compatibility. Fixes/Adress #145

    *: Listening as broadly as possible regarding any side-effects to the community

    enhancement bugfix release flourish 
    opened by Ousret 6
  • Revise the logger instanciation/initial handlers

    Revise the logger instanciation/initial handlers

    I added the logging functionality described in the proposal. I also took care to make sure the explain argument would operate the same way. I left the behavior in api.py where if explain is not set, the logger will still log messages at the WARNING level. That behavior is really up to you as the package maintainer. It is as easy as removing that branch from the if statement and adding documentation to the repository that describes how a logger must be set via the handler if an application developer so desires.

    I also added two simple tests that check whether the set_stream_handler function does what it should. Apologies if the tests are not in the correct style. Let me know if anything is in need of attention or you have changed your mind about the behavior change for logging. Thanks for the awesome library.

    Close #134

    opened by nmaynes 6
  • [BUG] Support for custom Python environment that ignore PEP 3120

    [BUG] Support for custom Python environment that ignore PEP 3120

    Describe the bug With requests library using charset-normalizer I am getting an error when calling Python via User-Defined Transform in SAP BODS:

    File "EXPRESSION", line 6, in <module>
    File "c:\program files\python39\lib\site-packages\requests\__init__.py", line 48, in <module>
    from charset_normalizer import __version__ as charset_normalizer_version
    File "c:\program files\python39\lib\site-packages\charset_normalizer\__init__.py", line 11
    SyntaxError: Non-ASCII character '\xd1' in file c:\program files\python39\lib\site-packages\charset_normalizer\__init__.py on
    line 12, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details.
    

    I am not able to define a source code encoding by placing a magic comment into the source files (either as a first or second line in the file) because the app probably modifies the script by itself (placing # -*- coding: utf-8 -*- doesn't help). The setting of environment variable PYTHONUTF8=1 doesn't help too.

    To Reproduce I am not able to provide code to reproduce the issue, it arises when calling Python via User-Defined Transform in SAP BODS Please check: https://github.com/apache/superset/issues/15631 This could be the same problem: https://stackoverflow.com/questions/68594538/syntaxerror-non-ascii-character-xd1-in-file-charset-normalizer-init-py-i

    Expected behavior No error - with requests version using chardet library there is no problem. Maybe avoiding non-ASCII characters in init.py could help...?

    Logs Please see the bug description.

    Desktop (please complete the following information):

    • OS: Windows 2016 Server
    • Python version 3.9.6
    • Package version 2.0.6
    • Requests version 2.26.0

    Additional context N/A

    bug help wanted 
    opened by kivhub 6
  • โฌ†๏ธ Bump pypa/cibuildwheel from 2.11.2 to 2.11.4

    โฌ†๏ธ Bump pypa/cibuildwheel from 2.11.2 to 2.11.4

    Bumps pypa/cibuildwheel from 2.11.2 to 2.11.4.

    Release notes

    Sourced from pypa/cibuildwheel's releases.

    v2.11.4

    • ๐Ÿ› Fix a bug that caused missing wheels on Windows when a test was skipped using CIBW_TEST_SKIP (#1377)
    • ๐Ÿ›  Updates CPython 3.11 to 3.11.1 (#1371)
    • ๐Ÿ›  Updates PyPy 3.7 to 3.7.10, except on macOS which remains on 7.3.9 due to a bug. (#1371)
    • ๐Ÿ“š Added a reference to abi3audit to the docs (#1347)

    v2.11.3

    • โœจ Improves the 'build options' log output that's printed at the start of each run (#1352)
    • โœจ Added a friendly error message to a common misconfiguration of the CIBW_TEST_COMMAND option - not specifying path using the {project} placeholder (#1336)
    • ๐Ÿ›  The GitHub Action now uses Powershell on Windows to avoid occasional incompabilities with bash (#1346)
    Changelog

    Sourced from pypa/cibuildwheel's changelog.

    v2.11.4

    24 Dec 2022

    • ๐Ÿ› Fix a bug that caused missing wheels on Windows when a test was skipped using CIBW_TEST_SKIP (#1377)
    • ๐Ÿ›  Updates CPython 3.11 to 3.11.1 (#1371)
    • ๐Ÿ›  Updates PyPy to 7.3.10, except on macOS which remains on 7.3.9 due to a bug on that platform. (#1371)
    • ๐Ÿ“š Added a reference to abi3audit to the docs (#1347)

    v2.11.3

    5 Dec 2022

    • โœจ Improves the 'build options' log output that's printed at the start of each run (#1352)
    • โœจ Added a friendly error message to a common misconfiguration of the CIBW_TEST_COMMAND option - not specifying path using the {project} placeholder (#1336)
    • ๐Ÿ›  The GitHub Action now uses Powershell on Windows to avoid occasional incompabilities with bash (#1346)
    Commits
    • 27fc88e Bump version: v2.11.4
    • a7e9ece Merge pull request #1371 from pypa/update-dependencies-pr
    • b9a3ed8 Update cibuildwheel/resources/build-platforms.toml
    • 3dcc2ff fix: not skipping the tests stops the copy (Windows ARM) (#1377)
    • 1c9ec76 Merge pull request #1378 from pypa/henryiii-patch-3
    • 22b433d Merge pull request #1379 from pypa/pre-commit-ci-update-config
    • 98fdf8c [pre-commit.ci] pre-commit autoupdate
    • cefc5a5 Update dependencies
    • e53253d ci: move to ubuntu 20
    • e9ecc65 [pre-commit.ci] pre-commit autoupdate (#1374)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 1
  • โฌ†๏ธ Bump isort from 5.10.1 to 5.11.4

    โฌ†๏ธ Bump isort from 5.10.1 to 5.11.4

    Bumps isort from 5.10.1 to 5.11.4.

    Release notes

    Sourced from isort's releases.

    5.11.4

    Changes

    :package: Dependencies

    5.11.3

    Changes

    :beetle: Fixes

    :construction_worker: Continuous Integration

    v5.11.3

    Changes

    :beetle: Fixes

    :construction_worker: Continuous Integration

    5.11.2

    Changes

    5.11.1

    Changes December 12 2022

    ... (truncated)

    Changelog

    Sourced from isort's changelog.

    5.11.4 December 21 2022

    5.11.3 December 16 2022

    5.11.2 December 12 2022

    5.11.1 December 12 2022

    5.11.0 December 12 2022

    Commits
    • 98390f5 Merge pull request #2059 from PyCQA/version/5.11.4
    • df69a05 Bump version 5.11.4
    • f9add58 Merge pull request #2058 from PyCQA/deps/poetry-1.3.1
    • 36caa91 Bump Poetry 1.3.1
    • 3c2e2d0 Merge pull request #1978 from mgorny/toml-test
    • 45d6abd Remove obsolete toml import from the test suite
    • 3020e0b Merge pull request #2057 from mgorny/poetry-install
    • a6fdbfd Stop installing documentation files to top-level site-packages
    • ff306f8 Fix tag template to match old standard
    • 227c4ae Merge pull request #2052 from hugovk/main
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies python 
    opened by dependabot[bot] 1
  • โฌ†๏ธ Bump black from 22.10.0 to 22.12.0

    โฌ†๏ธ Bump black from 22.10.0 to 22.12.0

    Bumps black from 22.10.0 to 22.12.0.

    Release notes

    Sourced from black's releases.

    22.12.0

    Preview style

    • Enforce empty lines before classes and functions with sticky leading comments (#3302)
    • Reformat empty and whitespace-only files as either an empty file (if no newline is present) or as a single newline character (if a newline is present) (#3348)
    • Implicitly concatenated strings used as function args are now wrapped inside parentheses (#3307)
    • Correctly handle trailing commas that are inside a line's leading non-nested parens (#3370)

    Configuration

    • Fix incorrectly applied .gitignore rules by considering the .gitignore location and the relative path to the target file (#3338)
    • Fix incorrectly ignoring .gitignore presence when more than one source directory is specified (#3336)

    Parser

    • Parsing support has been added for walruses inside generator expression that are passed as function args (for example, any(match := my_re.match(text) for text in texts)) (#3327).

    Integrations

    • Vim plugin: Optionally allow using the system installation of Black via let g:black_use_virtualenv = 0(#3309)
    Changelog

    Sourced from black's changelog.

    22.12.0

    Preview style

    • Enforce empty lines before classes and functions with sticky leading comments (#3302)
    • Reformat empty and whitespace-only files as either an empty file (if no newline is present) or as a single newline character (if a newline is present) (#3348)
    • Implicitly concatenated strings used as function args are now wrapped inside parentheses (#3307)
    • Correctly handle trailing commas that are inside a line's leading non-nested parens (#3370)

    Configuration

    • Fix incorrectly applied .gitignore rules by considering the .gitignore location and the relative path to the target file (#3338)
    • Fix incorrectly ignoring .gitignore presence when more than one source directory is specified (#3336)

    Parser

    • Parsing support has been added for walruses inside generator expression that are passed as function args (for example, any(match := my_re.match(text) for text in texts)) (#3327).

    Integrations

    • Vim plugin: Optionally allow using the system installation of Black via let g:black_use_virtualenv = 0(#3309)
    Commits
    • 2ddea29 Prepare release 22.12.0 (#3413)
    • 5b1443a release: skip bad macos wheels for now (#3411)
    • 9ace064 Bump peter-evans/find-comment from 2.0.1 to 2.1.0 (#3404)
    • 19c5fe4 Fix CI with latest flake8-bugbear (#3412)
    • d4a8564 Bump sphinx-copybutton from 0.5.0 to 0.5.1 in /docs (#3390)
    • 2793249 Wordsmith current_style.md (#3383)
    • d97b789 Remove whitespaces of whitespace-only files (#3348)
    • c23a5c1 Clarify that Black runs with --safe by default (#3378)
    • 8091b25 Correctly handle trailing commas that are inside a line's leading non-nested ...
    • ffaaf48 Compare each .gitignore found with an appropiate relative path (#3338)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies python 
    opened by dependabot[bot] 1
Releases(3.0.1)
graph learning code for ogb

The final code for OGB Installation Requirements: ogb=1.3.1 torch=1.7.0 torch-geometric=1.7.0 torch-scatter=2.0.6 torch-sparse=0.6.9 Baseline models T

PierreHao 20 Nov 10, 2022
Image Smoothing and Blurring Using OpenCV

Image-Smoothing-and-Blurring-Using-OpenCV This repository contains codes for performing image smoothing and blurring using OpenCV. There are different

Happy N. Monday 3 Feb 15, 2022
Text Detection from images using OpenCV

EAST Detector for Text Detection OpenCVโ€™s EAST(Efficient and Accurate Scene Text Detection ) text detector is a deep learning model, based on a novel

Abhishek Singh 88 Oct 20, 2022
Face Anonymizer - FaceAnonApp v1.0

Face Anonymizer - FaceAnonApp v1.0 Blur faces from image and video files in /data/files folder. Contents Repo of the source files for the FaceAnonApp.

6 Apr 18, 2022
This is a repository to learn and get more computer vision skills, make robotics projects integrating the computer vision as a perception tool and create a lot of awesome advanced controllers for the robots of the future.

This is a repository to learn and get more computer vision skills, make robotics projects integrating the computer vision as a perception tool and create a lot of awesome advanced controllers for the

Elkin Javier Guerra Galeano 17 Nov 03, 2022
Visual Attention based OCR

Attention-OCR Authours: Qi Guo and Yuntian Deng Visual Attention based OCR. The model first runs a sliding CNN on the image (images are resized to hei

Yuntian Deng 1.1k Jan 02, 2023
Driver Drowsiness Detection with OpenCV & Dlib

In this project, we have built a driver drowsiness detection system that will detect if the eyes of the driver are close for too long and infer if the driver is sleepy or inactive.

Mansi Mishra 4 Oct 26, 2022
CRAFT-Pyotorch๏ผšCharacter Region Awareness for Text Detection Reimplementation for Pytorch

CRAFT-Reimplementation Note๏ผšIf you have any problems, please comment. Or you can join us weChat group. The QR code will update in issues #49 . Reimple

453 Dec 28, 2022
Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Head Detector Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd. The head_detection mod

Ramana Subramanyam 76 Dec 06, 2022
Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized

SceneTextPapers Tracking the latest progress in Scene Text Detection and Recognition: must-read papers well organized Information about this repositor

Shangbang Long 763 Jan 01, 2023
A novel region proposal network for more general object detection ( including scene text detection ).

DeRPN: Taking a further step toward more general object detection DeRPN is a novel region proposal network which concentrates on improving the adaptiv

Deep Learning and Vision Computing Lab, SCUT 151 Dec 12, 2022
With the virtual keyboard, you can write on the real time images by combining the thumb and index fingers on the letter you want.

Virtual Keyboard With the virtual keyboard, you can write on the real time images by combining the thumb and index fingers on the letter you want. At

Gรผldeniz BektaลŸ 5 Jan 23, 2022
Read-only mirror of https://gitlab.gnome.org/GNOME/ocrfeeder

================================= OCRFeeder - A Complete OCR Suite ================================= OCRFeeder is a complete Optical Character Recogn

GNOME Github Mirror 81 Dec 23, 2022
An application of high resolution GANs to dewarp images of perturbed documents

Docuwarp This project is focused on dewarping document images through the usage of pix2pixHD, a GAN that is useful for general image to image translat

Thomas Huang 97 Dec 25, 2022
A set of workflows for corpus building through OCR, post-correction and normalisation

PICCL: Philosophical Integrator of Computational and Corpus Libraries PICCL offers a workflow for corpus building and builds on a variety of tools. Th

Language Machines 41 Dec 27, 2022
Generating .npy dataset and labels out of given image, containing numbers from 0 to 9, using opencv

basic-dataset-generator-from-image-of-numbers generating .npy dataset and labels out of given image, containing numbers from 0 to 9, using opencv inpu

1 Jan 01, 2022
An OCR evaluation tool

dinglehopper dinglehopper is an OCR evaluation tool and reads ALTO, PAGE and text files. It compares a ground truth (GT) document page with a OCR resu

QURATOR-SPK 40 Dec 20, 2022
QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021)

QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021) Yuanming Hu, Jiafeng Liu, Xuanda Yang, Mingkuan Xu, Ye Kuang, Weiwei Xu, Qiang Dai, W

Taichi Developers 119 Dec 02, 2022
PyQT5 app that colorize black & white pictures using CNN(use pre-trained model which was made with OpenCV)

About PyQT5 app that colorize black & white pictures using CNN(use pre-trained model which was made with OpenCV) Colorizor ะŸั€ะธะปะพะถะตะฝะธะต ะดะปั ะฟั€ะพะตะบั‚ะฐ Yand

1 Apr 04, 2022
This is a project to detect gestures to zoom in or out, using the real-time distance between the index finger and the thumb. It's based on OpenCV and Mediapipe.

Pinch-zoom This is a python project based on real-time hand-gesture detection, to zoom in or out, using the distance between the index finger and the

Harshit Bhalla 6 Jul 11, 2022