Bidirectionally transformed strings

Overview

bistring

Build status Documentation status

The bistring library provides non-destructive versions of common string processing operations like normalization, case folding, and find/replace. Each bistring remembers the original string, and how its substrings map to substrings of the modified version.

For example:

>>> from bistring import bistr
>>> s = bistr('๐•ฟ๐–๐–Š ๐––๐–š๐–Ž๐–ˆ๐–, ๐–‡๐–—๐–”๐–œ๐–“ ๐ŸฆŠ ๐–๐–š๐–’๐–•๐–˜ ๐–”๐–›๐–Š๐–— ๐–™๐–๐–Š ๐–‘๐–†๐–Ÿ๐–ž ๐Ÿถ')
>>> s = s.normalize('NFKD')     # Unicode normalization
>>> s = s.casefold()            # Case-insensitivity
>>> s = s.replace('๐ŸฆŠ', 'fox')  # Replace emoji with text
>>> s = s.replace('๐Ÿถ', 'dog')
>>> s = s.sub(r'[^\w\s]+', '')  # Strip everything but letters and spaces
>>> s = s[:19]                  # Extract a substring
>>> s.modified                  # The modified substring, after changes
'the quick brown fox'
>>> s.original                  # The original substring, before changes
'๐•ฟ๐–๐–Š ๐––๐–š๐–Ž๐–ˆ๐–, ๐–‡๐–—๐–”๐–œ๐–“ ๐ŸฆŠ'

Languages

PyPI version npm version

bistring is available in multiple languages, currently Python and JavaScript/TypeScript. Ports to other languages are planned for the near future.

The code is structured similarly in each language to make it easy to share algorithms, tests, and fixes between them. The main differences come from trying to mirror the language's built-in string API. If you want to contribute a bug fix or a new feature, feel free to implement it in any one of the supported languages, and we'll try to port it to the rest of them.

Demo

Click here for a live demo of the bistring library in your browser.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Comments
  • Whitespace at start of string mishandled by SentenceTokenizer?

    Whitespace at start of string mishandled by SentenceTokenizer?

    I'm not sure if this is expected behaviour or a bug but the following code illustrates my uncertainty:

    pre_split = bistring.bistr(" \tFoo. \t\n \tBar. \t") \
        .sub(r"^\s+", "") \
        .sub(r"\s*\n\s*", "\n") \
        .sub(r"\s+$", "\n")
    
    post_split = bistring.bistr.join(
        [s.text for s in bistring.SentenceTokenizer("en_GB").tokenize(pre_split)]
    )
    
    # These should print True but actually print False
    print(pre_split == post_split)
    print(pre_split.original == post_split.original)
    
    # This should print True and does print True
    print(pre_split.modified == post_split.modified)
    
    # This should print False but actually prints True
    print(pre_split.original[2:] == post_split.original)
    

    In summary, I was expecting the result of re-joining the tokens produced by SentenceTokenizer to yield an identical bistr to the one that existed prior to the splitting. This appears to be true with the only (known) exception being whitespace at the start of the first sentence is being lost. Whitespace at the end of the string, and whitespace between sentences within the string, are retained as expected.

    Is this expected behaviour?

    Produces behaviour using Python 3.7, bistring 0.4.0, pyicu 2.6, and icu 68.1 (all installed via conda-forge).

    question 
    opened by qtdaniel 10
  • Composition of no-op replacements produces incorrect (or confusing?) alignment

    Composition of no-op replacements produces incorrect (or confusing?) alignment

    >>> from bistring import bistr
    >>> b = bistr("abc")
    >>> b1 = b.replace("bc", "bc")
    >>> b2 = b1.replace("ab", "ab")
    >>> b2
    bistr('abc', 'abc', Alignment([(0, 0), (1, 2), (3, 3)]))
    >>> b2[:2].original
    'a'
    

    Both individual replacements effectively don't change the contents of the original string. I think b2[:2].original "should" return 'abc' here. This would probably be achieved with a coarser composed alignment Alignment([(0, 0), (3, 3)]).

    opened by maxhgerlach 3
  • PyPI Release?

    PyPI Release?

    Thank you very much for this library! It's really useful in many text processing use cases for NLP.

    The last release 0.4 on PyPI is from September 2019 and on master there's been at least a fix for bistr.join() since then: #20 Would you consider cutting a new release?

    opened by maxhgerlach 3
  • Arithmetic progressions

    Arithmetic progressions

    This optimizes the representation of Alignments by detecting and compressing any arithmetic sub-sequences that are found. This implicitly compresses any runs with the identity mapping, as well as other potentially common ones like 2โ†’1 char mappings from UTF-16 to UTF-8 or non-BMP to BMP.

    • [x] Python
    • [ ] JS
    opened by tavianator 3
  • Fix readthedocs build

    Fix readthedocs build

    • js: Update dependencies
    • js: Work around https://github.com/rollup/plugins/issues/934
    • docs: Update npm dependencies
    • docs: Work around https://github.com/readthedocs/readthedocs-docker-images/issues/107
    opened by tavianator 2
  • build(deps): bump urllib3 from 1.26.4 to 1.26.5 in /docs

    build(deps): bump urllib3 from 1.26.4 to 1.26.5 in /docs

    Bumps urllib3 from 1.26.4 to 1.26.5.

    Release notes

    Sourced from urllib3's releases.

    1.26.5

    :warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap

    • Fixed deprecation warnings emitted in Python 3.10.
    • Updated vendored six library to 1.16.0.
    • Improved performance of URL parser when splitting the authority component.

    If you or your organization rely on urllib3 consider supporting us via GitHub Sponsors

    Changelog

    Sourced from urllib3's changelog.

    1.26.5 (2021-05-26)

    • Fixed deprecation warnings emitted in Python 3.10.
    • Updated vendored six library to 1.16.0.
    • Improved performance of URL parser when splitting the authority component.
    Commits
    • d161647 Release 1.26.5
    • 2d4a3fe Improve performance of sub-authority splitting in URL
    • 2698537 Update vendored six to 1.16.0
    • 07bed79 Fix deprecation warnings for Python 3.10 ssl module
    • d725a9b Add Python 3.10 to GitHub Actions
    • 339ad34 Use pytest==6.2.4 on Python 3.10+
    • f271c9c Apply latest Black formatting
    • 1884878 [1.26] Properly proxy EOF on the SSLTransport test suite
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies python 
    opened by dependabot[bot] 2
  • build(deps): bump lodash from 4.17.15 to 4.17.19 in /docs

    build(deps): bump lodash from 4.17.15 to 4.17.19 in /docs

    โš ๏ธ Dependabot is rebasing this PR โš ๏ธ

    If you make any changes to it yourself then they will take precedence over the rebase.


    Bumps lodash from 4.17.15 to 4.17.19.

    Release notes

    Sourced from lodash's releases.

    4.17.16

    Commits
    Maintainer changes

    This version was pushed to npm by mathias, a new releaser for lodash since your current version.


    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 2
  • build(deps): bump minimist from 1.2.5 to 1.2.6 in /js

    build(deps): bump minimist from 1.2.5 to 1.2.6 in /js

    Bumps minimist from 1.2.5 to 1.2.6.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies javascript 
    opened by dependabot[bot] 1
  • build(deps): bump node-notifier from 8.0.0 to 8.0.1 in /js

    build(deps): bump node-notifier from 8.0.0 to 8.0.1 in /js

    Bumps node-notifier from 8.0.0 to 8.0.1.

    Changelog

    Sourced from node-notifier's changelog.

    v8.0.1

    • fixes possible injection issue for notify-send
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • build(deps): bump highlight.js from 10.1.2 to 10.4.1 in /docs

    build(deps): bump highlight.js from 10.1.2 to 10.4.1 in /docs

    Bumps highlight.js from 10.1.2 to 10.4.1.

    Release notes

    Sourced from highlight.js's releases.

    10.4.1

    Security fixes:

    • (fix) Exponential backtracking fixes for: Josh Goebel
      • cpp
      • handlebars
      • gams
      • perl
      • jboss-cli
      • r
      • erlang-repl
      • powershell
      • routeros
    • (fix) Polynomial backtracking fixes for: Josh Goebel
      • asciidoc
      • reasonml
      • latex
      • kotlin
      • gcode
      • d
      • aspectj
      • moonscript
      • coffeescript/livescript
      • csharp
      • scilab
      • crystal
      • elixir
      • basic
      • ebnf
      • ruby
      • fortran/irpf90
      • livecodeserver
      • yaml
      • x86asm
      • dsconfig
      • markdown
      • ruleslanguage
      • xquery
      • sqf

    Very grateful to Michael Schmidt for all the help.

    10.4.0 - November 2020

    A largish release with many improvements and fixes from quite a few different contributors. Enjoy!

    Deprecations:

    ... (truncated)

    Changelog

    Sourced from highlight.js's changelog.

    Version 10.4.1 (tentative)

    Security

    • (fix) Exponential backtracking fixes for: Josh Goebel
      • cpp
      • handlebars
      • gams
      • perl
      • jboss-cli
      • r
      • erlang-repl
      • powershell
      • routeros
    • (fix) Polynomial backtracking fixes for: Josh Goebel
      • asciidoc
      • reasonml
      • latex
      • kotlin
      • gcode
      • d
      • aspectj
      • moonscript
      • coffeescript/livescript
      • csharp
      • scilab
      • crystal
      • elixir
      • basic
      • ebnf
      • ruby
      • fortran/irpf90
      • livecodeserver
      • yaml
      • x86asm
      • dsconfig
      • markdown
      • ruleslanguage
      • xquery
      • sqf

    Very grateful to Michael Schmidt for all the help.

    Version 10.4.0

    A largish release with many improvements and fixes from quite a few different contributors. Enjoy!

    ... (truncated)

    Commits
    • e96b915 bump 10.4.1
    • 065f65f chore(release) allow release script to handle production releases
    • 68509fc chore(docs) bump SECURITY mention to 9.18.5
    • aa0fb85 chore(docs) Version 9 has reached EOL.
    • fb0a626 enh(ci): Add tests for polynomial regex issues
    • fa46dd1 fix(reasonml) fix poly backtracking issue
    • d496052 fix(latex) fix poly backtracking issue
    • d9f1cdb fix(javascript/typescript) fix poly backtracking issue
    • fdec037 fix(asciidoc) fix poly backtracking issue
    • 02ca487 fix(kotlin) fix poly backtracking issue
    • Additional commits viewable in compare view
    Maintainer changes

    This version was pushed to npm by joshgoebel, a new releaser for highlight.js since your current version.


    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • build(deps): bump lodash from 4.17.15 to 4.17.19 in /js

    build(deps): bump lodash from 4.17.15 to 4.17.19 in /js

    Bumps lodash from 4.17.15 to 4.17.19.

    Release notes

    Sourced from lodash's releases.

    4.17.16

    Commits
    Maintainer changes

    This version was pushed to npm by mathias, a new releaser for lodash since your current version.


    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • build(deps): bump json5 from 2.2.1 to 2.2.3 in /js

    build(deps): bump json5 from 2.2.1 to 2.2.3 in /js

    Bumps json5 from 2.2.1 to 2.2.3.

    Release notes

    Sourced from json5's releases.

    v2.2.3

    v2.2.2

    • Fix: Properties with the name __proto__ are added to objects and arrays. (#199) This also fixes a prototype pollution vulnerability reported by Jonathan Gregson! (#295).
    Changelog

    Sourced from json5's changelog.

    v2.2.3 [code, diff]

    v2.2.2 [code, diff]

    • Fix: Properties with the name __proto__ are added to objects and arrays. (#199) This also fixes a prototype pollution vulnerability reported by Jonathan Gregson! (#295).
    Commits
    • c3a7524 2.2.3
    • 94fd06d docs: update CHANGELOG for v2.2.3
    • 3b8cebf docs(security): use GitHub security advisories
    • f0fd9e1 docs: publish a security policy
    • 6a91a05 docs(template): bug -> bug report
    • 14f8cb1 2.2.2
    • 10cc7ca docs: update CHANGELOG for v2.2.2
    • 7774c10 fix: add proto to objects and arrays
    • edde30a Readme: slight tweak to intro
    • 97286f8 Improve example in readme
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies javascript 
    opened by dependabot[bot] 0
  • build(deps): bump certifi from 2021.10.8 to 2022.12.7 in /docs

    build(deps): bump certifi from 2021.10.8 to 2022.12.7 in /docs

    Bumps certifi from 2021.10.8 to 2022.12.7.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies python 
    opened by dependabot[bot] 0
  • Add CodeQL workflow for GitHub code scanning

    Add CodeQL workflow for GitHub code scanning

    Hi microsoft/bistring!

    This is a one-off automatically generated pull request from LGTM.com :robot:. You might have heard that weโ€™ve integrated LGTMโ€™s underlying CodeQL analysis engine natively into GitHub. The result is GitHub code scanning!

    With LGTM fully integrated into code scanning, we are focused on improving CodeQL within the native GitHub code scanning experience. In order to take advantage of current and future improvements to our analysis capabilities, we suggest you enable code scanning on your repository. Please take a look at our blog post for more information.

    This pull request enables code scanning by adding an auto-generated codeql.yml workflow file for GitHub Actions to your repository โ€” take a look! We tested it before opening this pull request, so all should be working :heavy_check_mark:. In fact, you might already have seen some alerts appear on this pull request!

    Where needed and if possible, weโ€™ve adjusted the configuration to the needs of your particular repository. But of course, you should feel free to tweak it further! Check this page for detailed documentation.

    Questions? Check out the FAQ below!

    FAQ

    Click here to expand the FAQ section

    How often will the code scanning analysis run?

    By default, code scanning will trigger a scan with the CodeQL engine on the following events:

    • On every pull request โ€” to flag up potential security problems for you to investigate before merging a PR.
    • On every push to your default branch and other protected branches โ€” this keeps the analysis results on your repositoryโ€™s Security tab up to date.
    • Once a week at a fixed time โ€” to make sure you benefit from the latest updated security analysis even when no code was committed or PRs were opened.

    What will this cost?

    Nothing! The CodeQL engine will run inside GitHub Actions, making use of your unlimited free compute minutes for public repositories.

    What types of problems does CodeQL find?

    The CodeQL engine that powers GitHub code scanning is the exact same engine that powers LGTM.com. The exact set of rules has been tweaked slightly, but you should see almost exactly the same types of alerts as you were used to on LGTM.com: weโ€™ve enabled the security-and-quality query suite for you.

    How do I upgrade my CodeQL engine?

    No need! New versions of the CodeQL analysis are constantly deployed on GitHub.com; your repository will automatically benefit from the most recently released version.

    The analysis doesnโ€™t seem to be working

    If you get an error in GitHub Actions that indicates that CodeQL wasnโ€™t able to analyze your code, please follow the instructions here to debug the analysis.

    How do I disable LGTM.com?

    If you have LGTMโ€™s automatic pull request analysis enabled, then you can follow these steps to disable the LGTM pull request analysis. You donโ€™t actually need to remove your repository from LGTM.com; it will automatically be removed in the next few months as part of the deprecation of LGTM.com (more info here).

    Which source code hosting platforms does code scanning support?

    GitHub code scanning is deeply integrated within GitHub itself. If youโ€™d like to scan source code that is hosted elsewhere, we suggest that you create a mirror of that code on GitHub.

    How do I know this PR is legitimate?

    This PR is filed by the official LGTM.com GitHub App, in line with the deprecation timeline that was announced on the official GitHub Blog. The proposed GitHub Action workflow uses the official open source GitHub CodeQL Action. If you have any other questions or concerns, please join the discussion here in the official GitHub community!

    I have another question / how do I get in touch?

    Please join the discussion here to ask further questions and send us suggestions!

    opened by lgtm-com[bot] 1
  • Problems with PyICU when installing bistring 0.4.0 with pip

    Problems with PyICU when installing bistring 0.4.0 with pip

    Summary

    I ran into what's apparently a known issue with installing PyICU over pip while trying to pip install bistring==0.4.0. Contrary to the error message and recommendations on that thread, installing pkg-config and libicu-dev didn't fix the issue for me. Only installing python3-icu (as recommended in the official PyICU docs) finally fixed it.

    This is obviously not an issue with bistring itself, but it makes it difficult to install bistring because the ICU dependency can't be automatically installed by pip. If there is nothing else that can be done about it, maybe a note about this could at least be added to the Readme file, so people can avoid the frustration of running into the pip error?

    More Details

    Here is the output I got from pip install bistring==0.4.0:

    Collecting bistring==0.4.0
      Downloading bistring-0.4.0-py3-none-any.whl (22 kB)
    Collecting pyicu
      Downloading PyICU-2.8.tar.gz (299 kB)
         |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 299 kB 2.1 MB/s
      Installing build dependencies ... done
      Getting requirements to build wheel ... error
      ERROR: Command errored out with exit status 1:
       command: /usr/bin/python3 /tmp/tmprkr9c6uy get_requires_for_build_wheel /tmp/tmp5tnhtmha
           cwd: /tmp/pip-install-x_54yndb/pyicu
      Complete output (64 lines):
      (running 'icu-config --version')
      (running 'pkg-config --modversion icu-i18n')
      Traceback (most recent call last):
        File "setup.py", line 63, in <module>
          ICU_VERSION = os.environ['ICU_VERSION']
        File "/usr/lib/python3.8/os.py", line 675, in __getitem__
          raise KeyError(key) from None
      KeyError: 'ICU_VERSION'
    
      During handling of the above exception, another exception occurred:
    
      Traceback (most recent call last):
        File "setup.py", line 66, in <module>
          ICU_VERSION = check_output(('icu-config', '--version')).strip()
        File "setup.py", line 19, in check_output
          return subprocess_check_output(popenargs)
        File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
          return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
        File "/usr/lib/python3.8/subprocess.py", line 489, in run
          with Popen(*popenargs, **kwargs) as process:
        File "/usr/lib/python3.8/subprocess.py", line 854, in __init__
          self._execute_child(args, executable, preexec_fn, close_fds,
        File "/usr/lib/python3.8/subprocess.py", line 1702, in _execute_child
          raise child_exception_type(errno_num, err_msg, err_filename)
      FileNotFoundError: [Errno 2] No such file or directory: 'icu-config'
    
      During handling of the above exception, another exception occurred:
    
      Traceback (most recent call last):
        File "setup.py", line 69, in <module>
          ICU_VERSION = check_output(('pkg-config', '--modversion', 'icu-i18n')).strip()
        File "setup.py", line 19, in check_output
          return subprocess_check_output(popenargs)
        File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
          return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
        File "/usr/lib/python3.8/subprocess.py", line 489, in run
          with Popen(*popenargs, **kwargs) as process:
        File "/usr/lib/python3.8/subprocess.py", line 854, in __init__
          self._execute_child(args, executable, preexec_fn, close_fds,
        File "/usr/lib/python3.8/subprocess.py", line 1702, in _execute_child
          raise child_exception_type(errno_num, err_msg, err_filename)
      FileNotFoundError: [Errno 2] No such file or directory: 'pkg-config'
    
      During handling of the above exception, another exception occurred:
    
      Traceback (most recent call last):
        File "/tmp/tmprkr9c6uy", line 280, in <module>
          main()
        File "/tmp/tmprkr9c6uy", line 263, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/tmp/tmprkr9c6uy", line 114, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-p3nxngyx/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 162, in get_requires_for_build_wheel
          return self._get_build_requires(
        File "/tmp/pip-build-env-p3nxngyx/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 143, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-p3nxngyx/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 158, in run_setup
          exec(compile(code, __file__, 'exec'), locals())
        File "setup.py", line 71, in <module>
          raise RuntimeError('''
      RuntimeError:
      Please install pkg-config on your system or set the ICU_VERSION environment
      variable to the version of ICU you have installed.
    
      ----------------------------------------
    ERROR: Command errored out with exit status 1: /usr/bin/python3 /tmp/tmprkr9c6uy get_requires_for_build_wheel /tmp/tmp5tnhtmha Check the logs for full command output
    
    opened by outofthecave 4
  • Rust

    Rust

    • [x] Add a Rust project
    • [x] Implement Alignment
    • [ ] Implement BiString
    • [ ] Implement slices
    • [x] Add a README
    • [ ] Benchmarks
    • [ ] Try arithmetic progression compression
    • [ ] Unify unbounded slice behaviour with Python and JS
    • [x] CI
    opened by tavianator 0
  • Transliterate

    Transliterate

    I was hoping you might advise me on how to incorporate transliteration into a text transformation pipeline.

    Let's say I want to use a 3rd party library like from unidecode import unidecode. I could create a bistring with new_bistr = bistr(text.modified, unidecode(text.modified)) but I would loose all the previous operations.

    Is there a way to fold in a modified string that is calculated outside bistring's capabilities?

    enhancement question 
    opened by christian-storm 4
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
An online markdown resume template project, based on pywebio

An online markdown resume template project, based on pywebio

ๆž็ฎ€XksA 5 Nov 10, 2022
Um simulador de caixa registradora com database usando arquivos .txt

๐Ÿ›’ Caixa Registradora V2 โ“ - Como usar? Execute o caixa-registradora.py, nele vai ter um menu interativo, vocรช pode cadastrar diversos produtos em um

Gabriel 0 Sep 25, 2022
Microsoft's Cascadia Code font customized to my liking.

Microsoft's Cascadia Code font customized to my liking. Also includes some simple batch patch and bake scripts to batch patch glyphs and bake font features into fonts!

Frederik List 3 Jan 29, 2022
PyNews ๐Ÿ“ฐ Simple newsletter made with python ๐Ÿ๐Ÿ—ž๏ธ

PyNews ๐Ÿ“ฐ Simple newsletter made with python Install dependencies This project has some dependencies (see requirements.txt) that are not included in t

Luciano Felix 4 Aug 21, 2022
This project aims to test check if your RegExp are being matched by grep.

Bash RegExp This project aims to test check if your RegExp are being matched by grep. It's a local server that starts on the port 8080. It runs the se

Quatrecentquatre 1 Feb 28, 2022
RSS Reader application for the Emacs Application Framework.

EAF RSS Reader RSS Reader application for the Emacs Application Framework. Load application (add-to-list 'load-path "~/.emacs.d/site-lisp/eaf-rss-read

EAF 15 Dec 07, 2022
Code Jam for creating a text-based adventure game engine and custom worlds

Text Based Adventure Jam Author: Devin McIntyre Our goal is two-fold: Create a text based adventure game engine that can parse a standard file format

HTTPChat 4 Dec 26, 2021
Extract knowledge from raw text

Extract knowledge from raw text This repository is a nearly copy-paste of "From Text to Knowledge: The Information Extraction Pipeline" with some cosm

Raphael Sourty 10 Dec 03, 2022
Python Lex-Yacc

PLY (Python Lex-Yacc) Copyright (C) 2001-2020 David M. Beazley (Dabeaz LLC) All rights reserved. Redistribution and use in source and binary forms, wi

David Beazley 2.4k Dec 31, 2022
Aml - anti-money laundering

Anti-money laundering Dedect relationship between A and E by tracing through payments with similar amounts and identifying payment chains. For example

3 Nov 21, 2022
Vector space based Information Retrieval System for Text Processing - Information retrieval

Information Retrieval: Text Processing Group 13 Sequence of operations Install Requirements Add given wikipedia files to the corpus directory. Downloa

1 Jan 01, 2022
utoken is a multilingual tokenizer that divides text into words, punctuation and special tokens such as numbers, URLs, XML tags, email-addresses and hashtags.

utoken utoken is a multilingual tokenizer that divides text into words, punctuation and special tokens such as numbers, URLs, XML tags, email-addresse

Ulf Hermjakob 11 Jan 05, 2023
A slugifier that works in unicode

Unicode Slugify Unicode Slugify is a slugifier that generates unicode slugs. It was originally used in the Firefox Add-ons web site to generate slugs

Mozilla 315 Nov 21, 2022
A Python package to facilitate research on building and evaluating automated scoring models.

Rater Scoring Modeling Tool Introduction Automated scoring of written and spoken test responses is a growing field in educational natural language pro

ETS 59 Oct 10, 2022
Find a Doc is a free online resource aimed at helping connect the foreign community in Japan with health services in their native language.

Find a Doc - Localization Find a Doc is a free online resource aimed at helping connect the foreign community in Japan with health services in their n

Our Japan Life 18 Dec 19, 2022
WorldCloud Orรงamento de Estado 2022

World Cloud Orรงamento de Estado 2022 What it does This script creates a worldcloud, masked on a image, from a txt file How to run it? Install all libr

Jorge Gomes 2 Oct 12, 2021
Word-Generator - Generates meaningful words from dictionary with given no. of letters and words.

Meaningful Word Generator Generates meaningful words from dictionary with given no. of letters and words. This might be useful for generating short li

Mohammed Rabil 1 Jan 01, 2022
Search for terms(word / table / field name or any) under Snowflake schema names

snowflake-search-terms-in-ddl-views Search for terms(word / table / field name or any) under Snowflake schema names Version : 1.0v How to use ? Run th

Igal Emona 1 Dec 15, 2021
Wikipedia Extractive Text Summarizer + Keywords Identification (entropy-based)

Wikipedia Extractive Text Summarizer + Keywords Identification (entropy-based)Wikipedia Extractive Text Summarizer + Keywords Identification (entropy-based)

Kevin Lai 1 Nov 08, 2021