Augmenty is an augmentation library based on spaCy for augmenting texts.

Overview

Augmenty: The cherry on top of your NLP pipeline

PyPI version python version Code style: black github actions pytest github actions docs github coverage CodeFactor Streamlit App pip downloads

Augmenty is an augmentation library based on spaCy for augmenting texts. Besides a wide array of highly flexible augmenters, Augmenty provides a series of tools for working with augmenters, including combining and moderating augmenters. Augmenty differs from other augmentation libraries in that it corrects (as far as possible) the assigned labels under the augmentation, thus making many of the augmenters valid for training more than simply sentence classification.

🔧 Installation

To get started using augmenty simply install it using pip by running the following line in your terminal:

pip install augmenty

Do note that this is a minimal installation. As some augmenters requires additional packages please write the following line to install all dependencies.

pip install augmenty[all]

For more detailed instructions on installing augmenty, including specific language support, see the installation instructions.

🍒 Simple Example

The following shows a simple example of how you can quickly augment text using Augmenty. For more on using augmenty see the usage guides.

import spacy
import augmenty

nlp = spacy.load("en_core_web_sm")

docs = nlp.pipe(["Augmenty is a great tool for text augmentation"])

entity_augmenter = augmenty.load("ents_replace.v1", 
                                 ent_dict = {{"ORG": [["spaCy"], ["spaCy", "Universe"]]})

for doc in augmenty.docs(docs, augmenter=entity_augmenter)
    print(doc)
spaCy Universe is a great tool for text augmentation.

📖 Documentation

Documentation
📚 Usage Guides Guides and instruction on how to use augmenty and its features.
📰 News and changelog New additions, changes and version history.
🎛 API References The detailed reference for augmenty's API. Including function documentation
🍒 Augmenters Contains a full list of current augmenters in augmenty.
😎 Demo A simple streamlit demo to try out the augmenters.

💬 Where to ask questions

Type
🚨 Bug Reports GitHub Issue Tracker
🎁 Feature Requests & Ideas GitHub Issue Tracker
👩‍💻 Usage Questions GitHub Discussions
🗯 General Discussion GitHub Discussions
🍒 Adding an Augmenter Adding an augmenter

🤔 FAQ

How do I test the code and run the test suite?

augmenty comes with an extensive test suite. In order to run the tests, you'll usually want to clone the repository and build augmenty from the source. This will also install the required development dependencies and test utilities defined in the requirements.txt.

pip install -r requirements.txt
pip install pytest

python -m pytest

which will run all the test in the augmenty/tests folder.

Specific tests can be run using:

python -m pytest augmenty/tests/test_docs.py

Code Coverage If you want to check code coverage you can run the following:

pip install pytest-cov

python -m pytest --cov=.

Does augmenty run on X?

augmenty is intended to run on all major OS, this includes Windows (latest version), MacOS (Catalina) and the latest version of Linux (Ubuntu). Below you can see if augmenty passes its test suite for the system of interest. Please note these are only the systems augmenty is being actively tested on, if you run on a similar system (e.g. an earlier version of Linux) augmenty will likely run there as well, if not please create an issue.

Operating System Status
Ubuntu/Linux (Latest) github actions pytest ubuntu
MacOS (Catalina) github actions pytest catalina
Windows (Latest) github actions pytest windows

How is the documentation generated?

augmenty uses sphinx to generate documentation. It uses the Furo theme with a custom styling.

To make the documentation you can run:

# install sphinx, themes and extensions
pip install sphinx furo sphinx-copybutton sphinxext-opengraph

# generate html from documentations

make -C docs html

Many of these augmenters are completely useless for training?

That is true, some of the augmenters are rarely something you would augment with during training. For instance randomly adding or removing spacing. However, augmentation can just as well be used to test whether a model is robust to certain variations.


Can I use augmenty without using spacy?

Indeed augmenty contains convenience functions for applying augmentation directly to raw texts. Check out the getting started guide to learn how.


🎓 Citing this work

If you use this library in your research, please cite:

@inproceedings{augmenty2021,
    title={Augmenty, the cherry on top of your NLP pipeline},
    author={Enevoldsen, Kenneth and Hansen, Lasse},
    year={2021}
}
Comments
  • Use of augmenty with spacy config files for training

    Use of augmenty with spacy config files for training

    I didn't see any documentation on how to import these augmenters when using spacy 3.0's config and command line system when training. Is it possible to use it in this sense? If so, how?

    apon further review, for the command line to register new augmentations, the flag: -- code <code.py> Needs to be set when calling the training. I have tried to point to the specific file that contains the keystroke aug that I wanted but it complains about not knowing a parent for relative imports. I also tried the various init.py files but it complained also. It seems to work when you take the code out and place it in a new file without relative imports and point to that.

    image

    Which page or section is this issue related to?

    https://spacy.io/usage/training#data-augmentation-custom

    https://kennethenevoldsen.github.io/augmenty/tutorials/introduction.html#Applying-the-augmentation

    documentation 
    opened by Giles-Billenness 3
  • Added sententence_subset.v1 augmenter following #48

    Added sententence_subset.v1 augmenter following #48

    Following #48, Added the sententence_subset.v1 augmenter which subsamples sentences from a document:

    import augmenty
    import spacy
    nlp = spacy.load("en_core_web_sm")
    
    # four sentences
    text = """Augmenty is a wonderful tool for augmentation. Augmentation is a wonderful tool
    for obtaining higher performance on limited data. You can also use it to see how
    robust your model is to changes. It will sample subset of the paragraf."""
    docs = nlp(text)
    
    augmenter = augmenty.load("sententence_subset.v1",  respect_sentences = True)
    
    list(augmenty.texts(texts, augmenter, nlp))
    

    Missing:

    • [ ] Add tests
    • [ ] Add documentation
    opened by KennethEnevoldsen 3
  • Paragraf subset augmenter

    Paragraf subset augmenter

    A paragraf subset augmentation which can work on token and sentence level. It will sample a random percentage of included coherent tokens/sentences and a random token/sentence start position ensuring the former constraint is maintained. The augmenter needs to handle annotated entities and avoid breaking them.

    Input arguments: level: how often to apply augmenter min_paragraf: Minimum percentage of tokens or sentences to include. Ie. 4 sentences with min_paragraf=0.5 means it as a minimum includes 2 sentences. sentence_level: Boolean to define if token or sentence level to define

    Example - sentence level

    import augmenty
    import spacy
    nlp = spacy.load("en_core_web_sm")
    
    # four sentences
    texts = [
        "Augmenty is a wonderful tool for augmentation. Augmentation is a wonderful tool"
        "for obtaining higher performance on limited data. You can also use it to see how "
        "robust your model is to changes. It will sample subset of the paragraf.",
    ]
    docs = nlp(texts)
    
    augmenter = augmenty.load("paragraf_subset.v1", level=1.0, min_paragraf=0.5, sentence_level=True)
    
    list(augmenty.texts(texts, augmenter, nlp))
    

    Example outputs:

    The first section:

    Augmenty is a wonderful tool for augmentation. Augmentation is a wonderful tool 
    for obtaining higher performance on limited data.
    

    The middle section:

    Augmentation is a wonderful tool for obtaining higher performance on limited data. 
    You can also use it to see how robust your model is to changes.
    

    The middle section:

    You can also use it to see how robust your model is to changes. It will sample subset 
    of the paragraf.
    

    Additional thoughts:

    Possibly addition of a reverse augmenter, eg. removing a coherent section of tokens/sentences.

    additional augmenter 
    opened by martincjespersen 3
  • :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.25 to 1.1.26

    :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.25 to 1.1.26

    Bumps MishaKav/pytest-coverage-comment from 1.1.25 to 1.1.26.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 2
  • :arrow_up: Update pydantic requirement from <1.9.0,>=1.8.2 to >=1.8.2,<1.10.0

    :arrow_up: Update pydantic requirement from <1.9.0,>=1.8.2 to >=1.8.2,<1.10.0

    Updates the requirements on pydantic to permit the latest version.

    Release notes

    Sourced from pydantic's releases.

    v1.9.0 (2021-12-31)

    Thank you to pydantic's sponsors: @​sthagen, @​timdrijvers, @​toinbis, @​koxudaxi, @​ginomempin, @​primer-io, @​and-semakin, @​westonsteimel, @​reillysiemens, @​es3n1n, @​jokull, @​JonasKs, @​Rehket, @​corleyma, @​daddycocoaman, @​hardbyte, @​datarootsio, @​jodal, @​aminalaee, @​rafsaf, @​jqueguiner, @​chdsbd, @​kevinalh, @​Mazyod, @​grillazz, @​JonasKs, @​simw, @​leynier, @​xfenix for their kind support.

    Highlights

    v1.9.0 (2021-12-31) Changes

    v1.9.0a2 (2021-12-24) Changes

    v1.9.0a1 (2021-12-18) Changes

    • Add support for Decimal-specific validation configurations in Field(), additionally to using condecimal(), to allow better support from editors and tooling, #3507 by @​tiangolo
    • Add arm64 binaries suitable for MacOS with an M1 CPU to PyPI, #3498 by @​samuelcolvin
    • Fix issue where None was considered invalid when using a Union type containing Any or object, #3444 by @​tharradine
    • When generating field schema, pass optional field argument (of type pydantic.fields.ModelField) to __modify_schema__() if present, #3434 by @​jasujm
    • Fix issue when pydantic fail to parse typing.ClassVar string type annotation, #3401 by @​uriyyo
    • Mention Python >= 3.9.2 as an alternative to typing_extensions.TypedDict, #3374 by @​BvB93
    • Changed the validator method name in the Custom Errors example to more accurately describe what the validator is doing; changed from name_must_contain_space to value_must_equal_bar, #3327 by @​michaelrios28
    • Add AmqpDsn class, #3254 by @​kludex
    • Always use Enum value as default in generated JSON schema, #3190 by @​joaommartins
    • Add support for Mypy 0.920, #3175 by @​christianbundy
    • validate_arguments now supports extra customization (used to always be Extra.forbid), #3161 by @​PrettyWood

    ... (truncated)

    Changelog

    Sourced from pydantic's changelog.

    v1.9.0 (2021-12-31)

    Thank you to pydantic's sponsors: @​sthagen, @​timdrijvers, @​toinbis, @​koxudaxi, @​ginomempin, @​primer-io, @​and-semakin, @​westonsteimel, @​reillysiemens, @​es3n1n, @​jokull, @​JonasKs, @​Rehket, @​corleyma, @​daddycocoaman, @​hardbyte, @​datarootsio, @​jodal, @​aminalaee, @​rafsaf, @​jqueguiner, @​chdsbd, @​kevinalh, @​Mazyod, @​grillazz, @​JonasKs, @​simw, @​leynier, @​xfenix for their kind support.

    Highlights

    v1.9.0 (2021-12-31) Changes

    v1.9.0a2 (2021-12-24) Changes

    v1.9.0a1 (2021-12-18) Changes

    • Add support for Decimal-specific validation configurations in Field(), additionally to using condecimal(), to allow better support from editors and tooling, #3507 by @​tiangolo
    • Add arm64 binaries suitable for MacOS with an M1 CPU to PyPI, #3498 by @​samuelcolvin
    • Fix issue where None was considered invalid when using a Union type containing Any or object, #3444 by @​tharradine
    • When generating field schema, pass optional field argument (of type pydantic.fields.ModelField) to __modify_schema__() if present, #3434 by @​jasujm
    • Fix issue when pydantic fail to parse typing.ClassVar string type annotation, #3401 by @​uriyyo
    • Mention Python >= 3.9.2 as an alternative to typing_extensions.TypedDict, #3374 by @​BvB93
    • Changed the validator method name in the Custom Errors example to more accurately describe what the validator is doing; changed from name_must_contain_space to value_must_equal_bar, #3327 by @​michaelrios28
    • Add AmqpDsn class, #3254 by @​kludex
    • Always use Enum value as default in generated JSON schema, #3190 by @​joaommartins
    • Add support for Mypy 0.920, #3175 by @​christianbundy

    ... (truncated)

    Commits
    • fbf8002 prepare for v1.9.0 release, extra change
    • 5406423 prepare for v1.9.0 release
    • 87da9ac apply update_forward_refs to json_encoders (#3595)
    • 6f26a1c Support mypy 0.910 to 0.930 including CI tests (#3594)
    • 8ef492b build(deps): bump mypy from 0.920 to 0.930 (#3573)
    • 2d3d266 remove failing release step
    • ef46789 add step to upload pypi files to release
    • 5d6f48c prepare for v1.9.0a2
    • e882277 fix: support generic models with discriminated union (#3551)
    • edad0db fix: keep old behaviour of json() by default (#3542)
    • Additional commits viewable in compare view

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies python 
    opened by dependabot[bot] 2
  • :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.39 to 1.1.40

    :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.39 to 1.1.40

    Bumps MishaKav/pytest-coverage-comment from 1.1.39 to 1.1.40.

    Release notes

    Sourced from MishaKav/pytest-coverage-comment's releases.

    Support GitHub enterprise urls

    What's Changed

    New Contributors

    Full Changelog: https://github.com/MishaKav/pytest-coverage-comment/compare/v1.1.39...v1.1.40

    Changelog

    Sourced from MishaKav/pytest-coverage-comment's changelog.

    Pytest Coverage Comment 1.1.40

    Release Date: 2022-12-03

    Changes

    • Support for url for github enterprise repositories, thanks to @​jbcumming for contribution
    • Minor readme improvements, thanks to @​AlexanderLanin for contribution
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 1
  • :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.30 to 1.1.31

    :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.30 to 1.1.31

    Bumps MishaKav/pytest-coverage-comment from 1.1.30 to 1.1.31.

    Release notes

    Sourced from MishaKav/pytest-coverage-comment's releases.

    Remove link on badge

    add option to remove link on badge

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 1
  • :arrow_up: Update streamlit requirement from <1.11.0,>=1.5.0 to >=1.5.0,<1.12.0

    :arrow_up: Update streamlit requirement from <1.11.0,>=1.5.0 to >=1.5.0,<1.12.0

    Updates the requirements on streamlit to permit the latest version.

    Commits

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies python 
    opened by dependabot[bot] 1
  • :arrow_up: Bump actions/setup-python from 3 to 4.1.0

    :arrow_up: Bump actions/setup-python from 3 to 4.1.0

    Bumps actions/setup-python from 3 to 4.1.0.

    Release notes

    Sourced from actions/setup-python's releases.

    v4.1.0

    In scope of this pull request we updated actions/cache package as the new version contains fixes for caching error handling. Moreover, we added a new input update-environment. This option allows to specify if the action shall update environment variables (default) or not.

    Update-environment input

        - name: setup-python 3.9
          uses: actions/[email protected]
          with:
            python-version: 3.9
            update-environment: false
    

    Besides, we added such changes as:

    v4.0.0

    What's Changed

    • Support for python-version-file input: #336

    Example of usage:

    - uses: actions/[email protected]
      with:
        python-version-file: '.python-version' # Read python version from a file
    - run: python my_script.py
    

    There is no default python version for this setup-python major version, the action requires to specify either python-version input or python-version-file input. If the python-version input is not specified the action will try to read required version from file from python-version-file input.

    • Use pypyX.Y for PyPy python-version input: #349

    Example of usage:

    - uses: actions/[email protected]
      with:
        python-version: 'pypy3.9' # pypy-X.Y kept for backward compatibility
    - run: python my_script.py
    
    • RUNNER_TOOL_CACHE environment variable is equal AGENT_TOOLSDIRECTORY: #338

    • Bugfix: create missing pypyX.Y symlinks: #347

    • PKG_CONFIG_PATH environment variable: #400

    • Added python-path output: #405

    ... (truncated)

    Commits
    • c4e89fa Improve readme for 3.x and 3.11-dev style python-version (#441)
    • 0ad0f6a Merge pull request #452 from mayeut/fix-env
    • f0bcf8b Merge pull request #456 from akx/patch-1
    • af97157 doc: Add multiple wildcards example to readme
    • 364e819 Merge pull request #394 from akv-platform/v-sedoli/set-env-by-default
    • 782f81b Merge pull request #450 from IvanZosimov/ResolveVersionFix
    • 2c9de4e Remove duplicate code introduced in #440
    • 412091c Fix tests for update-environment==false
    • 78a2330 Merge pull request #451 from dmitry-shibanov/fx-pipenv-python-version
    • 96f494e trigger checks
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 1
  • :arrow_up: Bump actions/setup-python from 3 to 4

    :arrow_up: Bump actions/setup-python from 3 to 4

    Bumps actions/setup-python from 3 to 4.

    Release notes

    Sourced from actions/setup-python's releases.

    v4.0.0

    What's Changed

    • Support for python-version-file input: #336

    Example of usage:

    - uses: actions/[email protected]
      with:
        python-version-file: '.python-version' # Read python version from a file
    - run: python my_script.py
    

    There is no default python version for this setup-python major version, the action requires to specify either python-version input or python-version-file input. If the python-version input is not specified the action will try to read required version from file from python-version-file input.

    • Use pypyX.Y for PyPy python-version input: #349

    Example of usage:

    - uses: actions/[email protected]
      with:
        python-version: 'pypy3.9' # pypy-X.Y kept for backward compatibility
    - run: python my_script.py
    
    • RUNNER_TOOL_CACHE environment variable is equal AGENT_TOOLSDIRECTORY: #338

    • Bugfix: create missing pypyX.Y symlinks: #347

    • PKG_CONFIG_PATH environment variable: #400

    • Added python-path output: #405 python-path output contains Python executable path.

    • Updated zeit/ncc to vercel/ncc package: #393

    • Bugfix: fixed output for prerelease version of poetry: #409

    • Made pythonLocation environment variable consistent for Python and PyPy: #418

    • Bugfix for 3.x-dev syntax: #417

    • Other improvements: #318 #396 #384 #387 #388

    Update actions/cache version to 2.0.2

    In scope of this release we updated actions/cache package as the new version contains fixes related to GHES 3.5 (actions/setup-python#382)

    Add "cache-hit" output and fix "python-version" output for PyPy

    This release introduces new output cache-hit (actions/setup-python#373) and fix python-version output for PyPy (actions/setup-python#365)

    The cache-hit output contains boolean value indicating that an exact match was found for the key. It shows that the action uses already existing cache or not. The output is available only if cache is enabled.

    ... (truncated)

    Commits
    • d09bd5e fix: 3.x-dev can install a 3.y version (#417)
    • f72db17 Made env.var pythonLocation consistent for Python and PyPy (#418)
    • 53e1529 add support for python-version-file (#336)
    • 3f82819 Fix output for prerelease version of poetry (#409)
    • 397252c Update zeit/ncc to vercel/ncc (#393)
    • de977ad Merge pull request #412 from vsafonkin/v-vsafonkin/fix-poetry-cache-test
    • 22c6af9 Change PyPy version to rebuild cache
    • 081a3cf Merge pull request #405 from mayeut/interpreter-path
    • ff70656 feature: add a python-path output
    • fff15a2 Use pypyX.Y for PyPy python-version input (#349)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 1
  • :arrow_up: Update streamlit requirement from <1.9.0,>=1.5.0 to >=1.5.0,<1.10.0

    :arrow_up: Update streamlit requirement from <1.9.0,>=1.5.0 to >=1.5.0,<1.10.0

    Updates the requirements on streamlit to permit the latest version.

    Commits

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies python 
    opened by dependabot[bot] 1
  • Sample fake entities for entity augmenter using Faker package

    Sample fake entities for entity augmenter using Faker package

    Add sampling of entities (such as names or adresses) from https://faker.readthedocs.io/en/master/locales/da_DK.html. This tool supports random sampling of entities for numerous of languages.

    enhancement help wanted 
    opened by martincjespersen 1
  • implement an oversampling function

    implement an oversampling function

    Augmentation can be used to oversample a category.

    Imagined usage would look something like this:

    aug = augmenty.load(...)
    
    def is_positive(example):
        """return true if the example contains an entity"""
        if example.y.cats["positive"] == 1:
            return True
        return False
    
    upsampled_corpus = augumenty.oversample(corpus, augmenter=aug, conditional=is_positive, n=1000)
    
    enhancement 
    opened by KennethEnevoldsen 0
  • Back translation augmentation

    Back translation augmentation

    Augmenting of a document using back translation of various languages e.g., using huggingface models: https://huggingface.co/models?pipeline_tag=translation.

    Example blog: https://dzlab.github.io/dltips/en/pytorch/text-augmentation/

    Example sentence: Augmenty is an augmentation library based on spaCy for augmenting texts. Augmenty differs from other augmentation libraries in that it corrects (as far as possible) the token, sentence and document labels under the augmentation.

    English -> Danish (Google): Augmenty er et udvidelsesbibliotek baseret på spaCy til forstørrelse af tekster. Augmenty adskiller sig fra andre augmentationsbiblioteker ved, at den korrigerer (så vidt muligt) token-, sætnings- og dokumentetiketterne under augmentationen.

    Danish -> English (Google): Augmenty is an extension library based on spaCy for enlarging texts. Augmenty differs from other augmentation libraries in that it corrects (as far as possible) the token, sentence, and document labels during augmentation.

    additional augmenter 
    opened by martincjespersen 1
  • List of potentially new augmenters

    List of potentially new augmenters

    The following is a list of potentially new augmenters. If you wish a specific augmenter to be added before others please update the issue corresponding to the augmenter (if it doesn't have one feel free to create one).

    A variation of existing augmenters:

    New augmenters

    Batch augmenters

    A combination of existing augmenters

    • [ ] EDA augmenter following the EDA paper
    additional augmenter 
    opened by KennethEnevoldsen 0
Releases(v1.0.1)
  • v1.0.1(Jun 21, 2022)

    Version

    What's Changed

    • Version 1.0.0 by @KennethEnevoldsen in https://github.com/KennethEnevoldsen/augmenty/pull/50
    • Update replace.py by @koaning in https://github.com/KennethEnevoldsen/augmenty/pull/51

    Documentation updates

    • added faker based on PR by @martincjespersen by @KennethEnevoldsen in https://github.com/KennethEnevoldsen/augmenty/pull/85
    • Added pre-config workflows by @KennethEnevoldsen in https://github.com/KennethEnevoldsen/augmenty/pull/86

    New Contributors

    • @dependabot made their first contribution in https://github.com/KennethEnevoldsen/augmenty/pull/46
    • @koaning made their first contribution in https://github.com/KennethEnevoldsen/augmenty/pull/51
    • @martincjespersen

    Full Changelog: https://github.com/KennethEnevoldsen/augmenty/compare/v.0.0.12...v1.0.1

    Source code(tar.gz)
    Source code(zip)
  • v.0.0.12(Feb 7, 2022)

    0.0.12 (03/08/21)

    • Many bugfixes
    • Added a few more augmenters
    • Notable updates to the documentation of the package

    0.0.1 (03/08/21)

    • First version of augmenty launches 🎉
      • with more than 15 highly customizable augmenters,
      • A high-quality code-base (coverage of 96% and a codefactor A),
      • and utilities for easy application of augmenters to strings and spaCy Docs.
      • Furthermore, it also includes a series of convenience functions for combining and moderating augmentations.

    Full Changelog: https://github.com/KennethEnevoldsen/augmenty/commits/v.0.0.12

    Source code(tar.gz)
    Source code(zip)
Owner
Kenneth Enevoldsen
Interdisciplinary PhD Student on representation learning in Clinical NLP and Genetics at Aarhus University and Interacting Minds Centre
Kenneth Enevoldsen
Python implementation of TextRank for phrase extraction and summarization of text documents

PyTextRank PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, used to: extract the top-ranked phrases from text document

derwen.ai 1.9k Jan 06, 2023
Machine learning models from Singapore's NLP research community

SG-NLP Machine learning models from Singapore's natural language processing (NLP) research community. sgnlp is a Python package that allows you to eas

AI Singapore | AI Makerspace 21 Dec 17, 2022
PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop

molten A minimal, extensible, fast and productive API framework for Python 3. Changelog: https://moltenframework.com/changelog.html Community: https:/

3.2k Dec 28, 2022
Python Implementation of ``Modeling the Influence of Verb Aspect on the Activation of Typical Event Locations with BERT'' (Findings of ACL: ACL 2021)

BERT-for-Surprisal Python Implementation of ``Modeling the Influence of Verb Aspect on the Activation of Typical Event Locations with BERT'' (Findings

7 Dec 05, 2022
A retro text-to-speech bot for Discord

hawking A retro text-to-speech bot for Discord, designed to work with all of the stuff you might've seen in Moonbase Alpha, using the existing command

Nick Schorr 23 Dec 25, 2022
Spooky Skelly For Python

_____ _ _____ _ _ _ | __| ___ ___ ___ | |_ _ _ | __|| |_ ___ | || | _ _ |__ || . || . || . || '

Kur0R1uka 1 Dec 23, 2021
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

A Deep Learning NLP/NLU library by Intel® AI Lab Overview | Models | Installation | Examples | Documentation | Tutorials | Contributing NLP Architect

Intel Labs 2.9k Jan 02, 2023
VD-BERT: A Unified Vision and Dialog Transformer with BERT

VD-BERT: A Unified Vision and Dialog Transformer with BERT PyTorch Code for the following paper at EMNLP2020: Title: VD-BERT: A Unified Vision and Dia

Salesforce 44 Nov 01, 2022
Python-zhuyin - An open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo)

Python-zhuyin - An open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo)

2 Dec 29, 2022
This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

UIS-RNN Overview This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm. UIS-RNN solves the problem of s

Google 1.4k Dec 28, 2022
Two-stage text summarization with BERT and BART

Two-Stage Text Summarization Description We experiment with a 2-stage summarization model on CNN/DailyMail dataset that combines the ability to filter

Yukai Yang (Alexis) 6 Oct 22, 2022
Task-based datasets, preprocessing, and evaluation for sequence models.

SeqIO: Task-based datasets, preprocessing, and evaluation for sequence models. SeqIO is a library for processing sequential data to be fed into downst

Google 290 Dec 26, 2022
Automatically search Stack Overflow for the command you want to run

stackshell Automatically search Stack Overflow (and other Stack Exchange sites) for the command you want to ru Use the up and down arrows to change be

circuit10 22 Oct 27, 2021
The Classical Language Toolkit

Notice: This Git branch (dev) contains the CLTK's upcoming major release (v. 1.0.0). See https://github.com/cltk/cltk/tree/master and https://docs.clt

Classical Language Toolkit 754 Jan 09, 2023
Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

Yoon Kim 43 Dec 23, 2022
Yuqing Xie 2 Feb 17, 2022
Python library for processing Chinese text

SnowNLP: Simplified Chinese Text Processing SnowNLP是一个python写的类库,可以方便的处理中文文本内容,是受到了TextBlob的启发而写的,由于现在大部分的自然语言处理库基本都是针对英文的,于是写了一个方便处理中文的类库,并且和TextBlob

Rui Wang 6k Jan 02, 2023
Built for cleaning purposes in military institutions

Ferramenta do AL Construído para fins de limpeza em instituições militares. Instalação Requer python = 3.2 pip install -r requirements.txt Usagem Exe

0 Aug 13, 2022
Exploring dimension-reduced embeddings

sleepwalk Exploring dimension-reduced embeddings This is the code repository. See here for the Sleepwalk web page. License and disclaimer This program

S. Anders's research group at ZMBH 91 Nov 29, 2022
Natural Language Processing at EDHEC, 2022

Natural Language Processing Here you will find the teaching materials for the "Natural Language Processing" course at EDHEC Business School, 2022 What

1 Feb 04, 2022