Augmenty is an augmentation library based on spaCy for augmenting texts.

Overview

Augmenty: The cherry on top of your NLP pipeline

PyPI version python version Code style: black github actions pytest github actions docs github coverage CodeFactor Streamlit App pip downloads

Augmenty is an augmentation library based on spaCy for augmenting texts. Besides a wide array of highly flexible augmenters, Augmenty provides a series of tools for working with augmenters, including combining and moderating augmenters. Augmenty differs from other augmentation libraries in that it corrects (as far as possible) the assigned labels under the augmentation, thus making many of the augmenters valid for training more than simply sentence classification.

🔧 Installation

To get started using augmenty simply install it using pip by running the following line in your terminal:

pip install augmenty

Do note that this is a minimal installation. As some augmenters requires additional packages please write the following line to install all dependencies.

pip install augmenty[all]

For more detailed instructions on installing augmenty, including specific language support, see the installation instructions.

🍒 Simple Example

The following shows a simple example of how you can quickly augment text using Augmenty. For more on using augmenty see the usage guides.

import spacy
import augmenty

nlp = spacy.load("en_core_web_sm")

docs = nlp.pipe(["Augmenty is a great tool for text augmentation"])

entity_augmenter = augmenty.load("ents_replace.v1", 
                                 ent_dict = {{"ORG": [["spaCy"], ["spaCy", "Universe"]]})

for doc in augmenty.docs(docs, augmenter=entity_augmenter)
    print(doc)
spaCy Universe is a great tool for text augmentation.

📖 Documentation

Documentation
📚 Usage Guides Guides and instruction on how to use augmenty and its features.
📰 News and changelog New additions, changes and version history.
🎛 API References The detailed reference for augmenty's API. Including function documentation
🍒 Augmenters Contains a full list of current augmenters in augmenty.
😎 Demo A simple streamlit demo to try out the augmenters.

💬 Where to ask questions

Type
🚨 Bug Reports GitHub Issue Tracker
🎁 Feature Requests & Ideas GitHub Issue Tracker
👩‍💻 Usage Questions GitHub Discussions
🗯 General Discussion GitHub Discussions
🍒 Adding an Augmenter Adding an augmenter

🤔 FAQ

How do I test the code and run the test suite?

augmenty comes with an extensive test suite. In order to run the tests, you'll usually want to clone the repository and build augmenty from the source. This will also install the required development dependencies and test utilities defined in the requirements.txt.

pip install -r requirements.txt
pip install pytest

python -m pytest

which will run all the test in the augmenty/tests folder.

Specific tests can be run using:

python -m pytest augmenty/tests/test_docs.py

Code Coverage If you want to check code coverage you can run the following:

pip install pytest-cov

python -m pytest --cov=.

Does augmenty run on X?

augmenty is intended to run on all major OS, this includes Windows (latest version), MacOS (Catalina) and the latest version of Linux (Ubuntu). Below you can see if augmenty passes its test suite for the system of interest. Please note these are only the systems augmenty is being actively tested on, if you run on a similar system (e.g. an earlier version of Linux) augmenty will likely run there as well, if not please create an issue.

Operating System Status
Ubuntu/Linux (Latest) github actions pytest ubuntu
MacOS (Catalina) github actions pytest catalina
Windows (Latest) github actions pytest windows

How is the documentation generated?

augmenty uses sphinx to generate documentation. It uses the Furo theme with a custom styling.

To make the documentation you can run:

# install sphinx, themes and extensions
pip install sphinx furo sphinx-copybutton sphinxext-opengraph

# generate html from documentations

make -C docs html

Many of these augmenters are completely useless for training?

That is true, some of the augmenters are rarely something you would augment with during training. For instance randomly adding or removing spacing. However, augmentation can just as well be used to test whether a model is robust to certain variations.


Can I use augmenty without using spacy?

Indeed augmenty contains convenience functions for applying augmentation directly to raw texts. Check out the getting started guide to learn how.


🎓 Citing this work

If you use this library in your research, please cite:

@inproceedings{augmenty2021,
    title={Augmenty, the cherry on top of your NLP pipeline},
    author={Enevoldsen, Kenneth and Hansen, Lasse},
    year={2021}
}
Comments
  • Use of augmenty with spacy config files for training

    Use of augmenty with spacy config files for training

    I didn't see any documentation on how to import these augmenters when using spacy 3.0's config and command line system when training. Is it possible to use it in this sense? If so, how?

    apon further review, for the command line to register new augmentations, the flag: -- code <code.py> Needs to be set when calling the training. I have tried to point to the specific file that contains the keystroke aug that I wanted but it complains about not knowing a parent for relative imports. I also tried the various init.py files but it complained also. It seems to work when you take the code out and place it in a new file without relative imports and point to that.

    image

    Which page or section is this issue related to?

    https://spacy.io/usage/training#data-augmentation-custom

    https://kennethenevoldsen.github.io/augmenty/tutorials/introduction.html#Applying-the-augmentation

    documentation 
    opened by Giles-Billenness 3
  • Added sententence_subset.v1 augmenter following #48

    Added sententence_subset.v1 augmenter following #48

    Following #48, Added the sententence_subset.v1 augmenter which subsamples sentences from a document:

    import augmenty
    import spacy
    nlp = spacy.load("en_core_web_sm")
    
    # four sentences
    text = """Augmenty is a wonderful tool for augmentation. Augmentation is a wonderful tool
    for obtaining higher performance on limited data. You can also use it to see how
    robust your model is to changes. It will sample subset of the paragraf."""
    docs = nlp(text)
    
    augmenter = augmenty.load("sententence_subset.v1",  respect_sentences = True)
    
    list(augmenty.texts(texts, augmenter, nlp))
    

    Missing:

    • [ ] Add tests
    • [ ] Add documentation
    opened by KennethEnevoldsen 3
  • Paragraf subset augmenter

    Paragraf subset augmenter

    A paragraf subset augmentation which can work on token and sentence level. It will sample a random percentage of included coherent tokens/sentences and a random token/sentence start position ensuring the former constraint is maintained. The augmenter needs to handle annotated entities and avoid breaking them.

    Input arguments: level: how often to apply augmenter min_paragraf: Minimum percentage of tokens or sentences to include. Ie. 4 sentences with min_paragraf=0.5 means it as a minimum includes 2 sentences. sentence_level: Boolean to define if token or sentence level to define

    Example - sentence level

    import augmenty
    import spacy
    nlp = spacy.load("en_core_web_sm")
    
    # four sentences
    texts = [
        "Augmenty is a wonderful tool for augmentation. Augmentation is a wonderful tool"
        "for obtaining higher performance on limited data. You can also use it to see how "
        "robust your model is to changes. It will sample subset of the paragraf.",
    ]
    docs = nlp(texts)
    
    augmenter = augmenty.load("paragraf_subset.v1", level=1.0, min_paragraf=0.5, sentence_level=True)
    
    list(augmenty.texts(texts, augmenter, nlp))
    

    Example outputs:

    The first section:

    Augmenty is a wonderful tool for augmentation. Augmentation is a wonderful tool 
    for obtaining higher performance on limited data.
    

    The middle section:

    Augmentation is a wonderful tool for obtaining higher performance on limited data. 
    You can also use it to see how robust your model is to changes.
    

    The middle section:

    You can also use it to see how robust your model is to changes. It will sample subset 
    of the paragraf.
    

    Additional thoughts:

    Possibly addition of a reverse augmenter, eg. removing a coherent section of tokens/sentences.

    additional augmenter 
    opened by martincjespersen 3
  • :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.25 to 1.1.26

    :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.25 to 1.1.26

    Bumps MishaKav/pytest-coverage-comment from 1.1.25 to 1.1.26.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 2
  • :arrow_up: Update pydantic requirement from <1.9.0,>=1.8.2 to >=1.8.2,<1.10.0

    :arrow_up: Update pydantic requirement from <1.9.0,>=1.8.2 to >=1.8.2,<1.10.0

    Updates the requirements on pydantic to permit the latest version.

    Release notes

    Sourced from pydantic's releases.

    v1.9.0 (2021-12-31)

    Thank you to pydantic's sponsors: @​sthagen, @​timdrijvers, @​toinbis, @​koxudaxi, @​ginomempin, @​primer-io, @​and-semakin, @​westonsteimel, @​reillysiemens, @​es3n1n, @​jokull, @​JonasKs, @​Rehket, @​corleyma, @​daddycocoaman, @​hardbyte, @​datarootsio, @​jodal, @​aminalaee, @​rafsaf, @​jqueguiner, @​chdsbd, @​kevinalh, @​Mazyod, @​grillazz, @​JonasKs, @​simw, @​leynier, @​xfenix for their kind support.

    Highlights

    v1.9.0 (2021-12-31) Changes

    v1.9.0a2 (2021-12-24) Changes

    v1.9.0a1 (2021-12-18) Changes

    • Add support for Decimal-specific validation configurations in Field(), additionally to using condecimal(), to allow better support from editors and tooling, #3507 by @​tiangolo
    • Add arm64 binaries suitable for MacOS with an M1 CPU to PyPI, #3498 by @​samuelcolvin
    • Fix issue where None was considered invalid when using a Union type containing Any or object, #3444 by @​tharradine
    • When generating field schema, pass optional field argument (of type pydantic.fields.ModelField) to __modify_schema__() if present, #3434 by @​jasujm
    • Fix issue when pydantic fail to parse typing.ClassVar string type annotation, #3401 by @​uriyyo
    • Mention Python >= 3.9.2 as an alternative to typing_extensions.TypedDict, #3374 by @​BvB93
    • Changed the validator method name in the Custom Errors example to more accurately describe what the validator is doing; changed from name_must_contain_space to value_must_equal_bar, #3327 by @​michaelrios28
    • Add AmqpDsn class, #3254 by @​kludex
    • Always use Enum value as default in generated JSON schema, #3190 by @​joaommartins
    • Add support for Mypy 0.920, #3175 by @​christianbundy
    • validate_arguments now supports extra customization (used to always be Extra.forbid), #3161 by @​PrettyWood

    ... (truncated)

    Changelog

    Sourced from pydantic's changelog.

    v1.9.0 (2021-12-31)

    Thank you to pydantic's sponsors: @​sthagen, @​timdrijvers, @​toinbis, @​koxudaxi, @​ginomempin, @​primer-io, @​and-semakin, @​westonsteimel, @​reillysiemens, @​es3n1n, @​jokull, @​JonasKs, @​Rehket, @​corleyma, @​daddycocoaman, @​hardbyte, @​datarootsio, @​jodal, @​aminalaee, @​rafsaf, @​jqueguiner, @​chdsbd, @​kevinalh, @​Mazyod, @​grillazz, @​JonasKs, @​simw, @​leynier, @​xfenix for their kind support.

    Highlights

    v1.9.0 (2021-12-31) Changes

    v1.9.0a2 (2021-12-24) Changes

    v1.9.0a1 (2021-12-18) Changes

    • Add support for Decimal-specific validation configurations in Field(), additionally to using condecimal(), to allow better support from editors and tooling, #3507 by @​tiangolo
    • Add arm64 binaries suitable for MacOS with an M1 CPU to PyPI, #3498 by @​samuelcolvin
    • Fix issue where None was considered invalid when using a Union type containing Any or object, #3444 by @​tharradine
    • When generating field schema, pass optional field argument (of type pydantic.fields.ModelField) to __modify_schema__() if present, #3434 by @​jasujm
    • Fix issue when pydantic fail to parse typing.ClassVar string type annotation, #3401 by @​uriyyo
    • Mention Python >= 3.9.2 as an alternative to typing_extensions.TypedDict, #3374 by @​BvB93
    • Changed the validator method name in the Custom Errors example to more accurately describe what the validator is doing; changed from name_must_contain_space to value_must_equal_bar, #3327 by @​michaelrios28
    • Add AmqpDsn class, #3254 by @​kludex
    • Always use Enum value as default in generated JSON schema, #3190 by @​joaommartins
    • Add support for Mypy 0.920, #3175 by @​christianbundy

    ... (truncated)

    Commits
    • fbf8002 prepare for v1.9.0 release, extra change
    • 5406423 prepare for v1.9.0 release
    • 87da9ac apply update_forward_refs to json_encoders (#3595)
    • 6f26a1c Support mypy 0.910 to 0.930 including CI tests (#3594)
    • 8ef492b build(deps): bump mypy from 0.920 to 0.930 (#3573)
    • 2d3d266 remove failing release step
    • ef46789 add step to upload pypi files to release
    • 5d6f48c prepare for v1.9.0a2
    • e882277 fix: support generic models with discriminated union (#3551)
    • edad0db fix: keep old behaviour of json() by default (#3542)
    • Additional commits viewable in compare view

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies python 
    opened by dependabot[bot] 2
  • :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.39 to 1.1.40

    :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.39 to 1.1.40

    Bumps MishaKav/pytest-coverage-comment from 1.1.39 to 1.1.40.

    Release notes

    Sourced from MishaKav/pytest-coverage-comment's releases.

    Support GitHub enterprise urls

    What's Changed

    New Contributors

    Full Changelog: https://github.com/MishaKav/pytest-coverage-comment/compare/v1.1.39...v1.1.40

    Changelog

    Sourced from MishaKav/pytest-coverage-comment's changelog.

    Pytest Coverage Comment 1.1.40

    Release Date: 2022-12-03

    Changes

    • Support for url for github enterprise repositories, thanks to @​jbcumming for contribution
    • Minor readme improvements, thanks to @​AlexanderLanin for contribution
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 1
  • :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.30 to 1.1.31

    :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.30 to 1.1.31

    Bumps MishaKav/pytest-coverage-comment from 1.1.30 to 1.1.31.

    Release notes

    Sourced from MishaKav/pytest-coverage-comment's releases.

    Remove link on badge

    add option to remove link on badge

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 1
  • :arrow_up: Update streamlit requirement from <1.11.0,>=1.5.0 to >=1.5.0,<1.12.0

    :arrow_up: Update streamlit requirement from <1.11.0,>=1.5.0 to >=1.5.0,<1.12.0

    Updates the requirements on streamlit to permit the latest version.

    Commits

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies python 
    opened by dependabot[bot] 1
  • :arrow_up: Bump actions/setup-python from 3 to 4.1.0

    :arrow_up: Bump actions/setup-python from 3 to 4.1.0

    Bumps actions/setup-python from 3 to 4.1.0.

    Release notes

    Sourced from actions/setup-python's releases.

    v4.1.0

    In scope of this pull request we updated actions/cache package as the new version contains fixes for caching error handling. Moreover, we added a new input update-environment. This option allows to specify if the action shall update environment variables (default) or not.

    Update-environment input

        - name: setup-python 3.9
          uses: actions/[email protected]
          with:
            python-version: 3.9
            update-environment: false
    

    Besides, we added such changes as:

    v4.0.0

    What's Changed

    • Support for python-version-file input: #336

    Example of usage:

    - uses: actions/[email protected]
      with:
        python-version-file: '.python-version' # Read python version from a file
    - run: python my_script.py
    

    There is no default python version for this setup-python major version, the action requires to specify either python-version input or python-version-file input. If the python-version input is not specified the action will try to read required version from file from python-version-file input.

    • Use pypyX.Y for PyPy python-version input: #349

    Example of usage:

    - uses: actions/[email protected]
      with:
        python-version: 'pypy3.9' # pypy-X.Y kept for backward compatibility
    - run: python my_script.py
    
    • RUNNER_TOOL_CACHE environment variable is equal AGENT_TOOLSDIRECTORY: #338

    • Bugfix: create missing pypyX.Y symlinks: #347

    • PKG_CONFIG_PATH environment variable: #400

    • Added python-path output: #405

    ... (truncated)

    Commits
    • c4e89fa Improve readme for 3.x and 3.11-dev style python-version (#441)
    • 0ad0f6a Merge pull request #452 from mayeut/fix-env
    • f0bcf8b Merge pull request #456 from akx/patch-1
    • af97157 doc: Add multiple wildcards example to readme
    • 364e819 Merge pull request #394 from akv-platform/v-sedoli/set-env-by-default
    • 782f81b Merge pull request #450 from IvanZosimov/ResolveVersionFix
    • 2c9de4e Remove duplicate code introduced in #440
    • 412091c Fix tests for update-environment==false
    • 78a2330 Merge pull request #451 from dmitry-shibanov/fx-pipenv-python-version
    • 96f494e trigger checks
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 1
  • :arrow_up: Bump actions/setup-python from 3 to 4

    :arrow_up: Bump actions/setup-python from 3 to 4

    Bumps actions/setup-python from 3 to 4.

    Release notes

    Sourced from actions/setup-python's releases.

    v4.0.0

    What's Changed

    • Support for python-version-file input: #336

    Example of usage:

    - uses: actions/[email protected]
      with:
        python-version-file: '.python-version' # Read python version from a file
    - run: python my_script.py
    

    There is no default python version for this setup-python major version, the action requires to specify either python-version input or python-version-file input. If the python-version input is not specified the action will try to read required version from file from python-version-file input.

    • Use pypyX.Y for PyPy python-version input: #349

    Example of usage:

    - uses: actions/[email protected]
      with:
        python-version: 'pypy3.9' # pypy-X.Y kept for backward compatibility
    - run: python my_script.py
    
    • RUNNER_TOOL_CACHE environment variable is equal AGENT_TOOLSDIRECTORY: #338

    • Bugfix: create missing pypyX.Y symlinks: #347

    • PKG_CONFIG_PATH environment variable: #400

    • Added python-path output: #405 python-path output contains Python executable path.

    • Updated zeit/ncc to vercel/ncc package: #393

    • Bugfix: fixed output for prerelease version of poetry: #409

    • Made pythonLocation environment variable consistent for Python and PyPy: #418

    • Bugfix for 3.x-dev syntax: #417

    • Other improvements: #318 #396 #384 #387 #388

    Update actions/cache version to 2.0.2

    In scope of this release we updated actions/cache package as the new version contains fixes related to GHES 3.5 (actions/setup-python#382)

    Add "cache-hit" output and fix "python-version" output for PyPy

    This release introduces new output cache-hit (actions/setup-python#373) and fix python-version output for PyPy (actions/setup-python#365)

    The cache-hit output contains boolean value indicating that an exact match was found for the key. It shows that the action uses already existing cache or not. The output is available only if cache is enabled.

    ... (truncated)

    Commits
    • d09bd5e fix: 3.x-dev can install a 3.y version (#417)
    • f72db17 Made env.var pythonLocation consistent for Python and PyPy (#418)
    • 53e1529 add support for python-version-file (#336)
    • 3f82819 Fix output for prerelease version of poetry (#409)
    • 397252c Update zeit/ncc to vercel/ncc (#393)
    • de977ad Merge pull request #412 from vsafonkin/v-vsafonkin/fix-poetry-cache-test
    • 22c6af9 Change PyPy version to rebuild cache
    • 081a3cf Merge pull request #405 from mayeut/interpreter-path
    • ff70656 feature: add a python-path output
    • fff15a2 Use pypyX.Y for PyPy python-version input (#349)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 1
  • :arrow_up: Update streamlit requirement from <1.9.0,>=1.5.0 to >=1.5.0,<1.10.0

    :arrow_up: Update streamlit requirement from <1.9.0,>=1.5.0 to >=1.5.0,<1.10.0

    Updates the requirements on streamlit to permit the latest version.

    Commits

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies python 
    opened by dependabot[bot] 1
  • Sample fake entities for entity augmenter using Faker package

    Sample fake entities for entity augmenter using Faker package

    Add sampling of entities (such as names or adresses) from https://faker.readthedocs.io/en/master/locales/da_DK.html. This tool supports random sampling of entities for numerous of languages.

    enhancement help wanted 
    opened by martincjespersen 1
  • implement an oversampling function

    implement an oversampling function

    Augmentation can be used to oversample a category.

    Imagined usage would look something like this:

    aug = augmenty.load(...)
    
    def is_positive(example):
        """return true if the example contains an entity"""
        if example.y.cats["positive"] == 1:
            return True
        return False
    
    upsampled_corpus = augumenty.oversample(corpus, augmenter=aug, conditional=is_positive, n=1000)
    
    enhancement 
    opened by KennethEnevoldsen 0
  • Back translation augmentation

    Back translation augmentation

    Augmenting of a document using back translation of various languages e.g., using huggingface models: https://huggingface.co/models?pipeline_tag=translation.

    Example blog: https://dzlab.github.io/dltips/en/pytorch/text-augmentation/

    Example sentence: Augmenty is an augmentation library based on spaCy for augmenting texts. Augmenty differs from other augmentation libraries in that it corrects (as far as possible) the token, sentence and document labels under the augmentation.

    English -> Danish (Google): Augmenty er et udvidelsesbibliotek baseret på spaCy til forstørrelse af tekster. Augmenty adskiller sig fra andre augmentationsbiblioteker ved, at den korrigerer (så vidt muligt) token-, sætnings- og dokumentetiketterne under augmentationen.

    Danish -> English (Google): Augmenty is an extension library based on spaCy for enlarging texts. Augmenty differs from other augmentation libraries in that it corrects (as far as possible) the token, sentence, and document labels during augmentation.

    additional augmenter 
    opened by martincjespersen 1
  • List of potentially new augmenters

    List of potentially new augmenters

    The following is a list of potentially new augmenters. If you wish a specific augmenter to be added before others please update the issue corresponding to the augmenter (if it doesn't have one feel free to create one).

    A variation of existing augmenters:

    New augmenters

    Batch augmenters

    A combination of existing augmenters

    • [ ] EDA augmenter following the EDA paper
    additional augmenter 
    opened by KennethEnevoldsen 0
Releases(v1.0.1)
  • v1.0.1(Jun 21, 2022)

    Version

    What's Changed

    • Version 1.0.0 by @KennethEnevoldsen in https://github.com/KennethEnevoldsen/augmenty/pull/50
    • Update replace.py by @koaning in https://github.com/KennethEnevoldsen/augmenty/pull/51

    Documentation updates

    • added faker based on PR by @martincjespersen by @KennethEnevoldsen in https://github.com/KennethEnevoldsen/augmenty/pull/85
    • Added pre-config workflows by @KennethEnevoldsen in https://github.com/KennethEnevoldsen/augmenty/pull/86

    New Contributors

    • @dependabot made their first contribution in https://github.com/KennethEnevoldsen/augmenty/pull/46
    • @koaning made their first contribution in https://github.com/KennethEnevoldsen/augmenty/pull/51
    • @martincjespersen

    Full Changelog: https://github.com/KennethEnevoldsen/augmenty/compare/v.0.0.12...v1.0.1

    Source code(tar.gz)
    Source code(zip)
  • v.0.0.12(Feb 7, 2022)

    0.0.12 (03/08/21)

    • Many bugfixes
    • Added a few more augmenters
    • Notable updates to the documentation of the package

    0.0.1 (03/08/21)

    • First version of augmenty launches 🎉
      • with more than 15 highly customizable augmenters,
      • A high-quality code-base (coverage of 96% and a codefactor A),
      • and utilities for easy application of augmenters to strings and spaCy Docs.
      • Furthermore, it also includes a series of convenience functions for combining and moderating augmentations.

    Full Changelog: https://github.com/KennethEnevoldsen/augmenty/commits/v.0.0.12

    Source code(tar.gz)
    Source code(zip)
Owner
Kenneth Enevoldsen
Interdisciplinary PhD Student on representation learning in Clinical NLP and Genetics at Aarhus University and Interacting Minds Centre
Kenneth Enevoldsen
Data preprocessing rosetta parser for python

datapreprocessing_rosetta_parser I've never done any NLP or text data processing before, so I wanted to use this hackathon as a learning opportunity,

ASReview hackathon for Follow the Money 2 Nov 28, 2021
Open-World Entity Segmentation

Open-World Entity Segmentation Project Website Lu Qi*, Jason Kuen*, Yi Wang, Jiuxiang Gu, Hengshuang Zhao, Zhe Lin, Philip Torr, Jiaya Jia This projec

DV Lab 408 Dec 29, 2022
Code for hyperboloid embeddings for knowledge graph entities

Implementation for the papers: Self-Supervised Hyperboloid Representations from Logical Queries over Knowledge Graphs, Nurendra Choudhary, Nikhil Rao,

30 Dec 10, 2022
Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources (NAACL-2021).

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources Description This is the repository for the paper Unifying Cross-

Sapienza NLP group 16 Sep 09, 2022
Ecommerce product title recognition package

revizor This package solves task of splitting product title string into components, like type, brand, model and article (or SKU or product code or you

Bureaucratic Labs 16 Mar 03, 2022
:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

(Framework for Adapting Representation Models) What is it? FARM makes Transfer Learning with BERT & Co simple, fast and enterprise-ready. It's built u

deepset 1.6k Dec 27, 2022
Code for the paper "Are Sixteen Heads Really Better than One?"

Are Sixteen Heads Really Better than One? This repository contains code to reproduce the experiments in our paper Are Sixteen Heads Really Better than

Paul Michel 143 Dec 14, 2022
Material for GW4SHM workshop, 16/03/2022.

GW4SHM Workshop Wednesday, 16th March 2022 (13:00 – 15:15 GMT): Presented by: Dr. Rhodri Nelson, Imperial College London Project website: https://www.

Devito Codes 1 Mar 16, 2022
Fidibo.com comments Sentiment Analyser

Fidibo.com comments Sentiment Analyser Introduction This project first asynchronously grab Fidibo.com books comment data using grabber.py and then sav

Iman Kermani 3 Apr 15, 2022
AutoGluon: AutoML for Text, Image, and Tabular Data

AutoML for Text, Image, and Tabular Data AutoGluon automates machine learning tasks enabling you to easily achieve strong predictive performance in yo

Amazon Web Services - Labs 5.2k Dec 29, 2022
Malware-Related Sentence Classification

Malware-Related Sentence Classification This repo contains the code for the ICTAI 2021 paper "Enrichment of Features for Malware-Related Sentence Clas

Chau Nguyen 1 Mar 26, 2022
Code repository of the paper Neural circuit policies enabling auditable autonomy published in Nature Machine Intelligence

Code repository of the paper Neural circuit policies enabling auditable autonomy published in Nature Machine Intelligence

9 Jan 08, 2023
Synthetic data for the people.

zpy: Synthetic data in Blender. Website • Install • Docs • Examples • CLI • Contribute • Licence Abstract Collecting, labeling, and cleaning data for

Zumo Labs 253 Dec 21, 2022
Awesome Treasure of Transformers Models Collection

💁 Awesome Treasure of Transformers Models for Natural Language processing contains papers, videos, blogs, official repo along with colab Notebooks. 🛫☑️

Ashish Patel 577 Jan 07, 2023
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 490 Dec 15, 2022
CMeEE 数据集医学实体抽取

医学实体抽取_GlobalPointer_torch 介绍 思想来自于苏神 GlobalPointer,原始版本是基于keras实现的,模型结构实现参考现有 pytorch 复现代码【感谢!】,基于torch百分百复现苏神原始效果。 数据集 中文医学命名实体数据集 点这里申请,很简单,共包含九类医学

85 Dec 28, 2022
NLP applications using deep learning.

NLP-Natural-Language-Processing NLP applications using deep learning like text generation etc. 1- Poetry Generation: Using a collection of Irish Poem

KASHISH 1 Jan 27, 2022
A framework for cleaning Chinese dialog data

A framework for cleaning Chinese dialog data

Yida 136 Dec 20, 2022
189 Jan 02, 2023
BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model

BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model

303 Dec 17, 2022