An easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.

Overview

Welcome to AdaptNLP

A high level framework and library for running, training, and deploying state-of-the-art Natural Language Processing (NLP) models for end to end tasks.

CI PyPI

What is AdaptNLP?

AdaptNLP is a python package that allows users ranging from beginner python coders to experienced Machine Learning Engineers to leverage state-of-the-art Natural Language Processing (NLP) models and training techniques in one easy-to-use python package.

Utilizing fastai with HuggingFace's Transformers library and Humboldt University of Berlin's Flair library, AdaptNLP provides Machine Learning Researchers and Scientists a modular and adaptive approach to a variety of NLP tasks simplifying what it takes to train, perform inference, and deploy NLP-based models and microservices.

What is the Benefit of AdaptNLP Rather Than Just Using Transformers?

Despite quick inference functionalities such as the pipeline API in transformers, it still is not quite as flexible nor fast enough. With AdaptNLP's Easy* inference modules, these tend to be slightly faster than the pipeline interface (bare minimum the same speed), while also providing the user with simple intuitive returns to alleviate any unneeded junk that may be returned.

Along with this, with the integration of the fastai library the code needed to train or run inference on your models has a completely modular API through the fastai Callback system. Rather than needing to write your entire torch loop, if there is anything special needed for a model a Callback can be written in less than 10 lines of code to achieve your specific functionalities.

Finally, when training your model fastai is on the forefront of beign a library constantly bringing in the best practices for achiving state-of-the-art training with new research methodologies heavily tested before integration. As such, AdaptNLP fully supports training with the One-Cycle policy, and using new optimizer combinations such as the Ranger optimizer with Cosine Annealing training through simple one-line fitting functions (fit_one_cycle and fit_flat_cos).

Installation Directions

PyPi

To install with pypi, please use:

pip install adaptnlp

Or if you have pip3:

pip3 install adaptnlp

Conda (Coming Soon)

Developmental Builds

To install any developmental style builds, please follow the below directions to install directly from git:

Stable Master Branch The master branch generally is not updated much except for hotfixes and new releases. To install please use:

pip install git+https://github.com/Novetta/adaptnlp

Developmental Branch {% include note.html content='Generally this branch can become unstable, and it is only recommended for contributors or those that really want to test out new technology. Please make sure to see if the latest tests are passing (A green checkmark on the commit message) before trying this branch out' %} You can install the developmental builds with:

pip install git+https://github.com/Novetta/[email protected]

Docker Images

There are actively updated Docker images hosted on Novetta's DockerHub

The guide to each tag is as follows:

  • latest: This is the latest pypi release and installs a complete package that is CUDA capable
  • dev: These are occasionally built developmental builds at certain stages. They are built by the dev branch and are generally stable
  • *api: The API builds are for the REST-API

To pull and run any AdaptNLP image immediatly you can run:

docker run -itp 8888:8888 novetta/adaptnlp:TAG

Replacing TAG with any of the afformentioned tags earlier.

Afterwards check localhost:8888 or localhost:888/lab to access the notebook containers

Navigating the Documentation

The AdaptNLP library is built with nbdev, so any documentation page you find (including this one!) can be directly run as a Jupyter Notebook. Each page at the top includes an "Open in Colab" button as well that will open the notebook in Google Colaboratory to allow for immediate access to the code.

The documentation is split into six sections, each with a specific purpose:

Getting Started

This group contains quick access to the homepage, what are the AdaptNLP Cookbooks, and how to contribute

Models and Model Hubs

These contain any relevant documentation for the AdaptiveModel class, the HuggingFace Hub model search integration, and the Result class that various inference API's return

Class API

This section contains the module documentation for the inference framework, the tuning framework, as well as the utilities and foundations for the AdaptNLP library.

Inference and Training Cookbooks

These two sections provide quick access to single use recipies for starting any AdaptNLP project for a particular task, with easy to use code designed for that specific use case. There are currently over 13 different tutorials available, with more coming soon.

NLP Services with FastAPI

This section provides directions on how to use the AdaptNLP REST API for deploying your models quickly with FastAPI

Contributing

There is a controbution guide available here

Testing

AdaptNLP is run on the nbdev framework. To run all tests please do the following:

  1. pip install nbverbose
  2. git clone https://github.com/Novetta/adaptnlp
  3. cd adaptnlp
  4. pip install -e .
  5. nbdev_test_nbs

This will run every notebook and ensure that all tests have passed. Please see the nbdev documentation for more information about it.

Contact

Please contact Zachary Mueller at [email protected] with questions or comments regarding AdaptNLP.

Follow us on Twitter at @TheZachMueller and @AdaptNLP for updates and NLP dialogue.

License

This project is licensed under the terms of the Apache 2.0 license.

Comments
  • multi-label classification / paperswithcode dataset

    multi-label classification / paperswithcode dataset

    Hi guys,

    Hope you are all well !

    I was wondering if adaptnlp can handle multi-label classification with 1560 labels.

    More precisely, I would like to apply it to paperswithcode dataset where labels are called tasks.

    Refs:

    Thanks for any insights or inputs on that.

    Cheers, X

    opened by ghost 7
  • cannot import name 'SAVE_STATE_WARNING' from 'torch.optim.lr_scheduler'

    cannot import name 'SAVE_STATE_WARNING' from 'torch.optim.lr_scheduler'

    Describe the bug Your demo Colab Notebook "Custom Fine-Tuning and Training with Transformer Models" doesn't work and generates the following error: image

    To Reproduce Steps to reproduce the behavior:

    1. Go to '...'
    2. Click on '....'
    3. Scroll down to '....'
    4. See error

    Expected behavior A clear and concise description of what you expected to happen.

    Screenshots If applicable, add screenshots to help explain your problem.

    Desktop (please complete the following information):

    • OS: [e.g. iOS]
    • Browser [e.g. chrome, safari]
    • Version [e.g. 22]

    Smartphone (please complete the following information):

    • Device: [e.g. iPhone6]
    • OS: [e.g. iOS8.1]
    • Browser [e.g. stock browser, safari]
    • Version [e.g. 22]

    Additional context Add any other context about the problem here.

    bug 
    opened by lematmat 5
  • Significant slowdown in EasyTokenTagger release 0.2.0

    Significant slowdown in EasyTokenTagger release 0.2.0

    I'm experiencing a slowdown in NER performance using EasyTokenTagger and 'ner-ontonotes' after updating to release 0.20. Has there been any underlying changes to how the tagger object works?

    Specifically, I am dealing with a very large chunk of text. Prior to this release, the NER tagging took around 15 seconds for this particular text. Now, it's taking 15+ minutes the first time but subsequent calls on that text are very quick. Is there some sort of caching or indexing that's being done now? I'd imagine this could create a lot of overhead for large chunks of text.

    opened by mkongsiri-Novetta 5
  • Can't load big dataset

    Can't load big dataset

    Describe the bug It happens when I want to learning_rate = finetuner.find_learning_rate(**learning_rate_finder_configs) in the tutorial. I have a big dataset with 200k rows and each of them has a text with around 200 words.

    In your code when you instantiate the TextDataset, the line tokenized_text = tokenizer.convert_tokens_to_ids(tokenizer.tokenize(text)) takes an eternity for a text of 20 million words. Do you think it can be achieved in the better/faster way like by keeping the rows like they are ?

    For the record: Time for 100 characters: 0.0003399848937988281s Time for 1000 characters: 0.00124359130859375s Time for 10 000 characters: 0.012135982513427734s Time for 100 000 characters: 0.2131056785583496s Time for 1 000 000 characters: 8.782422542572021s Time for 10 000 000 characters: 734.5397665500641s

    Can't reach the end of the full TextDataset (109 610 928 characters).

    To Reproduce Tutorial with a big dataset

    opened by NicoLivesey 5
  • AttributeError: 'CamembertForMaskedLM' object has no attribute 'cls'

    AttributeError: 'CamembertForMaskedLM' object has no attribute 'cls'

    Describe the bug Trying to freeze a LMFinetuner based on Camembert weights and get:


    AttributeError Traceback (most recent call last) in 6 } 7 finetuner = LMFineTuner(**ft_configs) ----> 8 finetuner.freeze()

    ~/anaconda3/envs/pe_adaptnlp/lib/python3.8/site-packages/adaptnlp/transformers/finetuning.py in freeze(self) 1630 """Freeze last classification layer group only 1631 """ -> 1632 layers_len = len(list(self.model.cls.parameters())) 1633 self.freeze_to(-layers_len) 1634

    ~/anaconda3/envs/pe_adaptnlp/lib/python3.8/site-packages/torch/nn/modules/module.py in getattr(self, name) 573 if name in modules: 574 return modules[name] --> 575 raise AttributeError("'{}' object has no attribute '{}'".format( 576 type(self).name, name)) 577

    AttributeError: 'CamembertForMaskedLM' object has no attribute 'cls'

    To Reproduce

    from adaptnlp import LMFineTuner
    train_file = "path/to/train" 
    valid_file = "path/to/valid"
    ft_configs = {
                  "train_data_file": train_file,
                  "eval_data_file": valid_file,
                  "model_type": "camembert",
                  "model_name_or_path": "camembert-base",
                 }
    finetuner = LMFineTuner(**ft_configs)
    finetuner.freeze()
    

    Expected behavior No error

    Desktop (please complete the following information):

    • OS: Amazon Linux
    • Browser Chrome
    opened by NicoLivesey 4
  • AttributeError: 'EasyDocumentEmbeddings' object has no attribute 'rnn_embeddings'

    AttributeError: 'EasyDocumentEmbeddings' object has no attribute 'rnn_embeddings'

    Cannot use pool option to generate embeddings (instead of the default rnn).

    A snippet for the problem:

    embedding_type='albert-xxlarge-v2'
    embedding_methods=["pool"]
    doc_embeddings = EasyDocumentEmbeddings(embedding_type, methods = embedding_methods)
    
    

    This is the error I get:

      File "env/lib/python3.7/site-packages/adaptnlp/training.py", line 91, in __init__
       self._initial_setup(self.label_dict, **kwargs)
     File "env/lib/python3.7/site-packages/adaptnlp/training.py", line 97, in _initial_setup
       document_embeddings: DocumentRNNEmbeddings = self.encoder.rnn_embeddings
    AttributeError: 'EasyDocumentEmbeddings' object has no attribute 'rnn_embeddings'
    

    Expected behavior would be to successfully obtain an easy document embeddings object with no errors

    Running on debian buster, python3.7

    If someone could give me a fix or a workaround or if I'm using this incorrectly, then please let me know

    opened by blerstpub 3
  • EasySequenceClassifier tag_text function returns None for FlairSequenceClassifier model

    EasySequenceClassifier tag_text function returns None for FlairSequenceClassifier model

    Hi! I tried to follow the tutorial for training custom sequence classifier: https://novetta.github.io/adaptnlp/tutorial/training-sequence-classification.html The last step returns empty sentences while expected labels: sentences = classifier.tag_text(example_text, model_name_or_path=OUTPUT_DIR)

    To Reproduce the behavior:

    from adaptnlp import EasySequenceClassifier
    from flair.data import Sentence
    
    OUTPUT_DIR = "…/best-model.pt"    # my custom model
    classifier = EasySequenceClassifier()
    
    ex_text = "This is a good text example"
    example_text=[Sentence(ex_text)]
    
    sentences = classifier.tag_text(text=example_text, model_name_or_path=OUTPUT_DIR, mini_batch_size=1)
    print("Label output:\n")
    print(sentences)
    

    Returns

    2020-12-28 17:44:31,111 loading file .../best-model.pt
    Label output:
    
    None
    

    Surprisingly labels got added to example_text print(example_text) Returns [Sentence: " This is a good text example " [− Tokens: 17 − Sentence-Labels: {'label': [0 (0.8812)]}]]

    Proposed explanation/ contribution: I think I know the reason for unexpected behavior and will be happy to help. classifier.tag_text creates FlairSequenceClassifier classifier. FlairSequenceClassifier initiates flair.models.TextClassifier classifier and uses TextClassifier predict method within its own predict method. But flair.models.TextClassifier predict method returns None because the labels are directly added to the sentences. I can re-write FlairSequenceClassifier predict method to return Sentences with labels instead of None.

    opened by DinaraN 3
  • Sequence classification using REST API fails with models except en-sentiment

    Sequence classification using REST API fails with models except en-sentiment

    Sequence classification over REST API using any model except for en-sentiment fails with:

    File "/usr/local/lib/python3.6/dist-packages/starlette/routing.py", line 41, in app response = await func(request) File "/usr/local/lib/python3.6/dist-packages/fastapi/routing.py", line 197, in app dependant=dependant, values=values, is_coroutine=is_coroutine File "/usr/local/lib/python3.6/dist-packages/fastapi/routing.py", line 147, in run_endpoint_function return await dependant.call(**values) File "./app/main.py", line 87, in sequence_classifier text=text, mini_batch_size=1, model_name_or_path=_SEQUENCE_CLASSIFICATION_MODEL File "/adaptnlp/adaptnlp/sequence_classification.py", line 285, in tag_text return classifier.predict(text=text, mini_batch_size=mini_batch_size, **kwargs,) File "/adaptnlp/adaptnlp/sequence_classification.py", line 140, in predict text_sent.add_label(label) TypeError: add_label() missing 1 required positional argument: 'value'

    Reproducable with: docker run -itp 5000:5000 -e TOKEN_TAGGING_MODE='ner' \ -e TOKEN_TAGGING_MODEL='ner-ontonotes-fast' \ -e SEQUENCE_CLASSIFICATION_MODEL='nlptown/bert-base-multilingual-uncased-sentiment' \ achangnovetta/adaptnlp-rest:latest \ bash

    opened by VogtAI 3
  • AdaptNLP v0.2.x Additional Features Discussion

    AdaptNLP v0.2.x Additional Features Discussion

    There are a lot of ideas that may be floating for feature implementations, so this thread just provides a mini roadmap and environment to think about adaptnlp's progression.

    Ideas can be stated freely in this thread and do not replace feature-request issue posts.

    • [x] Tokenizer Start integrating tokenizers all across adaptnlp for speed and performance enhancements for training and inference.
    • [x] Summarization Add NLP-task of summarization using document-level encoder based on transformer language models
    • [x] GPU Multi-GPU and mixed-precision is prevalent in AdaptNLP, but its implementation can be improved and debugged ~~FastAPI Batch-Serving Improve on the concurrent calls with batch processing from the NLP models (maybe try to make it CPU and GPU agnostic for ease-of-use)~~ ~~Model Downloading Start structuring a way to download and potentially upload pre-trained NLP-task models~~
    enhancement 
    opened by aychang95 3
  • Data API

    Data API

    We probably should have a data API of some form, that ties into https://github.com/Novetta/adaptnlp/issues/128

    Ideally it should simply prep a dataset for tokenization of a model, or tokenize the data itself.

    For now we cover two inputs:

    1. Individual texts
    2. CSV

    We should support something akin to fastai's get_y, but with decent defaults so that customization is available, but not needed.

    Ideally something like:

    dset = TaskDataset.from_df(
      df,  # Can be fname or dataframe
      get_x = ColReader('text'),
      get_y = ColReader('label'),
      splitter = RandomSplitter(),
      model = 'bert-base-uncased', # The name/type of downstream model
      task = "ner" # Or use a `Task.NER` namespace class
    )
    

    And further:

    dset.dataloaders(bs=8, collate_fn=data_collator)
    

    It reads extremely similar to the fastai API, but we do not use the fastai API, as for text doing it like this is a bit easier.

    The highest level API would look like so:

    dls = TaskDataLoaders.from_df(df, 'text', 'label', model='bert-base-uncased')
    

    We should note the model used, and when integrating it with the tuning API if something is off with the model entered, we make note of that

    enhancement 
    opened by muellerzr 2
  • ImportError: cannot import name 'EasyTokenTagger'

    ImportError: cannot import name 'EasyTokenTagger'

    Describe the bug A clear and concise description of what the bug is. I tried to run the code in the tutorial

    from adaptnlp import EasyTokenTagger
    
    
    ## Example Text
    example_text = "Novetta's headquarters is located in Mclean, Virginia."
    
    ## Load the token tagger module and tag text with the NER model 
    tagger = EasyTokenTagger()
    sentences = tagger.tag_text(text=example_text, model_name_or_path="ner")
    
    ## Output tagged token span results in Flair's Sentence object model
    for sentence in sentences:
        for entity in sentence.get_spans("ner"):
            print(entity)
    

    and it gave me the error:

    ...
      File "/home/rajiv/Documents/dev/python/nltk-trial/adaptnlp.py", line 2, in <module>
        from adaptnlp import EasyTokenTagger
    ImportError: cannot import name 'EasyTokenTagger'
    

    Desktop (please complete the following information):

    • OS: Ubuntu
    • Version: 20.04
    • Python: 3.6.9
    opened by RAbraham 2
  • classifier.tag_text on GPU!

    classifier.tag_text on GPU!

    hi i want to classify texts: classifier = EasySequenceClassifier() hub = HFModelHub() hub.search_model_by_task('text-classification') model = hub.search_model_by_name('nlptown/bert-base', user_uploaded=True)[0]; sentence = classifier.tag_text(text=inputs, model_name_or_path=model, mini_batch_size=1)

    Q1: how force to run it on CPU? Q2: now i have GPU but i can't success to run, my errors: ... FileNotFoundError: [Errno 2] No such file or directory: 'nlptown/bert-base-multilingual-uncased-sentiment' During handling of the above exception, another exception occurred: ... RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

    opened by topliftarm 0
  • Unified Training API

    Unified Training API

    Training API will use fastai under the hood, and we'll make a few functions to build general datasets.

    Tasks and sample datasets to use:

    Other Information

    Task API's should have a simple user interface, IE high-level can only input specific options, while midlevel has access to the full fastai Learner params.

    Example mid-level API I'm thinking about:

    dls = some_build_data_thing()
    tuner = QAFineTuner(dls, 'bert-base-cased')
    tuner.tune(
      scheduler = 'fit_flat_cos',
      n_epochs = 3,
      lr = None,
      suggest_method = 'valley', # Triggers if lr is None
      additional_callbacks = []
    )
    

    And its high-level:

    tuner = QAFineTuner.from_csv(
      question_column_name = "question",
      answer_column_name = "answer",
      model = "bert-base-cased"
    )
    tuner.tune(...)
    

    We should automatically pull in proper metrics for each task, but users have the option to bring in their own as well and pass it to QAFineTuner (good defaults)

    Tuners should also have a func like QAFineTuner.from_csv() to build the dataset in-house

    enhancement 
    opened by muellerzr 2
  • Save context in QuestionAnswering and re-use it

    Save context in QuestionAnswering and re-use it

    I notices when we run any code snippet, it convert the text to vectors or some similar thing. For example in this code snippet

    from adaptnlp import EasyQuestionAnswering 
    from pprint import pprint
    
    ## Example Query and Context 
    query = "What is the meaning of life?"
    context = "Machine Learning is the meaning of life."
    top_n = 5
    
    ## Load the QA module and run inference on results 
    qa = EasyQuestionAnswering()
    best_answer, best_n_answers = qa.predict_qa(query=query, context=context, n_best_size=top_n, mini_batch_size=1, model_name_or_path="distilbert-base-uncased-distilled-squad")
    
    ## Output top answer as well as top 5 answers
    print(best_answer)
    pprint(best_n_answers)
    

    It convert both query and context to vectors first. What if we have very long context and we have a lot of queries, each time it will convert the context to vector. I think there should be a way to save context vector and re-use it instead of creating again and again.

    enhancement 
    opened by talhaanwarch 1
  • Stretch Goals

    Stretch Goals

    • [x] HuggingFace raw embeddings over Flair
    • [x] Try and integrate Callbacks for text generation and other classes that aren't using it

      Note: Didn't do this for text generation, more complex than its worth

    • [x] Use fastrelease (with conda)
    • [x] Improve test coverage
    • [x] GH CI for testing Mac, Windows, and Linux, similar to how fastai has it setup
    • [x] nbdev?
    • [x] Windows support
    • [x] Use Pipeline for inference

      Note: Pipeline is slower on many tasks that AdaptNLP covers, tests are in place to ensure that this is always true

    • [ ] 1.0.0: Unified training framework for at least 4 NLP tasks
    enhancement 
    opened by muellerzr 0
Releases(v0.3.7)
  • v0.3.7(Nov 10, 2021)

  • v0.3.6(Nov 9, 2021)

  • v0.3.3(Sep 3, 2021)

    Bug Squashed

    • Embeddings were conjoined rather than separated out by word
    • Question Answering Results would only return the first instance, rather than top n instances
    • AdaptiveTuner can accept a label_names parameter for where the labels in a batch are present
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Aug 11, 2021)

    New Features

    • A new Data API that integrates with HuggingFace's Dataset class

    • A new Tuner API for training and fine-tuning Transformer models

    • Full integration of the latest fastai library for full access to state-of-the-art practices when training and fine-tuning a model. As improvements are made to the library AdaptNLP will update to accomodate them

    • A new Result API that most inference modules return. This is a filterable result ensuring that you only get the most relevent information when returning a prediction from the Easy* modules

    Breaking Changes

    • The train and eval capabilities in the Easy* modules no longer exist, and all training related functionalities have migrated to the Tuner API
    • LanguageModelFineTuner no longer exists, and the same tuning functionality is in LanguageModelTuner

    Bugs Squashed

    • max_len Attribute Error (127
    • Integrate a complete Data API (milestone) (129
    • Use the latest fastcore (132)
    • Fix unused kwarg arguments in text generation (134)
    • Fix name 'df' is not defined (135)
    Source code(tar.gz)
    Source code(zip)
  • 0.2.3(May 5, 2021)

    Breaking Changes:

    • New versions of AdaptNLP will require a minimum torch version of 1.7, and flair of 0.9 (currently we install via git until 0.9/0.81 is released)

    New Features

    Bugs Squashed

    • Fix accessing bart-large-cnn (110)
    • Fix SAVE_STATE_WARNING (114)
    Source code(tar.gz)
    Source code(zip)
  • v0.2.2(Jan 11, 2021)

    Official AdaptNLP Docker Images updated

    • Using NVIDIA NGC Container Registry Cuda base images #101
    • All images should be deployable via. Kubeflow Jupyter Servers
    • Cleaner python virtualvenv setup #101
    • Official readme can be found at https://github.com/Novetta/adaptnlp/blob/master/docker/README.md

    Minor Bug Fixes

    • Fix token tagging REST application type check #92
    • Semantic fixes in readme #94
    • Standalone microservice REST application images #93
    • Python 3.7+ is now an official requirement #97
    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Sep 17, 2020)

    Updated to nlp 0.4 -> datasets 1.0+ and multi-label training for sequence classification fixes.

    EasySequenceClassifier.train() Updates

    • Integrates datasets.Dataset now
    • Swapped order of formatting and label column renaming due to labels not showing up from torch data batches #87

    Tutorials and Documentation

    • Documentation and sequence classification tutorials have been updated to address nlp->datasets name change
    • Broken links also updated

    ODSC Europe Workshop 2020: Notebooks and Colab

    • ODSC Europe 2020 workshop materials now available in repository "/tutorials/Workshop"
    • Easy to run notebooks and colab links aligned with the tutorials are available
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Sep 1, 2020)

    Updated to transformers 3+, nlp 0.4+, flair 0.6+, pandas 1+

    New Features!

    New and "easier" training framework with easy modules: EasySequenceClassifier.train() and EasySequenceClassifier.evaluate()

    • Integrates nlp.Dataset and transformers.Trainer for a streamlined training workflow
    • Tutorials, notebooks, and colab links available
    • Sequence Classification task has been implemented, other NLP tasks are in the works
    • SequenceClassifierTrainer is still available, but will be transitioned into the EasySequenceClassifier and deprecated

    New and "easier" LMFineTuner

    • Integrates transformers.Trainer for a streamlined training workflow
    • Older LMFineTuner is still available as LMFineTunerManual, but will be deprecated in later releases
    • Tutorials, notebooks, and colab links available

    EasyTextGenerator

    • New module for text generation. GPT models are currently supported, other models may work but still experimental
    • Tutorials, notebooks, and colab links available

    Tutorials and Documentation

    • Documentation has been edited and updated to include additional features like the change in training frameworks and fine-tuning
    • The sequence classification tutorial is a good indicator of the direction we are going with the training and fine-tuning framework

    Notebooks and Colab

    • Easy to run notebooks and colab links aligned with the tutorials are available

    Bug fixes

    • Minor bug and implementation error fixes from flair upgrades
    Source code(tar.gz)
    Source code(zip)
  • v0.1.6(May 1, 2020)

  • v0.1.5(Apr 17, 2020)

    Updated to Transformers 2.8.0 which now includes the ELECTRA language model

    EasySummarizer and EasyTranslator Bug Fix #63

    • Address mini batch output format issue for language model heads for the summarization and translation task

    Tutorials and Workshop #64

    • Add the ODSC Timeline Generator notebooks along with colab links
    • Small touch ups in tutorial notebooks

    Documentation

    • Address missing model_name_or_path param in some easy modules
    Source code(tar.gz)
    Source code(zip)
  • v0.1.4(Apr 2, 2020)

    Updated to Transformers 2.7.0 which includes the Bart and T5 Language Models!

    EasySummarizer #47

    • New module for summarizing documents. These support both the T5 and Bart pre-trained models provided by Hugging Face.
    • Helper objects for the easy module that can be run as standalone instances TransformersSummarizer

    EasyTranslator #49

    • New module for translating documents with T5 pre-trained models provided by Hugging Face.
    • Helper objects for the easy module that can be run as standalone instances TransformersTranslator

    Documentation and Tutorials #52

    • New Class API documentation for EasySummarizer and EasyTranslator
    • New tutorial guides, initial notebooks, and links to colab for the above as well
    • Readme provides quickstart samples that show examples from the notebooks #53

    Other

    • Dockerhub repo for adaptnlp-rest added here https://hub.docker.com/r/achangnovetta/adaptnlp-rest
    • Upgraded CircleCI allowing us to run #40
    • Added Nightly build #39
    Source code(tar.gz)
    Source code(zip)
  • v0.1.3(Mar 6, 2020)

    Sequence Classification and Question Answering updates to integrate Hugging Face's public models.

    EasySequenceClassifier

    • Can now take Flair and Transformers pre-trained sequence classification models as input in the model_name_or_path param
    • Helper objects for the easy module that can be run as standalone instances TransformersSequenceClassifier FlairSequenceClassifier

    EasyQuestionAnswering

    • Can now take Transformers pre-trained sequence classification models as input in the model_name_or_path param
    • Helper objects for the easy module that can be run as standalone instances TransformersQuestionAnswering

    Documentation and Tutorials

    Documentation has been updated with the above implementations

    • Tutorials updated with better examples to convey changes
    • Class API docs updated
    • Tutorial notebooks updated
    • Colab notebooks better displayed on readme

    FastAPI Rest

    FastAPI updated to latest (0.52.0) FastAPI endpoints can now be stood up and deployed with any huggingface sequence classification or question answering model specified as an env var arg.

    Dependencies

    Transformers pinned for stable updates

    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Feb 19, 2020)

    AdaptNLP's first published release on github.

    Easy API:

    • EasyTokenTagger
    • EasySequenceClassifier
    • EasyWordEmbeddings
    • EasyStackedEmbeddings
    • EasyDocumentEmbeddings

    Training and Fine-tuning Interface

    • SequenceClassifierTrainer
    • LMFineTuner

    FastAPI AdaptNLP App for Streamlined Rapid NLP-Model Deployment

    • adaptnlp/rest
    • configured to run any pretrained and custom trained flair/adaptnlp models
    • compatible with nvidia-docker for GPU use
    • AdaptNLP integration but loosely coupled

    Documentation

    • Documentation release with walk-through guides, tutorials, and Class API docs of the above
    • Built with mkdocs, material for mkdocs, and mkautodoc

    Tutorials

    • IPython/Colab Notebooks provided and updated to showcase AdaptNLP Modules

    Continuous Integration

    • CircleCI build and tests running successfully and minimally
    • Github workflow for pypi publishing added

    Formatting

    • Flake8 and Black adherence
    Source code(tar.gz)
    Source code(zip)
Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.

Linear Transformers Are Secretly Fast Weight Programmers This repository contains the code accompanying the paper Linear Transformers Are Secretly Fas

Imanol Schlag 77 Dec 19, 2022
Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration

Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration This is the official repository for the EMNLP 2021 long pa

70 Dec 11, 2022
VoiceFixer VoiceFixer is a framework for general speech restoration.

VoiceFixer VoiceFixer is a framework for general speech restoration. We aim at the restoration of severly degraded speech and historical speech. Paper

Leo 174 Jan 06, 2023
Partially offline multi-language translator built upon Huggingface transformers.

Translate Command-line interface to translation pipelines, powered by Huggingface transformers. This tool can download translation models, and then us

Richard Jarry 8 Oct 25, 2022
Course project of [email protected]

NaiveMT Prepare Clone this repository git clone [email protected]:Poeroz/NaiveMT.git

Poeroz 2 Apr 24, 2022
Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

Wav2Vec2 STT Python Beta Software Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 mode

David Zurow 22 Dec 29, 2022
Production First and Production Ready End-to-End Keyword Spotting Toolkit

Production First and Production Ready End-to-End Keyword Spotting Toolkit

223 Jan 02, 2023
Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.

KB-NER: a Knowledge-based System for Multilingual Complex Named Entity Recognition The code is for the winner system (DAMO-NLP) of SemEval 2022 MultiC

116 Dec 27, 2022
OceanScript is an Esoteric language used to encode and decode text into a formulation of characters

OceanScript is an Esoteric language used to encode and decode text into a formulation of characters - where the final result looks like waves in the ocean.

code for modular summarization work published in ACL2021 by Krishna et al

This repository contains the code for running modular summarization pipelines as described in the publication Krishna K, Khosla K, Bigham J, Lipton ZC

Approximately Correct Machine Intelligence (ACMI) Lab 21 Nov 24, 2022
Translates basic English sentences into the Huna language (hoo-NAH)

huna-translator The Huna Language Translates basic English sentences into the Huna language (hoo-NAH). The Huna constructed language was developed in

Miles Smith 0 Jan 20, 2022
Clone a voice in 5 seconds to generate arbitrary speech in real-time

This repository is forked from Real-Time-Voice-Cloning which only support English. English | 中文 Features 🌍 Chinese supported mandarin and tested with

Weijia Chen 25.6k Jan 06, 2023
Open-source offline translation library written in Python. Uses OpenNMT for translations

Open source neural machine translation in Python. Designed to be used either as a Python library or desktop application. Uses OpenNMT for translations and PyQt for GUI.

Argos Open Tech 1.6k Jan 01, 2023
Code for "Generative adversarial networks for reconstructing natural images from brain activity".

Reconstruct handwritten characters from brains using GANs Example code for the paper "Generative adversarial networks for reconstructing natural image

K. Seeliger 2 May 17, 2022
Russian words synonyms and antonyms

ru_synonyms Russian words synonyms and antonyms. Install pip install git+https://github.com/ahmados/rusynonyms.git Usage from ru_synonyms import Anto

sumekenov 7 Dec 14, 2022
What are the best Systems? New Perspectives on NLP Benchmarking

What are the best Systems? New Perspectives on NLP Benchmarking In Machine Learning, a benchmark refers to an ensemble of datasets associated with one

Pierre Colombo 12 Nov 03, 2022
Research code for the paper "Fine-tuning wav2vec2 for speaker recognition"

Fine-tuning wav2vec2 for speaker recognition This is the code used to run the experiments in https://arxiv.org/abs/2109.15053. Detailed logs of each t

Nik 103 Dec 26, 2022
Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Weitang Liu 1.6k Jan 03, 2023
[EMNLP 2021] LM-Critic: Language Models for Unsupervised Grammatical Error Correction

LM-Critic: Language Models for Unsupervised Grammatical Error Correction This repo provides the source code & data of our paper: LM-Critic: Language M

Michihiro Yasunaga 98 Nov 24, 2022
TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset.

TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset. TunBERT was applied to three NLP downstream tasks: Sentiment Analysis (S

InstaDeep Ltd 72 Dec 09, 2022