This is a general repo that helps you develop fast/effective NLP classifiers using Huggingface

Last update: Mar 11, 2022

Related tags

Overview

NLP Classifier

Introduction

This project trains a bert model on any NLP classifcation model. And uses the model in make predictions on new data using batch_inference.py. This architecture can be easily extended to cover a lot more models.

Installation

Set up

$ https://github.com/abdullahtarek/nlp_classifier.git
$ cd nlp_classifier.git
Move the train.csv and test.csv in the data folder

Python

$ pip install -r requirements.txt
$ Copy the training or testing dataset in the "data" folder
$ python training.py or $ python batch_inference.py

Docker

$ docker build . -t nlp_classifier
$ docker run -it -v $DATA_FOLDER:/app/data -v $LOCAL_SAVED_MODEL_FOLDER:/app/saved_models nlp_classifier python batch_inference.py or python training.py

Extra options

Manging Configurations

All configurations are in the conf folder where you can change the data path, model path, etc.
You can also provide the configuration flag while running the script. You can write --help after the python command to see which configs you can change. Example: python3 batch_inference.py --help.

This is a general repo that helps you develop fast/effective NLP classifiers using Huggingface

Related tags

Overview

NLP Classifier

Introduction

Installation

Set up

Python

Docker

Extra options

Manging Configurations

Owner

Abdullah Tarek

A Python 3.6+ package to run .many files, where many programs written in many languages may exist in one file.

Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP)

A Transformer Implementation that is easy to understand and customizable.

Py65 65816 - Add support for the 65C816 to py65

A PyTorch Implementation of End-to-End Models for Speech-to-Text

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch

Using context-free grammar formalism to parse English sentences to determine their structure to help computer to better understand the meaning of the sentence.

Smart discord chatbot integrated with Dialogflow

Full Spectrum Bioinformatics - a free online text designed to introduce key topics in Bioinformatics using the Python

Tools for curating biomedical training data for large-scale language modeling

Shared code for training sentence embeddings with Flax / JAX

MEDIALpy: MEDIcal Abbreviations Lookup in Python

The first online catalogue for Arabic NLP datasets.

An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

This project deals with a simplified version of a more general problem of Aspect Based Sentiment Analysis.

This is the source code of RPG (Reward-Randomized Policy Gradient)

Simple GUI where you can enter an article and get a crisp summarized version.

BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).

Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering