This is a general repo that helps you develop fast/effective NLP classifiers using Huggingface

Last update: Mar 11, 2022

Related tags

Overview

NLP Classifier

Introduction

This project trains a bert model on any NLP classifcation model. And uses the model in make predictions on new data using batch_inference.py. This architecture can be easily extended to cover a lot more models.

Installation

Set up

$ https://github.com/abdullahtarek/nlp_classifier.git
$ cd nlp_classifier.git
Move the train.csv and test.csv in the data folder

Python

$ pip install -r requirements.txt
$ Copy the training or testing dataset in the "data" folder
$ python training.py or $ python batch_inference.py

Docker

$ docker build . -t nlp_classifier
$ docker run -it -v $DATA_FOLDER:/app/data -v $LOCAL_SAVED_MODEL_FOLDER:/app/saved_models nlp_classifier python batch_inference.py or python training.py

Extra options

Manging Configurations

All configurations are in the conf folder where you can change the data path, model path, etc.
You can also provide the configuration flag while running the script. You can write --help after the python command to see which configs you can change. Example: python3 batch_inference.py --help.

This is a general repo that helps you develop fast/effective NLP classifiers using Huggingface

Related tags

Overview

NLP Classifier

Introduction

Installation

Set up

Python

Docker

Extra options

Manging Configurations

Owner

Abdullah Tarek

Text Normalization（文本正则化）

CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

IEEEXtreme15.0 Questions And Answers

In this project, we compared Spanish BERT and Multilingual BERT in the Sentiment Analysis task.

Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

code for modular summarization work published in ACL2021 by Krishna et al

NLP library designed for reproducible experimentation management

Negative sampling for solving the unlabeled entity problem in NER. ICLR-2021 paper: Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition.

novel deep learning research works with PaddlePaddle

Torchrecipes provides a set of reproduci-able, re-usable, ready-to-run RECIPES for training different types of models, across multiple domains, on PyTorch Lightning.

A BERT-based reverse-dictionary of Korean proverbs

texlive expressions for documents

Phomber is infomation grathering tool that reverse search phone numbers and get their details, written in python3.

Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

Super easy library for BERT based NLP models

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

中文医疗信息处理基准CBLUE: A Chinese Biomedical LanguageUnderstanding Evaluation Benchmark

Snips Python library to extract meaning from text

Skipgram Negative Sampling in PyTorch

A crowdsourced dataset of dialogues grounded in social contexts involving utilization of commonsense.