Repository for Project Insight: NLP as a Service

Overview

Project Insight

NLP as a Service

Project Insight

GitHub issues GitHub forks Github Stars GitHub license Code style: black

Contents

  1. Introduction
  2. Installation
  3. Project Details
  4. License

Introduction

Project Insight is designed to create NLP as a service with code base for both front end GUI (streamlit) and backend server (FastApi) the usage of transformers models on various downstream NLP task.

The downstream NLP tasks covered:

  • News Classification

  • Entity Recognition

  • Sentiment Analysis

  • Summarization

  • Information Extraction To Do

The user can select different models from the drop down to run the inference.

The users can also directly use the backend fastapi server to have a command line inference.

Features of the solution

  • Python Code Base: Built using Fastapi and Streamlit making the complete code base in Python.
  • Expandable: The backend is desinged in a way that it can be expanded with more Transformer based models and it will be available in the front end app automatically.
  • Micro-Services: The backend is designed with a microservices architecture, with dockerfile for each service and leveraging on Nginx as a reverse proxy to each independently running service.
    • This makes it easy to update, manitain, start, stop individual NLP services.

Installation

  • Clone the Repo.
  • Run the Docker Compose to spin up the Fastapi based backend service.
  • Run the Streamlit app with the streamlit run command.

Setup and Documentation

  1. Download the models

    • Download the models from here
    • Save them in the specific model folders inside the src_fastapi folder.
  2. Running the backend service.

    • Go to the src_fastapi folder
    • Run the Docker Compose comnand
    $ cd src_fastapi
    src_fastapi:~$ sudo docker-compose up -d
  3. Running the frontend app.

    • Go to the src_streamlit folder
    • Run the app with the streamlit run command
    $ cd src_streamlit
    src_streamlit:~$ streamlit run NLPfily.py
  4. Access to Fastapi Documentation: Since this is a microservice based design, every NLP task has its own seperate documentation

Project Details

Demonstration

Project Insight Demo

Directory Details

  • Front End: Front end code is in the src_streamlit folder. Along with the Dockerfile and requirements.txt

  • Back End: Back End code is in the src_fastapi folder.

    • This folder contains directory for each task: Classification, ner, summary...etc
    • Each NLP task has been implemented as a microservice, with its own fastapi server and requirements and Dockerfile so that they can be independently mantained and managed.
    • Each NLP task has its own folder and within each folder each trained model has 1 folder each. For example:
    - sentiment
        > app
            > api
                > distilbert
                    - model.bin
                    - network.py
                    - tokeniser files
                >roberta
                    - model.bin
                    - network.py
                    - tokeniser files
    
    • For each new model under each service a new folder will have to be added.

    • Each folder model will need the following files:

      • Model bin file.
      • Tokenizer files
      • network.py Defining the class of the model if customised model used.
    • config.json: This file contains the details of the models in the backend and the dataset they are trained on.

How to Add a new Model

  1. Fine Tune a transformer model for specific task. You can leverage the transformers-tutorials

  2. Save the model files, tokenizer files and also create a network.py script if using a customized training network.

  3. Create a directory within the NLP task with directory_name as the model name and save all the files in this directory.

  4. Update the config.json with the model details and dataset details.

  5. Update the <service>pro.py with the correct imports and conditions where the model is imported. For example for a new Bert model in Classification Task, do the following:

    • Create a new directory in classification/app/api/. Directory name bert.

    • Update config.json with following:

      "classification": {
      "model-1": {
          "name": "DistilBERT",
          "info": "This model is trained on News Aggregator Dataset from UC Irvin Machine Learning Repository. The news headlines are classified into 4 categories: **Business**, **Science and Technology**, **Entertainment**, **Health**. [New Dataset](https://archive.ics.uci.edu/ml/datasets/News+Aggregator)"
      },
      "model-2": {
          "name": "BERT",
          "info": "Model Info"
      }
      }
    • Update classificationpro.py with the following snippets:

      Only if customized class used

      from classification.bert import BertClass

      Section where the model is selected

      if model == "bert":
          self.model = BertClass()
          self.tokenizer = BertTokenizerFast.from_pretrained(self.path)

License

This project is licensed under the GPL-3.0 License - see the LICENSE.md file for details

Owner
Abhishek Kumar Mishra
Eat, Sleep, Pray, and Code * An Operations Innovation Lead at IHS Markit during working hours. * Love to read manga and cook new cuisines.
Abhishek Kumar Mishra
Summarization module based on KoBART

KoBART-summarization Install KoBART pip install git+https://github.com/SKT-AI/KoBART#egg=kobart Requirements pytorch==1.7.0 transformers==4.0.0 pytor

seujung hwan, Jung 148 Dec 28, 2022
Sentello is python script that simulates the anti-evasion and anti-analysis techniques used by malware.

sentello Sentello is a python script that simulates the anti-evasion and anti-analysis techniques used by malware. For techniques that are difficult t

Malwation 62 Oct 02, 2022
The Easy-to-use Dialogue Response Selection Toolkit for Researchers

The Easy-to-use Dialogue Response Selection Toolkit for Researchers

GMFTBY 32 Nov 13, 2022
Pretty-doc - Composable text objects with python

pretty-doc from __future__ import annotations from dataclasses import dataclass

Taine Zhao 2 Jan 17, 2022
构建一个多源(公众号、RSS)、干净、个性化的阅读环境

2C 构建一个多源(公众号、RSS)、干净、个性化的阅读环境 作为一名微信公众号的重度用户,公众号一直被我设为汲取知识的地方。随着使用程度的增加,相信大家或多或少会有一个比较头疼的问题——广告问题。 假设你关注的公众号有十来个,若一个公众号两周接一次广告,理论上你会面临二十多次广告,实际上会更多,运

howie.hu 678 Dec 28, 2022
Spooky Skelly For Python

_____ _ _____ _ _ _ | __| ___ ___ ___ | |_ _ _ | __|| |_ ___ | || | _ _ |__ || . || . || . || '

Kur0R1uka 1 Dec 23, 2021
Beyond the Imitation Game collaborative benchmark for enormous language models

BIG-bench 🪑 The Beyond the Imitation Game Benchmark (BIG-bench) will be a collaborative benchmark intended to probe large language models, and extrap

Google 1.3k Jan 01, 2023
auto_code_complete is a auto word-completetion program which allows you to customize it on your need

auto_code_complete v1.3 purpose and usage auto_code_complete is a auto word-completetion program which allows you to customize it on your needs. the m

RUO 2 Feb 22, 2022
Codename generator using WordNet parts of speech database

codenames Codename generator using WordNet parts of speech database References: https://possiblywrong.wordpress.com/2021/09/13/code-name-generator/ ht

possiblywrong 27 Oct 30, 2022
A method for cleaning and classifying text using transformers.

NLP Translation and Classification The repository contains a method for classifying and cleaning text using NLP transformers. Overview The input data

Ray Chamidullin 0 Nov 15, 2022
vits chinese, tts chinese, tts mandarin

vits chinese, tts chinese, tts mandarin 史上训练最简单,音质最好的语音合成系统

AmorTX 12 Dec 14, 2022
Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer Requirements torch==1.6.0

Yan Yuanmeng 478 Dec 25, 2022
Conditional Transformer Language Model for Controllable Generation

CTRL - A Conditional Transformer Language Model for Controllable Generation Authors: Nitish Shirish Keskar, Bryan McCann, Lav Varshney, Caiming Xiong,

Salesforce 1.7k Dec 28, 2022
A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN

artificial intelligence cosmic love and attention fire in the sky a pyramid made of ice a lonely house in the woods marriage in the mountains lantern

Phil Wang 2.3k Jan 01, 2023
NLPShala , the best IDE for all Natural language processing tasks.

The revolutionary IDE for all NLP (Natural language processing) stuffs on the internet.

Abhi 3 Aug 08, 2021
Predict an emoji that is associated with a text

Sentiment Analysis Sentiment analysis in computational linguistics is a general term for techniques that quantify sentiment or mood in a text. Can you

Tetsumichi(Telly) Umada 30 Sep 07, 2022
Machine Learning Course Project, IMDB movie review sentiment analysis by lstm, cnn, and transformer

IMDB Sentiment Analysis This is the final project of Machine Learning Courses in Huazhong University of Science and Technology, School of Artificial I

Daniel 0 Dec 27, 2021
State of the Art Natural Language Processing

Spark NLP: State of the Art Natural Language Processing Spark NLP is a Natural Language Processing library built on top of Apache Spark ML. It provide

John Snow Labs 3k Jan 05, 2023
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing 🎉 🎉 🎉 We released the 2.0.0 version with TF2 Support. 🎉 🎉 🎉 If you

Eliyar Eziz 2.3k Dec 29, 2022
Ecommerce product title recognition package

revizor This package solves task of splitting product title string into components, like type, brand, model and article (or SKU or product code or you

Bureaucratic Labs 16 Mar 03, 2022