Project Insight
NLP as a Service
Contents
Introduction
Project Insight is designed to create NLP as a service with code base for both front end GUI (streamlit) and backend server (FastApi) the usage of transformers models on various downstream NLP task.
The downstream NLP tasks covered:
-  News Classification 
-  Entity Recognition 
-  Sentiment Analysis 
-  Summarization 
-  Information Extraction To Do
The user can select different models from the drop down to run the inference.
The users can also directly use the backend fastapi server to have a command line inference.
Features of the solution
- Python Code Base: Built using FastapiandStreamlitmaking the complete code base in Python.
- Expandable: The backend is desinged in a way that it can be expanded with more Transformer based models and it will be available in the front end app automatically.
- Micro-Services: The backend is designed with a microservices architecture, with dockerfile for each service and leveraging on Nginx as a reverse proxy to each independently running service.
- This makes it easy to update, manitain, start, stop individual NLP services.
 
Installation
- Clone the Repo.
- Run the Docker Composeto spin up the Fastapi based backend service.
- Run the Streamlit app with the streamlit run command.
Setup and Documentation
-  Download the models - Download the models from here
- Save them in the specific model folders inside the src_fastapifolder.
 
-  Running the backend service. - Go to the src_fastapifolder
- Run the Docker Composecomnand
 $ cd src_fastapi src_fastapi:~$ sudo docker-compose up -d 
- Go to the 
-  Running the frontend app. - Go to the src_streamlitfolder
 - Run the app with the streamlit run command
 $ cd src_streamlit src_streamlit:~$ streamlit run NLPfily.py 
- Go to the 
-  Access to Fastapi Documentation: Since this is a microservice based design, every NLP task has its own seperate documentation - News Classification: http://localhost:8080/api/v1/classification/docs
- Sentiment Analysis: http://localhost:8080/api/v1/sentiment/docs
- NER: http://localhost:8080/api/v1/ner/docs
- Summarization: http://localhost:8080/api/v1/summary/docs
 
Project Details
Demonstration
Directory Details
-  Front End: Front end code is in the src_streamlitfolder. Along with theDockerfileandrequirements.txt
-  Back End: Back End code is in the src_fastapifolder.- This folder contains directory for each task: Classification,ner,summary...etc
- Each NLP task has been implemented as a microservice, with its own fastapi server and requirements and Dockerfile so that they can be independently mantained and managed.
- Each NLP task has its own folder and within each folder each trained model has 1 folder each. For example:
 - sentiment > app > api > distilbert - model.bin - network.py - tokeniser files >roberta - model.bin - network.py - tokeniser files-  For each new model under each service a new folder will have to be added. 
-  Each folder model will need the following files: - Model bin file.
- Tokenizer files
- network.pyDefining the class of the model if customised model used.
 
-  config.json: This file contains the details of the models in the backend and the dataset they are trained on.
 
- This folder contains directory for each task: 
How to Add a new Model
-  Fine Tune a transformer model for specific task. You can leverage the transformers-tutorials 
-  Save the model files, tokenizer files and also create a network.pyscript if using a customized training network.
-  Create a directory within the NLP task with directory_nameas themodel nameand save all the files in this directory.
-  Update the config.jsonwith the model details and dataset details.
-  Update the <service>pro.pywith the correct imports and conditions where the model is imported. For example for a new Bert model in Classification Task, do the following:-  Create a new directory in classification/app/api/. Directory namebert.
-  Update config.jsonwith following:"classification": { "model-1": { "name": "DistilBERT", "info": "This model is trained on News Aggregator Dataset from UC Irvin Machine Learning Repository. The news headlines are classified into 4 categories: **Business**, **Science and Technology**, **Entertainment**, **Health**. [New Dataset](https://archive.ics.uci.edu/ml/datasets/News+Aggregator)" }, "model-2": { "name": "BERT", "info": "Model Info" } } 
-  Update classificationpro.pywith the following snippets:Only if customized class used from classification.bert import BertClass Section where the model is selected if model == "bert": self.model = BertClass() self.tokenizer = BertTokenizerFast.from_pretrained(self.path) 
 
-  
License
This project is licensed under the GPL-3.0 License - see the LICENSE.md file for details

