Basic yet complete Machine Learning pipeline for NLP tasks

Last update: Aug 22, 2022

Related tags

Text Data & NLP ml-pipeline

Overview

Basic yet complete Machine Learning pipeline for NLP tasks

This repository accompanies the article on building basic yet complete ML pipelines for solving NLP tasks.

Requirements

Docker

telnet

Please refer to installation instructions for your system if needed.

Running the pipeline

The whole pipeline of 4 services (mail server, database, prediction service and orchestrator) can be started with one command:

docker-compose -f docker-compose.yaml up --build

It should start printing log messages from the services.

Sending an email

The pipeline is triggered by an unread email appearing in the mailbox. In order to send one, telnet util can be used.

Connecting to the IMAP mail server: telnet localhost 3025

Sending the email with telnet:

EHLO user
MAIL FROM:<[email protected]>
RCPT TO:<user>
DATA
Subject: Hello World
 
Hello!

She works at Apple now but before that she worked at Microsoft.
.
QUIT

If everything went well, something like this should appear in logs:

orchestrator_1                   | Polling mailbox...
prediction-worker_1              | INFO:     172.19.0.5:55294 - "POST /predict HTTP/1.1" 200 OK
orchestrator_1                   | Recorded to DB with id=34: [{'entity_text': 'Apple', 'start': 24, 'end': 29}, {'entity_text': 'Microsoft', 'start': 58, 'end': 67}]

Checking the result

The data must also be recorded to the database. In order to check that, any DB client can be used with the following connection parameters:

host: localhost
port: 5432
database: maildb
username: pguser
pasword: password

and running SELECT * FROM mail LIMIT 10 query.

Basic yet complete Machine Learning pipeline for NLP tasks

Related tags

Overview

Basic yet complete Machine Learning pipeline for NLP tasks

Requirements

Running the pipeline

Running the pipeline

Sending an email

Checking the result

Owner

Ivan

NVDA, the free and open source Screen Reader for Microsoft Windows

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

ConvBERT: Improving BERT with Span-based Dynamic Convolution

Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

天池中药说明书实体识别挑战冠军方案；中文命名实体识别；NER; BERT-CRF & BERT-SPAN & BERT-MRC；Pytorch

A large-scale (194k), Multiple-Choice Question Answering (MCQA) dataset designed to address realworld medical entrance exam questions.

An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.

Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

Common Voice Dataset explorer

Weakly-supervised Text Classification Based on Keyword Graph

Line as a Visual Sentence: Context-aware Line Descriptor for Visual Localization

👄 The most accurate natural language detection library for Python, suitable for long and short text alike

☀️ Measuring the accuracy of BBC weather forecasts in Honolulu, USA

The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

Russian GPT3 models.

A Multi-modal Model Chinese Spell Checker Released on ACL2021.

Training code for Korean multi-class sentiment analysis

QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

Sapiens is a human antibody language model based on BERT.

Implementation of ProteinBERT in Pytorch