REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.

Overview

What is MUSE?

MUSE stands for Multilingual Universal Sentence Encoder - multilingual extension (16 languages) of Universal Sentence Encoder (USE).
MUSE/USE models encode sentences into embedding vectors of fixed size.

MUSE paper: link.
USE paper: link.
USE Visually Explainer article: link.

What is MUSE as Service?

MUSE as Service is REST API for sentence tokenization and embedding using MUSE.
It is written on flask + gunicorn.
You can configure gunicorn with gunicorn.conf.py file.

Installation

# clone repo
git clone https://github.com/dayyass/muse_as_service.git

# install dependencies
cd muse_as_service
pip install -r requirements.txt

Run Service

To launch a service use a docker container (either locally or on a server):

docker build -t muse_as_service .
docker run -d -p 5000:5000 --name muse_as_service muse_as_service

NOTE: you can launch a service without docker using gunicorn: sh ./gunicorn.sh, or flask: python app.py, but it is preferable to launch the service inside the docker container.
NOTE: instead of building a docker image, you can pull it from Docker Hub:
docker pull dayyass/muse_as_service

Usage

After you launch the service, you can tokenize and embed any {sentence} using GET requests ({ip} is the address where the service was launched):

http://{ip}:5000/tokenize?sentence={sentence}
http://{ip}:5000/embed?sentence={sentence}

You can use python requests library to work with GET requests (example notebook):

import numpy as np
import requests

ip = "localhost"
port = 5000

sentence = "This is sentence example."

# tokenizer
response = requests.get(
    url=f"http://{ip}:{port}/tokenize",
    params={"sentence": f"{sentence}"},
)
tokenized_sentence = response.json()["content"]

# embedder
response = requests.get(
    url=f"http://{ip}:{port}/embed",
    params={"sentence": f"{sentence}"},
)
embedding = np.array(response.json()["content"][0])

# results
print(tokenized_sentence)  # ['▁This', '▁is', '▁sentence', '▁example', '.']
print(embedding.shape)  # (512,)

But it is better to use the built-in client MUSEClient for sentence tokenization and embedding, that wraps the functionality of the requests library and provides the user with a simpler interface (example notebook):

from muse_as_service import MUSEClient

ip = "localhost"
port = 5000

sentence = "This is sentence example."

# init client
client = MUSEClient(
    ip=ip,
    port=port,
)

# tokenizer
tokenized_sentence = client.tokenize(sentence)

# embedder
embedding = client.embed(sentence)

# results
print(tokenized_sentence)  # ['▁This', '▁is', '▁sentence', '▁example', '.']
print(embedding.shape)  # (512,)

Citation

If you use muse_as_service in a scientific publication, we would appreciate references to the following BibTex entry:

@misc{dayyass_muse_as_service,
    author = {El-Ayyass, Dani},
    title = {Multilingual Universal Sentence Encoder REST API},
    howpublished = {\url{https://github.com/dayyass/muse_as_service}},
    year = {2021},
}
You might also like...
Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).

XLM-EMO: Multilingual Emotion Prediction in Social Media Text Abstract Detecting emotion in text allows social and computational scientists to study h

Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2.

T-TA (Transformer-based Text Auto-encoder) This repository contains codes for Transformer-based Text Auto-encoder (T-TA, paper: Fast and Accurate Deep

Some embedding layer implementation using ivy library
Some embedding layer implementation using ivy library

ivy-manual-embeddings Some embedding layer implementation using ivy library. Just for fun. It is based on NYCTaxiFare dataset from kaggle (cut down to

A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.

Multilingual Latent Dirichlet Allocation (LDA) Pipeline This project is for text clustering using the Latent Dirichlet Allocation (LDA) algorithm. It

Multilingual text (NLP) processing toolkit

polyglot Polyglot is a natural language pipeline that supports massive multilingual applications. Free software: GPLv3 license Documentation: http://p

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Trankit: A Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing Trankit is a light-weight Transformer-based Pyth

Multilingual text (NLP) processing toolkit

polyglot Polyglot is a natural language pipeline that supports massive multilingual applications. Free software: GPLv3 license Documentation: http://p

Multilingual text (NLP) processing toolkit

polyglot Polyglot is a natural language pipeline that supports massive multilingual applications. Free software: GPLv3 license Documentation: http://p

Comments
  • How to change batch size

    How to change batch size

    I got the following OOM message: Error on request: Traceback (most recent call last): File "D:\ProgramData\Anaconda3\envs\muse-as-a-service\lib\site-packages\werkzeug\serving.py", line 324, in run_wsgi execute(self.server.app) File "D:\ProgramData\Anaconda3\envs\muse-as-a-service\lib\site-packages\werkzeug\serving.py", line 313, in execute application_iter = app(environ, start_response) File "D:\ProgramData\Anaconda3\envs\muse-as-a-service\lib\site-packages\flask\app.py", line 2091, in call return self.wsgi_app(environ, start_response) File "D:\ProgramData\Anaconda3\envs\muse-as-a-service\lib\site-packages\flask\app.py", line 2076, in wsgi_app response = self.handle_exception(e) File "D:\ProgramData\Anaconda3\envs\muse-as-a-service\lib\site-packages\flask_restful_init_.py", line 271, in error_router return original_handler(e) File "D:\ProgramData\Anaconda3\envs\muse-as-a-service\lib\site-packages\flask\app.py", line 2073, in wsgi_app response = self.full_dispatch_request() File "D:\ProgramData\Anaconda3\envs\muse-as-a-service\lib\site-packages\flask\app.py", line 1518, in full_dispatch_request rv = self.handle_user_exception(e) File "D:\ProgramData\Anaconda3\envs\muse-as-a-service\lib\site-packages\flask_restful_init_.py", line 271, in error_router return original_handler(e) File "D:\ProgramData\Anaconda3\envs\muse-as-a-service\lib\site-packages\flask\app.py", line 1516, in full_dispatch_request rv = self.dispatch_request() File "D:\ProgramData\Anaconda3\envs\muse-as-a-service\lib\site-packages\flask\app.py", line 1502, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args) File "D:\ProgramData\Anaconda3\envs\muse-as-a-service\lib\site-packages\flask_restful_init_.py", line 467, in wrapper resp = resource(*args, **kwargs) File "D:\ProgramData\Anaconda3\envs\muse-as-a-service\lib\site-packages\flask\views.py", line 84, in view return current_app.ensure_sync(self.dispatch_request)(*args, **kwargs) File "D:\ProgramData\Anaconda3\envs\muse-as-a-service\lib\site-packages\flask_restful_init_.py", line 582, in dispatch_request resp = meth(*args, **kwargs) File "D:\ProgramData\Anaconda3\envs\muse-as-a-service\lib\site-packages\flask_jwt_extended\view_decorators.py", line 127, in decorator return current_app.ensure_sync(fn)(*args, **kwargs) File "F:\repos3\muse-as-service\muse-as-service\src\muse_as_service\endpoints.py", line 56, in get embedding = self.embedder(args["sentence"]).numpy().tolist() File "D:\ProgramData\Anaconda3\envs\muse-as-a-service\lib\site-packages\keras\engine\base_layer.py", line 1037, in call outputs = call_fn(inputs, *args, **kwargs) File "D:\ProgramData\Anaconda3\envs\muse-as-a-service\lib\site-packages\tensorflow_hub\keras_layer.py", line 229, in call result = f() File "D:\ProgramData\Anaconda3\envs\muse-as-a-service\lib\site-packages\tensorflow\python\saved_model\load.py", line 664, in _call_attribute return instance.call(*args, **kwargs) File "D:\ProgramData\Anaconda3\envs\muse-as-a-service\lib\site-packages\tensorflow\python\eager\def_function.py", line 885, in call result = self._call(*args, **kwds) File "D:\ProgramData\Anaconda3\envs\muse-as-a-service\lib\site-packages\tensorflow\python\eager\def_function.py", line 957, in _call filtered_flat_args, self._concrete_stateful_fn.captured_inputs) # pylint: disable=protected-access File "D:\ProgramData\Anaconda3\envs\muse-as-a-service\lib\site-packages\tensorflow\python\eager\function.py", line 1964, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File "D:\ProgramData\Anaconda3\envs\muse-as-a-service\lib\site-packages\tensorflow\python\eager\function.py", line 596, in call ctx=ctx) File "D:\ProgramData\Anaconda3\envs\muse-as-a-service\lib\site-packages\tensorflow\python\eager\execute.py", line 60, in quick_execute inputs, attrs, num_outputs) tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[32851,782,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node StatefulPartitionedCall/StatefulPartitionedCall/EncoderTransformer/Transformer/SparseTransformerEncode/Layer_0/SelfAttention/SparseMultiheadAttention/ComputeQKV/ScatterNd}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

         [[StatefulPartitionedCall/StatefulPartitionedCall/EncoderTransformer/Transformer/layer_prepostprocess/layer_norm/add_1/_128]]
    

    Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

    (1) Resource exhausted: OOM when allocating tensor with shape[32851,782,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node StatefulPartitionedCall/StatefulPartitionedCall/EncoderTransformer/Transformer/SparseTransformerEncode/Layer_0/SelfAttention/SparseMultiheadAttention/ComputeQKV/ScatterNd}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

    question 
    opened by jiangweiatgithub 3
  • slow response from service

    slow response from service

    I have been comparing the efficency between the muse as service and the original "hub.load" method, and see a noticeable slow reponse in the former, both running separately on my Quadro RTX 5000. Can I safely assume this slowness is due to the very nature of the web service? If so, is there any way to improve it?

    invalid 
    opened by jiangweiatgithub 1
Releases(v1.1.2)
Owner
Dani El-Ayyass
Senior NLP Engineer @ Sber AI, Master Student in Applied Mathematics and Computer Science @ CMC MSU
Dani El-Ayyass
Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.

The KLEJ Benchmark Baselines The KLEJ benchmark (Kompleksowa Lista Ewaluacji Językowych) is a set of nine evaluation tasks for the Polish language und

Allegro Tech 17 Oct 18, 2022
VD-BERT: A Unified Vision and Dialog Transformer with BERT

VD-BERT: A Unified Vision and Dialog Transformer with BERT PyTorch Code for the following paper at EMNLP2020: Title: VD-BERT: A Unified Vision and Dia

Salesforce 44 Nov 01, 2022
Converts text into a PDF of handwritten notes

Text To Handwritten Notes Converts text into a PDF of handwritten notes Explore the docs » · Report Bug · Request Feature · Steps: $ git clone https:/

UVSinghK 63 Oct 09, 2022
Edge-Augmented Graph Transformer

Edge-augmented Graph Transformer Introduction This is the official implementation of the Edge-augmented Graph Transformer (EGT) as described in https:

Md Shamim Hussain 21 Dec 14, 2022
Analyse japanese ebooks using MeCab to determine the difficulty level for japanese learners

japanese-ebook-analysis This aim of this project is to make analysing the contents of a japanese ebook easy and streamline the process for non-technic

Christoffer Aakre 14 Jul 23, 2022
Proquabet - Convert your prose into proquints and then you essentially have Vogon poetry

Proquabet Turn your prose into a constant stream of encrypted and meaningless-so

Milo Fultz 2 Oct 10, 2022
Natural Language Processing with transformers

we want to create a repo to illustrate usage of transformers in chinese

Datawhale 763 Dec 27, 2022
A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any other format

RITA DSL This is a language, loosely based on language Apache UIMA RUTA, focused on writing manual language rules, which compiles into either spaCy co

Šarūnas Navickas 60 Sep 26, 2022
CCF BDCI BERT系统调优赛题baseline(Pytorch版本)

CCF BDCI BERT系统调优赛题baseline(Pytorch版本) 此版本基于Pytorch后端的huggingface进行实现。由于此实现使用了Oneflow的dataloader作为数据读入的方式,因此也需要安装Oneflow。其它框架的数据读取可以参考OneflowDataloade

Ziqi Zhou 9 Oct 13, 2022
Implementing SimCSE(paper, official repository) using TensorFlow 2 and KR-BERT.

KR-BERT-SimCSE Implementing SimCSE(paper, official repository) using TensorFlow 2 and KR-BERT. Training Unsupervised python train_unsupervised.py --mi

Jeong Ukjae 27 Dec 12, 2022
My Implementation for the paper EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks using Tensorflow

Easy Data Augmentation Implementation This repository contains my Implementation for the paper EDA: Easy Data Augmentation Techniques for Boosting Per

Aflah 9 Oct 31, 2022
A Word Level Transformer layer based on PyTorch and 🤗 Transformers.

Transformer Embedder A Word Level Transformer layer based on PyTorch and 🤗 Transformers. How to use Install the library from PyPI: pip install transf

Riccardo Orlando 27 Nov 20, 2022
Code for ACL 2021 main conference paper "Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances".

Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances This repository contains the code and pre-trained mode

ICTNLP 90 Dec 27, 2022
Abhijith Neil Abraham 2 Nov 05, 2021
Precision Medicine Knowledge Graph (PrimeKG)

PrimeKG Website | bioRxiv Paper | Harvard Dataverse Precision Medicine Knowledge Graph (PrimeKG) presents a holistic view of diseases. PrimeKG integra

Machine Learning for Medicine and Science @ Harvard 103 Dec 10, 2022
Code for "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022.

README Code for Two-stage Identifier: "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022. For details of the model a

Yongliang Shen 45 Nov 29, 2022
Text-to-Speech for Belarusian language

title emoji colorFrom colorTo sdk app_file pinned Belarusian TTS 🐸 green green gradio app.py false Belarusian TTS 📢 🤖 Belarusian TTS (text-to-speec

Yurii Paniv 1 Nov 27, 2021
LightSeq: A High-Performance Inference Library for Sequence Processing and Generation

LightSeq is a high performance inference library for sequence processing and generation implemented in CUDA. It enables highly efficient computation of modern NLP models such as BERT, GPT2, Transform

Bytedance Inc. 2.5k Jan 03, 2023
Understand Text Summarization and create your own summarizer in python

Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document. Technologies that can make a coherent

Sreekanth M 1 Oct 18, 2022
A Python 3.6+ package to run .many files, where many programs written in many languages may exist in one file.

RunMany Intro | Installation | VSCode Extension | Usage | Syntax | Settings | About A tool to run many programs written in many languages from one fil

6 May 22, 2022