Accelerated NLP pipelines for fast inference on CPU and GPU. Built with Transformers, Optimum and ONNX Runtime.

Overview

Optimum Transformers

Tests License PyPI

Accelerated NLP pipelines for fast inference 🚀 on CPU and GPU. Built with 🤗 Transformers, Optimum and ONNX runtime.

Installation:

With PyPI:

pip install optimum-transformers

Or directly from GitHub:

pip install git+https://github.com/AlekseyKorshuk/optimum-transformers

Usage:

The pipeline API is similar to transformers pipeline with just a few differences which are explained below.

Just provide the path/url to the model, and it'll download the model if needed from the hub and automatically create onnx graph and run inference.

from optimum_transformers import pipeline

# Initialize a pipeline by passing the task name and 
# set onnx to True (default value is also True)
nlp = pipeline("sentiment-analysis", use_onnx=True)
nlp("Transformers and onnx runtime is an awesome combo!")
# [{'label': 'POSITIVE', 'score': 0.999721109867096}]  

Or provide a different model using the model argument.

from optimum_transformers import pipeline

nlp = pipeline("question-answering", model="deepset/roberta-base-squad2", use_onnx=True)
nlp(question="What is ONNX Runtime ?",
         context="ONNX Runtime is a highly performant single inference engine for multiple platforms and hardware")
# {'answer': 'highly performant single inference engine for multiple platforms and hardware', 'end': 94,
# 'score': 0.751201868057251, 'start': 18}
from optimum_transformers import pipeline

nlp = pipeline("ner", model="mys/electra-base-turkish-cased-ner", use_onnx=True, optimize=True,
                    grouped_entities=True)
nlp("adana kebap ülkemizin önemli lezzetlerinden biridir.")
# [{'entity_group': 'B-food', 'score': 0.869149774312973, 'word': 'adana kebap'}]

Set use_onnx to False for standard torch inference. Set optimize to True for quantize with ONNX. ( set use_onnx to True)

Supported pipelines

You can create Pipeline objects for the following down-stream tasks:

  • feature-extraction: Generates a tensor representation for the input sequence
  • ner and token-classification: Generates named entity mapping for each word in the input sequence.
  • sentiment-analysis: Gives the polarity (positive / negative) of the whole input sequence. Can be used for any text classification model.
  • question-answering: Provided some context and a question referring to the context, it will extract the answer to the question in the context.
  • text-classification: Classifies sequences according to a given number of classes from training.
  • zero-shot-classification: Classifies sequences according to a given number of classes directly in runtime.
  • fill-mask: The task of masking tokens in a sequence with a masking token, and prompting the model to fill that mask with an appropriate token.
  • text-generation: The task of generating text according to the previous text provided.

Calling the pipeline for the first time loads the model, creates the onnx graph, and caches it for future use. Due to this, the first load will take some time. Subsequent calls to the same model will load the onnx graph automatically from the cache.

Benchmarks

Note: For some reason, onnx is slow on colab notebook, so you won't notice any speed-up there. Benchmark it on your own hardware.

Check our example of benchmarking: example.

For detailed benchmarks and other information refer to this blog post and notebook.

Note: These results were collected on my local machine. So if you have high performance machine to benchmark, please contact me.

Benchmark sentiment-analysis pipeline

Benchmark zero-shot-classification pipeline

Benchmark token-classification pipeline

Benchmark question-answering pipeline

Benchmark fill-mask pipeline

About

Built by Aleksey Korshuk

Follow

Follow

Follow

🚀 If you want to contribute to this project OR create something cool together — contact me: link

Star this repository:

GitHub stars

Resources

You might also like...
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

ONNX Runtime is a cross-platform inference and training machine-learning accelerator. ONNX Runtime inference can enable faster customer experiences an

Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.
Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.

scc4onnx Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel

A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for ONNX.
A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for ONNX.

sam4onnx A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for

A repository that shares tuning results of trained models generated by TensorFlow / Keras. Post-training quantization (Weight Quantization, Integer Quantization, Full Integer Quantization, Float16 Quantization), Quantization-aware training. TensorFlow Lite. OpenVINO. CoreML. TensorFlow.js. TF-TRT. MediaPipe. ONNX. [.tflite,.h5,.pb,saved_model,tfjs,tftrt,mlmodel,.xml/.bin, .onnx] ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX
ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

ONNX-PackNet-SfM: Python scripts for performing monocular depth estimation using the PackNet-SfM model in ONNX
ONNX-PackNet-SfM: Python scripts for performing monocular depth estimation using the PackNet-SfM model in ONNX

Python scripts for performing monocular depth estimation using the PackNet-SfM model in ONNX

A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB, or simply to separate onnx files to any size you want.
A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB, or simply to separate onnx files to any size you want.

sne4onnx A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB, or

Simple ONNX operation generator. Simple Operation Generator for ONNX.
Simple ONNX operation generator. Simple Operation Generator for ONNX.

sog4onnx Simple ONNX operation generator. Simple Operation Generator for ONNX. https://github.com/PINTO0309/simple-onnx-processing-tools Key concept V

Simple tool to combine(merge) onnx models.  Simple Network Combine Tool for ONNX.
Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX.

snc4onnx Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX. https://github.com/PINTO0309/simple-onnx-processing-tools 1.

Releases(v0.2.1-upd)
Owner
Aleksey Korshuk
Aleksey Korshuk
Direct LiDAR Odometry: Fast Localization with Dense Point Clouds

Direct LiDAR Odometry: Fast Localization with Dense Point Clouds DLO is a lightweight and computationally-efficient frontend LiDAR odometry solution w

VECTR at UCLA 369 Dec 30, 2022
Everything's Talkin': Pareidolia Face Reenactment (CVPR2021)

Everything's Talkin': Pareidolia Face Reenactment (CVPR2021) Linsen Song, Wayne Wu, Chaoyou Fu, Chen Qian, Chen Change Loy, and Ran He [Paper], [Video

71 Dec 21, 2022
This repository contains the DendroMap implementation for scalable and interactive exploration of image datasets in machine learning.

DendroMap DendroMap is an interactive tool to explore large-scale image datasets used for machine learning. A deep understanding of your data can be v

DIV Lab 33 Dec 30, 2022
A modular active learning framework for Python

Modular Active Learning framework for Python3 Page contents Introduction Active learning from bird's-eye view modAL in action From zero to one in a fe

modAL 1.9k Dec 31, 2022
Official implementation of Sparse Transformer-based Action Recognition

STAR Official implementation of S parse T ransformer-based A ction R ecognition Dataset download NTU RGB+D 60 action recognition of 2D/3D skeleton fro

Chonghan_Lee 15 Nov 02, 2022
Automatic meme generation model using Tensorflow Keras.

Memefly You can find the project at MemeflyAI. Contributors Nick Buukhalter Harsh Desai Han Lee Project Overview Trello Board Product Canvas Automatic

BloomTech Labs 2 Jan 13, 2022
PyTorch implementation of Neural Combinatorial Optimization with Reinforcement Learning.

neural-combinatorial-rl-pytorch PyTorch implementation of Neural Combinatorial Optimization with Reinforcement Learning. I have implemented the basic

Patrick E. 454 Jan 06, 2023
Unofficial implementation of Point-Unet: A Context-Aware Point-Based Neural Network for Volumetric Segmentation

Point-Unet This is an unofficial implementation of the MICCAI 2021 paper Point-Unet: A Context-Aware Point-Based Neural Network for Volumetric Segment

Namt0d 9 Dec 07, 2022
IA for recognising Traffic Signs using Keras [Tensorflow]

Traffic Signs Recognition ⚠️ 🚦 Fundamentals of Intelligent Systems Introduction 📄 Development of a neural network capable of recognizing nine differ

Sebastián Fernández García 2 Dec 19, 2022
Pytorch reimplementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)

Vision Transformer Pytorch reimplementation of Google's repository for the ViT model that was released with the paper An Image is Worth 16x16 Words: T

Eunkwang Jeon 1.4k Dec 28, 2022
Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information by Masato Tamura, Hiroki Ohashi, and Tomoaki Yosh

105 Dec 23, 2022
Cave Generation using metaballs in Blender. Originally created by sdfgeoff, Edited by Myself (Archie Jaskowicz).

Blender-Cave-Generation Cave Generation using metaballs in Blender. Originally created by sdfgeoff, Edited by Myself (Archie Jaskowicz). Installation

2 Dec 28, 2022
Unofficial implementation of HiFi-GAN+ from the paper "Bandwidth Extension is All You Need" by Su, et al.

HiFi-GAN+ This project is an unoffical implementation of the HiFi-GAN+ model for audio bandwidth extension, from the paper Bandwidth Extension is All

Brent M. Spell 134 Dec 30, 2022
Caffe-like explicit model constructor. C(onfig)Model

cmodel Caffe-like explicit model constructor. C(onfig)Model Installation pip install git+https://github.com/bonlime/cmodel Usage In order to allow usi

1 Feb 18, 2022
Negative Interactions for Improved Collaborative Filtering:

Negative Interactions for Improved Collaborative Filtering: Don’t go Deeper, go Higher This notebook provides an implementation in Python 3 of the alg

Harald Steck 21 Mar 05, 2022
A machine learning library for spiking neural networks. Supports training with both torch and jax pipelines, and deployment to neuromorphic hardware.

Rockpool Rockpool is a Python package for developing signal processing applications with spiking neural networks. Rockpool allows you to build network

SynSense 21 Dec 14, 2022
A PoC Corporation Relationship Knowledge Graph System on top of Nebula Graph.

Corp-Rel is a PoC of Corpartion Relationship Knowledge Graph System. It's built on top of the Open Source Graph Database: Nebula Graph with a dataset

Wey Gu 20 Dec 11, 2022
Implementation of Diverse Semantic Image Synthesis via Probability Distribution Modeling

Diverse Semantic Image Synthesis via Probability Distribution Modeling (CVPR 2021) Paper Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu,

tzt 45 Nov 17, 2022
Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019)

Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019) Introduction Official implementation of Adaptive Pyramid Context Network

21 Nov 09, 2022
Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

Mingrui Yu 3 Jan 07, 2022