It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

Overview

CLIP-ONNX

It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

Usage

Install clip-onnx module and requirements first. Use this trick

!pip install git+https://github.com/Lednik7/CLIP-ONNX.git

Example in 3 steps

  1. Download CLIP image from repo
!wget -c -O CLIP.png https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true
  1. Load standard CLIP model, image, text on cpu
import clip
from PIL import Image

# onnx cannot work with cuda
model, preprocess = clip.load("ViT-B/32", device="cpu", jit=False)
# batch first
image = preprocess(Image.open("CLIP.png")).unsqueeze(0) # [1, 3, 224, 224]
text = clip.tokenize(["a diagram", "a dog", "a cat"]) # [3, 77]
  1. Create CLIP-ONNX object to convert model to onnx
from clip_onnx import clip_onnx, attention
clip.model.ResidualAttentionBlock.attention = attention

visual_path = "clip_visual.onnx"
textual_path = "clip_textual.onnx"

# ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
onnx_model = clip_onnx(model, providers=["CPUExecutionProvider"], # cpu mode
                       visual_path=visual_path, textual_path=textual_path)
onnx_model.convert2onnx(image, text, verbose=True)
onnx_model.start_sessions()
  1. Use for standard CLIP API. Batch inference
image_features = onnx_model.encode_image(image)
text_features = onnx_model.encode_text(text)

logits_per_image, logits_per_text = onnx_model(image, text)
probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs)  # prints: [[0.41456965 0.29270944 0.29272085]]

Enjoy the speed

Examples

See examples folder for more details
Some parts of the code were taken from the post. Thank you neverix for this notebook.

Comments
  • Can't use CUDAExecutionProvider

    Can't use CUDAExecutionProvider

    Hey, I'm trying to use the code on GPU and I encountered 2 problems:

    1. when running pip install git+https://github.com/Lednik7/CLIP-ONNX.git I got the following error (tried on multiple machines): ERROR: Could not find a version that satisfies the requirement torch==1.10.0+cu111 (from clip-onnx)

    I fixed it by installing that version of torch by myself. with pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html, and then running the rest of the installation.

    1. After I installed the package, I tried to run the example in the readme with CPUExecutionProvider and it worked fine, but when I'm trying to run it on GPU with CUDAExecutionProvider I get the following error message (again on different machines):

    2022-01-31 20:57:03.234399301 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:535 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met. 2022-01-31 20:57:03.872349008 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:535 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

    I can't figure out what is the problem. Any help?

    opened by YoadTew 13
  • Performance is inconsistent with the original model

    Performance is inconsistent with the original model

    Hi, thanks for providing this useful tool! However, I found that the result produced by the generated ONNX model is inconsistent with the original CLIP model. Here is the code I used to test the original model:

    model, preprocess = clip.load("ViT-B/32", device="cpu", jit=False)
    
    image = preprocess(Image.open("CLIP.png")).unsqueeze(0).cpu() # [1, 3, 224, 224]
    text = clip.tokenize(["a diagram", "a dog", "a cat"]).cpu() # [3, 77]
    
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    
    logits_per_image, logits_per_text = model(image, text)
    probs = logits_per_image.softmax(dim=-1).detach().cpu().numpy()
    
    print("Label probs:", probs) 
    

    The result is: Label probs: [[0.9927937 0.00421069 0.00299573]]

    However, when using the onnx model, the result is: Label probs: [[0.41456965 0.29270944 0.29272085]].

    Could you help me with this? Thanks!

    opened by Cestlaviez 5
  • Error on installing the torch version in requirements.txt

    Error on installing the torch version in requirements.txt

    pip install git+https://github.com/Lednik7/CLIP-ONNX.git

    ERROR: Could not find a version that satisfies the requirement torch==1.11.0+cu113 (from versions: 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.2.0, 1.3.0, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0)
    ERROR: No matching distribution found for torch==1.11.0+cu113
    

    python version is 3.7.13

    opened by dingusagar 2
  • ERROR: No matching distribution found for onnxruntime==1.11

    ERROR: No matching distribution found for onnxruntime==1.11

    Hi, Thanks for the great work!

    I am having this error when I try to install the package.

    ERROR: No matching distribution found for onnxruntime==1.11

    Maybe we can update the requirements.txt?

    opened by wanliAlex 1
  • Replace the operator of

    Replace the operator of "torch.einsum"

    q, k, v = (torch.einsum("tbh, oh -> tbo", x, self.attn.in_proj_weight) + self.attn.in_proj_bias).contiguous().chunk( 3, dim=-1)

    @Lednik7 Thanks for your great work on Clip-ONNX. for the pytorch operator of "torch.einsum" , if we don't want to use this operator , do you have other codes to replace this operator? this operator is not friendly to some Inference engine, like NV TensorRT, so if you have other codes to replace einsum, that will be better

    opened by zhangnju 2
Owner
Gerasimov Maxim
16 y.o. Data Scientist. Graduated by Yandex Lyceum and Tinkoff Education
Gerasimov Maxim
3DV 2021: Synergy between 3DMM and 3D Landmarks for Accurate 3D Facial Geometry

SynergyNet 3DV 2021: Synergy between 3DMM and 3D Landmarks for Accurate 3D Facial Geometry Cho-Ying Wu, Qiangeng Xu, Ulrich Neumann, CGIT Lab at Unive

Cho-Ying Wu 239 Jan 06, 2023
Generic U-Net Tensorflow implementation for image segmentation

Tensorflow Unet Warning This project is discontinued in favour of a Tensorflow 2 compatible reimplementation of this project found under https://githu

Joel Akeret 1.8k Dec 10, 2022
5 Jan 05, 2023
catch-22: CAnonical Time-series CHaracteristics

catch22 - CAnonical Time-series CHaracteristics About catch22 is a collection of 22 time-series features coded in C that can be run from Python, R, Ma

Carl H Lubba 229 Oct 21, 2022
alfred-py: A deep learning utility library for **human**

Alfred Alfred is command line tool for deep-learning usage. if you want split an video into image frames or combine frames into a single video, then a

JinTian 800 Jan 03, 2023
PyTorch implementation of HDN(Homography Decomposition Networks) for planar object tracking

Homography Decomposition Networks for Planar Object Tracking This project is the offical PyTorch implementation of HDN(Homography Decomposition Networ

CaptainHook 48 Dec 15, 2022
Style transfer, deep learning, feature transform

FastPhotoStyle License Copyright (C) 2018 NVIDIA Corporation. All rights reserved. Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons

NVIDIA Corporation 10.9k Jan 02, 2023
[CVPR 2020] Interpreting the Latent Space of GANs for Semantic Face Editing

InterFaceGAN - Interpreting the Latent Space of GANs for Semantic Face Editing Figure: High-quality facial attributes editing results with InterFaceGA

GenForce: May Generative Force Be with You 1.3k Dec 29, 2022
PyTorch implementation of SCAFFOLD (Stochastic Controlled Averaging for Federated Learning, ICML 2020).

Scaffold-Federated-Learning PyTorch implementation of SCAFFOLD (Stochastic Controlled Averaging for Federated Learning, ICML 2020). Environment numpy=

KI 30 Dec 29, 2022
PyTorch implementation of Interpretable Explanations of Black Boxes by Meaningful Perturbation

PyTorch implementation of Interpretable Explanations of Black Boxes by Meaningful Perturbation The paper: https://arxiv.org/abs/1704.03296 What makes

Jacob Gildenblat 322 Dec 17, 2022
Generate Cartoon Images using Generative Adversarial Network

AvatarGAN ✨ Generate Cartoon Images using DC-GAN Deep Convolutional GAN is a generative adversarial network architecture. It uses a couple of guidelin

Aakash Jhawar 50 Dec 29, 2022
an implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch

This work has now been superseded by: https://github.com/sniklaus/revisiting-sepconv sepconv-slomo This is a reference implementation of Video Frame I

Simon Niklaus 985 Jan 08, 2023
GrailQA: Strongly Generalizable Question Answering

GrailQA is a new large-scale, high-quality KBQA dataset with 64,331 questions annotated with both answers and corresponding logical forms in different syntax (i.e., SPARQL, S-expression, etc.). It ca

OSU DKI Lab 76 Dec 21, 2022
TYolov5: A Temporal Yolov5 Detector Based on Quasi-Recurrent Neural Networks for Real-Time Handgun Detection in Video

TYolov5: A Temporal Yolov5 Detector Based on Quasi-Recurrent Neural Networks for Real-Time Handgun Detection in Video Timely handgun detection is a cr

Mario Duran-Vega 18 Dec 26, 2022
IhoneyBakFileScan Modify - 批量网站备份文件扫描器,增加文件规则,优化内存占用

ihoneyBakFileScan_Modify 批量网站备份文件泄露扫描工具 2022.2.8 添加、修改内容 增加备份文件fuzz规则 修改备份文件大小判断

VMsec 220 Jan 05, 2023
PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.

PyAF (Python Automatic Forecasting) PyAF is an Open Source Python library for Automatic Forecasting built on top of popular data science python module

CARME Antoine 405 Jan 02, 2023
Image Processing, Image Smoothing, Edge Detection and Transforms

opevcvdl-hw1 This project uses openCV and Qt to achieve the requirements. Version Python 3.7 opencv-contrib-python 3.4.2.17 Matplotlib 3.1.1 pyqt5 5.1

Kenny Cheng 3 Aug 17, 2022
Code for Subgraph Federated Learning with Missing Neighbor Generation (NeurIPS 2021)

To run the code Unzip the package to your local directory; Run 'pip install -r requirements.txt' to download required packages; Open file ~/nips_code/

32 Dec 26, 2022
DeepSTD: Mining Spatio-temporal Disturbances of Multiple Context Factors for Citywide Traffic Flow Prediction

DeepSTD: Mining Spatio-temporal Disturbances of Multiple Context Factors for Citywide Traffic Flow Prediction This is the implementation of DeepSTD in

5 Sep 26, 2022
code for our paper "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer"

SHOT++ Code for our TPAMI submission "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer" that is ext

75 Dec 16, 2022