Unified API to facilitate usage of pre-trained "perceptor" models, a la CLIP

Overview

mmc

installation

git clone https://github.com/dmarx/Multi-Modal-Comparators
cd 'Multi-Modal-Comparators'
pip install poetry
poetry build
pip install dist/mmc*.whl

# optional final step:
#poe napm_installs
python src/mmc/napm_installs/__init__.py

To see which models are immediately available, run:

python -m mmc.loaders

That optional poe napm_installs step

For the most convenient experience, it is recommended that you perform the final poe napm_installs step. Omitting this step will make your one-time setup faster, but will make certain use cases more complex.

If you did not perform the optional poe napm_installs step, you likely received several warnings about models whose loaders could not be registered. These are models whose codebases depend on python code which is not trivially installable. You will still have access to all of the models supported by the library as if you had run the last step, but their loaders will not be queryable from the registry (see below) and will need to be loaded via the appropriate mmc.loader directly, which may be non-trivial to identify without the ability to query it from mmc's registry.

As a concrete example, if the napm step is skipped, the model [cloob - corwsonkb - cloob_laion_400m_vit_b_16_32_epochs] will not appear in the list of registered loaders, but can still be loaded like this:

from mmc.loaders import KatCloobLoader

model = KatCloobLoader(id='cloob_laion_400m_vit_b_16_32_epochs').load()

Invoking the load() method on an unregistered loader will invoke napm to prepare any uninstallable dependencies required to load the model. Next time you run python -m mmc.loaders, the CLOOB loader will show as registered and spinning up the registry will longer emit a warning for that model.

Usage

TLDR

# spin up the registry
from mmc import loaders

from mmc.mock.openai import MockOpenaiClip
from mmc.registry import REGISTRY

cloob_query = {architecture='cloob'}
cloob_loaders = REGISTRY.find(**cloob_query)

# loader repl prints attributes for uniquely querying
print(cloob_loaders)

# loader returns a perceptor whose API is standardized across mmc
cloob_model = cloob_loaders[0].load()

# wrapper classes are provided for mocking popular implementations
# to facilitate drop-in compatibility with existing code
drop_in_replacement__cloob_model = MockOpenaiClip(cloob_model)

Querying the Model Registry

Spin up the model registry by importing the loaders module:

from mmc import loaders

To see which models are available:

from mmc.registry import REGISTRY

for loader in REGISTRY.find():
    print(loader)

You can constrain the result set by querying the registry for specific metadata attributes

# all CLIP models
clip_loaders = REGISTRY.find(architecture='clip')

# CLIP models published by openai
openai_clip_loaders = REGISTRY.find(architecture='clip', publisher='openai')

# All models published by MLFoundations (openCLIP)
mlf_loaders = REGISTRY.find(publisher='mlfoundations)'

# A specific model
rn50_loader = REGISTRY.find(architecture='clip', publisher='openai', id='RN50')
# NB: there may be multiple models matching a particular "id". the 'id' field
# only needs to be unique for a given architecture-publisher pair.

All pretrained checkpoints are uniquely identifiable by a combination of architecture, publisher, and id.

The above queries return lists of loader objects. If model artifacts (checkpoints, config) need to be downloaded, they will only be downloaded after the load() method on the loader is invoked.

loaders = REGISTRY.find(...)
loader = loaders[0] # just picking an arbitrary return value here, remember: loaders is a *list* of loaders
model = loader.load()

The load() method returns an instance of an mmc.MultiModalComparator. The MultiModalComparator class is a modality-agnostic abstraction. I'll get to the ins and outs of that another time.

API Mocking

You want something you can just drop into your code and it'll work. We got you. This library provides wrapper classes to mock the APIs of commonly used CLIP implementations. To wrap a MultiModalComparator so it can be used as a drop-in replacement with code compatible with OpenAI's CLIP:

from mmc.mock.openai import MockOpenaiClip

my_model = my_model_loader.load()
model = MockOpenaiClip(my_model)

MultiMMC: Multi-Perceptor Implementation

The MultiMMC class can be used to run inference against multiple mmc models in parallel. This form of ensemble is sometimes referred to as a "multi-perceptor".

To ensure that all models loaded into the MultiMMC are compatible, the MultiMMC instance is initialized by specifying the modalities it supports. We'll discuss modality objects in a bit.

from mmc.multimmc import MultiMMC
from mmc.modalities import TEXT, IMAGE

perceptor = MultiMMC(TEXT, IMAGE)

To load and use a model:

perceptor.load_model(
    architecture='clip', 
    publisher='openai', 
    id='RN50',
)

score = perceptor.compare(
    image=PIL.Image.open(...), 
    text=text_pos),
)

Additional models can be added to the ensemble via the load_model() method.

The MultiMMC does not support API mocking because of its reliance on the compare method.

Available Pre-trained Models

Some model comparisons here

# [<architecture> - <publisher> - <id>]
[clip - openai - RN50]
[clip - openai - RN101]
[clip - openai - RN50x4]
[clip - openai - RN50x16]
[clip - openai - RN50x64]
[clip - openai - ViT-B/32]
[clip - openai - ViT-B/16]
[clip - openai - ViT-L/14]
[clip - openai - ViT-L/[email protected]]
[clip - mlfoundations - RN50--openai]
[clip - mlfoundations - RN50--yfcc15m]
[clip - mlfoundations - RN50--cc12m]
[clip - mlfoundations - RN50-quickgelu--openai]
[clip - mlfoundations - RN50-quickgelu--yfcc15m]
[clip - mlfoundations - RN50-quickgelu--cc12m]
[clip - mlfoundations - RN101--openai]
[clip - mlfoundations - RN101--yfcc15m]
[clip - mlfoundations - RN101-quickgelu--openai]
[clip - mlfoundations - RN101-quickgelu--yfcc15m]
[clip - mlfoundations - RN50x4--openai]
[clip - mlfoundations - RN50x16--openai]
[clip - mlfoundations - ViT-B-32--openai]
[clip - mlfoundations - ViT-B-32--laion400m_e31]
[clip - mlfoundations - ViT-B-32--laion400m_e32]
[clip - mlfoundations - ViT-B-32--laion400m_avg]
[clip - mlfoundations - ViT-B-32-quickgelu--openai]
[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_e31]
[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_e32]
[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_avg]
[clip - mlfoundations - ViT-B-16--openai]
[clip - mlfoundations - ViT-L-14--openai]
[clip - sbert - ViT-B-32-multilingual-v1]
[clip - sajjjadayobi - clipfa]

# The following models depend on napm for setup
[clip - navervision - kelip_ViT-B/32]
[cloob - crowsonkb - cloob_laion_400m_vit_b_16_16_epochs]
[cloob - crowsonkb - cloob_laion_400m_vit_b_16_32_epochs]
[clip - facebookresearch - clip_small_25ep]
[clip - facebookresearch - clip_base_25ep]
[clip - facebookresearch - clip_large_25ep]
[slip - facebookresearch - slip_small_25ep]
[slip - facebookresearch - slip_small_50ep]
[slip - facebookresearch - slip_small_100ep]
[slip - facebookresearch - slip_base_25ep]
[slip - facebookresearch - slip_base_50ep]
[slip - facebookresearch - slip_base_100ep]
[slip - facebookresearch - slip_large_25ep]
[slip - facebookresearch - slip_large_50ep]
[slip - facebookresearch - slip_large_100ep]
[simclr - facebookresearch - simclr_small_25ep]
[simclr - facebookresearch - simclr_base_25ep]
[simclr - facebookresearch - simclr_large_25ep]
[clip - facebookresearch - clip_base_cc3m_40ep]
[clip - facebookresearch - clip_base_cc12m_35ep]
[slip - facebookresearch - slip_base_cc3m_40ep]
[slip - facebookresearch - slip_base_cc12m_35ep]

VRAM Cost

The following is an estimate of the amount of space the loaded model occupies in memory:

publisher architecture model_name vram_mb
0 openai clip RN50 358
1 openai clip RN101 294
2 openai clip RN50x4 424
3 openai clip RN50x16 660
4 openai clip RN50x64 1350
5 openai clip ViT-B/32 368
6 openai clip ViT-B/16 348
7 openai clip ViT-L/14 908
8 openai clip ViT-L/[email protected] 908
9 mlfoundations clip RN50--openai 402
10 mlfoundations clip RN50--yfcc15m 402
11 mlfoundations clip RN50--cc12m 402
12 mlfoundations clip RN50-quickgelu--openai 402
13 mlfoundations clip RN50-quickgelu--yfcc15m 402
14 mlfoundations clip RN50-quickgelu--cc12m 402
15 mlfoundations clip RN101--openai 476
16 mlfoundations clip RN101--yfcc15m 476
17 mlfoundations clip RN101-quickgelu--openai 476
18 mlfoundations clip RN101-quickgelu--yfcc15m 476
19 mlfoundations clip RN50x4--openai 732
20 mlfoundations clip RN50x16--openai 1200
21 mlfoundations clip ViT-B-32--openai 634
22 mlfoundations clip ViT-B-32--laion400m_e31 634
23 mlfoundations clip ViT-B-32--laion400m_e32 634
24 mlfoundations clip ViT-B-32--laion400m_avg 634
25 mlfoundations clip ViT-B-32-quickgelu--openai 634
26 mlfoundations clip ViT-B-32-quickgelu--laion400m_e31 634
27 mlfoundations clip ViT-B-32-quickgelu--laion400m_e32 634
28 mlfoundations clip ViT-B-32-quickgelu--laion400m_avg 634
29 mlfoundations clip ViT-B-16--openai 634
30 mlfoundations clip ViT-L-14--openai 1688
32 sajjjadayobi clip clipfa 866
33 crowsonkb cloob cloob_laion_400m_vit_b_16_16_epochs 610
34 crowsonkb cloob cloob_laion_400m_vit_b_16_32_epochs 610
36 facebookresearch slip slip_small_25ep 728
37 facebookresearch slip slip_small_50ep 650
38 facebookresearch slip slip_small_100ep 650
39 facebookresearch slip slip_base_25ep 714
40 facebookresearch slip slip_base_50ep 714
41 facebookresearch slip slip_base_100ep 714
42 facebookresearch slip slip_large_25ep 1534
43 facebookresearch slip slip_large_50ep 1522
44 facebookresearch slip slip_large_100ep 1522
45 facebookresearch slip slip_base_cc3m_40ep 714
46 facebookresearch slip slip_base_cc12m_35ep 714

Contributing

Suggest a pre-trained model

If you would like to suggest a pre-trained model for future addition, you can add a comment to this issue

Add a pre-trained model

  1. Create a loader class that encapsulates the logic for importing the model, loading weights, preprocessing inputs, and performing projections.
  2. At the bottom of the file defining the loader class should be a code snippet that adds each respective checkpoint's loader to the registry.
  3. Add an import for the new file to mmc/loaders/__init__.py. The imports in this file are the reason import mmc.loaders "spins up" the registry.
  4. If the codebase on which the model depends can be installed, update pytproject.toml to install it.
  5. Otherwise, add napm preparation at the top of the loaders load method (see cloob or kelip for examples), and also add napm setup to mmc/napm_installs/__init__.py
  6. Add a test case to tests/test_mmc_loaders.py
  7. Add a test script for the loader (see test_mmc_katcloob as an example)
Owner
David Marx
Engineer / Machine Learning Researcher interested in deep learning, probabilistic ML, generative models, multi-modal SSL, visual understanding, geometric
David Marx
ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information

ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information This repository contains code, model, dataset for ChineseBERT at ACL2021. Ch

413 Dec 01, 2022
Official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition" in AAAI2022.

AimCLR This is an official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Reco

Gty 44 Dec 17, 2022
Implementation of the federated dual coordinate descent (FedDCD) method.

FedDCD.jl Implementation of the federated dual coordinate descent (FedDCD) method. Installation To install, just call Pkg.add("https://github.com/Zhen

Zhenan Fan 6 Sep 21, 2022
Example how to deploy deep learning model with aiohttp.

aiohttp-demos Demos for aiohttp project. Contents Imagetagger Deep Learning Image Classifier URL shortener Toxic Comments Classifier Moderator Slack B

aio-libs 661 Jan 04, 2023
SuRE Evaluation: A Supplementary Material

SuRE Evaluation: A Supplementary Material This repository contains supplementary material regarding the evaluations presented in the paper Visual Expl

NYU Visualization Lab 0 Dec 14, 2021
《Train in Germany, Test in The USA: Making 3D Object Detectors Generalize》(CVPR 2020)

Train in Germany, Test in The USA: Making 3D Object Detectors Generalize This paper has been accpeted by Conference on Computer Vision and Pattern Rec

Xiangyu Chen 101 Jan 02, 2023
Neural Point-Based Graphics

Neural Point-Based Graphics Project   Video   Paper Neural Point-Based Graphics Kara-Ali Aliev1 Artem Sevastopolsky1,2 Maria Kolos1,2 Dmitry Ulyanov3

Ali Aliev 252 Dec 13, 2022
Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Noise Contrastive Estimation for pyTorch Overview This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemen

Denis Emelin 42 Nov 24, 2022
PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks

Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)

Wenwen Yu 498 Dec 24, 2022
You are AllSet: A Multiset Function Framework for Hypergraph Neural Networks.

AllSet This is the repo for our paper: You are AllSet: A Multiset Function Framework for Hypergraph Neural Networks. We prepared all codes and a subse

Jianhao 51 Dec 24, 2022
ARAE-Tensorflow for Discrete Sequences (Adversarially Regularized Autoencoder)

ARAE Tensorflow Code Code for the paper Adversarially Regularized Autoencoders for Generating Discrete Structures by Zhao, Kim, Zhang, Rush and LeCun

19 Nov 12, 2021
A repo for Causal Imitation Learning under Temporally Correlated Noise

CausIL A repo for Causal Imitation Learning under Temporally Correlated Noise. Running Experiments To re-train an expert, run: python experts/train_ex

Gokul Swamy 5 Nov 01, 2022
3D ResNet Video Classification accelerated by TensorRT

Activity Recognition TensorRT Perform video classification using 3D ResNets trained on Kinetics-400 dataset and accelerated with TensorRT P.S Click on

Akash James 39 Nov 21, 2022
Multi-modal co-attention for drug-target interaction annotation and Its Application to SARS-CoV-2

CoaDTI Multi-modal co-attention for drug-target interaction annotation and Its Application to SARS-CoV-2 Abstract Environment The test was conducted i

Layne_Huang 7 Nov 14, 2022
Official PyTorch implementation of "Improving Face Recognition with Large AgeGaps by Learning to Distinguish Children" (BMVC 2021)

Inter-Prototype (BMVC 2021): Official Project Webpage This repository provides the official PyTorch implementation of the following paper: Improving F

Jungsoo Lee 16 Jun 30, 2022
Using machine learning to predict and analyze high and low reader engagement for New York Times articles posted to Facebook.

How The New York Times can increase Engagement on Facebook Using machine learning to understand characteristics of news content that garners "high" Fa

Jessica Miles 0 Sep 16, 2021
PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation (TPAMI).

PFENet This is the implementation of our paper PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation that has been accepted to IEE

DV Lab 230 Dec 31, 2022
Official code for "Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes", CVPR2022

[CVPR 2022] Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes Dongkwon Jin, Wonhui Park, Seong-Gyun Jeong, Heeyeon Kwon, and Cha

Dongkwon Jin 106 Dec 29, 2022
Task Transformer Network for Joint MRI Reconstruction and Super-Resolution (MICCAI 2021)

T2Net Task Transformer Network for Joint MRI Reconstruction and Super-Resolution (MICCAI 2021) [Paper][Code] Dependencies numpy==1.18.5 scikit_image==

64 Nov 23, 2022
Editing a classifier by rewriting its prediction rules

This repository contains the code and data for our paper: Editing a classifier by rewriting its prediction rules Shibani Santurkar*, Dimitris Tsipras*

Madry Lab 86 Dec 27, 2022