improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

Related tags

Deep LearningCLIP-ViL
Overview

CLIP-ViL

In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

We release the extracted features and reproducible code here.

Specifically, we develop our methods in two scenarios: (1) direct task-specific fine-tuning; and (2) Vision and Language pre-training.

CLIP-ViL-Direct/VLN

We directly plug CLIP into tasks-pecific models and finetune on three representative tasks including Visual Question Answering, Image Captioning, and Vision-Language Navigation.

Please see the corresponding code directory for full details.

Noted that in direct finetuning, for Visual Question Answering on VQA 2.0 test-dev, we are able to achieve up to 68.37% accuracy with Pythia, 74.01% accuracy with MCAN and generally more than 4.0% improvements in accuracy; For Image Captioning on Karpathy's test split of MS COCO, we got 2.1% improvements in CIDEr metric over resnet alternatives; For Navigation, On RxR, we got 5% improvements with the nDTW metric (the main metric for RxR). On R2R, we got about 6% improvements in accuracy regarding our strong baselines.

CLIP-ViL-Pretrain

In order to test the potential of combining CLIP pre-training and Vision and Language pre-training. We introduce CLIP-ViL-Pretrain, a vision-and-language model pre-trained on image-text data with CLIP visual encoder as its visual backbone. CLIP-ViL-Pretrain is pretrained on aligned image-text data with a reconstructive objective and an image-text matching objective. It is further finetuned on VQA, SNLI-VE and GQA tasks.

Please see the corresponding code directory for full details.

Noted that CLIP-ViL-Pretrain is able to achieve 76.48% accuracy on VQA 2.0 test-dev and 76.70% accuracy on test-std; 80.61% accuracy on SNLI-VE Dev and 80.20% on Test-P; 61.42% accuracy on GQA test-dev and 62.93% accuracy on test-std.

Related Links

Reference

If you use CLIP-ViL in your research or wish to refer to the baseline results published here, please use the following BibTeX entry.

@misc{shen2021clip,
    title={How Much Can CLIP Benefit Vision-and-Language Tasks?}, 
    author={Sheng Shen and Liunian Harold Li and Hao Tan and Mohit Bansal and Anna Rohrbach and Kai-Wei Chang and Zhewei Yao and Kurt Keutzer},
    year={2021},
    eprint={2107.06383},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Paper Code:A Self-adaptive Weighted Differential Evolution Approach for Large-scale Feature Selection

1. SaWDE.m is the main function 2. DataPartition.m is used to randomly partition the original data into training sets and test sets with a ratio of 7

wangxb 14 Dec 08, 2022
Implementation of Convolutional enhanced image Transformer

CeiT : Convolutional enhanced image Transformer This is an unofficial PyTorch implementation of Incorporating Convolution Designs into Visual Transfor

Rishikesh (ऋषिकेश) 82 Dec 13, 2022
Created as part of CS50 AI's coursework. This AI makes use of knowledge entailment to calculate the best probabilities to win Minesweeper.

Minesweeper-AI Created as part of CS50 AI's coursework. This AI makes use of knowledge entailment to calculate the best probabilities to win Minesweep

Beckham 0 Jul 20, 2022
The Official Repository for "Generalized OOD Detection: A Survey"

Generalized Out-of-Distribution Detection: A Survey 1. Overview This repository is with our survey paper: Title: Generalized Out-of-Distribution Detec

Jingkang Yang 338 Jan 03, 2023
Lux AI environment interface for RLlib multi-agents

Lux AI interface to RLlib MultiAgentsEnv For Lux AI Season 1 Kaggle competition. LuxAI repo RLlib-multiagents docs Kaggle environments repo Please let

Jaime 12 Nov 07, 2022
This repository is dedicated to developing and maintaining code for experiments with wide neural networks.

Wide-Networks This repository contains the code of various experiments on wide neural networks. In particular, we implement classes for abc-parameteri

Karl Hajjar 0 Nov 02, 2021
A Python library for Deep Graph Networks

PyDGN Wiki Description This is a Python library to easily experiment with Deep Graph Networks (DGNs). It provides automatic management of data splitti

Federico Errica 194 Dec 22, 2022
A Machine Teaching Framework for Scalable Recognition

MEMORABLE This repository contains the source code accompanying our ICCV 2021 paper. A Machine Teaching Framework for Scalable Recognition Pei Wang, N

2 Dec 08, 2021
LSTM-VAE Implementation and Relevant Evaluations

LSTM-VAE Implementation and Relevant Evaluations Before using any file in this repository, please create two directories under the root directory name

Lan Zhang 5 Oct 08, 2022
Tracking Progress in Question Answering over Knowledge Graphs

Tracking Progress in Question Answering over Knowledge Graphs Table of contents Question Answering Systems with Descriptions The QA Systems Table cont

Knowledge Graph Question Answering 47 Jan 02, 2023
Differentiable Optimizers with Perturbations in Pytorch

Differentiable Optimizers with Perturbations in PyTorch This contains a PyTorch implementation of Differentiable Optimizers with Perturbations in Tens

Jake Tuero 54 Jun 22, 2022
🚩🚩🚩

My CTF Challenges 2021 AIS3 Pre-exam / MyFirstCTF Name Category Keywords Difficulty ⒸⓄⓋⒾⒹ-①⑨ (MyFirstCTF Only) Reverse Baby ★ Piano Reverse C#, .NET ★

6 Oct 28, 2021
Running AlphaFold2 (from ColabFold) in Azure Machine Learning

Running AlphaFold2 (from ColabFold) in Azure Machine Learning Colby T. Ford, Ph.D. Companion repository for Medium Post: How to predict many protein s

Colby T. Ford 3 Feb 18, 2022
A Deep Learning based project for creating line art portraits.

ArtLine The main aim of the project is to create amazing line art portraits. Sounds Intresting,let's get to the pictures!! Model-(Smooth) Model-(Quali

Vijish Madhavan 3.3k Jan 07, 2023
A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision

pytorch-lifestream a library built upon PyTorch for building embeddings on discrete event sequences using self-supervision. It can process terabyte-si

Dmitri Babaev 103 Dec 17, 2022
[CVPR'21] FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space by Quande Liu, Cheng Chen, Ji

Quande Liu 178 Jan 06, 2023
A whale detector design for the Kaggle whale-detector challenge!

CNN (InceptionV1) + STFT based Whale Detection Algorithm So, this repository is my PyTorch solution for the Kaggle whale-detection challenge. The obje

Tarin Ziyaee 92 Sep 28, 2021
Collection of machine learning related notebooks to share.

ML_Notebooks Collection of machine learning related notebooks to share. Notebooks GAN_distributed_training.ipynb In this Notebook, TensorFlow's tutori

Sascha Kirch 14 Dec 22, 2022
MoveNetを用いたPythonでの姿勢推定のデモ

MoveNet-Python-Example MoveNetのPythonでの動作サンプルです。 ONNXに変換したモデルも同梱しています。変換自体を試したい方はMoveNet_tf2onnx.ipynbを使用ください。 2021/08/24時点でTensorFlow Hubで提供されている以下モデ

KazuhitoTakahashi 38 Dec 17, 2022
Tensorflow Implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE

SMU A Tensorflow Implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE arXiv https://arxiv.org/abs/211

Fuhang 5 Jan 18, 2022