Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".

Overview

VL-BERT

By Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai.

This repository is an official implementation of the paper VL-BERT: Pre-training of Generic Visual-Linguistic Representations.

Update on 2020/01/16 Add code of visualization.

Update on 2019/12/20 Our VL-BERT got accepted by ICLR 2020.

Introduction

VL-BERT is a simple yet powerful pre-trainable generic representation for visual-linguistic tasks. It is pre-trained on the massive-scale caption dataset and text-only corpus, and can be fine-tuned for various down-stream visual-linguistic tasks, such as Visual Commonsense Reasoning, Visual Question Answering and Referring Expression Comprehension.

Thanks to PyTorch and its 3rd-party libraries, this codebase also contains following features:

  • Distributed Training
  • FP16 Mixed-Precision Training
  • Various Optimizers and Learning Rate Schedulers
  • Gradient Accumulation
  • Monitoring the Training Using TensorboardX

Citing VL-BERT

@inproceedings{
  Su2020VL-BERT:,
  title={VL-BERT: Pre-training of Generic Visual-Linguistic Representations},
  author={Weijie Su and Xizhou Zhu and Yue Cao and Bin Li and Lewei Lu and Furu Wei and Jifeng Dai},
  booktitle={International Conference on Learning Representations},
  year={2020},
  url={https://openreview.net/forum?id=SygXPaEYvH}
}

Prepare

Environment

  • Ubuntu 16.04, CUDA 9.0, GCC 4.9.4
  • Python 3.6.x
    # We recommend you to use Anaconda/Miniconda to create a conda environment
    conda create -n vl-bert python=3.6 pip
    conda activate vl-bert
  • PyTorch 1.0.0 or 1.1.0
    conda install pytorch=1.1.0 cudatoolkit=9.0 -c pytorch
  • Apex (optional, for speed-up and fp16 training)
    git clone https://github.com/jackroos/apex
    cd ./apex
    pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./  
  • Other requirements:
    pip install Cython
    pip install -r requirements.txt
  • Compile
    ./scripts/init.sh

Data

See PREPARE_DATA.md.

Pre-trained Models

See PREPARE_PRETRAINED_MODELS.md.

Training

Distributed Training on Single-Machine

./scripts/dist_run_single.sh <num_gpus> <task>/train_end2end.py <path_to_cfg> <dir_to_store_checkpoint>
  • <num_gpus>: number of gpus to use.
  • <task>: pretrain/vcr/vqa/refcoco.
  • <path_to_cfg>: config yaml file under ./cfgs/<task>.
  • <dir_to_store_checkpoint>: root directory to store checkpoints.

Following is a more concrete example:

./scripts/dist_run_single.sh 4 vcr/train_end2end.py ./cfgs/vcr/base_q2a_4x16G_fp32.yaml ./

Distributed Training on Multi-Machine

For example, on 2 machines (A and B), each with 4 GPUs,

run following command on machine A:

./scripts/dist_run_multi.sh 2 0 <ip_addr_of_A> 4 <task>/train_end2end.py <path_to_cfg> <dir_to_store_checkpoint>

run following command on machine B:

./scripts/dist_run_multi.sh 2 1 <ip_addr_of_A> 4 <task>/train_end2end.py <path_to_cfg> <dir_to_store_checkpoint>

Non-Distributed Training

./scripts/nondist_run.sh <task>/train_end2end.py <path_to_cfg> <dir_to_store_checkpoint>

Note:

  1. In yaml files under ./cfgs, we set batch size for GPUs with at least 16G memory, you may need to adapt the batch size and gradient accumulation steps according to your actual case, e.g., if you decrease the batch size, you should also increase the gradient accumulation steps accordingly to keep 'actual' batch size for SGD unchanged.

  2. For efficiency, we recommend you to use distributed training even on single-machine. But for RefCOCO+, you may meet deadlock using distributed training due to unknown reason (it may be related to PyTorch dataloader deadloack), you can simply use non-distributed training to solve this problem.

Evaluation

VCR

  • Local evaluation on val set:

    python vcr/val.py \
      --a-cfg <cfg_of_q2a> --r-cfg <cfg_of_qa2r> \
      --a-ckpt <checkpoint_of_q2a> --r-ckpt <checkpoint_of_qa2r> \
      --gpus <indexes_of_gpus_to_use> \
      --result-path <dir_to_save_result> --result-name <result_file_name>
    

    Note: <indexes_of_gpus_to_use> is gpu indexes, e.g., 0 1 2 3.

  • Generate prediction results on test set for leaderboard submission:

    python vcr/test.py \
      --a-cfg <cfg_of_q2a> --r-cfg <cfg_of_qa2r> \
      --a-ckpt <checkpoint_of_q2a> --r-ckpt <checkpoint_of_qa2r> \
      --gpus <indexes_of_gpus_to_use> \
      --result-path <dir_to_save_result> --result-name <result_file_name>
    

VQA

  • Generate prediction results on test set for EvalAI submission:
    python vqa/test.py \
      --cfg <cfg_file> \
      --ckpt <checkpoint> \
      --gpus <indexes_of_gpus_to_use> \
      --result-path <dir_to_save_result> --result-name <result_file_name>
    

RefCOCO+

  • Local evaluation on val/testA/testB set:
    python refcoco/test.py \
      --split <val|testA|testB> \
      --cfg <cfg_file> \
      --ckpt <checkpoint> \
      --gpus <indexes_of_gpus_to_use> \
      --result-path <dir_to_save_result> --result-name <result_file_name>
    

Visualization

See VISUALIZATION.md.

Acknowledgements

Many thanks to following codes that help us a lot in building this codebase:

Owner
Weijie Su
Graduate student at USTC.
Weijie Su
The first machine learning framework that encourages learning ML concepts instead of memorizing class functions.

SeaLion is designed to teach today's aspiring ml-engineers the popular machine learning concepts of today in a way that gives both intuition and ways of application. We do this through concise algori

Anish 324 Dec 27, 2022
Real-time ground filtering algorithm of cloud points acquired using Terrestrial Laser Scanner (TLS)

This repository contains tools to simulate the ground filtering process of a registered point cloud. The repository contains two filtering methods. The first method uses a normal vector, and fit to p

5 Aug 25, 2022
IMBENS: class-imbalanced ensemble learning in Python.

IMBENS: class-imbalanced ensemble learning in Python. Links: [Documentation] [Gallery] [PyPI] [Changelog] [Source] [Download] [知乎/Zhihu] [中文README] [a

Zhining Liu 176 Jan 04, 2023
Prometheus exporter for Cisco Unified Computing System (UCS) Manager

prometheus-ucs-exporter Overview Use metrics from the UCS API to export relevant metrics to Prometheus This repository is a fork of Drew Stinnett's or

Marshall Wace 6 Nov 07, 2022
A Multi-attribute Controllable Generative Model for Histopathology Image Synthesis

A Multi-attribute Controllable Generative Model for Histopathology Image Synthesis This is the pytorch implementation for our MICCAI 2021 paper. A Mul

Jiarong Ye 7 Apr 04, 2022
PyTorch Implementation of Vector Quantized Variational AutoEncoders.

Pytorch implementation of VQVAE. This paper combines 2 tricks: Vector Quantization (check out this amazing blog for better understanding.) Straight-Th

Vrushank Changawala 2 Oct 06, 2021
LONG-TERM SERIES FORECASTING WITH QUERYSELECTOR – EFFICIENT MODEL OF SPARSEATTENTION

Query Selector Here you can find code and data loaders for the paper https://arxiv.org/pdf/2107.08687v1.pdf . Query Selector is a novel approach to sp

MORAI 62 Dec 17, 2022
Tesla Light Show xLights Guide With python

Tesla Light Show xLights Guide Welcome to the Tesla Light Show xLights guide! You can create and run your own light shows on Tesla vehicles. Running a

Tesla, Inc. 2.5k Dec 29, 2022
PyTorch code for Composing Partial Differential Equations with Physics-Aware Neural Networks

FInite volume Neural Network (FINN) This repository contains the PyTorch code for models, training, and testing, and Python code for data generation t

Cognitive Modeling 20 Dec 18, 2022
Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

Deep Constrained Least Squares for Blind Image Super-Resolution [Paper] This is the official implementation of 'Deep Constrained Least Squares for Bli

MEGVII Research 141 Dec 30, 2022
Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism

Period-alternatives-of-Softmax Experimental Demo for our paper 'Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechani

slwang9353 0 Sep 06, 2021
Stroke-predictions-ml-model - Machine learning model to predict individuals chances of having a stroke

stroke-predictions-ml-model machine learning model to predict individuals chance

Alex Volchek 1 Jan 03, 2022
PyTorch Implementation of CvT: Introducing Convolutions to Vision Transformers

CvT: Introducing Convolutions to Vision Transformers Pytorch implementation of CvT: Introducing Convolutions to Vision Transformers Usage: img = torch

Rishikesh (ऋषिकेश) 193 Jan 03, 2023
Local Multi-Head Channel Self-Attention for FER2013

LHC-Net Local Multi-Head Channel Self-Attention This repository is intended to provide a quick implementation of the LHC-Net and to replicate the resu

12 Jan 04, 2023
SingleVC performs any-to-one VC, which is an important component of MediumVC project.

SingleVC performs any-to-one VC, which is an important component of MediumVC project. Here is the official implementation of the paper, MediumVC.

谷下雨 26 Dec 28, 2022
Object detection using yolo-tiny model and opencv used as backend

Object detection Algorithm used : Yolo algorithm Backend : opencv Library required: opencv = 4.5.4-dev' Quick Overview about structure 1) main.py Load

2 Jul 06, 2022
Quickly and easily create / train a custom DeepDream model

Dream-Creator This project aims to simplify the process of creating a custom DeepDream model by using pretrained GoogleNet models and custom image dat

55 Dec 27, 2022
Reproducing-BowNet: Learning Representations by Predicting Bags of Visual Words

Reproducing-BowNet Our reproducibility effort based on the 2020 ML Reproducibility Challenge. We are reproducing the results of this CVPR 2020 paper:

6 Mar 16, 2022
Python scripts using the Mediapipe models for Halloween.

Mediapipe-Halloween-Examples Python scripts using the Mediapipe models for Halloween. WHY Mainly for fun. But this repository also includes useful exa

Ibai Gorordo 23 Jan 06, 2023
MoveNet Single Pose on OpenVINO

MoveNet Single Pose tracking on OpenVINO Running Google MoveNet Single Pose models on OpenVINO. A convolutional neural network model that runs on RGB

35 Nov 11, 2022