The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Last update: Nov 14, 2022

Related tags

Deep Learning weak-sup-visual-grounding

Overview

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

This repository is the official implementation of CVPR 2021 paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Requirements

Tensorflow-1-15

Training

To train the NCE model(s) in the paper, run this command:

python train_nce_distill_model.py \
  --region_feat_path=region_features.hdf5 \
  --phrase_feat_path=phrase_features.hdf5 \
  --glove_path=glove.hdf5

To train the NCE+Distill model(s) in the paper, run this command:

python train_nce_distill_model.py \
  --region_feat_path=region_features.hdf5 \
  --phrase_feat_path=phrase_features.hdf5 \
  --glove_path=glove.hdf5 \
  --phrase_to_label_json=phrase_to_label.json

Evaluation

To evaluate the model on Flickr30K, run:

python eval_model.py \
  --region_feat_path=region_features_test.hdf5 \
  --phrase_feat_path=phrase_features_test.hdf5 \
  --glove_path=glove.hdf5 \
  --restore_path=checkpoint.meta

Pre-trained Models

You can download pretrained models using Res101 VG features here:

You can also find the features on Flickr30K test split here.

The pretrained models achieve the following performance on Flickr30K test split:

Model Name	[email protected]	[email protected]	[email protected]
NCE+Distill	0.5310	0.7394	0.7875
NCE	0.5135	0.7338	0.7833

Citation

If you use our implementation in your research or wish to refer to the results published in our paper, please use the following BibTeX entry.

@InProceedings{Wang_2021_CVPR,
    author    = {Wang, Liwei and Huang, Jing and Li, Yin and Xu, Kun and Yang, Zhengyuan and Yu, Dong},
    title     = {Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {14090-14100}
}

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Related tags

Overview

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

Requirements

Training

Evaluation

Pre-trained Models

Citation

Owner

Neighborhood Contrastive Learning for Novel Class Discovery

YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with ONNX, TensorRT, ncnn, and OpenVINO supported.

A graphical Semi-automatic annotation tool based on labelImg and Yolov5

Learning 3D Part Assembly from a Single Image

A static analysis library for computing graph representations of Python programs suitable for use with graph neural networks.

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

TACTO: A Fast, Flexible and Open-source Simulator for High-Resolution Vision-based Tactile Sensors

Capstone-Project-2 - A game program written in the Python language

Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

State-of-the-art language models can match human performance on many tasks

Tutel MoE: An Optimized Mixture-of-Experts Implementation

Is RobustBench/AutoAttack a suitable Benchmark for Adversarial Robustness?

PyTorch implementation for Convolutional Networks with Adaptive Inference Graphs

QAHOI: Query-Based Anchors for Human-Object Interaction Detection (paper)

A implemetation of the LRCN in mxnet

Using machine learning to predict and analyze high and low reader engagement for New York Times articles posted to Facebook.

Harmonious Textual Layout Generation over Natural Images via Deep Aesthetics Learning

codebase for "A Theory of the Inductive Bias and Generalization of Kernel Regression and Wide Neural Networks"

The Noise Contrastive Estimation for softmax output written in Pytorch

An Implementation of Transformer in Transformer in TensorFlow for image classification, attention inside local patches