Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)

Last update: Dec 23, 2022

Overview

Vision-Language Transformer and Query Generation for Referring Segmentation

Please consider citing our paper in your publications if the project helps your research.

@inproceedings{vision-language-transformer,
  title={Vision-Language Transformer and Query Generation for Referring Segmentation},
  author={Ding, Henghui and Liu, Chang and Wang, Suchen and Jiang, Xudong},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  year={2021}
}

Installation

Environment:
- Python 3.6
- tensorflow 1.15
- Other dependencies in requirements.txt
- SpaCy model for embedding:
  
  python -m spacy download en_vectors_web_lg
Dataset preparation
- Put the folder of COCO training set ("train2014") under data/images/.
- Download the RefCOCO dataset from here and extract them to data/. Then run the script for data preparation under data/:
```
cd data
python data_process_v2.py --data_root . --output_dir data_v2 --dataset [refcoco/refcoco+/refcocog] --split [unc/umd/google] --generate_mask
```

Evaluating

Download pretrained models & config files from here.
In the config file, set:
- evaluate_model: path to the pretrained weights
- evaluate_set: path to the dataset for evaluation.

Run

python vlt.py test [PATH_TO_CONFIG_FILE]

Training

Pretrained Backbones: We use the backbone weights proviede by MCN.

Note: we use the backbone that excludes all images that appears in the val/test splits of RefCOCO, RefCOCO+ and RefCOCOg.
Specify hyperparameters, dataset path and pretrained weight path in the configuration file. Please refer to the examples under /config, or config file of our pretrained models.

Run

python vlt.py train [PATH_TO_CONFIG_FILE]

Acknowledgement

We borrowed a lot of codes from MCN, keras-transformer, RefCOCO API and keras-yolo3. Thanks for their excellent works!

Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)

Related tags

Overview

Vision-Language Transformer and Query Generation for Referring Segmentation

Installation

Evaluating

Training

Acknowledgement

Owner

Henghui Ding

Codes for building and training the neural network model described in Domain-informed neural networks for interaction localization within astroparticle experiments.

BLEND: A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches

Deep Learning Head Pose Estimation using PyTorch.

An official source code for "Augmentation-Free Self-Supervised Learning on Graphs"

Official implementation of Self-supervised Graph Attention Networks (SuperGAT), ICLR 2021.

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

A symbolic-model-guided fuzzer for TLS

A face dataset generator with out-of-focus blur detection and dynamic interval adjustment.

Mitsuba 2: A Retargetable Forward and Inverse Renderer

Pixray is an image generation system

Voice Conversion Using Speech-to-Speech Neuro-Style Transfer

[CVPR2021] UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles

Codebase of deep learning models for inferring stability of mRNA molecules

Trash Sorter Extraordinaire is a software which efficiently detects the different types of waste in a pile of random trash through feeding it pictures or videos.

Neural network for stock price prediction

Replication Package for AequeVox:Automated Fariness Testing for Speech Recognition Systems

[CVPR2021] De-rendering the World's Revolutionary Artefacts

Sequence-to-Sequence learning using PyTorch

Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)

Transfer Learning library for Deep Neural Networks.