Code for our CVPR 2022 Paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection"

Last update: Dec 04, 2022

Related tags

Deep Learning gen-vlkt

Overview

GEN-VLKT

Code for our CVPR 2022 paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection".

Contributed by Yue Liao*, Aixi Zhang*, Miao Lu, Yongliang Wang, Xiaobo Li and Si Liu.

Installation

Installl the dependencies.

pip install -r requirements.txt

Clone and build CLIP.

git clone https://github.com/openai/CLIP.git && cd CLIP && python setup.py develop && cd ..

Data preparation

HICO-DET

HICO-DET dataset can be downloaded here. After finishing downloading, unpack the tarball (hico_20160224_det.tar.gz) to the data directory.

Instead of using the original annotations files, we use the annotation files provided by the PPDM authors. The annotation files can be downloaded from here. The downloaded annotation files have to be placed as follows.

data
 └─ hico_20160224_det
     |─ annotations
     |   |─ trainval_hico.json
     |   |─ test_hico.json
     |   └─ corre_hico.npy
     :

V-COCO

First clone the repository of V-COCO from here, and then follow the instruction to generate the file instances_vcoco_all_2014.json. Next, download the prior file prior.pickle from here. Place the files and make directories as follows.

GEN-VLKT
 |─ data
 │   └─ v-coco
 |       |─ data
 |       |   |─ instances_vcoco_all_2014.json
 |       |   :
 |       |─ prior.pickle
 |       |─ images
 |       |   |─ train2014
 |       |   |   |─ COCO_train2014_000000000009.jpg
 |       |   |   :
 |       |   └─ val2014
 |       |       |─ COCO_val2014_000000000042.jpg
 |       |       :
 |       |─ annotations
 :       :

For our implementation, the annotation file have to be converted to the HOIA format. The conversion can be conducted as follows.

PYTHONPATH=data/v-coco \
        python convert_vcoco_annotations.py \
        --load_path data/v-coco/data \
        --prior_path data/v-coco/prior.pickle \
        --save_path data/v-coco/annotations

Note that only Python2 can be used for this conversion because vsrl_utils.py in the v-coco repository shows a error with Python3.

V-COCO annotations with the HOIA format, corre_vcoco.npy, test_vcoco.json, and trainval_vcoco.json will be generated to annotations directory.

Pre-trained model

Download the pretrained model of DETR detector for ResNet50, and put it to the params directory.

python ./tools/convert_parameters.py \
        --load_path params/detr-r50-e632da11.pth \
        --save_path params/detr-r50-pre-2branch-hico.pth \
        --num_queries 64

python ./tools/convert_parameters.py \
        --load_path params/detr-r50-e632da11.pth \
        --save_path params/detr-r50-pre-2branch-vcoco.pth \
        --dataset vcoco \
        --num_queries 64

Training

After the preparation, you can start training with the following commands. The whole training is split into two steps: GEN-VLKT base model training and dynamic re-weighting training. The trainings of GEN-VLKT-S for HICO-DET and V-COCO are shown as follows.

HICO-DET

sh ./config/hico_s.sh

V-COCO

sh ./configs/vcoco_s.sh

Zero-shot

sh ./configs/hico_s_zs_nf_uc.sh

Evaluation

HICO-DET

You can conduct the evaluation with trained parameters for HICO-DET as follows.

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \
        main.py \
        --pretrained pretrained/hico_gen_vlkt_s.pth \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --num_queries 64 \
        --dec_layers 3 \
        --eval \
        --with_clip_label \
        --with_obj_clip_label \
        --use_nms_filter

For the official evaluation (reported in paper), you need to covert the prediction file to a official prediction format following this file, and then follow PPDM evaluation steps.

V-COCO

Firstly, you need the add the following main function to the vsrl_eval.py in data/v-coco.

if __name__ == '__main__':
  import sys

  vsrl_annot_file = 'data/vcoco/vcoco_test.json'
  coco_file = 'data/instances_vcoco_all_2014.json'
  split_file = 'data/splits/vcoco_test.ids'

  vcocoeval = VCOCOeval(vsrl_annot_file, coco_file, split_file)

  det_file = sys.argv[1]
  vcocoeval._do_eval(det_file, ovr_thresh=0.5)

Next, for the official evaluation of V-COCO, a pickle file of detection results have to be generated. You can generate the file with the following command. and then evaluate it as follows.

python generate_vcoco_official.py \
        --param_path pretrained/VCOCO_GEN_VLKT_S.pth \
        --save_path vcoco.pickle \
        --hoi_path data/v-coco \
        --num_queries 64 \
        --dec_layers 3 \
        --use_nms_filter \
        --with_clip_label \
        --with_obj_clip_label

cd data/v-coco
python vsrl_eval.py vcoco.pickle

Zero-shot

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \
        main.py \
        --pretrained pretrained/hico_gen_vlkt_s.pth \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --num_queries 64 \
        --dec_layers 3 \
        --eval \
        --with_clip_label \
        --with_obj_clip_label \
        --use_nms_filter \
        --zero_shot_type rare_first \
        --del_unseen

Regular HOI Detection Results

HICO-DET

	Full (D)	Rare (D)	Non-rare (D)	Full(KO)	Rare (KO)	Non-rare (KO)	Download	Conifg
GEN-VLKT-S (R50)	33.75	29.25	35.10	36.78	32.75	37.99	model	config
GEN-VLKT-M* (R101)	34.63	30.04	36.01	37.97	33.72	39.24	model	config
GEN-VLKT-L (R101)	34.95	31.18	36.08	38.22	34.36	39.37	model	config

D: Default, KO: Known object, *: The original model is lost and the provided checkpoint performance is slightly different from the paper reported.

V-COCO

	Scenario 1	Scenario 2	Download	Config
GEN-VLKT-S (R50)	62.41	64.46	model	config
GEN-VLKT-M (R101)	63.28	65.58	model	config
GEN-VLKT-L (R101)	63.58	65.93	model	config

Zero-shot HOI Detection Results

	Type	Unseen	Seen	Full	Download	Conifg
GEN-VLKT-S	RF-UC	21.36	32.91	30.56	model	config
GEN-VLKT-S	NF-UC	25.05	23.38	23.71	model	config
GEN-VLKT-S	UO	10.51	28.92	25.63	model	config
GEN-VLKT-S	UV	20.96	30.23	28.74	model	config

Citation

Please consider citing our paper if it helps your research.

@inproceedings{liao2022genvlkt,
  title={GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection},
  author={Yue Liao, Aixi Zhang, Miao Lu, Yongliang Wang, Xiaobo Li, Si Liu},
  booktitle={CVPR},
  year={2022}
}

License

GEN-VLKT is released under the MIT license. See LICENSE for additional details.

Acknowledge

Some of the codes are built upon PPDM, DETR, QPIC and CDN. Thanks them for their great works!

Code for our CVPR 2022 Paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection"

Related tags

Overview

GEN-VLKT

Installation

Data preparation

HICO-DET

V-COCO

Pre-trained model

Training

HICO-DET

V-COCO

Zero-shot

Evaluation

HICO-DET

V-COCO

Zero-shot

Regular HOI Detection Results

HICO-DET

V-COCO

Zero-shot HOI Detection Results

Citation

License

Acknowledge

Owner

Yue Liao

GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. (CVPR 2021)

Semantic Scholar's Author Disambiguation Algorithm & Evaluation Suite

PyTorch implementation of "Supervised Contrastive Learning" (and SimCLR incidentally)

A script that trains a model to recognize handwritten digits using the MNIST data set.

This repo implements a 3D segmentation task for an airport baggage dataset.

The code for two papers: Feedback Transformer and Expire-Span.

Repository for the paper "Online Domain Adaptation for Occupancy Mapping", RSS 2020

A boosting-based Multiple Instance Learning (MIL) package that includes MIL-Boost and MCIL-Boost

This repository is for Contrastive Embedding Distribution Refinement and Entropy-Aware Attention Network (CEDR)

Scheduling BilinearRewards

Train emoji embeddings based on emoji descriptions.

Robust Lane Detection via Expanded Self Attention (WACV 2022)

SimDeblur is a simple framework for image and video deblurring, implemented by PyTorch

[CVPRW 21] "BNN - BN = ? Training Binary Neural Networks without Batch Normalization", Tianlong Chen, Zhenyu Zhang, Xu Ouyang, Zechun Liu, Zhiqiang Shen, Zhangyang Wang

Spatially-Adaptive Pixelwise Networks for Fast Image Translation, CVPR 2021

Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression.

Official pytorch implementation of the IrwGAN for unaligned image-to-image translation

This repository contains the entire code for our work "Two-Timescale End-to-End Learning for Channel Acquisition and Hybrid Precoding"

A PyTorch implementation of EfficientNet and EfficientNetV2 (coming soon!)

codes for Self-paced Deep Regression Forests with Consideration on Ranking Fairness