Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

Related tags

Deep Learningqpic
Overview

QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information

by Masato Tamura, Hiroki Ohashi, and Tomoaki Yoshinaga.

This repository contains the official implementation of the paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information", which is accepted to CVPR2021.

QPIC is implemented by extending the recently proposed object detector, DETR. QPIC leverages the query-based detection and attention mechanism in the transformer, and as a result, achieves high HOI detection performance with simple detection heads.

Example attention maps.

Preparation

Dependencies

Our implementation uses external libraries such as NumPy and PyTorch. You can resolve the dependencies with the following command.

pip install numpy
pip install -r requirements.txt

Note that this command may dump errors during installing pycocotools, but the errors can be ignored.

Dataset

HICO-DET

HICO-DET dataset can be downloaded here. After finishing downloading, unpack the tarball (hico_20160224_det.tar.gz) to the data directory.

Instead of using the original annotations files, we use the annotation files provided by the PPDM authors. The annotation files can be downloaded from here. The downloaded annotation files have to be placed as follows.

qpic
 |─ data
 │   └─ hico_20160224_det
 |       |─ annotations
 |       |   |─ trainval_hico.json
 |       |   |─ test_hico.json
 |       |   └─ corre_hico.npy
 :       :

V-COCO

First clone the repository of V-COCO from here, and then follow the instruction to generate the file instances_vcoco_all_2014.json. Next, download the prior file prior.pickle from here. Place the files and make directories as follows.

qpic
 |─ data
 │   └─ v-coco
 |       |─ data
 |       |   |─ instances_vcoco_all_2014.json
 |       |   :
 |       |─ prior.pickle
 |       |─ images
 |       |   |─ train2014
 |       |   |   |─ COCO_train2014_000000000009.jpg
 |       |   |   :
 |       |   └─ val2014
 |       |       |─ COCO_val2014_000000000042.jpg
 |       |       :
 |       |─ annotations
 :       :

For our implementation, the annotation file have to be converted to the HOIA format. The conversion can be conducted as follows.

PYTHONPATH=data/v-coco \
        python convert_vcoco_annotations.py \
        --load_path data/v-coco/data \
        --prior_path data/v-coco/prior.pickle \
        --save_path data/v-coco/annotations

Note that only Python2 can be used for this conversion because vsrl_utils.py in the v-coco repository shows a error with Python3.

V-COCO annotations with the HOIA format, corre_vcoco.npy, test_vcoco.json, and trainval_vcoco.json will be generated to annotations directory.

Pre-trained parameters

Our QPIC have to be pre-trained with the COCO object detection dataset. For the HICO-DET training, this pre-training can be omitted by using the parameters of DETR. The parameters can be downloaded from here for ResNet50, and for ResNet101. For the V-COCO training, this pre-training has to be carried out because some images of the V-COCO evaluation set are contained in the training set of DETR. We excluded the images and pre-trained QPIC for the V-COCO evaluation.

After downloading or pre-training, move the pre-trained parameters to the params directory and convert the parameters with the following command (e.g. downloaded ResNet50 parameters).

python convert_parameters.py \
        --load_path params/detr-r50-e632da11.pth \
        --save_path params/detr-r50-pre.pth

Training

After the preparation, you can start the training with the following command.

For the HICO-DET training.

python main.py \
        --pretrained params/detr-r50-pre.pth \
        --output_dir logs \
        --hoi \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --set_cost_bbox 2.5 \
        --set_cost_giou 1 \
        --bbox_loss_coef 2.5 \
        --giou_loss_coef 1

For the V-COCO training.

python main.py \
        --pretrained params/detr-r50-pre.pth \
        --output_dir logs \
        --hoi \
        --dataset_file vcoco \
        --hoi_path data/v-coco \
        --num_obj_classes 80 \
        --num_verb_classes 29 \
        --backbone resnet50 \
        --set_cost_bbox 2.5 \
        --set_cost_giou 1 \
        --bbox_loss_coef 2.5 \
        --giou_loss_coef 1

If you have multiple GPUs on your machine, you can utilize them to speed up the training. The number of GPUs is specified with the --nproc_per_node option. The following command starts the training with 8 GPUs for the HICO-DET training.

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \
        main.py \
        --pretrained params/detr-r50-pre.pth \
        --output_dir logs \
        --hoi \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --set_cost_bbox 2.5 \
        --set_cost_giou 1 \
        --bbox_loss_coef 2.5 \
        --giou_loss_coef 1

Evaluation

The evaluation is conducted at the end of each epoch during the training. The results are written in logs/log.txt like below:

"test_mAP": 0.29061250833779456, "test_mAP rare": 0.21910348492395765, "test_mAP non-rare": 0.31197234650036926

test_mAP, test_mAP rare, and test_mAP non-rare are the results of the default full, rare, and non-rare setting, respectively.

For the official evaluation of V-COCO, a pickle file of detection results have to be generated. You can generate the file as follows.

python generate_vcoco_official.py \
        --param_path logs/checkpoint.pth
        --save_path vcoco.pickle
        --hoi_path data/v-coco

Results

HICO-DET.

Full (D) Rare (D) Non-rare (D) Full(KO) Rare (KO) Non-rare (KO)
QPIC (ResNet50) 29.07 21.85 31.23 31.68 24.14 33.93
QPIC (ResNet101) 29.90 23.92 31.69 32.38 26.06 34.27

D: Default, KO: Known object

V-COCO.

Scenario 1 Scenario 2
QPIC (ResNet50) 58.8 61.0
QPIC (ResNet101) 58.3 60.7

Citation

Please consider citing our paper if it helps your research.

@inproceedings{tamura_cvpr2021,
author = {Tamura, Masato and Ohashi, Hiroki and Yoshinaga, Tomoaki},
title = {{QPIC}: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information},
booktitle={CVPR},
year = {2021},
}
Comments
  • Reproduction the result on VCOCO dataset

    Reproduction the result on VCOCO dataset

    Hi, I can reproduce the result on HICO-DET dataset, but just get 51 on V-COCO dataset, could you please provide the log.txt? I am not sure where I am wrong. Thanks.

    opened by SherlockHolmes221 7
  • How to get the file logs/checkpoint.pth?

    How to get the file logs/checkpoint.pth?

    Thanks for your interesting work! Now I have a question about how to get the file logs/checkpoint.pth,I have not see the file from your google drive. Desire to your reply.

    opened by Lunarli 4
  • number of decoder layers impact

    number of decoder layers impact

    Hi, thanks for interesting work. Does the number of decoder layers have a significant impact on performance like in detr? And I cant find how much v100 you've used. maybe 8?

    opened by jihwanp 3
  • Why the pretrained VCOCO model has 81 object classes?

    Why the pretrained VCOCO model has 81 object classes?

    When I tried to evaluate the VCOCO model you provide, I have to set the parameter --num_obj_classes to 81. What is the reason for this setting? And should I set --num_obj_classes 81 during training?

    Thanks for your reply!

    opened by weiyunfei 3
  • How to pretrain detector for VCOCO?

    How to pretrain detector for VCOCO?

    Hi, thanks for your excellent work. I am confused about the pre-training for the V-COCO training. In your README.md, you stated the pretraining has to be carried out for the V-COCO training since some images of the V-COCO evaluation set are contained in the training set of DETR, while the given training command just used the pretrained DETR model without any more pretraining. Should I apply a pretraining on the dataset excluding the V-COCO evaluation images or just follow your training command?

    opened by weiyunfei 3
  • Different AP Results for pre-trained VCOCO models

    Different AP Results for pre-trained VCOCO models

    Hi folks,

    First of all, thank you for sharing this repository. I would like to ask a specific question about the evaluation results of provided pre-trained V-COCO models. I followed the instructions you provided (for constructing annotation files for V-COCO and obtaining pickle files) to get the results. However, comparing with the V-COCO results in the table, I got different average role ap results. For instance, I am providing the output of R-50 QPIC, scenario 1 Role AP result in the attached screenshot. I am wondering possible reasons for the issue, could you please provide assistance about this?

    qpic_resnet50_sc1_roleap

    opened by cancam 2
  • How to convert the pretrained DETR model for V-COCO style?

    How to convert the pretrained DETR model for V-COCO style?

    Hi! Sorry for bothering again. When I tried to train a V-COCO model, I found that the object classifier of the pre-trained DETR model is only 81-way (including background), while in #6 you claimed that the V-COCO model has an 82-way object classifier (including background and a missing category id). So what should I do to convert the classifier parameters from 81 classes into 82 classes?

    opened by weiyunfei 2
  • For V-COCO result reproduction, which evaluation code should I use?

    For V-COCO result reproduction, which evaluation code should I use?

    I tried to reproduce your results on V-COCO, and I encountered some problems. I found that the official evaluation code can only achieve 56.5 for Scenario 1 and 58.6 for Scenario 2 with your pre-trained model, while with the vcoco_eval.py in your project which is in PPDM style, the result is 58.35. However, the evaluation process in vcoco_eval.py seems like Scenario 2 in the official evaluation. Which code should I use to reproduce your experiments on V-COCO? Moreover, if I need to use the vcoco_eval.py, what should I do to get the evaluation results for both scenarios?

    opened by weiyunfei 2
  • V-COCO Evaluation Error

    V-COCO Evaluation Error

    Hello,

    Many thanks for your great work.

    I am trying to evaluate your pre-trained models on V-COCO.

    1. So, I first generate the official detection pickle via:
    
    python generate_vcoco_official.py \
            --param_path ./params/qpic_resnet50_vcoco.pth \
            --save_path ./logs_vcoco/vcoco.pickle \
            --hoi_path ./data/v-coco
    
    1. Later, I use the official evaluation code via the following:
    from vsrl_eval import VCOCOeval
    
    vsrl_annot_file_s='./data/v-coco/data/vcoco/vcoco_val.json'
    split_file_s='./data/v-coco/data/splits/vcoco_val.ids'
    
    coco_file_s='./data/v-coco/data/instances_vcoco_all_2014.json'
    vcocoeval = VCOCOeval(vsrl_annot_file_s, coco_file_s, split_file_s)
    
    file_name= './logs_vcoco/vcoco.pickle'
    vcocoeval._do_eval(file_name, ovr_thresh=0.5)
    

    Please note that I adapted the latter script from VSG-Net repo. I face the following error during evaluation:

    zero-size array to reduction operation minimum which has no identity

    on assert(np.amax(rec) <= 1) within _do_agent_eval() function execution.

    I wonder if this is a common error, and how it can be mitigated?

    Many thanks.

    opened by kilickaya 1
  • Distance AP

    Distance AP

    Hi I have a question about distance-wise ap. image

    Does the hoi instance, which participated in calculating ap, correspond to distance=0.1 when the distance between human & center box is between 0.1 and 0.2?

    Or I would fully appreciate it if you could provide code for calculate distance wise ap. Thanks

    opened by jihwanp 1
  • Some useless steps in focal loss

    Some useless steps in focal loss

    image Thx for your interesting work! The neg_weights is meaningful only in heatmap based problem, e.g., Gaussian based detection. Removing those steps may avoid some misunderstanding.

    opened by YueLiao 1
  • a bug of hico dataset file?

    a bug of hico dataset file?

    Firstly,thank for your nice work?but i find small bug here https://github.com/hitachi-rd-cv/qpic/blob/main/datasets/hico.py#L62 The number of HOI turples(len(img_anno['hoi_annotation'])) should not exceed the number of query(self.num_queries) but the number of boxes(len(img_anno['annotations'])) should not exceed the number of query(self.num_queries)? Is that right? Waiting for your thought,thank you very much!!

    opened by jojolee123 0
  • How to convert .onnx model from qpic.pth

    How to convert .onnx model from qpic.pth

    I am facing some issue; onnx conversion of this model. I tried

    torch.onnx.export(pth_model, dummy_input, "onnx_model.onnx", opset_version 11) dummy_input shape is [1, 3, 720, 1280] # same with this model.

    However, the result is [array([[[nan, nan, nan], [nan, nan, nan]]], dtype=float32), array([[[nan, nan], [nan, nan]]], dtype=float32), array([[[nan, nan, nan, nan], [nan, nan, nan, nan]]], dtype=float32), array([[[nan, nan, nan, nan], [nan, nan, nan, nan]]], dtype=float32)]

    It's nan party!

    please commet how to fix it.

    opened by bj-noh 0
  • Is your architecture end-to-end HOI?

    Is your architecture end-to-end HOI?

    Hello Thanks for your implementation. Is your architecture end-to-end HOI? Does it mean that it does not require any features and feature extraction? For example, I can feed my features to your network?

    opened by mansooreh1 0
  • The influence from aux_loss

    The influence from aux_loss

    Hi authors,

    thanks for your implementation which helps my research a lot. In the supplementary materials of your paper you stated that the auxiliary loss is used following DETR. What will happen if it isn't used for HOI training? I wonder if there's any corresponding experiment on QPIC.

    opened by hwfan 0
  • custom dataset implementation

    custom dataset implementation

    Hi, Thank you so much for this awesome repo. I am currently trying a custom dataset implementation using qpic. The dataset's annotation file is in coco format. The convert_vcoco_annotations.py script converts vcoco to HOIA format. But I am trying to convert a coco format annotation to HOIA by manually adding the interactions. In this process, I am unable to understand a few things, in HOIA format : what does {subject_id , category_id, object_id} mean in "hoi_annotations" dict key, what does {category_id} mean in "annotations" dict key"

    opened by dharneeshkar004 0
An original implementation of "MetaICL Learning to Learn In Context" by Sewon Min, Mike Lewis, Luke Zettlemoyer and Hannaneh Hajishirzi

MetaICL: Learning to Learn In Context This includes an original implementation of "MetaICL: Learning to Learn In Context" by Sewon Min, Mike Lewis, Lu

Meta Research 141 Jan 07, 2023
an implementation of 3D Ken Burns Effect from a Single Image using PyTorch

3d-ken-burns This is a reference implementation of 3D Ken Burns Effect from a Single Image [1] using PyTorch. Given a single input image, it animates

Simon Niklaus 1.4k Dec 28, 2022
KITTI-360 Annotation Tool is a framework that developed based on python(cherrypy + jinja2 + sqlite3) as the server end and javascript + WebGL as the front end.

KITTI-360 Annotation Tool is a framework that developed based on python(cherrypy + jinja2 + sqlite3) as the server end and javascript + WebGL as the front end.

86 Dec 12, 2022
Library extending Jupyter notebooks to integrate with Apache TinkerPop and RDF SPARQL.

Graph Notebook: easily query and visualize graphs The graph notebook provides an easy way to interact with graph databases using Jupyter notebooks. Us

Amazon Web Services 501 Dec 28, 2022
Self-supervised learning on Graph Representation Learning (node-level task)

graph_SSL Self-supervised learning on Graph Representation Learning (node-level task) How to run the code To run GRACE, sh run_GRACE.sh To run GCA, sh

Namkyeong Lee 3 Dec 31, 2021
My personal code and solution to the Synacor Challenge from 2012 OSCON.

Synacor OSCON Challenge Solution (2012) This repository contains my code and solution to solve the Synacor OSCON 2012 Challenge. If you are interested

2 Mar 20, 2022
A repo with study material, exercises, examples, etc for Devnet SPAUTO

MPLS in the SDN Era -- DevNet SPAUTO Get right to the study material: Checkout the Wiki! A lab topology based on MPLS in the SDN era book used for 30

Hugo Tinoco 67 Nov 16, 2022
Construct a neural network frame by Numpy

本项目的CSDN博客链接:https://blog.csdn.net/weixin_41578567/article/details/111482022 1. 概览 本项目主要用于神经网络的学习,通过基于numpy的实现,了解神经网络底层前向传播、反向传播以及各类优化器的原理。 该项目目前已实现的功

24 Jan 22, 2022
[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning [CVPR'21, Oral] By Zhicheng Huang*, Zhaoyang Zeng*, Yupan H

Multimedia Research 196 Dec 13, 2022
MEND: Model Editing Networks using Gradient Decomposition

MEND: Model Editing Networks using Gradient Decomposition Setup Environment This codebase uses Python 3.7.9. Other versions may work as well. Create a

Eric Mitchell 141 Dec 02, 2022
Code for KHGT model, AAAI2021

KHGT Code for KHGT accepted by AAAI2021 Please unzip the data files in Datasets/ first. To run KHGT on Yelp data, use python labcode_yelp.py For Movi

32 Nov 29, 2022
Computer-Vision-Paper-Reviews - Computer Vision Paper Reviews with Key Summary along Papers & Codes

Computer-Vision-Paper-Reviews Computer Vision Paper Reviews with Key Summary along Papers & Codes. Jonathan Choi 2021 50+ Papers across Computer Visio

Jonathan Choi 2 Mar 17, 2022
Extending JAX with custom C++ and CUDA code

Extending JAX with custom C++ and CUDA code This repository is meant as a tutorial demonstrating the infrastructure required to provide custom ops in

Dan Foreman-Mackey 237 Dec 23, 2022
A treasure chest for visual recognition powered by PaddlePaddle

简体中文 | English PaddleClas 简介 飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别任务的工具集,助力使用者训练出更好的视觉模型和应用落地。 近期更新 2021.11.1 发布PP-ShiTu技术报告,新增饮料识别demo 2021.10.23 发

4.6k Dec 31, 2022
Rule Based Classification Project For Python

Rule-Based-Classification-Project (ENG) Business Problem: A game company wants to create new level-based customer definitions (personas) by using some

Deniz Can OĞUZ 4 Oct 29, 2022
Implementation of Continuous Sparsification, a method for pruning and ticket search in deep networks

Continuous Sparsification Implementation of Continuous Sparsification (CS), a method based on l_0 regularization to find sparse neural networks, propo

Pedro Savarese 23 Dec 07, 2022
NitroFE is a Python feature engineering engine which provides a variety of modules designed to internally save past dependent values for providing continuous calculation.

NitroFE is a Python feature engineering engine which provides a variety of modules designed to internally save past dependent values for providing continuous calculation.

100 Sep 28, 2022
Solving Zero-Shot Learning in Named Entity Recognition with Common Sense Knowledge

Zero-Shot Learning in Named Entity Recognition with Common Sense Knowledge Associated code for the paper Zero-Shot Learning in Named Entity Recognitio

Søren Hougaard Mulvad 13 Dec 25, 2022
A setup script to generate ITK Python Wheels

ITK Python Package This project provides a setup.py script to build ITK Python binary packages and infrastructure to build ITK external module Python

Insight Software Consortium 59 Dec 14, 2022
Watch faces morph into each other with StyleGAN 2, StyleGAN, and DCGAN!

FaceMorpher FaceMorpher is an innovative project to get a unique face morph (or interpolation for geeks) on a website. Yes, this means you can see fac

Anish 9 Jun 24, 2022