YouRefIt: Embodied Reference Understanding with Language and Gesture

Last update: Jul 11, 2022

Related tags

Deep Learning YouRefIt_ERU

Overview

YouRefIt: Embodied Reference Understanding with Language and Gesture

by Yixin Chen, Qing Li, Deqian Kong, Yik Lun Kei, Tao Gao, Yixin Zhu, Song-Chun Zhu and Siyuan Huang

The IEEE International Conference on Computer Vision (ICCV), 2021

Introduction

We study the machine's understanding of embodied reference: One agent uses both language and gesture to refer to an object to another agent in a shared physical environment. To tackle this problem, we introduce YouRefIt, a new crowd-sourced, real-world dataset of embodied reference.

For more details, please refer to our paper.

Checklist

Image ERU
Video ERU

Installation

The code was tested with the following environment: Ubuntu 18.04/20.04, python 3.7/3.8, pytorch 1.9.1. Run

    git clone https://github.com/yixchen/YouRefIt_ERU
    pip install -r requirements.txt

Dataset

Download the YouRefIt dataset from Dataset Request Page and put under ./ln_data

Model weights

Yolov3: download the pretrained model and place the file in ./saved_models by
```
sh saved_models/yolov3_weights.sh
```
More pretrained models are availble Google drive, and should also be placed in ./saved_models.

Make sure to put the files in the following structure:

|-- ROOT
|	|-- ln_data
|		|-- yourefit
|			|-- images
|			|-- paf
|			|-- saliency
|	|-- saved_modeks
|		|-- final_model_full.tar
|		|-- final_resc.tar

Training

Train the model, run the code under main folder.

python train.py --data_root ./ln_data/ --dataset yourefit --gpu gpu_id

Evaluation

Evaluate the model, run the code under main folder. Using flag --test to access test mode.

python train.py --data_root ./ln_data/ --dataset yourefit --gpu gpu_id \
 --resume saved_models/model.pth.tar \
 --test

Evaluate Image ERU on our released model

Evaluate our full model with PAF and saliency feature, run

python train.py --data_root ./ln_data/ --dataset yourefit  --gpu gpu_id \
 --resume saved_models/final_model_full.tar --use_paf --use_sal --large --test

Evaluate baseline model that only takes images as input, run

python train.py --data_root ./ln_data/ --dataset yourefit  --gpu gpu_id \
 --resume saved_models/final_resc.tar --large --test

Evalute the inference results on test set on different IOU levels by changing the path accordingly,

 python evaluate_results.py

Citation

@inProceedings{chen2021yourefit,
 title={YouRefIt: Embodied Reference Understanding with Language and Gesture},
 author = {Chen, Yixin and Li, Qing and Kong, Deqian and Kei, Yik Lun and Zhu, Song-Chun and Gao, Tao and Zhu, Yixin and Huang, Siyuan},
 booktitle={The IEEE International Conference on Computer Vision (ICCV),
 year={2021}
 }

Acknowledgement

Our code is built on ReSC and we thank the authors for their hard work.

YouRefIt: Embodied Reference Understanding with Language and Gesture

Related tags

Overview

YouRefIt: Embodied Reference Understanding with Language and Gesture

Introduction

Checklist

Installation

Dataset

Model weights

Training

Evaluation

Evaluate Image ERU on our released model

Citation

Acknowledgement

Owner

BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search

A repository built on the Flow software package to explore cyber-security attacks on intelligent transportation systems.

【ACMMM 2021】DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning

FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection

Seach Losses of our paper 'Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search', accepted by ICLR 2021.

Object detection on multiple datasets with an automatically learned unified label space.

Warning: This project does not have any current developer. See bellow.

PyTorch implementation of PSPNet segmentation network

EncT5: Fine-tuning T5 Encoder for Non-autoregressive Tasks

Contrastive Multi-View Representation Learning on Graphs

Implementation of association rules mining algorithms (Apriori|FPGrowth) using python.

Swapping face using Face Mesh with TensorFlow Lite

This is a collection of simple PyTorch implementations of neural networks and related algorithms. These implementations are documented with explanations,

The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'

Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

Attack classification models with transferability, black-box attack; unrestricted adversarial attacks on imagenet

Benchmarks for Model-Based Optimization

Awesome Monocular 3D detection

Semantic Segmentation of images using PixelLib with help of Pascalvoc dataset trained with Deeplabv3+ framework.

PyTorch implementation of Decoupling Value and Policy for Generalization in Reinforcement Learning