Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021

Last update: Jan 01, 2023

Overview

Spatial-Temporal Transformer for Dynamic Scene Graph Generation

Pytorch Implementation of our paper Spatial-Temporal Transformer for Dynamic Scene Graph Generation accepted by ICCV2021. We propose a Transformer-based model STTran to generate dynamic scene graphs of the given video. STTran can detect the visual relationships in each frame.

The introduction video is available now: https://youtu.be/gKpnRU8btLg

About the code We run the code on a single RTX2080ti for both training and testing. We borrowed some code from Yang's repository and Zellers' repository.

Usage

We use python=3.6, pytorch=1.1 and torchvision=0.3 in our code. First, clone the repository:

git clone https://github.com/yrcong/STTran.git

We borrow some compiled code for bbox operations.

cd lib/draw_rectangles
python setup.py build_ext --inplace
cd ..
cd fpn/box_intersections_cpu
python setup.py build_ext --inplace

For the object detector part, please follow the compilation from https://github.com/jwyang/faster-rcnn.pytorch We provide a pretrained FasterRCNN model for Action Genome. Please download here and put it in

fasterRCNN/models/faster_rcnn_ag.pth

Dataset

We use the dataset Action Genome to train/evaluate our method. Please process the downloaded dataset with the Toolkit. The directories of the dataset should look like:

|-- action_genome
    |-- annotations   #gt annotations
    |-- frames        #sampled frames
    |-- videos        #original videos

In the experiments for SGCLS/SGDET, we only keep bounding boxes with short edges larger than 16 pixels. Please download the file object_bbox_and_relationship_filtersmall.pkl and put it in the dataloader

Train

You can train the STTran with train.py. We trained the model on a RTX 2080ti:

For PredCLS:

python train.py -mode predcls -datasize large -data_path $DATAPATH

For SGCLS:

python train.py -mode sgcls -datasize large -data_path $DATAPATH

For SGDET:

python train.py -mode sgdet -datasize large -data_path $DATAPATH

Evaluation

You can evaluate the STTran with test.py.

For PredCLS (trained Model):

python test.py -m predcls -datasize large -data_path $DATAPATH -model_path $MODELPATH

For SGCLS (trained Model): :

python test.py -m sgcls -datasize large -data_path $DATAPATH -model_path $MODELPATH

For SGDET (trained Model): :

python test.py -m sgdet -datasize large -data_path $DATAPATH -model_path $MODELPATH

Citation

If our work is helpful for your research, please cite our publication:

@inproceedings{cong2021spatial,
  title={Spatial-Temporal Transformer for Dynamic Scene Graph Generation},
  author={Cong, Yuren and Liao, Wentong and Ackermann, Hanno and Rosenhahn, Bodo and Yang, Michael Ying},
  booktitle = {International Conference on Computer Vision (ICCV)},
  year={2021}
  url={https://arxiv.org/abs/2107.12309}
}

Help

When you have any question/idea about the code/paper. Please comment in Github or send us Email. We will reply as soon as possible.

Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021

Related tags

Overview

Spatial-Temporal Transformer for Dynamic Scene Graph Generation

Usage

Dataset

Train

Evaluation

Citation

Help

Owner

Yuren Cong

The Codebase for Causal Distillation for Language Models.

NOMAD - A blackbox optimization software

Code for the paper titled "Generalized Depthwise-Separable Convolutions for Adversarially Robust and Efficient Neural Networks" (NeurIPS 2021 Spotlight).

Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

Official repository for: Continuous Control With Ensemble DeepDeterministic Policy Gradients

[TIP 2021] SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction

Automatic Number Plate Recognition using Contours and Convolution Neural Networks (CNN)

An onlinel learning to rank python codebase.

(IEEE TIP 2021) Regularized Densely-connected Pyramid Network for Salient Instance Segmentation

Semi-supervised Learning for Sentiment Analysis

SuperSDR: multiplatform KiwiSDR + CAT transceiver integrator

Black-Box-Tuning - Black-Box Tuning for Language-Model-as-a-Service

Mining-the-Social-Web-3rd-Edition - The official online compendium for Mining the Social Web, 3rd Edition (O'Reilly, 2018)

Automatic Attendance marker for LMS Practice School Division, BITS Pilani

Official implementation of DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations in TensorFlow 2

Paper: De-rendering Stylized Texts

This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning].

SimulLR - PyTorch Implementation of SimulLR

Code implementing "Improving Deep Learning Interpretability by Saliency Guided Training"

Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets