Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021

Overview

Spatial-Temporal Transformer for Dynamic Scene Graph Generation

Pytorch Implementation of our paper Spatial-Temporal Transformer for Dynamic Scene Graph Generation accepted by ICCV2021. We propose a Transformer-based model STTran to generate dynamic scene graphs of the given video. STTran can detect the visual relationships in each frame.

The introduction video is available now: https://youtu.be/gKpnRU8btLg

GitHub Logo

About the code We run the code on a single RTX2080ti for both training and testing. We borrowed some code from Yang's repository and Zellers' repository.

Usage

We use python=3.6, pytorch=1.1 and torchvision=0.3 in our code. First, clone the repository:

git clone https://github.com/yrcong/STTran.git

We borrow some compiled code for bbox operations.

cd lib/draw_rectangles
python setup.py build_ext --inplace
cd ..
cd fpn/box_intersections_cpu
python setup.py build_ext --inplace

For the object detector part, please follow the compilation from https://github.com/jwyang/faster-rcnn.pytorch We provide a pretrained FasterRCNN model for Action Genome. Please download here and put it in

fasterRCNN/models/faster_rcnn_ag.pth

Dataset

We use the dataset Action Genome to train/evaluate our method. Please process the downloaded dataset with the Toolkit. The directories of the dataset should look like:

|-- action_genome
    |-- annotations   #gt annotations
    |-- frames        #sampled frames
    |-- videos        #original videos

In the experiments for SGCLS/SGDET, we only keep bounding boxes with short edges larger than 16 pixels. Please download the file object_bbox_and_relationship_filtersmall.pkl and put it in the dataloader

Train

You can train the STTran with train.py. We trained the model on a RTX 2080ti:

  • For PredCLS:
python train.py -mode predcls -datasize large -data_path $DATAPATH 
  • For SGCLS:
python train.py -mode sgcls -datasize large -data_path $DATAPATH 
  • For SGDET:
python train.py -mode sgdet -datasize large -data_path $DATAPATH 

Evaluation

You can evaluate the STTran with test.py.

python test.py -m predcls -datasize large -data_path $DATAPATH -model_path $MODELPATH
python test.py -m sgcls -datasize large -data_path $DATAPATH -model_path $MODELPATH
python test.py -m sgdet -datasize large -data_path $DATAPATH -model_path $MODELPATH

Citation

If our work is helpful for your research, please cite our publication:

@inproceedings{cong2021spatial,
  title={Spatial-Temporal Transformer for Dynamic Scene Graph Generation},
  author={Cong, Yuren and Liao, Wentong and Ackermann, Hanno and Rosenhahn, Bodo and Yang, Michael Ying},
  booktitle = {International Conference on Computer Vision (ICCV)},
  year={2021}
  url={https://arxiv.org/abs/2107.12309}
}

Help

When you have any question/idea about the code/paper. Please comment in Github or send us Email. We will reply as soon as possible.

Owner
Yuren Cong
Yuren Cong
Semi-Supervised Signed Clustering Graph Neural Network (and Implementation of Some Spectral Methods)

SSSNET SSSNET: Semi-Supervised Signed Network Clustering For details, please read our paper. Environment Setup Overview The project has been tested on

Yixuan He 9 Nov 24, 2022
Self-Adaptable Point Processes with Nonparametric Time Decays

NPPDecay This is our implementation for the paper Self-Adaptable Point Processes with Nonparametric Time Decays, by Zhimeng Pan, Zheng Wang, Jeff M. P

zpan 2 Sep 24, 2022
Simulated garment dataset for virtual try-on

Simulated garment dataset for virtual try-on This repository contains the dataset used in the following papers: Self-Supervised Collision Handling via

33 Dec 20, 2022
S2s2net - Sentinel-2 Super-Resolution Segmentation Network

S2S2Net Sentinel-2 Super-Resolution Segmentation Network Getting started Install

Wei Ji 10 Nov 10, 2022
Official implementation for Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos

Multi-modal Interaction Graph Convolutioal Network for Temporal Language Localization in Videos Official implementation for Multi-Modal Interaction Gr

Zongmeng Zhang 15 Oct 18, 2022
Implementation of Diverse Semantic Image Synthesis via Probability Distribution Modeling

Diverse Semantic Image Synthesis via Probability Distribution Modeling (CVPR 2021) Paper Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu,

tzt 45 Nov 17, 2022
Pytorch library for fast transformer implementations

Transformers are very successful models that achieve state of the art performance in many natural language tasks

Idiap Research Institute 1.3k Dec 30, 2022
Blind Video Temporal Consistency via Deep Video Prior

deep-video-prior (DVP) Code for NeurIPS 2020 paper: Blind Video Temporal Consistency via Deep Video Prior PyTorch implementation | paper | project web

Chenyang LEI 272 Dec 21, 2022
BERTMap: A BERT-Based Ontology Alignment System

BERTMap: A BERT-based Ontology Alignment System Important Notices The relevant paper was accepted in AAAI-2022. Arxiv version is available at: https:/

KRR 36 Dec 24, 2022
Azion the best solution of Edge Computing in the world.

Azion Edge Function docker action Create or update an Edge Functions on Azion Edge Nodes. The domain name is the key for decision to a create or updat

8 Jul 16, 2022
Gif-caption - A straightforward GIF Captioner written in Python

Broksy's GIF Captioner Have you ever wanted to easily caption a GIF without havi

3 Apr 09, 2022
Code for "LoRA: Low-Rank Adaptation of Large Language Models"

LoRA: Low-Rank Adaptation of Large Language Models This repo contains the implementation of LoRA in GPT-2 and steps to replicate the results in our re

Microsoft 394 Jan 08, 2023
JFB: Jacobian-Free Backpropagation for Implicit Models

JFB: Jacobian-Free Backpropagation for Implicit Models

Typal Research 28 Dec 11, 2022
Implementation of CVAE. Trained CVAE on faces from UTKFace Dataset to produce synthetic faces with a given degree of happiness/smileyness.

Conditional Smiles! (SmileCVAE) About Implementation of AE, VAE and CVAE. Trained CVAE on faces from UTKFace Dataset. Using an encoding of the Smile-s

Raúl Ortega 3 Jan 09, 2022
Demonstrates how to divide a DL model into multiple IR model files (division) and introduce a simplest way to implement a custom layer works with OpenVINO IR models.

Demonstration of OpenVINO techniques - Model-division and a simplest-way to support custom layers Description: Model Optimizer in Intel(r) OpenVINO(tm

Yasunori Shimura 12 Nov 09, 2022
OptaPlanner wrappers for Python. Currently significantly slower than OptaPlanner in Java or Kotlin.

OptaPy is an AI constraint solver for Python to optimize the Vehicle Routing Problem, Employee Rostering, Maintenance Scheduling, Task Assignment, School Timetabling, Cloud Optimization, Conference S

OptaPy 211 Jan 02, 2023
Adversarial Learning for Modeling Human Motion

Adversarial Learning for Modeling Human Motion This repository contains the open source code which reproduces the results for the paper: Adversarial l

wangqi 6 Jun 15, 2021
Machine learning, in numpy

numpy-ml Ever wish you had an inefficient but somewhat legible collection of machine learning algorithms implemented exclusively in NumPy? No? Install

David Bourgin 11.6k Dec 30, 2022
Unofficial keras(tensorflow) implementation of MAE model from Masked Autoencoders Are Scalable Vision Learners

MAE-keras Unofficial keras(tensorflow) implementation of MAE model described in 'Masked Autoencoders Are Scalable Vision Learners'. This work has been

Yewon 11 Jun 12, 2022
Implementation of SE3-Transformers for Equivariant Self-Attention, in Pytorch.

SE3 Transformer - Pytorch Implementation of SE3-Transformers for Equivariant Self-Attention, in Pytorch. May be needed for replicating Alphafold2 resu

Phil Wang 207 Dec 23, 2022