Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

Last update: Dec 23, 2022

Related tags

Deep Learning PanoAVQA

Overview

Pano-AVQA

Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

[Paper] [Poster] [Video]

Getting Started

This code is based on following libraries:

python=3.8
pytorch=1.7.0 (with cuda 10.2)

To create virtual environment with all necessary libraries:

conda env create -f environment.yml

By default data should be saved under data/feat/{audio,label,visual} directory and logs (w/ cache, checkpoint) are saved under data/{cache,ckpt,log} directory. Using symbolic link is recommended:

ln -s {path_to_your_data_directory} data

We use single TITAN RTX for training, but GPUs with less memory are still doable with smaller batch size (provided precomputed features).

Dataset

We plan to release the Pano-AVQA dataset public within this year, including Q&A annotation, precomputed features, etc. Please stay tuned!

Model

Training

Default configuration is provided in code/config.py. To run with this configuration:

python cli.py

To run with custom configuration, either modify code/config.py or execute:

python cli.py with {{flags_at_your_disposal}}

Inference

Model weight is saved under ./data/log directory. To run inference only:

python cli.py eval with ckpt_file=../data/log/{experiment}/{ckpt}.pth

Citation

If you find our work useful in your research, please consider citing:

@InProceedings{Yun2021PanoAVQA,
    author = {Yun, Heeseung and Yu, Youngjae and Yang, Wonsuk and Lee, Kangil and Kim, Gunhee},
    title = {Pano-AVQA: Grounded Audio-Visual Question Answering on 360$^\circ$ Videos},
    booktitle = {ICCV},
    year = {2021}
}

Contact

If you have any inquiries, please don't hesitate to contact us via heeseung.yun at vision.snu.ac.kr.

Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

Related tags

Overview

Pano-AVQA

[Paper] [Poster] [Video]

Getting Started

Dataset

Model

Training

Inference

Citation

Contact

Owner

Heeseung Yun

Human segmentation models, training/inference code, and trained weights, implemented in PyTorch

🌎 The Modern Declarative Data Flow Framework for the AI Empowered Generation.

MPI-IS Mesh Processing Library

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning

Code repository for the paper: Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild (ICCV 2021)

Simple machine learning library / 簡單易用的機器學習套件

A PyTorch implementation of Learning to learn by gradient descent by gradient descent

Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields.

Facestar dataset. High quality audio-visual recordings of human conversational speech.

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

This is the official implementation for "Do Transformers Really Perform Bad for Graph Representation?".

DIVeR: Deterministic Integration for Volume Rendering

Convert scikit-learn models to PyTorch modules

Distributionally robust neural networks for group shifts

Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

Code for binary and multiclass model change active learning, with spectral truncation implementation.

Unrestricted Facial Geometry Reconstruction Using Image-to-Image Translation

PyTorch package for the discrete VAE used for DALL·E.

Applications using the GTN library and code to reproduce experiments in "Differentiable Weighted Finite-State Transducers"

Repository aimed at compiling code, papers, demos etc.. related to my PhD on 3D vision and machine learning for fruit detection and shape estimation at the university of Lincoln