Localizing-Visual-Sounds-the-Hard-Way

Code and Dataset for "Localizing Visual Sounds the Hard Way".

The repo contains code and our pre-trained model.

Environment

Python 3.6.8
Pytorch 1.3.0

Flickr-SoundNet

We provide the pretrained model here.

To test the model, testing data and ground truth should be downloaded from learning to localize sound source.

Then run

python test.py --data_path "path to downloaded data with structure below/" --summaries_dir "path to pretrained models" --gt_path "path to ground truth" --testset "flickr"

VGG-Sound Source

We provide the pretrained model here.

To test the model, run

python test.py --data_path "path to downloaded data with structure below/" --summaries_dir "path to pretrained models" --testset "vggss"

(Note, some gt bounding boxes are updated recently, all results on VGG-SS cause a 2~3% difference on IoU.)

Both test data should be placed in the following structure.

data path
│
└───frames
│   │   image001.jpg
│   │   image002.jpg
│   │
└───audio
    │   audio011.wav
    │   audio012.wav

Citation

@InProceedings{Chen21,
              title        = "Localizing Visual Sounds the Hard Way",
              author       = "Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman",
              booktitle    = "CVPR",
              year         = "2021"}

Localizing Visual Sounds the Hard Way

Related tags

Overview

Localizing-Visual-Sounds-the-Hard-Way

Environment

Flickr-SoundNet

VGG-Sound Source

Citation

Owner

Honglie Chen

Implementation of the Chamfer Distance as a module for pyTorch

Official code of Team Yao at Multi-Modal-Fact-Verification-2022

Official code base for the poster "On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation" published in NeurIPS 2021 Workshop (SVRHM)

Official implementation of "Dynamic Anchor Learning for Arbitrary-Oriented Object Detection" (AAAI2021).

PixelPyramids: Exact Inference Models from Lossless Image Pyramids (ICCV 2021)

On Out-of-distribution Detection with Energy-based Models

Colab notebook for openai/glide-text2im.

Training PSPNet in Tensorflow. Reproduce the performance from the paper.

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

OpenLT: An open-source project for long-tail classification

E-RAFT: Dense Optical Flow from Event Cameras

Code for "Single-view robot pose and joint angle estimation via render & compare", CVPR 2021 (Oral).

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers

Code for "Adversarial Attack Generation Empowered by Min-Max Optimization", NeurIPS 2021

Face Recognition plus identification simply and fast | Python

An open-source online reverse dictionary.

In the AI for TSP competition we try to solve optimization problems using machine learning.

A simple, high level, easy-to-use open source Computer Vision library for Python.

TensorFlow implementation of original paper : https://github.com/hszhao/PSPNet

A Web API for automatic background removal using Deep Learning. App is made using Flask and deployed on Heroku.