Code for Discriminative Sounding Objects Localization (NeurIPS 2020)

Overview

Discriminative Sounding Objects Localization

Code for our NeurIPS 2020 paper Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching (The previous title is Learning to Discriminatively Localize Sounding Objects in a Cocktail-party Scenario). The code is implemented on PyTorch with python3.

Requirements

  • PyTorch 1.1
  • torchvision
  • scikit-learn
  • librosa
  • Pillow
  • opencv

Running Procedure

For experiments on Music or AudioSet-instrument, the training and evaluation procedures are similar, respectively under the folder music-exp and audioset-instrument. Here, we take the experiments on Music dataset as an example.

Data Preparation

The sounding object bounding box annotations on solo and duet are stored in music-exp/solotest.json and music-exp/duettest.json, and the data and annotations of synthetic set are available at https://zenodo.org/record/4079386#.X4PFodozbb2 . And the Audioset-instrument balanced subset bounding box annotations are in audioset-instrument/audioset_box.json

Training

Stage one
training_stage_one.py [-h]
optional arguments:
[--batch_size] training batchsize
[--learning_rate] learning rate
[--epoch] total training epoch
[--evaluate] only do testing or also training
[--use_pretrain] whether to initialize from ckpt
[--ckpt_file] the ckpt file path to be resumed
[--use_class_task] whether to use localization-classification alternative training
[--class_iter] training iterations for classification of each epoch
[--mask] mask threshold to determine whether is object or background
[--cluster] number of clusters for discrimination
python3 training_stage_one.py

After training of stage one, we will get the cluster pseudo labels and object dictionary of different classes in the folder ./obj_features, which is then used in the second stage training as category-aware object representation reference.

Stage two
training_stage_two.py [-h]
optional arguments:
[--batch_size] training batchsize
[--learning_rate] learning rate
[--epoch] total training epoch
[--evaluate] only do testing or also training
[--use_pretrain] whether to initialize from ckpt
[--ckpt_file] the ckpt file path to be resumed
python3 training_stage_two.py

Evaluation

Stage one

We first generate localization results and save then as a pkl file, then calculate metrics, IoU and AUC and also generate visualizations, by running

python3 test.py
python3 tools.py
Stage two

For evaluation of stage two, i.e., class-aware sounding object localization in multi-source scenes, we first match the cluster pseudo labels generated in stage one with gt labels to accordingly assign one object category to each center representation in the object dictionary by running

python3 match_cluster.py

It is necessary to manually ensure there is one-to-one matching between object category and each center representation.

Then we generate the localization results and calculate metrics, CIoU AUC and NSA, by running

python3 test_stage_two.py
python3 eval.py

Results

The two tables respectively show our model's performance on single-source and multi-source scenarios.

The following figures show the category-aware localization results under multi-source scenes. The green boxes mean the sounding objects while the red boxes are silent ones.

Code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”

GATER This repository contains the code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”. Our implementation is

Jiacheng Ye 12 Nov 24, 2022
A PyTorch implementation of EfficientNet and EfficientNetV2 (coming soon!)

EfficientNet PyTorch Quickstart Install with pip install efficientnet_pytorch and load a pretrained EfficientNet with: from efficientnet_pytorch impor

Luke Melas-Kyriazi 7.2k Jan 06, 2023
SemEval2022 Patronizing and Condescending Language (PCL) Detection

SemEval2022 Patronizing and Condescending Language (PCL) Detection This task is from SemEval 2022. What is Patronizing and Condescending Language (PCL

Daniel Saeedi 0 Aug 05, 2022
For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training.

LongScientificFormer For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training. Some code

Athar Sefid 6 Nov 02, 2022
(NeurIPS 2021) Pytorch implementation of paper "Re-ranking for image retrieval and transductive few-shot classification"

SSR (NeurIPS 2021) Pytorch implementation of paper "Re-ranking for image retrieval and transductivefew-shot classification" [Paper] [Project webpage]

xshen 29 Dec 06, 2022
Noise Conditional Score Networks (NeurIPS 2019, Oral)

Generative Modeling by Estimating Gradients of the Data Distribution This repo contains the official implementation for the NeurIPS 2019 paper Generat

451 Dec 26, 2022
CrossNorm and SelfNorm for Generalization under Distribution Shifts (ICCV 2021)

CrossNorm (CN) and SelfNorm (SN) (Accepted at ICCV 2021) This is the official PyTorch implementation of our CNSN paper, in which we propose CrossNorm

100 Dec 28, 2022
A collection of SOTA Image Classification Models in PyTorch

A collection of SOTA Image Classification Models in PyTorch

sithu3 85 Dec 30, 2022
JAXMAPP: JAX-based Library for Multi-Agent Path Planning in Continuous Spaces

JAXMAPP: JAX-based Library for Multi-Agent Path Planning in Continuous Spaces JAXMAPP is a JAX-based library for multi-agent path planning (MAPP) in c

OMRON SINIC X 24 Dec 28, 2022
Ranger deep learning optimizer rewrite to use newest components

Ranger21 - integrating the latest deep learning components into a single optimizer Ranger deep learning optimizer rewrite to use newest components Ran

Less Wright 266 Dec 28, 2022
Vision transformers (ViTs) have found only limited practical use in processing images

CXV Convolutional Xformers for Vision Vision transformers (ViTs) have found only limited practical use in processing images, in spite of their state-o

Cloudwalker 23 Sep 10, 2022
Instantaneous Motion Generation for Robots and Machines.

Ruckig Instantaneous Motion Generation for Robots and Machines. Ruckig generates trajectories on-the-fly, allowing robots and machines to react instan

Berscheid 374 Dec 23, 2022
DilatedNet in Keras for image segmentation

Keras implementation of DilatedNet for semantic segmentation A native Keras implementation of semantic segmentation according to Multi-Scale Context A

303 Mar 15, 2022
IDRLnet, a Python toolbox for modeling and solving problems through Physics-Informed Neural Network (PINN) systematically.

IDRLnet IDRLnet is a machine learning library on top of PyTorch. Use IDRLnet if you need a machine learning library that solves both forward and inver

IDRL 105 Dec 17, 2022
Official DGL implementation of "Rethinking High-order Graph Convolutional Networks"

SE Aggregation This is the implementation for Rethinking High-order Graph Convolutional Networks. Here we show the codes for citation networks as an e

Tianqi Zhang (张天启) 32 Jul 19, 2022
The implementation for the SportsCap (IJCV 2021)

SportsCap: Monocular 3D Human Motion Capture and Fine-grained Understanding in Challenging Sports Videos ProjectPage | Paper | Video | Dataset (Part01

Chen Xin 79 Dec 16, 2022
Quadruped-command-tracking-controller - Quadruped command tracking controller (flat terrain)

Quadruped command tracking controller (flat terrain) Prepare Install RAISIM link

Yunho Kim 4 Oct 20, 2022
(CVPR2021) DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation

DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation CVPR2021(oral) [arxiv] Requirements python3.7 pytorch==

W-zx-Y 85 Dec 07, 2022
PyTorch implementations of the paper: "Learning Independent Instance Maps for Crowd Localization"

IIM - Crowd Localization This repo is the official implementation of paper: Learning Independent Instance Maps for Crowd Localization. The code is dev

tao han 91 Nov 10, 2022
Diffgram - Supervised Learning Data Platform

Data Annotation, Data Labeling, Annotation Tooling, Training Data for Machine Learning

Diffgram 1.6k Jan 07, 2023