PyTorch code for training MM-DistillNet for multimodal knowledge distillation

Overview

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

MM-DistillNet is a novel framework that is able to perform Multi-Object Detection and tracking using only ambient sound during inference time. The framework leverages on our new new MTA loss function that facilitates the distillation of information from multimodal teachers (RGB, thermal and depth) into an audio-only student network.

Illustration of MM-DistillNet

This repository contains the PyTorch implementation of our CVPR'2021 paper There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge. The repository builds on PyTorch-YOLOv3 Metrics and Yet-Another-EfficientDet-Pytorch codebases.

If you find the code useful for your research, please consider citing our paper:

@article{riverahurtado2021mmdistillnet,
  title={There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge},
  author={Rivera Valverde, Francisco and Valeria Hurtado, Juana and Valada, Abhinav},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  year={2021}
}

Demo

http://rl.uni-freiburg.de/research/multimodal-distill

System Requirements

  • Linux
  • Python 3.7
  • PyTorch 1.3
  • CUDA 10.1

IMPORTANT NOTE: These requirements are not necessarily mandatory. However, we have only tested the code under the above settings and cannot provide support for other setups.

Installation

a. Create a conda virtual environment.

git clone https://github.com/robot-learning-freiburg/MM-DistillNet.git
cd MM-DistillNet
conda create -n mmdistillnet_env
conda activate mmdistillnet_env

b. Install dependencies

pip install -r requirements.txt

Prepare datasets and configure run

We also supply our large-scale multimodal dataset with over 113,000 time-synchronized frames of RGB, depth, thermal, and audio modalities, available at http://multimodal-distill.cs.uni-freiburg.de/#dataset

Please make sure the data is available in the directory under the name data.

The binary download contains the expected folder format for our scripts to work. The path where the binary was extracted must be updated in the configuration files, in this case configs/mm-distillnet.cfg.

You will also need to download our trained teacher-models available here. Kindly download this files and have them available in the current directory, with the name of trained_models. The directory structure should look something like this:

>ls
configs/  evaluate.py  images/  LICENSE  logs/  mp3_to_pkl.py  README.md  requirements.txt  setup.cfg  src/  train.py trained_models/

>ls trained_models
LICENSE.txt              README.txt                             yet-another-efficientdet-d2-embedding.pth  yet-another-efficientdet-d2-rgb.pth
mm-distillnet.0.pth.tar  yet-another-efficientdet-d2-depth.pth  yet-another-efficientdet-d2.pth            yet-another-efficientdet-d2-thermal.pth

Additionally, the file configs/mm-distillnet.cfg contains support for different parallelization strategies and GPU/CPU support (using PyTorch's DataParallel and DistributedDataParallel)

Due to disk space constraints, we provide a mp3 version of the audio files. Librosa is known to be slow with mp3 files, so we also provide a mp3->pickle conversion utility. The idea is, that before training we convert the audio files to a spectogram and store it to a pickle file.

mp3_to_pkl.py --dir <path to the dataset>

Training and Evaluation

Training Procedure

Edit the config file appropriately in configs folder. Our best recipe is found under configs/mm-distillnet.cfg.

python train.py --config 
   

   

To run the full dataset We our method using 4 GPUs with 2.4 Gb memory each (The expected runtime is 7 days). After training, the best model would be stored under /best.pth.tar . This file can be used to evaluate the performance of the model.

Evaluation Procedure

Evaluate the performance of the model (Our best model can be found under trained_models/mm-distillnet.0.pth.tar):

python evaluate.py --config 
   
     --checkpoint 
    

    
   

Results

The evaluation results of our method, after bayesian optimization, are (more details can be found in the paper):

Method KD [email protected] [email protected] [email protected] CDx CDy
StereoSoundNet[4] RGB 44.05 62.38 41.46 3.00 2.24
:--- ------------- ------------- ------------- ------------- ------------- -------------
MM-DistillNet RGB 61.62 84.29 59.66 1.27 0.69

Pre-Trained Models

Our best pre-trained model can be found on the dataset installation path.

Acknowledgements

We have used utility functions from other open-source projects. We especially thank the authors of:

Contacts

License

For academic usage, the code is released under the GPLv3 license. For any commercial purpose, please contact the authors.

Official implementation of "An Image is Worth 16x16 Words, What is a Video Worth?" (2021 paper)

An Image is Worth 16x16 Words, What is a Video Worth? paper Official PyTorch Implementation Gilad Sharir, Asaf Noy, Lihi Zelnik-Manor DAMO Academy, Al

213 Nov 12, 2022
Code for the paper "Next Generation Reservoir Computing"

Next Generation Reservoir Computing This is the code for the results and figures in our paper "Next Generation Reservoir Computing". They are written

OSU QuantInfo Lab 105 Dec 20, 2022
Machine Learning with JAX Tutorials

The purpose of this repo is to make it easy to get started with JAX. It contains my "Machine Learning with JAX" series of tutorials (YouTube videos and Jupyter Notebooks) as well as the content I fou

Aleksa Gordić 372 Dec 28, 2022
Prototype-based Incremental Few-Shot Semantic Segmentation

Prototype-based Incremental Few-Shot Semantic Segmentation Fabio Cermelli, Massimiliano Mancini, Yongqin Xian, Zeynep Akata, Barbara Caputo -- BMVC 20

Fabio Cermelli 21 Dec 29, 2022
ROS support for Velodyne 3D LIDARs

Overview Velodyne1 is a collection of ROS2 packages supporting Velodyne high definition 3D LIDARs3. Warning: The master branch normally contains code

ROS device drivers 543 Dec 30, 2022
Transfer Learning for Pose Estimation of Illustrated Characters

bizarre-pose-estimator Transfer Learning for Pose Estimation of Illustrated Characters Shuhong Chen *, Matthias Zwicker * WACV2022 [arxiv] [video] [po

Shuhong Chen 142 Dec 28, 2022
Example for AUAV 2022 with obstacle avoidance.

AUAV 2022 Sample This is a sample PX4 based quadrotor path planning framework based on Ubuntu 20.04 and ROS noetic for the IEEE Autonomous UAS 2022 co

James Goppert 11 Sep 16, 2022
Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

SimplePose Code and pre-trained models for our paper, “Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation”, a

Jia Li 256 Dec 24, 2022
Anonymize BLM Protest Images

Anonymize BLM Protest Images This repository automates @BLMPrivacyBot, a Twitter bot that shows the anonymized images to help keep protesters safe. Us

Stanford Machine Learning Group 40 Oct 13, 2022
PyG (PyTorch Geometric) - A library built upon PyTorch to easily write and train Graph Neural Networks (GNNs)

PyG (PyTorch Geometric) is a library built upon PyTorch to easily write and train Graph Neural Networks (GNNs) for a wide range of applications related to structured data.

PyG 16.5k Jan 08, 2023
CRLT: A Unified Contrastive Learning Toolkit for Unsupervised Text Representation Learning

CRLT: A Unified Contrastive Learning Toolkit for Unsupervised Text Representation Learning This repository contains the code and relevant instructions

XiaoMing 5 Aug 19, 2022
A tensorflow implementation of an HMM layer

tensorflow_hmm Tensorflow and numpy implementations of the HMM viterbi and forward/backward algorithms. See Keras example for an example of how to use

Zach Dwiel 283 Oct 19, 2022
Differentiable Abundance Matching With Python

shamnet Differentiable Stellar Population Synthesis Installation You can install shamnet with pip. Installation dependencies are numpy, jax, corrfunc,

5 Dec 17, 2021
Bayesian Meta-Learning Through Variational Gaussian Processes

vmgp This is the repository of Vivek Myers and Nikhil Sardana for our CS 330 final project, Bayesian Meta-Learning Through Variational Gaussian Proces

Vivek Myers 2 Nov 17, 2022
[CVPR2021] Look before you leap: learning landmark features for one-stage visual grounding.

LBYL-Net This repo implements paper Look Before You Leap: Learning Landmark Features For One-Stage Visual Grounding CVPR 2021. Getting Started Prerequ

SVIP Lab 45 Dec 12, 2022
Code for "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds", CVPR 2021

PV-RAFT This repository contains the PyTorch implementation for paper "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clou

Yi Wei 43 Dec 05, 2022
N-gram models- Unsmoothed, Laplace, Deleted Interpolation

N-gram models- Unsmoothed, Laplace, Deleted Interpolation

Ravika Nagpal 1 Jan 04, 2022
FairFuzz: AFL extension targeting rare branches

FairFuzz An AFL extension to increase code coverage by targeting rare branches. FairFuzz has a particular advantage on programs with highly nested str

Caroline Lemieux 222 Nov 16, 2022
[ICCV2021] Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving

Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving

Xuanchi Ren 44 Dec 03, 2022
Real-time face detection and emotion/gender classification using fer2013/imdb datasets with a keras CNN model and openCV.

Real-time face detection and emotion/gender classification using fer2013/imdb datasets with a keras CNN model and openCV.

Octavio Arriaga 5.3k Dec 30, 2022