PyTorch code for training MM-DistillNet for multimodal knowledge distillation

Overview

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

MM-DistillNet is a novel framework that is able to perform Multi-Object Detection and tracking using only ambient sound during inference time. The framework leverages on our new new MTA loss function that facilitates the distillation of information from multimodal teachers (RGB, thermal and depth) into an audio-only student network.

Illustration of MM-DistillNet

This repository contains the PyTorch implementation of our CVPR'2021 paper There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge. The repository builds on PyTorch-YOLOv3 Metrics and Yet-Another-EfficientDet-Pytorch codebases.

If you find the code useful for your research, please consider citing our paper:

@article{riverahurtado2021mmdistillnet,
  title={There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge},
  author={Rivera Valverde, Francisco and Valeria Hurtado, Juana and Valada, Abhinav},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  year={2021}
}

Demo

http://rl.uni-freiburg.de/research/multimodal-distill

System Requirements

  • Linux
  • Python 3.7
  • PyTorch 1.3
  • CUDA 10.1

IMPORTANT NOTE: These requirements are not necessarily mandatory. However, we have only tested the code under the above settings and cannot provide support for other setups.

Installation

a. Create a conda virtual environment.

git clone https://github.com/robot-learning-freiburg/MM-DistillNet.git
cd MM-DistillNet
conda create -n mmdistillnet_env
conda activate mmdistillnet_env

b. Install dependencies

pip install -r requirements.txt

Prepare datasets and configure run

We also supply our large-scale multimodal dataset with over 113,000 time-synchronized frames of RGB, depth, thermal, and audio modalities, available at http://multimodal-distill.cs.uni-freiburg.de/#dataset

Please make sure the data is available in the directory under the name data.

The binary download contains the expected folder format for our scripts to work. The path where the binary was extracted must be updated in the configuration files, in this case configs/mm-distillnet.cfg.

You will also need to download our trained teacher-models available here. Kindly download this files and have them available in the current directory, with the name of trained_models. The directory structure should look something like this:

>ls
configs/  evaluate.py  images/  LICENSE  logs/  mp3_to_pkl.py  README.md  requirements.txt  setup.cfg  src/  train.py trained_models/

>ls trained_models
LICENSE.txt              README.txt                             yet-another-efficientdet-d2-embedding.pth  yet-another-efficientdet-d2-rgb.pth
mm-distillnet.0.pth.tar  yet-another-efficientdet-d2-depth.pth  yet-another-efficientdet-d2.pth            yet-another-efficientdet-d2-thermal.pth

Additionally, the file configs/mm-distillnet.cfg contains support for different parallelization strategies and GPU/CPU support (using PyTorch's DataParallel and DistributedDataParallel)

Due to disk space constraints, we provide a mp3 version of the audio files. Librosa is known to be slow with mp3 files, so we also provide a mp3->pickle conversion utility. The idea is, that before training we convert the audio files to a spectogram and store it to a pickle file.

mp3_to_pkl.py --dir <path to the dataset>

Training and Evaluation

Training Procedure

Edit the config file appropriately in configs folder. Our best recipe is found under configs/mm-distillnet.cfg.

python train.py --config 
   

   

To run the full dataset We our method using 4 GPUs with 2.4 Gb memory each (The expected runtime is 7 days). After training, the best model would be stored under /best.pth.tar . This file can be used to evaluate the performance of the model.

Evaluation Procedure

Evaluate the performance of the model (Our best model can be found under trained_models/mm-distillnet.0.pth.tar):

python evaluate.py --config 
   
     --checkpoint 
    

    
   

Results

The evaluation results of our method, after bayesian optimization, are (more details can be found in the paper):

Method KD [email protected] [email protected] [email protected] CDx CDy
StereoSoundNet[4] RGB 44.05 62.38 41.46 3.00 2.24
:--- ------------- ------------- ------------- ------------- ------------- -------------
MM-DistillNet RGB 61.62 84.29 59.66 1.27 0.69

Pre-Trained Models

Our best pre-trained model can be found on the dataset installation path.

Acknowledgements

We have used utility functions from other open-source projects. We especially thank the authors of:

Contacts

License

For academic usage, the code is released under the GPLv3 license. For any commercial purpose, please contact the authors.

Pure python implementation reverse-mode automatic differentiation

MiniGrad A minimal implementation of reverse-mode automatic differentiation (a.k.a. autograd / backpropagation) in pure Python. Inspired by Andrej Kar

Kenny Song 76 Sep 12, 2022
Code for Private Recommender Systems: How Can Users Build Their Own Fair Recommender Systems without Log Data? (SDM 2022)

Private Recommender Systems: How Can Users Build Their Own Fair Recommender Systems without Log Data? (SDM 2022) We consider how a user of a web servi

joisino 20 Aug 21, 2022
The code for our paper CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention.

CrossFormer This repository is the code for our paper CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention. Introduction Existin

cheerss 238 Jan 06, 2023
Demos of essentia classifiers hosted on replicate.ai

essentia-replicate-demos Demos of Essentia models hosted on replicate.ai's MTG site. The models Check our site for a complete list of the models avail

Music Technology Group - Universitat Pompeu Fabra 12 Nov 14, 2022
Pytorch Implementation of Auto-Compressing Subset Pruning for Semantic Image Segmentation

Pytorch Implementation of Auto-Compressing Subset Pruning for Semantic Image Segmentation Introduction ACoSP is an online pruning algorithm that compr

Merantix 8 Dec 07, 2022
Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline Ankit Goyal, Hei Law, Bowei Liu, Alejandro Newell, Jia Deng Internati

Princeton Vision & Learning Lab 115 Jan 04, 2023
(NeurIPS 2021) Pytorch implementation of paper "Re-ranking for image retrieval and transductive few-shot classification"

SSR (NeurIPS 2021) Pytorch implementation of paper "Re-ranking for image retrieval and transductivefew-shot classification" [Paper] [Project webpage]

xshen 29 Dec 06, 2022
Stream images from a connected camera over MQTT, view using Streamlit, record to file and sqlite

mqtt-camera-streamer Summary: Publish frames from a connected camera or MJPEG/RTSP stream to an MQTT topic, and view the feed in a browser on another

Robin Cole 183 Dec 16, 2022
Official PyTorch implementation of Learning Intra-Batch Connections for Deep Metric Learning (ICML 2021) published at International Conference on Machine Learning

About This repository the official PyTorch implementation of Learning Intra-Batch Connections for Deep Metric Learning. The config files contain the s

Dynamic Vision and Learning Group 41 Dec 10, 2022
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Graph ConvNets in PyTorch October 15, 2017 Xavier Bresson http://www.ntu.edu.sg/home/xbresson https://github.com/xbresson https://twitter.com/xbresson

Xavier Bresson 287 Jan 04, 2023
Code for CVPR 2021 paper: Anchor-Free Person Search

Introduction This is the implementationn for Anchor-Free Person Search in CVPR2021 License This project is released under the Apache 2.0 license. Inst

158 Jan 04, 2023
Official PyTorch code for Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution (MANet, ICCV2021)

Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution (MANet, ICCV2021) This repository is the official PyTorc

Jingyun Liang 139 Dec 29, 2022
Half Instance Normalization Network for Image Restoration

HINet Half Instance Normalization Network for Image Restoration, based on https://github.com/megvii-model/HINet. Dependencies NumPy PyTorch, preferabl

Holy Wu 4 Jun 06, 2022
Learning What and Where to Draw

###Learning What and Where to Draw Scott Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, Honglak Lee This is the code for our NIPS 201

Scott Ellison Reed 337 Nov 18, 2022
Populating 3D Scenes by Learning Human-Scene Interaction https://posa.is.tue.mpg.de/

Populating 3D Scenes by Learning Human-Scene Interaction [Project Page] [Paper] License Software Copyright License for non-commercial scientific resea

Mohamed Hassan 81 Nov 08, 2022
A bare-bones Python library for quality diversity optimization.

pyribs Website Source PyPI Conda CI/CD Docs Docs Status Twitter pyribs.org GitHub docs.pyribs.org A bare-bones Python library for quality diversity op

ICAROS 127 Jan 06, 2023
Python calculations for the position of the sun and moon.

Astral This is 'astral' a Python module which calculates Times for various positions of the sun: dawn, sunrise, solar noon, sunset, dusk, solar elevat

Simon Kennedy 169 Dec 20, 2022
Plugin for Gaffer providing direct acess to asset from PolyHaven.com. Only HDRIs at the moment, Cycles and Arnold supported

GafferHaven Plugin for Gaffer providing direct acess to asset from PolyHaven.com. Only HDRIs are supported at the moment, in Cycles and Arnold lights.

Jakub Vondra 6 Jan 26, 2022
MicroNet: Improving Image Recognition with Extremely Low FLOPs (ICCV 2021)

MicroNet: Improving Image Recognition with Extremely Low FLOPs (ICCV 2021) A pytorch implementation of MicroNet. If you use this code in your research

Yunsheng Li 293 Dec 28, 2022
An All-MLP solution for Vision, from Google AI

MLP Mixer - Pytorch An All-MLP solution for Vision, from Google AI, in Pytorch. No convolutions nor attention needed! Yannic Kilcher video Install $ p

Phil Wang 784 Jan 06, 2023