PyTorch code for training MM-DistillNet for multimodal knowledge distillation

Overview

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

MM-DistillNet is a novel framework that is able to perform Multi-Object Detection and tracking using only ambient sound during inference time. The framework leverages on our new new MTA loss function that facilitates the distillation of information from multimodal teachers (RGB, thermal and depth) into an audio-only student network.

Illustration of MM-DistillNet

This repository contains the PyTorch implementation of our CVPR'2021 paper There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge. The repository builds on PyTorch-YOLOv3 Metrics and Yet-Another-EfficientDet-Pytorch codebases.

If you find the code useful for your research, please consider citing our paper:

@article{riverahurtado2021mmdistillnet,
  title={There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge},
  author={Rivera Valverde, Francisco and Valeria Hurtado, Juana and Valada, Abhinav},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  year={2021}
}

Demo

http://rl.uni-freiburg.de/research/multimodal-distill

System Requirements

  • Linux
  • Python 3.7
  • PyTorch 1.3
  • CUDA 10.1

IMPORTANT NOTE: These requirements are not necessarily mandatory. However, we have only tested the code under the above settings and cannot provide support for other setups.

Installation

a. Create a conda virtual environment.

git clone https://github.com/robot-learning-freiburg/MM-DistillNet.git
cd MM-DistillNet
conda create -n mmdistillnet_env
conda activate mmdistillnet_env

b. Install dependencies

pip install -r requirements.txt

Prepare datasets and configure run

We also supply our large-scale multimodal dataset with over 113,000 time-synchronized frames of RGB, depth, thermal, and audio modalities, available at http://multimodal-distill.cs.uni-freiburg.de/#dataset

Please make sure the data is available in the directory under the name data.

The binary download contains the expected folder format for our scripts to work. The path where the binary was extracted must be updated in the configuration files, in this case configs/mm-distillnet.cfg.

You will also need to download our trained teacher-models available here. Kindly download this files and have them available in the current directory, with the name of trained_models. The directory structure should look something like this:

>ls
configs/  evaluate.py  images/  LICENSE  logs/  mp3_to_pkl.py  README.md  requirements.txt  setup.cfg  src/  train.py trained_models/

>ls trained_models
LICENSE.txt              README.txt                             yet-another-efficientdet-d2-embedding.pth  yet-another-efficientdet-d2-rgb.pth
mm-distillnet.0.pth.tar  yet-another-efficientdet-d2-depth.pth  yet-another-efficientdet-d2.pth            yet-another-efficientdet-d2-thermal.pth

Additionally, the file configs/mm-distillnet.cfg contains support for different parallelization strategies and GPU/CPU support (using PyTorch's DataParallel and DistributedDataParallel)

Due to disk space constraints, we provide a mp3 version of the audio files. Librosa is known to be slow with mp3 files, so we also provide a mp3->pickle conversion utility. The idea is, that before training we convert the audio files to a spectogram and store it to a pickle file.

mp3_to_pkl.py --dir <path to the dataset>

Training and Evaluation

Training Procedure

Edit the config file appropriately in configs folder. Our best recipe is found under configs/mm-distillnet.cfg.

python train.py --config 
   

   

To run the full dataset We our method using 4 GPUs with 2.4 Gb memory each (The expected runtime is 7 days). After training, the best model would be stored under /best.pth.tar . This file can be used to evaluate the performance of the model.

Evaluation Procedure

Evaluate the performance of the model (Our best model can be found under trained_models/mm-distillnet.0.pth.tar):

python evaluate.py --config 
   
     --checkpoint 
    

    
   

Results

The evaluation results of our method, after bayesian optimization, are (more details can be found in the paper):

Method KD [email protected] [email protected] [email protected] CDx CDy
StereoSoundNet[4] RGB 44.05 62.38 41.46 3.00 2.24
:--- ------------- ------------- ------------- ------------- ------------- -------------
MM-DistillNet RGB 61.62 84.29 59.66 1.27 0.69

Pre-Trained Models

Our best pre-trained model can be found on the dataset installation path.

Acknowledgements

We have used utility functions from other open-source projects. We especially thank the authors of:

Contacts

License

For academic usage, the code is released under the GPLv3 license. For any commercial purpose, please contact the authors.

Film review classification

Film review classification Решение задачи классификации отзывов на фильмы на положительные и отрицательные с помощью рекуррентных нейронных сетей 1. З

Nikita Dukin 3 Jan 21, 2022
Repository for training material for the 2022 SDSC HPC/CI User Training Course

hpc-training-2022 Repository for training material for the 2022 SDSC HPC/CI Training Series HPC/CI Training Series home https://www.sdsc.edu/event_ite

sdsc-hpc-training-org 21 Jul 27, 2022
NP DRAW paper released code

NP-DRAW: A Non-Parametric Structured Latent Variable Model for Image Generation This repo contains the official implementation for the NP-DRAW paper.

ZENG Xiaohui 22 Mar 13, 2022
A TensorFlow implementation of the Mnemonic Descent Method.

MDM A Tensorflow implementation of the Mnemonic Descent Method. Mnemonic Descent Method: A recurrent process applied for end-to-end face alignment G.

123 Oct 07, 2022
[ICRA 2022] CaTGrasp: Learning Category-Level Task-Relevant Grasping in Clutter from Simulation

This is the official implementation of our paper: Bowen Wen, Wenzhao Lian, Kostas Bekris, and Stefan Schaal. "CaTGrasp: Learning Category-Level Task-R

Bowen Wen 199 Jan 04, 2023
Companion repository to the paper accepted at the 4th ACM SIGSPATIAL International Workshop on Advances in Resilient and Intelligent Cities

Transfer learning approach to bicycle sharing systems station location planning using OpenStreetMap Companion repository to the paper accepted at the

Politechnika Wrocławska - repozytorium dla informatyków 4 Oct 24, 2022
This repository contains the database and code used in the paper Embedding Arithmetic for Text-driven Image Transformation

This repository contains the database and code used in the paper Embedding Arithmetic for Text-driven Image Transformation (Guillaume Couairon, Holger

Meta Research 31 Oct 17, 2022
Full Transformer Framework for Robust Point Cloud Registration with Deep Information Interaction

Full Transformer Framework for Robust Point Cloud Registration with Deep Information Interaction. arxiv This repository contains python scripts for tr

12 Dec 12, 2022
This is a pytorch implementation for the BST model from Alibaba https://arxiv.org/pdf/1905.06874.pdf

Behavior-Sequence-Transformer-Pytorch This is a pytorch implementation for the BST model from Alibaba https://arxiv.org/pdf/1905.06874.pdf This model

Jaime Ferrando Huertas 83 Jan 05, 2023
Implementation of the famous Image Manipulation\Forgery Detector "ManTraNet" in Pytorch

Who has never met a forged picture on the web ? No one ! Everyday we are constantly facing fake pictures touched up in Photoshop but it is not always

Rony Abecidan 77 Dec 16, 2022
Clustering with variational Bayes and population Monte Carlo

pypmc pypmc is a python package focusing on adaptive importance sampling. It can be used for integration and sampling from a user-defined target densi

45 Feb 06, 2022
Accelerated Multi-Modal MR Imaging with Transformers

Accelerated Multi-Modal MR Imaging with Transformers Dependencies numpy==1.18.5 scikit_image==0.16.2 torchvision==0.8.1 torch==1.7.0 runstats==1.8.0 p

54 Dec 16, 2022
Spatial Temporal Graph Convolutional Networks (ST-GCN) for Skeleton-Based Action Recognition in PyTorch

Reminder ST-GCN has transferred to MMSkeleton, and keep on developing as an flexible open source toolbox for skeleton-based human understanding. You a

sijie yan 1.1k Dec 25, 2022
Learning 3D Part Assembly from a Single Image

Learning 3D Part Assembly from a Single Image This repository contains a PyTorch implementation of the paper: Learning 3D Part Assembly from A Single

18 Dec 21, 2022
Level Based Customer Segmentation

level_based_customer_segmentation Level Based Customer Segmentation Persona Veri Seti kullanılarak müşteri segmentasyonu yapılmıştır. KOLONLAR : PRICE

Buse Yıldırım 6 Dec 21, 2021
Yet Another Reinforcement Learning Tutorial

This repo contains self-contained RL implementations

Sungjoon 65 Dec 10, 2022
This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

NeurIPS 2021 (Spotlight): Task-Adaptive Neural Network Search with Meta-Contrastive Learning This is an official PyTorch implementation of Task-Adapti

Wonyong Jeong 15 Nov 21, 2022
Deep Distributed Control of Port-Hamiltonian Systems

De(e)pendable Distributed Control of Port-Hamiltonian Systems (DeepDisCoPH) This repository is associated to the paper [1] and it contains: The full p

Dependable Control and Decision group - EPFL 3 Aug 17, 2022
This is a Image aid classification software based on python TK library development

This is a Image aid classification software based on python TK library development.

EasonChan 1 Jan 17, 2022
How to Train a GAN? Tips and tricks to make GANs work

(this list is no longer maintained, and I am not sure how relevant it is in 2020) How to Train a GAN? Tips and tricks to make GANs work While research

Soumith Chintala 10.8k Dec 31, 2022