Code release for paper: The Boombox: Visual Reconstruction from Acoustic Vibrations

Related tags

Deep Learningboombox
Overview

The Boombox: Visual Reconstruction from Acoustic Vibrations

Boyuan Chen, Mia Chiquier, Hod Lipson, Carl Vondrick
Columbia University

Project Website | Video | Paper

Overview

This repo contains the PyTorch implementation for paper "The Boombox: Visual Reconstruction from Acoustic Vibrations".

teaser

Content

Installation

Our code has been tested on Ubuntu 18.04 with CUDA 11.0. Create a python virtual environment and install the dependencies.

virtualenv -p /usr/bin/python3.6 env-boombox
source env-boombox/bin/activate
cd boombox
pip install -r requirements.txt

Data Preparation

Run the following commands to download the dataset (2.0G).

cd boombox
wget https://boombox.cs.columbia.edu/dataset/data.zip
unzip data.zip
rm -rf data.zip

After this step, you should see a folder named as data, and video and audio data are in cube, small_cuboid and large_cuboid subfolders.

About Configs and Logs

Before training and evaluation, we first introduce the configuration and logging structure.

  1. Configs: all the specific parameters used for training and evaluation are indicated as individual config file. Overall, we have two training paradigms: single-shape and multiple-shape.

    For single-shape, we train and evaluate on each shape separately. Their config files are named with their own shape: cube, large_cuboid and small_cuboid. For multiple-shape, we mix all the shapes together and perform training and evaluation while the shape is not known a priori. The config file folder is all.

    Within each config folder, we have config file for depth prediction and image prediction. The last digit in each folder refers to the random seed. For example, if you want to train our model with all the shapes mixed to output a RGB image with random seed 3, you should refer the parameters in:

    configs/all/2d_out_img_3
    
  2. Logs: both the training and evaluation results will be saved in the log folder for each experiment. The last digit in the logs folder indicates the random seed. Inside the logs folder, the structure and contents are:

    \logs_True_False_False_image_conv2d-encoder-decoder_True_{output_representation}_{seed}
        \lightning_logs
            \checkpoints               [saved checkpoint]
            \version_0                 [training stats]
            \version_1                 [testing stats]
        \pred_visualizations           [predicted and ground-truth images]
    

Training

Both training and evaluation are fast. We provide an example bash script for running our experiments in run_audio.sh. Specifically, to train our model on all shapes that outputs RGB image representations with random seed 1 and GPU 0, run the following command:

CUDA_VISIBLE_DEVICES=0 python main.py ./configs/all/2d_out_img_1/config.yaml;

Evaluation

Again, we provide an example bash script for running our experiments in run_audio.sh. Following the above example, to evaluate the trained model, run the following command:

CUDA_VISIBLE_DEVICES=0 python eval.py ./configs/all/2d_out_img_1/config.yaml ./logs_True_False_False_image_conv2d-encoder-decoder_True_pixel_1/lightning_logs/checkpoints;

License

This repository is released under the MIT license. See LICENSE for additional details.

Owner
Boyuan Chen
Ph.D. student in Computer Science at Columbia University Creative Machines Lab.
Boyuan Chen
Self-labelling via simultaneous clustering and representation learning. (ICLR 2020)

Self-labelling via simultaneous clustering and representation learning 🆗 🆗 🎉 NEW models (20th August 2020): Added standard SeLa pretrained torchvis

Yuki M. Asano 469 Jan 02, 2023
A hand tracking demo made with mediapipe where you can control lights with pinching your fingers and moving your hand up/down.

HandTrackingBrightnessControl A hand tracking demo made with mediapipe where you can control lights with pinching your fingers and moving your hand up

Teemu Laurila 19 Feb 12, 2022
Project for music generation system based on object tracking and CGAN

Project for music generation system based on object tracking and CGAN The project was inspired by MIDINet: A Convolutional Generative Adversarial Netw

1 Nov 21, 2021
ClevrTex: A Texture-Rich Benchmark for Unsupervised Multi-Object Segmentation

ClevrTex This repository contains dataset generation code for ClevrTex benchmark from paper: ClevrTex: A Texture-Rich Benchmark for Unsupervised Multi

Laurynas Karazija 26 Dec 21, 2022
DI-HPC is an acceleration operator component for general algorithm modules in reinforcement learning algorithms

DI-HPC: Decision Intelligence - High Performance Computation DI-HPC is an acceleration operator component for general algorithm modules in reinforceme

OpenDILab 185 Dec 29, 2022
A stable algorithm for GAN training

DRAGAN (Deep Regret Analytic Generative Adversarial Networks) Link to our paper - https://arxiv.org/abs/1705.07215 Pytorch implementation (thanks!) -

195 Oct 10, 2022
Highly comparative time-series analysis

〰️ hctsa 〰️ : highly comparative time-series analysis hctsa is a software package for running highly comparative time-series analysis using Matlab (fu

Ben Fulcher 569 Dec 21, 2022
Code for EMNLP2021 paper "Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training"

VoCapXLM Code for EMNLP2021 paper Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training Environment DockerFile: dancingso

Bo Zheng 15 Jul 28, 2022
Zero-shot Synthesis with Group-Supervised Learning (ICLR 2021 paper)

GSL - Zero-shot Synthesis with Group-Supervised Learning Figure: Zero-shot synthesis performance of our method with different dataset (iLab-20M, RaFD,

Andy_Ge 62 Dec 21, 2022
PED: DETR for Crowd Pedestrian Detection

PED: DETR for Crowd Pedestrian Detection Code for PED: DETR For (Crowd) Pedestrian Detection Paper PED: DETR for Crowd Pedestrian Detection Installati

36 Sep 13, 2022
Code to compute permutation and drop-column importances in Python scikit-learn models

Feature importances for scikit-learn machine learning models By Terence Parr and Kerem Turgutlu. See Explained.ai for more stuff. The scikit-learn Ran

Terence Parr 537 Dec 31, 2022
The comma.ai Calibration Challenge!

Welcome to the comma.ai Calibration Challenge! Your goal is to predict the direction of travel (in camera frame) from provided dashcam video. This rep

comma.ai 697 Jan 05, 2023
A Factor Model for Persistence in Investment Manager Performance

Factor-Model-Manager-Performance A Factor Model for Persistence in Investment Manager Performance I apply methods and processes similar to those used

Omid Arhami 1 Dec 01, 2021
Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation.

Unified-EPT Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation. Installation Linux, CUDA=10.0,

29 Aug 23, 2022
Open & Efficient for Framework for Aspect-based Sentiment Analysis

PyABSA - Open & Efficient for Framework for Aspect-based Sentiment Analysis Fast & Low Memory requirement & Enhanced implementation of Local Context F

YangHeng 567 Jan 07, 2023
MAterial del programa Misión TIC 2022

Mision TIC 2022 Esta iniciativa, aparece como respuesta frente a los retos de la Cuarta Revolución Industrial, y tiene como objetivo la formación de 1

6 May 25, 2022
The Official PyTorch Implementation of "VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models" (ICLR 2021 spotlight paper)

Official PyTorch implementation of "VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models" (ICLR 2021 Spotlight Paper) Zhisheng

NVIDIA Research Projects 45 Dec 26, 2022
Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125

Albumentations Albumentations is a Python library for image augmentation. Image augmentation is used in deep learning and computer vision tasks to inc

11.4k Jan 09, 2023
Code of the lileonardo team for the 2021 Emotion and Theme Recognition in Music task of MediaEval 2021

Emotion and Theme Recognition in Music The repository contains code for the submission of the lileonardo team to the 2021 Emotion and Theme Recognitio

Vincent Bour 8 Aug 02, 2022
Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images"

GANInversion_with_ConsecutiveImgs Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images" https://a

QingyangXu 38 Dec 07, 2022