Codebase for ECCV18 "The Sound of Pixels"

Overview

Sound-of-Pixels

Codebase for ECCV18 "The Sound of Pixels".

*This repository is under construction, but the core parts are already there.

Environment

The code is developed under the following configurations.

  • Hardware: 1-4 GPUs (change [--num_gpus NUM_GPUS] accordingly)
  • Software: Ubuntu 16.04.3 LTS, CUDA>=8.0, Python>=3.5, PyTorch>=0.4.0

Training

  1. Prepare video dataset.

    a. Download MUSIC dataset from: https://github.com/roudimit/MUSIC_dataset

    b. Download videos.

  2. Preprocess videos. You can do it in your own way as long as the index files are similar.

    a. Extract frames at 8fps and waveforms at 11025Hz from videos. We have following directory structure:

    data
    ├── audio
    |   ├── acoustic_guitar
    │   |   ├── M3dekVSwNjY.mp3
    │   |   ├── ...
    │   ├── trumpet
    │   |   ├── STKXyBGSGyE.mp3
    │   |   ├── ...
    │   ├── ...
    |
    └── frames
    |   ├── acoustic_guitar
    │   |   ├── M3dekVSwNjY.mp4
    │   |   |   ├── 000001.jpg
    │   |   |   ├── ...
    │   |   ├── ...
    │   ├── trumpet
    │   |   ├── STKXyBGSGyE.mp4
    │   |   |   ├── 000001.jpg
    │   |   |   ├── ...
    │   |   ├── ...
    │   ├── ...
    

    b. Make training/validation index files by running:

    python scripts/create_index_files.py
    

    It will create index files train.csv/val.csv with the following format:

    ./data/audio/acoustic_guitar/M3dekVSwNjY.mp3,./data/frames/acoustic_guitar/M3dekVSwNjY.mp4,1580
    ./data/audio/trumpet/STKXyBGSGyE.mp3,./data/frames/trumpet/STKXyBGSGyE.mp4,493
    

    For each row, it stores the information: AUDIO_PATH,FRAMES_PATH,NUMBER_FRAMES

  3. Train the default model.

./scripts/train_MUSIC.sh
  1. During training, visualizations are saved in HTML format under ckpt/MODEL_ID/visualization/.

Evaluation

  1. (Optional) Download our trained model weights for evaluation.
./scripts/download_trained_model.sh
  1. Evaluate the trained model performance.
./scripts/eval_MUSIC.sh

Reference

If you use the code or dataset from the project, please cite:

    @InProceedings{Zhao_2018_ECCV,
        author = {Zhao, Hang and Gan, Chuang and Rouditchenko, Andrew and Vondrick, Carl and McDermott, Josh and Torralba, Antonio},
        title = {The Sound of Pixels},
        booktitle = {The European Conference on Computer Vision (ECCV)},
        month = {September},
        year = {2018}
    }
Owner
Hang Zhao
Assistant Professor at Tsinghua University, MIT PhD in Computer Vision
Hang Zhao
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

Super Resolution Examples We run this script under TensorFlow 2.0 and the TensorLayer2.0+. For TensorLayer 1.4 version, please check release. 🚀 🚀 🚀

TensorLayer Community 2.9k Jan 08, 2023
MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift

MemStream Implementation of MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift . Siddharth Bhatia, Arjit Jain, Shivi

Stream-AD 61 Dec 02, 2022
Official implementation of the MM'21 paper Constrained Graphic Layout Generation via Latent Optimization

[MM'21] Constrained Graphic Layout Generation via Latent Optimization This repository provides the official code for the paper "Constrained Graphic La

Kotaro Kikuchi 73 Dec 27, 2022
Deep Learning as a Cloud API Service.

Deep API Deep Learning as Cloud APIs. This project provides pre-trained deep learning models as a cloud API service. A web interface is available as w

Wu Han 4 Jan 06, 2023
The official repository for BaMBNet

BaMBNet-Pytorch Paper

Junjun Jiang 18 Dec 04, 2022
Cartoon-StyleGan2 🙃 : Fine-tuning StyleGAN2 for Cartoon Face Generation

Fine-tuning StyleGAN2 for Cartoon Face Generation

Jihye Back 520 Jan 04, 2023
An unopinionated replacement for PyTorch's Dataset and ImageFolder, that handles Tar archives

Simple Tar Dataset An unopinionated replacement for PyTorch's Dataset and ImageFolder classes, for datasets stored as uncompressed Tar archives. Just

Joao Henriques 47 Dec 20, 2022
The InterScript dataset contains interactive user feedback on scripts generated by a T5-XXL model.

Interscript The Interscript dataset contains interactive user feedback on a T5-11B model generated scripts. Dataset data.json contains the data in an

AI2 8 Dec 01, 2022
Code for ICE-BeeM paper - NeurIPS 2020

ICE-BeeM: Identifiable Conditional Energy-Based Deep Models Based on Nonlinear ICA This repository contains code to run and reproduce the experiments

Ilyes Khemakhem 65 Dec 22, 2022
This repository provides code for "On Interaction Between Augmentations and Corruptions in Natural Corruption Robustness".

On Interaction Between Augmentations and Corruptions in Natural Corruption Robustness This repository provides the code for the paper On Interaction B

Meta Research 33 Dec 08, 2022
A toolkit for document-level event extraction, containing some SOTA model implementations

❤️ A Toolkit for Document-level Event Extraction with & without Triggers Hi, there 👋 . Thanks for your stay in this repo. This project aims at buildi

Tong Zhu(朱桐) 159 Dec 22, 2022
Python/Rust implementations and notes from Proofs Arguments and Zero Knowledge

What is this? This is where I'll be collecting resources related to the Study Group on Dr. Justin Thaler's Proofs Arguments And Zero Knowledge Book. T

Thor 66 Jan 04, 2023
This repo is a C++ version of yolov5_deepsort_tensorrt. Packing all C++ programs into .so files, using Python script to call C++ programs further.

yolov5_deepsort_tensorrt_cpp Introduction This repo is a C++ version of yolov5_deepsort_tensorrt. And packing all C++ programs into .so files, using P

41 Dec 27, 2022
A short code in python, Enchpyter, is able to encrypt and decrypt words as you determine, of course

Enchpyter Enchpyter is a program do encrypt and decrypt any word you want (just letters). You enter how many letters jumps and write the word, so, the

João Assalim 2 Oct 10, 2022
Final report with code for KAIST Course KSE 801.

Orthogonal collocation is a method for the numerical solution of partial differential equations

Chuanbo HUA 4 Apr 06, 2022
A forwarding MPI implementation that can use any other MPI implementation via an MPI ABI

MPItrampoline MPI wrapper library: MPI trampoline library: MPI integration tests: MPI is the de-facto standard for inter-node communication on HPC sys

Erik Schnetter 31 Dec 22, 2022
nn_builder lets you build neural networks with less boilerplate code

nn_builder lets you build neural networks with less boilerplate code. You specify the type of network you want and it builds it. Install pip install n

Petros Christodoulou 157 Nov 20, 2022
HyperaPy: An automatic hyperparameter optimization framework ⚡🚀

hyperpy HyperPy: An automatic hyperparameter optimization framework Description HyperPy: Library for automatic hyperparameter optimization. Build on t

Sergio Mora 7 Sep 06, 2022
Code release for Universal Domain Adaptation(CVPR 2019)

Universal Domain Adaptation Code release for Universal Domain Adaptation(CVPR 2019) Requirements python 3.6+ PyTorch 1.0 pip install -r requirements.t

THUML @ Tsinghua University 229 Dec 23, 2022
"Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices", official implementation

Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices This repository contains the official PyTorch implemen

Yandex Research 21 Oct 18, 2022