Visualizer using audio and semantic analysis to explore BigGAN (Brock et al., 2018) latent space.

Last update: Nov 21, 2022

Overview

BigGAN Audio Visualizer

Description

This visualizer explores BigGAN (Brock et al., 2018) latent space by using pitch/tempo of an audio file to generate and interpolate between noise/class vector inputs to the model. Classes are chosen manually or optionally using semantic similarity on BERT encodings of a lyrics corpus.

Usage:

usage: visualize.py [-h] -s SONG [--resolution {128,256,512}] [-d DURATION]
               [-ps [200-295]] [-ts [0.05-0.8]]
               [--classes CLASSES [CLASSES ...]] [-n NUM_CLASSES]
               [--jitter [0-1]] [--frame_length i*2^6] [--truncation [0.1-1]]
               [--smooth_factor [10-30]] [--batch_size BATCH_SIZE]
               [-o OUTPUT_FILE] [--use_last_vectors] [--use_last_classes]
               [-l LYRICS]

Arguments

short	long	default	range	help
`-h`	`--help`			show this help message and exit
`-s`	`--song`	`input/romantic.mp3`		path to input audio file
	`--resolution`	`512`	`{128,256,512}`	output video resolution
`-d`	`--duration`	`None`		output video duration
`-ps`	`--pitch_sensitivity`	`220`	`[200-295]`	controls the sensitivity of the class vector to changes in pitch
`-ts`	`--tempo_sensitivity`	`0.25`	`[0.05-0.8]`	controls the sensitivity of the noise vector to changes in volume and tempo
	`--classes`	`None`		manually specify [--num_classes] ImageNet classes
`-n`	`--num_classes`	`12`		number of unique classes to use
	`--jitter`	`0.5`	`[0-1]`	controls jitter of the noise vector to reduce repitition
	`--frame_length`	`512`	`i*2^6`	number of audio frames to video frames in the output
	`--truncation`	`1`	`[0.1-1]`	BigGAN truncation parameter controls complexity of structure within frames
	`--smooth_factor`	`20`	`[10-30]`	controls interpolation between class vectors to smooth rapid flucations
	`--batch_size`	`30`		BigGAN batch_size
`-o`	`--output_file`			name of output file stored in output/, defaults to [--song] path base_name
	`--use_last_vectors`	`False`		set flag to use previous saved class/noise vectors
	`--use_last_classes`	`False`		set flag to use previous classes
`-l`	`--lyrics`	`None`		path to lyrics file; setting [--lyrics LYRICS] computes classes by semantic similarity under BERT encodings

Visualizer using audio and semantic analysis to explore BigGAN (Brock et al., 2018) latent space.

Related tags

Overview

BigGAN Audio Visualizer

Description

Usage:

Arguments

Owner

Rush Kapoor

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation "

Official repository of the paper Privacy-friendly Synthetic Data for the Development of Face Morphing Attack Detectors

A system for quickly generating training data with weak supervision

Code for "Primitive Representation Learning for Scene Text Recognition" (CVPR 2021)

PIXIE: Collaborative Regression of Expressive Bodies

Weakly-supervised object detection.

IEGAN — Official PyTorch Implementation Independent Encoder for Deep Hierarchical Unsupervised Image-to-Image Translation

Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification and Phonemetic Analysis

Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.

A "gym" style toolkit for building lightweight Neural Architecture Search systems

Simultaneous NMT/MMT framework in PyTorch

Code for SALT: Stackelberg Adversarial Regularization, EMNLP 2021.

Picasso: A CUDA-based Library for Deep Learning over 3D Meshes

Image Captioning using CNN and Transformers

Implementation of Pix2Seq in PyTorch

Vision-Language Pre-training for Image Captioning and Question Answering

This repo provides function call to track multi-objects in videos

Another pytorch implementation of FCN (Fully Convolutional Networks)

Latent Network Models to Account for Noisy, Multiply-Reported Social Network Data

Semantic Image Synthesis with SPADE