Toward a Visual Concept Vocabulary for GAN Latent Space, ICCV 2021

Last update: Dec 23, 2022

Related tags

Overview

Toward a Visual Concept Vocabulary for GAN Latent Space
_{Code and data from the ICCV 2021 paper}

Sarah Schwettmann, Evan Hernandez, David Bau, Samuel Klein, Jacob Andreas, Antonio Torralba
Paper | Website | arxiv

This repository contains code for finding layer-selective directions, distilling them, and loading the vocabulary of visual concepts in BigGAN used in the original paper.

Notice: This repository is under active development! Expect instability until at least October 25th, 2021.

Installation

The provided code has been tested for Python 3.8 on MacOS and Ubuntu 20.04. It may still work in other environments, but we make no guarantees.

To run the code yourself, start by cloning the repository:

git clone https://github.com/schwettmann/visual-vocab
cd visual-vocab

(Optional) You will probably want to create a conda environment or virtual environment instead of installing the dependencies globally. E.g., to create a new virtual environment you can run:

python3 -m venv env
source env/bin/activate

Finally, install the Python dependencies using pip:

pip3 install -r requirements.txt

Usage

Notice: This section is under construction and will be updated as functionality gets added.

To download any of the various annotated directions from the paper, use datasets.load submodule. It downloads and parses the annoated directions. Example usage:

from visualvocab import datasets

# Download layer-selective directions and annotations used for distilling single-word directions:
dataset = datasets.load('lsd_all')

# Download distilled directions for all BigGAN-Places365 categories:
dataset = datasets.load('distilled_all')

# Download distilled directions for a specific BigGAN-Places365 category:
dataset = datasets.load('distilled_cottage')

See the module for a full list of available annotated directions.

Citation

Sarah Schwettmann, Evan Hernandez, David Bau, Samuel Klein, Jacob Andreas, Antonio Torralba. Toward a Visual Concept Vocabulary for GAN Latent Space, Proceedings of the International Conference on Computer Vision (ICCV), 2021.

Bibtex

@InProceedings{Schwettmann_2021_ICCV,
    author    = {Schwettmann, Sarah and Hernandez, Evan and Bau, David and Klein, Samuel and Andreas, Jacob and Torralba, Antonio},
    title     = {Toward a Visual Concept Vocabulary for GAN Latent Space},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {6804-6812}
}

Toward a Visual Concept Vocabulary for GAN Latent Space, ICCV 2021

Related tags

Overview

Toward a Visual Concept Vocabulary for GAN Latent Space
_{Code and data from the ICCV 2021 paper}

Installation

Usage

Citation

Bibtex

Owner

Sarah Schwettmann

Translation for Trilium Notes. Trilium Notes 中文版.

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

neural network based speaker embedder

Download videos from YouTube/Twitch/Twitter right in the Windows Explorer, without installing any shady shareware apps

Conversational-AI-ChatBot - Intelligent ChatBot built with Microsoft's DialoGPT transformer to make conversations with human users!

IEEEXtreme15.0 Questions And Answers

A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.

Repository for the paper: VoiceMe: Personalized voice generation in TTS

SummerTime - Text Summarization Toolkit for Non-experts

The aim of this task is to predict someone's English proficiency based on a text input.

Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

Transformer training code for sequential tasks

GNES enables large-scale index and semantic search for text-to-text, image-to-image, video-to-video and any-to-any content form

Mesh TensorFlow: Model Parallelism Made Easier

Statistics and Mathematics for Machine Learning, Deep Learning , Deep NLP

Conversational text Analysis using various NLP techniques

DLO8012: Natural Language Processing & CSL804: Computational Lab - II

An assignment from my grad-level data mining course demonstrating some experience with NLP/neural networks/Pytorch

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

Repository for Graph2Pix: A Graph-Based Image to Image Translation Framework

Toward a Visual Concept Vocabulary for GAN Latent Space, ICCV 2021

Related tags

Overview

Toward a Visual Concept Vocabulary for GAN Latent Space Code and data from the ICCV 2021 paper

Installation

Usage

Citation

Bibtex

Owner

Sarah Schwettmann

Translation for Trilium Notes. Trilium Notes 中文版.

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

neural network based speaker embedder

Download videos from YouTube/Twitch/Twitter right in the Windows Explorer, without installing any shady shareware apps

Conversational-AI-ChatBot - Intelligent ChatBot built with Microsoft's DialoGPT transformer to make conversations with human users!

IEEEXtreme15.0 Questions And Answers

A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.

Repository for the paper: VoiceMe: Personalized voice generation in TTS

SummerTime - Text Summarization Toolkit for Non-experts

The aim of this task is to predict someone's English proficiency based on a text input.

Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

Transformer training code for sequential tasks

GNES enables large-scale index and semantic search for text-to-text, image-to-image, video-to-video and any-to-any content form

Mesh TensorFlow: Model Parallelism Made Easier

Statistics and Mathematics for Machine Learning, Deep Learning , Deep NLP

Conversational text Analysis using various NLP techniques

DLO8012: Natural Language Processing & CSL804: Computational Lab - II

An assignment from my grad-level data mining course demonstrating some experience with NLP/neural networks/Pytorch

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

Repository for Graph2Pix: A Graph-Based Image to Image Translation Framework

Toward a Visual Concept Vocabulary for GAN Latent Space
_{Code and data from the ICCV 2021 paper}