Official Implementation for "StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery" (ICCV 2021 Oral)

Overview

StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery (ICCV 2021 Oral)

Run this model on Replicate

Optimization: Open In Colab Global directions: Open In Colab Mapper: Open In Colab

Check our full demo video here:

StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
Or Patashnik*, Zongze Wu*, Eli Shechtman, Daniel Cohen-Or, Dani Lischinski
*Equal contribution, ordered alphabetically
https://arxiv.org/abs/2103.17249

Abstract: Inspired by the ability of StyleGAN to generate highly realistic images in a variety of domains, much recent work has focused on understanding how to use the latent spaces of StyleGAN to manipulate generated and real images. However, discovering semantically meaningful latent manipulations typically involves painstaking human examination of the many degrees of freedom, or an annotated collection of images for each desired manipulation. In this work, we explore leveraging the power of recently introduced Contrastive Language-Image Pre-training (CLIP) models in order to develop a text-based interface for StyleGAN image manipulation that does not require such manual effort. We first introduce an optimization scheme that utilizes a CLIP-based loss to modify an input latent vector in response to a user-provided text prompt. Next, we describe a latent mapper that infers a text-guided latent manipulation step for a given input image, allowing faster and more stable textbased manipulation. Finally, we present a method for mapping a text prompts to input-agnostic directions in StyleGAN’s style space, enabling interactive text-driven image manipulation. Extensive results and comparisons demonstrate the effectiveness of our approaches.

Description

Official Implementation of StyleCLIP, a method to manipulate images using a driving text. Our method uses the generative power of a pretrained StyleGAN generator, and the visual-language power of CLIP. In the paper we present three methods:

  • Latent vector optimization.
  • Latent mapper, trained to manipulate latent vectors according to a specific text description.
  • Global directions in the StyleSpace.

Updates

15/8/2021 Add support for StyleSpace in optimization and latent mapper methods

6/4/2021 Add mapper training and inference (including a jupyter notebook) code

6/4/2021 Add support for custom StyleGAN2 and StyleGAN2-ada models, and also custom images

2/4/2021 Add the global directions code (a local GUI and a colab notebook)

31/3/2021 Upload paper to arxiv, and video to YouTube

14/2/2021 Initial version

Setup (for all three methods)

For all the methods described in the paper, is it required to have:

Specific requirements for each method are described in its section. To install CLIP please run the following commands:

conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=<CUDA_VERSION>
pip install ftfy regex tqdm gdown
pip install git+https://github.com/openai/CLIP.git

Editing via Latent Vector Optimization

Setup

Here, the code relies on the Rosinality pytorch implementation of StyleGAN2. Some parts of the StyleGAN implementation were modified, so that the whole implementation is native pytorch.

In addition to the requirements mentioned before, a pretrained StyleGAN2 generator will attempt to be downloaded, (or manually download from here).

Usage

Given a textual description, one can both edit a given image, or generate a random image that best fits to the description. Both operations can be done through the main.py script, or the optimization_playground.ipynb notebook (Open In Colab).

Editing

To edit an image set --mode=edit. Editing can be done on both provided latent vector, and on a random latent vector from StyleGAN's latent space. It is recommended to adjust the --l2_lambda according to the desired edit.

Generating Free-style Images

To generate a free-style image set --mode=free_generation.

Editing via Latent Mapper

Here, we provide the code for the latent mapper. The mapper is trained to learn residuals from a given latent vector, according to the driving text. The code for the mapper is in mapper/.

Setup

As in the optimization, the code relies on Rosinality pytorch implementation of StyleGAN2. In addition the the StyleGAN weights, it is neccessary to have weights for the facial recognition network used in the ID loss. The weights can be downloaded from here.

The mapper is trained on latent vectors. It is recommended to train on inverted real images. To this end, we provide the CelebA-HQ that was inverted by e4e: train set, test set.

Usage

Training

  • The main training script is placed in mapper/scripts/train.py.
  • Training arguments can be found at mapper/options/train_options.py.
  • Intermediate training results are saved to opts.exp_dir. This includes checkpoints, train outputs, and test outputs. Additionally, if you have tensorboard installed, you can visualize tensorboard logs in opts.exp_dir/logs. Note that
  • To resume a training, please provide --checkpoint_path.
  • --description is where you provide the driving text.
  • If you perform an edit that is not supposed to change "colors" in the image, it is recommended to use the flag --no_fine_mapper.

Example for training a mapper for the moahwk hairstyle:

cd mapper
python train.py --exp_dir ../results/mohawk_hairstyle --no_fine_mapper --description "mohawk hairstyle"

All configurations for the examples shown in the paper are provided there.

Inference

  • The main inferece script is placed in mapper/scripts/inference.py.
  • Inference arguments can be found at mapper/options/test_options.py.
  • Adding the flag --couple_outputs will save image containing the input and output images side-by-side.

Pretrained models for variuos edits are provided. Please refer to utils.py for the complete links list.

We also provide a notebook for performing inference with the mapper Mapper notebook: Open In Colab

Editing via Global Direction

Here we provide GUI for editing images with the global directions. We provide both a jupyter notebook Open In Colab, and the GUI used in the video. For both, the linear direction are computed in real time. The code is located at global_directions/.

Setup

Here, we rely on the official TensorFlow implementation of StyleGAN2.

It is required to have TensorFlow, version 1.14 or 1.15 (conda install -c anaconda tensorflow-gpu==1.14).

Usage

Local GUI

To start the local GUI please run the following commands:

cd global_directions

# input dataset name 
dataset_name='ffhq' 

# pretrained StyleGAN2 model from standard [NVlabs implementation](https://github.com/NVlabs/stylegan2) will be download automatically.
# pretrained StyleGAN2-ada model could be download from https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/ .
# for custom StyleGAN2 or StyleGAN2-ada model, please place the model under ./StyleCLIP/global_directions/model/ folder.


# input prepare data 
python GetCode.py --dataset_name $dataset_name --code_type 'w'
python GetCode.py --dataset_name $dataset_name --code_type 's'
python GetCode.py --dataset_name $dataset_name --code_type 's_mean_std'

# preprocess (this may take a few hours). 
# we precompute the results for StyleGAN2 on ffhq, StyleGAN2-ada on afhqdog, afhqcat. For these model, we can skip the preprocess step.
python SingleChannel.py --dataset_name $dataset_name

# generated image to be manipulated 
# this operation will generate and replace the w_plu.npy and .jpg images in './data/dataset_name/' folder. 
# if you you want to keep the original data, please rename the original folder.
# to use custom images, please use e4e encoder to generate latents.pt, and place it in './data/dataset_name/' folder, and add --real flag while running this function.
# you may skip this step if you want to manipulate the real human faces we prepare in ./data/ffhq/ folder.   
python GetGUIData.py --dataset_name $dataset_name

# interactively manipulation 
python PlayInteractively.py --dataset_name $dataset_name

As shown in the video, to edit an image it is requires to write a neutral text and a target text. To operate the GUI, please do the following:

  • Maximize the window size
  • Double click on the left square to choose an image. The images are taken from global_directions/data/ffhq, and the corresponding latent vectors are in global_directions/data/ffhq/w_plus.npy.
  • Type a neutral text, then press enter
  • Modify the target text so that it will contain the target edit, then press enter.

You can now play with:

  • Manipulation strength - positive values correspond to moving along the target direction.
  • Disentanglement threshold - large value means more disentangled edit, just a few channels will be manipulated so only the target attribute will change (for example, grey hair). Small value means less disentangled edit, a large number of channels will be manipulated, related attributes will also change (such as wrinkle, skin color, glasses).
Examples:
Edit Neutral Text Target Text
Smile face smiling face
Gender female face male face
Blonde hair face with hair face with blonde hair
Hi-top fade face with hair face with Hi-top fade hair
Blue eyes face with eyes face with blue eyes

More examples could be found in the video and in the paper.

Pratice Tips:

In the terminal, for every manipulation, the number of channels being manipulated is printed (the number is controlled by the attribute (neutral, target) and the disentanglement threshold).

  1. For color transformation, usually 10-20 channels is enough. For large structure change (for example, Hi-top fade), usually 100-200 channels are required.
  2. For an attribute (neutral, target), if you give a low disentanglement threshold, there are just few channels (<20) being manipulated, and usually it is not enough for performing the desired edit.

Notebook

Open the notebook in colab and run all the cells. In the last cell you can play with the image.

beta corresponds to the disentanglement threshold, and alpha to the manipulation strength.

After you set the desired set of parameters, please run again the last cell to generate the image.

Editing Examples

In the following, we show some results obtained with our methods. All images are real, and were inverted into the StyleGAN's latent space using e4e. The driving text that was used for each edit appears below or above each image.

Latent Optimization

Latent Mapper

Global Directions

Related Works

The global directions we find for editing are direction in the S Space, which was introduced and analyzed in StyleSpace (Wu et al).

To edit real images, we inverted them to the StyleGAN's latent space using e4e (Tov et al.).

The code strcuture of the mapper is heavily based on pSp.

Citation

If you use this code for your research, please cite our paper:

@misc{patashnik2021styleclip,
      title={StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery}, 
      author={Or Patashnik and Zongze Wu and Eli Shechtman and Daniel Cohen-Or and Dani Lischinski},
      year={2021},
      eprint={2103.17249},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
NeWT: Natural World Tasks

NeWT: Natural World Tasks This repository contains resources for working with the NeWT dataset. ❗ At this time the binary tasks are not publicly avail

Visipedia 26 Oct 18, 2022
Implementation detail for paper "Multi-level colonoscopy malignant tissue detection with adversarial CAC-UNet"

Multi-level-colonoscopy-malignant-tissue-detection-with-adversarial-CAC-UNet Implementation detail for our paper "Multi-level colonoscopy malignant ti

CVSM Group - email: <a href=[email protected]"> 84 Nov 22, 2022
Compartmental epidemic model to assess undocumented infections: applications to SARS-CoV-2 epidemics in Brazil - Datasets and Codes

Compartmental epidemic model to assess undocumented infections: applications to SARS-CoV-2 epidemics in Brazil - Datasets and Codes The codes for simu

1 Jan 12, 2022
[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

EPro-PnP EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation In CVPR 2022 (Oral). [paper] Hanshen

同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University) 842 Jan 04, 2023
Official code of paper "PGT: A Progressive Method for Training Models on Long Videos" on CVPR2021

PGT Code for paper PGT: A Progressive Method for Training Models on Long Videos. Install Run pip install -r requirements.txt. Run python setup.py buil

Bo Pang 27 Mar 30, 2022
Wider-Yolo Kütüphanesi ile Yüz Tespit Uygulamanı Yap

WIDER-YOLO : Yüz Tespit Uygulaması Yap Wider-Yolo Kütüphanesinin Kullanımı 1. Wider Face Veri Setini İndir Train Dataset Val Dataset Test Dataset Not:

Kadir Nar 6 Aug 22, 2022
Official code for the paper "Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks".

Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks This repository contains the official code for the

Linus Ericsson 11 Dec 16, 2022
Code for the ECCV2020 paper "A Differentiable Recurrent Surface for Asynchronous Event-Based Data"

A Differentiable Recurrent Surface for Asynchronous Event-Based Data Code for the ECCV2020 paper "A Differentiable Recurrent Surface for Asynchronous

Marco Cannici 21 Oct 05, 2022
Code for AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network (ICCV 2021).

AA-RMVSNet Code for AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network (ICCV 2021) in PyTorch. paper link: arXiv | CVF Change Log Ju

Qingtian Zhu 97 Dec 30, 2022
D²Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos

D²Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos This repository contains the implementation for "D²Conv3D: Dynamic Dilated Co

17 Oct 20, 2022
This repository focus on Image Captioning & Video Captioning & Seq-to-Seq Learning & NLP

Awesome-Visual-Captioning Table of Contents ACL-2021 CVPR-2021 AAAI-2021 ACMMM-2020 NeurIPS-2020 ECCV-2020 CVPR-2020 ACL-2020 AAAI-2020 ACL-2019 NeurI

Ziqi Zhang 362 Jan 03, 2023
Open source Python module for computer vision

About PCV PCV is a pure Python library for computer vision based on the book "Programming Computer Vision with Python" by Jan Erik Solem. More details

Jan Erik Solem 1.9k Jan 06, 2023
"Segmenter: Transformer for Semantic Segmentation" reproduced via mmsegmentation

Segmenter-based-on-OpenMMLab "Segmenter: Transformer for Semantic Segmentation, arxiv 2105.05633." reproduced via mmsegmentation. We reproduce Segment

EricKani 22 Feb 24, 2022
Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

RGBT Crowd Counting Lingbo Liu, Jiaqi Chen, Hefeng Wu, Guanbin Li, Chenglong Li, Liang Lin. "Cross-Modal Collaborative Representation Learning and a L

37 Dec 08, 2022
Dataset for the Research2Clinics @ NeurIPS 2021 Paper: What Do You See in this Patient? Behavioral Testing of Clinical NLP Models

Behavioral Testing of Clinical NLP Models This repository contains code for testing the behavior of clinical prediction models based on patient letter

Betty van Aken 2 Sep 20, 2022
Lipschitz-constrained Unsupervised Skill Discovery

Lipschitz-constrained Unsupervised Skill Discovery This repository is the official implementation of Seohong Park, Jongwook Choi*, Jaekyeom Kim*, Hong

Seohong Park 17 Dec 18, 2022
A Model for Natural Language Attack on Text Classification and Inference

TextFooler A Model for Natural Language Attack on Text Classification and Inference This is the source code for the paper: Jin, Di, et al. "Is BERT Re

Di Jin 418 Dec 16, 2022
The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".

Codebase for learning control flow in transformers The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformer

Csordás Róbert 24 Oct 15, 2022
Computer Vision and Pattern Recognition, NUS CS4243, 2022

CS4243_2022 Computer Vision and Pattern Recognition, NUS CS4243, 2022 Cloud Machine #1 : Google Colab (Free GPU) Follow this Notebook installation : h

Xavier Bresson 142 Dec 15, 2022
An offline deep reinforcement learning library

d3rlpy: An offline deep reinforcement learning library d3rlpy is an offline deep reinforcement learning library for practitioners and researchers. imp

Takuma Seno 817 Jan 02, 2023