Stitch it in Time: GAN-Based Facial Editing of Real Videos

Related tags

Deep LearningSTIT
Overview

STIT - Stitch it in Time

arXiv CGP WAI

[Project Page]

Stitch it in Time: GAN-Based Facial Editing of Real Videos
Rotem Tzaban, Ron Mokady, Rinon Gal, Amit Bermano, Daniel Cohen-Or

Abstract:
The ability of Generative Adversarial Networks to encode rich semantics within their latent space has been widely adopted for facial image editing. However, replicating their success with videos has proven challenging. Sets of high-quality facial videos are lacking, and working with videos introduces a fundamental barrier to overcome - temporal coherency. We propose that this barrier is largely artificial. The source video is already temporally coherent, and deviations from this state arise in part due to careless treatment of individual components in the editing pipeline. We leverage the natural alignment of StyleGAN and the tendency of neural networks to learn low frequency functions, and demonstrate that they provide a strongly consistent prior. We draw on these insights and propose a framework for semantic editing of faces in videos, demonstrating significant improvements over the current state-of-the-art. Our method produces meaningful face manipulations, maintains a higher degree of temporal consistency, and can be applied to challenging, high quality, talking head videos which current methods struggle with.

Requirements

Pytorch(tested with 1.10, should work with 1.8/1.9 as well) + torchvision

For the rest of the requirements, run:

pip install Pillow imageio imageio-ffmpeg dlib face-alignment opencv-python click wandb tqdm scipy matplotlib clip lpips 

Pretrained models

In order to use this project you need to download pretrained models from the following Link.

Unzip it inside the project's main directory.

You can use the download_models.sh script (requires installing gdown with pip install gdown)

Alternatively, you can unzip the models to a location of your choice and update configs/path_config.py accordingly.

Splitting videos into frames

Our code expects videos in the form of a directory with individual frame images. To produce such a directory from an existing video, we recommend using ffmpeg:

ffmpeg -i "video.mp4" "video_frames/out%04d.png"

Example Videos

The videos used to produce our results can be downloaded from the following Link.

Inversion

To invert a video run:

python train.py --input_folder /path/to/images_dir \ 
 --output_folder /path/to/experiment_dir \
 --run_name RUN_NAME \
 --num_pti_steps NUM_STEPS

This includes aligning, cropping, e4e encoding and PTI

For example:

python train.py --input_folder /data/obama \ 
 --output_folder training_results/obama \
 --run_name obama \
 --num_pti_steps 80

Weights and biases logging is disabled by default. to enable, add --use_wandb

Naive Editing

To run edits without stitching tuning:

python edit_video.py --input_folder /path/to/images_dir \ 
 --output_folder /path/to/experiment_dir \
 --run_name RUN_NAME \
 --edit_name EDIT_NAME \
 --edit_range EDIT_RANGE \  

edit_range determines the strength of the edits applied. It should be in the format RANGE_START RANGE_END RANGE_STEPS.
for example, if we use --edit_range 1 5 2, we will apply edits with strength 1, 3 and 5.

For young Obama use:

python edit_video.py --input_folder /data/obama \ 
 --output_folder edits/obama/ \
 --run_name obama \
 --edit_name age \
 --edit_range -8 -8 1 \  

Editing + Stitching Tuning

To run edits with stitching tuning:

python edit_video_stitching_tuning.py --input_folder /path/to/images_dir \ 
 --output_folder /path/to/experiment_dir \
 --run_name RUN_NAME \
 --edit_name EDIT_NAME \
 --edit_range EDIT_RANGE \
 --outer_mask_dilation MASK_DILATION

We support early breaking the stitching tuning process, when the loss reaches a specified threshold.
This enables us to perform more iterations for difficult frames while maintaining a reasonable running time.
To use this feature, add --border_loss_threshold THRESHOLD to the command(Shown in the Jim and Kamala Harris examples below).
For videos with a simple background to reconstruct (e.g Obama, Jim, Emma Watson, Kamala Harris), we use THRESHOLD=0.005.
For videos where a more exact reconstruction of the background is required (e.g Michael Scott), we use THRESHOLD=0.002.
Early breaking is disabled by default.

For young Obama use:

python edit_video_stitching_tuning.py --input_folder /data/obama \ 
 --output_folder edits/obama/ \
 --run_name obama \
 --edit_name age \
 --edit_range -8 -8 1 \  
 --outer_mask_dilation 50

For gender editing on Obama use:

python edit_video_stitching_tuning.py --input_folder /data/obama \ 
 --output_folder edits/obama/ \
 --run_name obama \
 --edit_name gender \
 --edit_range -6 -6 1 \  
 --outer_mask_dilation 50

For young Emma Watson use:

python edit_video_stitching_tuning.py --input_folder /data/emma_watson \ 
 --output_folder edits/emma_watson/ \
 --run_name emma_watson \
 --edit_name age \
 --edit_range -8 -8 1 \  
 --outer_mask_dilation 50

For smile removal on Emma Watson use:

python edit_video_stitching_tuning.py --input_folder /data/emma_watson \ 
 --output_folder edits/emma_watson/ \
 --run_name emma_watson \
 --edit_name smile \
 --edit_range -3 -3 1 \  
 --outer_mask_dilation 50

For Emma Watson lipstick editing use: (done with styleclip global direction)

python edit_video_stitching_tuning.py --input_folder /data/emma_watson \ 
 --output_folder edits/emma_watson/ \
 --run_name emma_watson \
 --edit_type styleclip_global \
 --edit_name lipstick \
 --neutral_class "Face" \
 --target_class "Face with lipstick" \
 --beta 0.2 \
 --edit_range 10 10 1 \  
 --outer_mask_dilation 50

For Old + Young Jim use (with early breaking):

python edit_video_stitching_tuning.py --input_folder datasets/jim/ \
 --output_folder edits/jim \
 --run_name jim \
 --edit_name age \
 --edit_range -8 8 2 \
 --outer_mask_dilation 50 \ 
 --border_loss_threshold 0.005

For smiling Kamala Harris:

python edit_video_stitching_tuning.py \
 --input_folder datasets/kamala/ \ 
 --output_folder edits/kamala \
 --run_name kamala \
 --edit_name smile \
 --edit_range 2 2 1 \
 --outer_mask_dilation 50 \
 --border_loss_threshold 0.005

Example Results

With stitching tuning:

out.mp4

Without stitching tuning:

out.mp4

Gender editing:

out.mp4

Young Emma Watson:

out.mp4

Emma Watson with lipstick:

out.mp4

Emma Watson smile removal:

out.mp4

Old Jim:

out.mp4

Young Jim:

out.mp4

Smiling Kamala Harris:

out.mp4

Out of domain video editing (Animations)

For editing out of domain videos, Some different parameters are required while training. First, dlib's face detector doesn't detect all animated faces, so we use a different face detector provided by the face_alignment package. Second, we reduce the smoothing of the alignment parameters with --center_sigma 0.0 Third, OOD videos require more training steps, as they are more difficult to invert.

To train, we use:

python train.py --input_folder datasets/ood_spiderverse_gwen/ \
 --output_folder training_results/ood \
 --run_name ood \
 --num_pti_steps 240 \
 --use_fa \
 --center_sigma 0.0

Afterwards, editing is performed the same way:

python edit_video.py --input_folder datasets/ood_spiderverse_gwen/ \
 --output_folder edits/ood --run_name ood \
 --edit_name smile --edit_range 2 2 1

out.mp4

python edit_video.py --input_folder datasets/ood_spiderverse_gwen/ \
 --output_folder edits/ood \
 --run_name ood \
 --edit_type styleclip_global
 --edit_range 10 10 1
 --edit_name lipstick
 --target_class 'Face with lipstick'

out.mp4

Credits:

StyleGAN2-ada model and implementation:
https://github.com/NVlabs/stylegan2-ada-pytorch Copyright © 2021, NVIDIA Corporation.
Nvidia Source Code License https://nvlabs.github.io/stylegan2-ada-pytorch/license.html

PTI implementation:
https://github.com/danielroich/PTI
Copyright (c) 2021 Daniel Roich
License (MIT) https://github.com/danielroich/PTI/blob/main/LICENSE

LPIPS model and implementation:
https://github.com/richzhang/PerceptualSimilarity
Copyright (c) 2020, Sou Uchida
License (BSD 2-Clause) https://github.com/richzhang/PerceptualSimilarity/blob/master/LICENSE

e4e model and implementation:
https://github.com/omertov/encoder4editing Copyright (c) 2021 omertov
License (MIT) https://github.com/omertov/encoder4editing/blob/main/LICENSE

StyleCLIP model and implementation:
https://github.com/orpatashnik/StyleCLIP Copyright (c) 2021 orpatashnik
License (MIT) https://github.com/orpatashnik/StyleCLIP/blob/main/LICENSE

StyleGAN2 Distillation for Feed-forward Image Manipulation - for editing directions:
https://github.com/EvgenyKashin/stylegan2-distillation
Copyright (c) 2019, Yandex LLC
License (Creative Commons NonCommercial) https://github.com/EvgenyKashin/stylegan2-distillation/blob/master/LICENSE

face-alignment Library:
https://github.com/1adrianb/face-alignment
Copyright (c) 2017, Adrian Bulat
License (BSD 3-Clause License) https://github.com/1adrianb/face-alignment/blob/master/LICENSE

face-parsing.PyTorch:
https://github.com/zllrunning/face-parsing.PyTorch
Copyright (c) 2019 zll
License (MIT) https://github.com/zllrunning/face-parsing.PyTorch/blob/master/LICENSE

Citation

If you make use of our work, please cite our paper:

@misc{tzaban2022stitch,
      title={Stitch it in Time: GAN-Based Facial Editing of Real Videos},
      author={Rotem Tzaban and Ron Mokady and Rinon Gal and Amit H. Bermano and Daniel Cohen-Or},
      year={2022},
      eprint={2201.08361},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
YOLOX_AUDIO is an audio event detection model based on YOLOX

YOLOX_AUDIO is an audio event detection model based on YOLOX, an anchor-free version of YOLO. This repo is an implementated by PyTorch. Main goal of YOLOX_AUDIO is to detect and classify pre-defined

intflow Inc. 77 Dec 19, 2022
A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population

DeepKE is a knowledge extraction toolkit supporting low-resource and document-level scenarios for entity, relation and attribute extraction. We provide comprehensive documents, Google Colab tutorials

ZJUNLP 1.6k Jan 05, 2023
NaijaSenti is an open-source sentiment and emotion corpora for four major Nigerian languages

NaijaSenti is an open-source sentiment and emotion corpora for four major Nigerian languages. This project was supported by lacuna-fund initiatives. Jump straight to one of the sections below, or jus

Hausa Natural Language Processing 14 Dec 20, 2022
Unicorn can be used for performance analyses of highly configurable systems with causal reasoning

Unicorn can be used for performance analyses of highly configurable systems with causal reasoning. Users or developers can query Unicorn for a performance task.

AISys Lab 27 Jan 05, 2023
Discovering Dynamic Salient Regions with Spatio-Temporal Graph Neural Networks

Discovering Dynamic Salient Regions with Spatio-Temporal Graph Neural Networks This is the official code for DyReg model inroduced in Discovering Dyna

Bitdefender Machine Learning 11 Nov 08, 2022
Code for Neural-GIF: Neural Generalized Implicit Functions for Animating People in Clothing(ICCV21)

NeuralGIF Code for Neural-GIF: Neural Generalized Implicit Functions for Animating People in Clothing(ICCV21) We present Neural Generalized Implicit F

Garvita Tiwari 104 Nov 18, 2022
Plenoxels: Radiance Fields without Neural Networks, Code release WIP

Plenoxels: Radiance Fields without Neural Networks Alex Yu*, Sara Fridovich-Keil*, Matthew Tancik, Qinhong Chen, Benjamin Recht, Angjoo Kanazawa UC Be

Alex Yu 2.3k Dec 30, 2022
Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).

Adaptive Segmentation Mask Attack This repository contains the implementation of the Adaptive Segmentation Mask Attack (ASMA), a targeted adversarial

Utku Ozbulak 53 Jul 04, 2022
TICC is a python solver for efficiently segmenting and clustering a multivariate time series

TICC TICC is a python solver for efficiently segmenting and clustering a multivariate time series. It takes as input a T-by-n data matrix, a regulariz

406 Dec 12, 2022
UI2I via StyleGAN2 - Unsupervised image-to-image translation method via pre-trained StyleGAN2 network

We proposed an unsupervised image-to-image translation method via pre-trained StyleGAN2 network. paper: Unsupervised Image-to-Image Translation via Pr

208 Dec 30, 2022
Catbird is an open source paraphrase generation toolkit based on PyTorch.

Catbird is an open source paraphrase generation toolkit based on PyTorch. Quick Start Requirements and Installation The project is based on PyTorch 1.

Afonso Salgado de Sousa 5 Dec 15, 2022
Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth [Paper]

Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth [Paper] Downloads [Downloads] Trained ckpt files for NYU Depth V2 and

98 Jan 01, 2023
List some popular DeepFake models e.g. DeepFake, FaceSwap-MarekKowal, IPGAN, FaceShifter, FaceSwap-Nirkin, FSGAN, SimSwap, CihaNet, etc.

deepfake-models List some popular DeepFake models e.g. DeepFake, CihaNet, SimSwap, FaceSwap-MarekKowal, IPGAN, FaceShifter, FaceSwap-Nirkin, FSGAN, Si

Mingcan Xiang 100 Dec 17, 2022
Compositional and Parameter-Efficient Representations for Large Knowledge Graphs

NodePiece - Compositional and Parameter-Efficient Representations for Large Knowledge Graphs NodePiece is a "tokenizer" for reducing entity vocabulary

Michael Galkin 107 Jan 04, 2023
Code for paper "Multi-level Disentanglement Graph Neural Network"

Multi-level Disentanglement Graph Neural Network (MD-GNN) This is a PyTorch implementation of the MD-GNN, and the code includes the following modules:

Lirong Wu 6 Dec 29, 2022
A pre-trained language model for social media text in Spanish

RoBERTuito A pre-trained language model for social media text in Spanish READ THE FULL PAPER Github Repository RoBERTuito is a pre-trained language mo

25 Dec 29, 2022
PointCNN: Convolution On X-Transformed Points (NeurIPS 2018)

PointCNN: Convolution On X-Transformed Points Created by Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. Introduction PointCNN

Yangyan Li 1.3k Dec 21, 2022
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis. You write a high level configuration file specifying your in

Blue Collar Bioinformatics 917 Jan 03, 2023
Gesture-Volume-Control - This Python program can adjust the system's volume by using hand gestures

Gesture-Volume-Control This Python program can adjust the system's volume by usi

VatsalAryanBhatanagar 1 Dec 30, 2021
[CVPR2021] DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datasets

DoDNet This repo holds the pytorch implementation of DoDNet: DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datase

116 Dec 12, 2022