Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

Overview

πŸ“– Depth-Aware Generative Adversarial Network for Talking Head Video Generation (CVPR 2022)

πŸ”₯ If DaGAN is helpful in your photos/projects, please help to ⭐ it or recommend it to your friends. Thanks πŸ”₯

[Paper]   [Project Page]   [Demo]   [Poster Video]

Fa-Ting Hong, Longhao Zhang, Li Shen, Dan Xu
The Hong Kong University of Science and Technology

Cartoon Sample

cartoon.mp4

Human Sample

celeb.mp4

Voxceleb1 Dataset

🚩 Updates

  • πŸ”₯ πŸ”₯ βœ… May 19, 2022: The depth face model trained on Voxceleb2 is released! (The corresponding checkpoint of DaGAN will release soon). Click the LINK

  • πŸ”₯ πŸ”₯ βœ… April 25, 2022: Integrated into Huggingface Spaces πŸ€— using Gradio. Try out the web demo: Hugging Face Spaces (GPU version will come soon!)

  • πŸ”₯ πŸ”₯ βœ… Add SPADE model, which produces more natural results.

πŸ”§ Dependencies and Installation

Installation

We now provide a clean version of DaGAN, which does not require customized CUDA extensions.

  1. Clone repo

    git clone https://github.com/harlanhong/CVPR2022-DaGAN.git
    cd CVPR2022-DaGAN
  2. Install dependent packages

    pip install -r requirements.txt
    
    ## Install the Face Alignment lib
    cd face-alignment
    pip install -r requirements.txt
    python setup.py install

⚑ Quick Inference

We take the paper version for an example. More models can be found here.

YAML configs

See config/vox-adv-256.yaml to get description of each parameter.

Pre-trained checkpoint

The pre-trained checkpoint of face depth network and our DaGAN checkpoints can be found under following link: OneDrive.

Inference! To run a demo, download checkpoint and run the following command:

CUDA_VISIBLE_DEVICES=0 python demo.py  --config config/vox-adv-256.yaml --driving_video path/to/driving --source_image path/to/source --checkpoint path/to/checkpoint --relative --adapt_scale --kp_num 15 --generator DepthAwareGenerator 

The result will be stored in result.mp4. The driving videos and source images should be cropped before it can be used in our method. To obtain some semi-automatic crop suggestions you can use python crop-video.py --inp some_youtube_video.mp4. It will generate commands for crops using ffmpeg.

πŸ’» Training

Datasets

  1. VoxCeleb. Please follow the instruction from https://github.com/AliaksandrSiarohin/video-preprocessing.

Train on VoxCeleb

To train a model on specific dataset run:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --master_addr="0.0.0.0" --master_port=12348 run.py --config config/vox-adv-256.yaml --name DaGAN --rgbd --batchsize 12 --kp_num 15 --generator DepthAwareGenerator

The code will create a folder in the log directory (each run will create a new name-specific directory). Checkpoints will be saved to this folder. To check the loss values during training see log.txt. By default the batch size is tunned to run on 8 GeForce RTX 3090 gpu (You can obtain the best performance after about 150 epochs). You can change the batch size in the train_params in .yaml file.

🚩 Please use multiple GPUs to train your own model, if you use only one GPU, you would meet the inplace problem.

Also, you can watch the training loss by running the following command:

tensorboard --logdir log/DaGAN/log

When you kill your process for some reasons in the middle of training, a zombie process may occur, you can kill it using our provided tool:

python kill_port.py PORT

Training on your own dataset

  1. Resize all the videos to the same size e.g 256x256, the videos can be in '.gif', '.mp4' or folder with images. We recommend the later, for each video make a separate folder with all the frames in '.png' format. This format is loss-less, and it has better i/o performance.

  2. Create a folder data/dataset_name with 2 subfolders train and test, put training videos in the train and testing in the test.

  3. Create a config config/dataset_name.yaml, in dataset_params specify the root dir the root_dir: data/dataset_name. Also adjust the number of epoch in train_params.

πŸ“œ Acknowledgement

Our DaGAN implementation is inspired by FOMM. We appreciate the authors of FOMM for making their codes available to public.

πŸ“œ BibTeX

@inproceedings{hong2022depth,
            title={Depth-Aware Generative Adversarial Network for Talking Head Video Generation},
            author={Hong, Fa-Ting and Zhang, Longhao and Shen, Li and Xu, Dan},
            journal={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
            year={2022}
          }

πŸ“§ Contact

If you have any question, please email [email protected].

NeROIC: Neural Object Capture and Rendering from Online Image Collections

NeROIC: Neural Object Capture and Rendering from Online Image Collections This repository is for the source code for the paper NeROIC: Neural Object C

Snap Research 647 Dec 27, 2022
Official implementation of the paper ``Unifying Nonlocal Blocks for Neural Networks'' (ICCV'21)

Spectral Nonlocal Block Overview Official implementation of the paper: Unifying Nonlocal Blocks for Neural Networks (ICCV'21) Spectral View of Nonloca

91 Dec 14, 2022
Auto grind btdb2 exp for tower

Bloons TD Battles 2 EXP Grinder Auto grind btdb2 exp for towers Setup I suggest checking out every screenshot to see what they are supposed to be, so

Vincent 6 Jul 29, 2022
End-to-End Object Detection with Fully Convolutional Network

This project provides an implementation for "End-to-End Object Detection with Fully Convolutional Network" on PyTorch.

472 Dec 22, 2022
MultiMix: Sparingly Supervised, Extreme Multitask Learning From Medical Images (ISBI 2021, MELBA 2021)

MultiMix This repository contains the implementation of MultiMix. Our publications for this project are listed below: "MultiMix: Sparingly Supervised,

Ayaan Haque 27 Dec 22, 2022
A fuzzing framework for SMT solvers

yinyang A fuzzing framework for SMT solvers. Given a set of seed SMT formulas, yinyang generates mutant formulas to stress-test SMT solvers. yinyang c

Project Yin-Yang for SMT Solver Testing 145 Jan 04, 2023
Code for ACL 2019 Paper: "COMET: Commonsense Transformers for Automatic Knowledge Graph Construction"

To run a generation experiment (either conceptnet or atomic), follow these instructions: First Steps First clone, the repo: git clone https://github.c

Antoine Bosselut 575 Jan 01, 2023
The official PyTorch code implementation of "Personalized Trajectory Prediction via Distribution Discrimination" in ICCV 2021.

Personalized Trajectory Prediction via Distribution Discrimination (DisDis) The official PyTorch code implementation of "Personalized Trajectory Predi

25 Dec 20, 2022
Benchmark for Answering Existential First Order Queries with Single Free Variable

EFO-1-QA Benchmark for First Order Query Estimation on Knowledge Graphs This repository contains an entire pipeline for the EFO-1-QA benchmark. EFO-1

HKUST-KnowComp 14 Oct 24, 2022
Scenic: A Jax Library for Computer Vision and Beyond

Scenic Scenic is a codebase with a focus on research around attention-based models for computer vision. Scenic has been successfully used to develop c

Google Research 1.6k Dec 27, 2022
The official implementation of ICCV paper "Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds".

Box-Aware Tracker (BAT) Pytorch-Lightning implementation of the Box-Aware Tracker. Box-Aware Feature Enhancement for Single Object Tracking on Point C

Kangel Zenn 5 Mar 26, 2022
Linear image-to-image translation

Linear (Un)supervised Image-to-Image Translation Examples for linear orthogonal transformations in PCA domain, learned without pairing supervision. Tr

Eitan Richardson 40 Aug 31, 2022
Collision risk estimation using stochastic motion models

collision_risk_estimation Collision risk estimation using stochastic motion models. This is a new approach, based on stochastic models, to predict the

Unmesh 7 Jun 26, 2022
A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

Segnet is deep fully convolutional neural network architecture for semantic pixel-wise segmentation. This is implementation of http://arxiv.org/pdf/15

Pradyumna Reddy Chinthala 190 Dec 15, 2022
MAVE: : A Product Dataset for Multi-source Attribute Value Extraction

MAVE: : A Product Dataset for Multi-source Attribute Value Extraction The dataset contains 3 million attribute-value annotations across 1257 unique ca

Google Research Datasets 89 Jan 08, 2023
DUE: End-to-End Document Understanding Benchmark

This is the repository that provide tools to download data, reproduce the baseline results and evaluation. What can you achieve with this guide Based

21 Dec 29, 2022
Implementation for NeurIPS 2021 Submission: SparseFed

READ THIS FIRST This repo is an anonymized version of an existing repository of GitHub, for the AIStats 2021 submission: SparseFed: Mitigating Model P

2 Jun 15, 2022
Pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Perspective"

Graph Neural Topic Model (GNTM) This is the pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Persp

Dazhong Shen 8 Sep 14, 2022
Hysterese plugin with two temperature offset areas

craftbeerpi4 plugin OffsetHysterese Temperatur-Steuerungs-Plugin mit zwei tempereaturbereich abhΓ€ngigen Offsets. Installation sudo pip3 install https:

HappyHibo 1 Dec 21, 2021
[ WSDM '22 ] On Sampling Collaborative Filtering Datasets

On Sampling Collaborative Filtering Datasets This repository contains the implementation of many popular sampling strategies, along with various expli

Noveen Sachdeva 17 Dec 08, 2022