Code for Talking Face Generation by Adversarially Disentangled Audio-Visual Representation (AAAI 2019)

Overview

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation (AAAI 2019)

We propose Disentangled Audio-Visual System (DAVS) to address arbitrary-subject talking face generation in this work, which aims to synthesize a sequence of face images that correspond to given speech semantics, conditioning on either an unconstrained speech audio or video.

[Project] [Paper] [Demo]

Recommondation of our CVPR21 repo

This repo is barely maintaining since the version of this code is out of date. If you are interested in the topic of Talking Face Generation, feel free to try the CODE of our CVPR2021 PAPER!

Requirements

Generating test results

Create the default folder "checkpoints" and put the checkpoint in it or get the CHECKPOINT_PATH
  • Samples for testing can be found in this folder named 0572_0019_0003. This is a pre-processed sample from the Voxceleb Dataset.

  • Run the testing script to generate videos from video:

python test_all.py  --test_root ./0572_0019_0003/video --test_type video --test_audio_video_length 99 --test_resume_path CHECKPOINT_PATH
  • Run the testing script to generate videos from audio:
python test_all.py  --test_root ./0572_0019_0003/audio --test_type audio --test_audio_video_length 99 --test_resume_path CHECKPOINT_PATH

Sample Results

  • Talking Effect on Human Characters

  • Talking Effect on Non-human Characters (Trained on Human Faces Only)

Create more samples

  • The face detection tool used in the demo videos can be found at RSA. It will return a Matfile with 5 key point locations in a row for each image. Other face alignment methods are also appliable such as dlib. The key points for face alignement we used are the two for the center of the eyes and the average point of the corners of the mouth. With each image's PATH and the face POINTS, you can find our way of face alignment at preprocess/face_align.py.

  • Our preprocessing of the audio files is the same and borrowed from the matlab code of SyncNet. Then we save the mfcc features into bin files.

Preparing Training Data

  • We used the LRW dataset for training.
  • The directories are arranged like this:
data
├── train, val, test
|	├── 0, 1, 2 ... 499 (one folder for each class)
|	│   ├── 0, 1, 2 ... #videos per class
|	│   │   ├── align_face256
|	│   │   |   ├── 0, 1, ... 28.jpg
|	│   |   ├── mfcc20
|	│   │   |   ├── 2, 3 ... 26.bin

where each video is extracted to frames and aligned using our protocol, and each audio is processed and saved using Matlab.

Training

python train.py
  • This is still a beta version of the training code which only disentangles wid information from pid space. Running the train.py only might not be able to fully reproduce the paper. However, it can be served as a reference for how we implement the whole training process.
  • During our own implementation, the classification part (without generation and disentanglement) is pretrained first. The pretraining training code is temporarily not provided.

Postprocessing Details (Optional)

  • The directly generated results may suffer from a "zoom-in-and-out" condition which we assume is caused by our alignment of the training set. We solve the unstable problem using Subspace Video Stabilization in the demos.

License and Citation

The use of this software is RESTRICTED to non-commercial research and educational purposes.

@inproceedings{zhou2019talking,
  title     = {Talking Face Generation by Adversarially Disentangled Audio-Visual Representation},
  author    = {Zhou, Hang and Liu, Yu and Liu, Ziwei and Luo, Ping and Wang, Xiaogang},
  booktitle = {AAAI Conference on Artificial Intelligence (AAAI)},
  year      = {2019},
}

Acknowledgement

The structure of this codebase is borrowed from pix2pix.

Owner
Hang_Zhou
Ph.D. @ MMLab-CUHK
Hang_Zhou
Implementation of " SESS: Self-Ensembling Semi-Supervised 3D Object Detection" (CVPR2020 Oral)

SESS: Self-Ensembling Semi-Supervised 3D Object Detection Created by Na Zhao from National University of Singapore Introduction This repository contai

125 Dec 23, 2022
SmallInitEmb - LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence

SmallInitEmb LayerNorm(SmallInit(Embedding)) in a Transformer I find that when t

PENG Bo 11 Dec 25, 2022
68 keypoint annotations for COFW test data

68 keypoint annotations for COFW test data This repository contains manually annotated 68 keypoints for COFW test data (original annotation of CFOW da

31 Dec 06, 2022
使用yolov5训练自己数据集(详细过程)并通过flask部署

使用yolov5训练自己的数据集(详细过程)并通过flask部署 依赖库 torch torchvision numpy opencv-python lxml tqdm flask pillow tensorboard matplotlib pycocotools Windows,请使用 pycoc

HB.com 19 Dec 28, 2022
Doing the asl sign language classification on static images using graph neural networks.

SignLangGNN When GNNs 💜 MediaPipe. This is a starter project where I tried to implement some traditional image classification problem i.e. the ASL si

10 Nov 09, 2022
Flexible-CLmser: Regularized Feedback Connections for Biomedical Image Segmentation

Flexible-CLmser: Regularized Feedback Connections for Biomedical Image Segmentation The skip connections in U-Net pass features from the levels of enc

Boheng Cao 1 Dec 29, 2021
The official PyTorch implementation for NCSNv2 (NeurIPS 2020)

Improved Techniques for Training Score-Based Generative Models This repo contains the official implementation for the paper Improved Techniques for Tr

174 Dec 26, 2022
Fast, differentiable sorting and ranking in PyTorch

Torchsort Fast, differentiable sorting and ranking in PyTorch. Pure PyTorch implementation of Fast Differentiable Sorting and Ranking (Blondel et al.)

Teddy Koker 655 Jan 04, 2023
ICCV2021 - Mining Contextual Information Beyond Image for Semantic Segmentation

Introduction The official repository for "Mining Contextual Information Beyond Image for Semantic Segmentation". Our full code has been merged into ss

55 Nov 09, 2022
DeepHawkeye is a library to detect unusual patterns in images using features from pretrained neural networks

English | 简体中文 Introduction DeepHawkeye is a library to detect unusual patterns in images using features from pretrained neural networks Reference Pat

CV Newbie 28 Dec 13, 2022
Collection of generative models in Tensorflow

tensorflow-generative-model-collections Tensorflow implementation of various GANs and VAEs. Related Repositories Pytorch version Pytorch version of th

3.8k Dec 30, 2022
An API-first distributed deployment system of deep learning models using timeseries data to analyze and predict systems behaviour

Gordo Building thousands of models with timeseries data to monitor systems. Table of content About Examples Install Uninstall Developer manual How to

Equinor 26 Dec 27, 2022
R3Det based on mmdet 2.19.0

R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object Installation # install mmdetection first if you haven't installed it

SJTU-Thinklab-Det 38 Dec 15, 2022
Node Dependent Local Smoothing for Scalable Graph Learning

Node Dependent Local Smoothing for Scalable Graph Learning Requirements Environments: Xeon Gold 5120 (CPU), 384GB(RAM), TITAN RTX (GPU), Ubuntu 16.04

Wentao Zhang 15 Nov 28, 2022
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

Detectron is deprecated. Please see detectron2, a ground-up rewrite of Detectron in PyTorch. Detectron Detectron is Facebook AI Research's software sy

Facebook Research 25.5k Jan 07, 2023
An improvement of FasterGICP: Acceptance-rejection Sampling based 3D Lidar Odometry

fasterGICP This package is an improvement of fast_gicp Please cite our paper if possible. W. Jikai, M. Xu, F. Farzin, D. Dai and Z. Chen, "FasterGICP:

79 Dec 31, 2022
[EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations

MuVER This repo contains the code and pre-trained model for our EMNLP 2021 paper: MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity

24 May 30, 2022
Repository of 3D Object Detection with Pointformer (CVPR2021)

3D Object Detection with Pointformer This repository contains the code for the paper 3D Object Detection with Pointformer (CVPR 2021) [arXiv]. This wo

Zhuofan Xia 117 Jan 06, 2023
Deformable DETR is an efficient and fast-converging end-to-end object detector.

Deformable DETR: Deformable Transformers for End-to-End Object Detection.

2k Jan 05, 2023
SMD-Nets: Stereo Mixture Density Networks

SMD-Nets: Stereo Mixture Density Networks This repository contains a Pytorch implementation of "SMD-Nets: Stereo Mixture Density Networks" (CVPR 2021)

Fabio Tosi 115 Dec 26, 2022