Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)

Overview

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)

Hang Zhou, Yasheng Sun, Wayne Wu, Chen Change Loy, Xiaogang Wang, and Ziwei Liu.

Project | Paper | Demo

We propose Pose-Controllable Audio-Visual System (PC-AVS), which achieves free pose control when driving arbitrary talking faces with audios. Instead of learning pose motions from audios, we leverage another pose source video to compensate only for head motions. The key is to devise an implicit low-dimension pose code that is free of mouth shape or identity information. In this way, audio-visual representations are modularized into spaces of three key factors: speech content, head pose, and identity information.

Requirements

  • Python 3.6 and Pytorch 1.3.0 are used. Basic requirements are listed in the 'requirements.txt'.
pip install -r requirements.txt

Quick Start: Generate Demo Results

  • Download the pre-trained checkpoints.

  • Create the default folder ./checkpoints and unzip the demo.zip at ./checkpoints/demo. There should be a 5 pths in it.

  • Unzip all *.zip files within the misc folder.

  • Run the demo scripts:

bash experiments/demo_vox.sh
  • The --gen_video argument is by default on, ffmpeg >= 4.2.0 is required to use this flag in linux systems. All frames along with an avconcat.mp4 video file will be saved in the ./id_517600055_pose_517600078_audio_681600002/results folder in the following form:

From left to right are the reference input, the generated results, the pose source video and the synced original video with the driving audio.

Prepare Testing Meta Data

  • Automatic VoxCeleb2 Data Formulation

The inference code experiments/demo.sh refers to ./misc/demo.csv for testing data paths. In linux systems, any applicable csv file can be created automatically by running:

python scripts/prepare_testing_files.py

Then modify the meta_path_vox in experiments/demo_vox.sh to './misc/demo2.csv' and run

bash experiments/demo_vox.sh

An additional result should be seen saved.

  • Metadata Details

Detailedly, in scripts/prepare_testing_files.py there are certain flags which enjoy great flexibility when formulating the metadata:

  1. --src_pose_path denotes the driving pose source path. It can be an mp4 file or a folder containing frames in the form of %06d.jpg starting from 0.

  2. --src_audio_path denotes the audio source's path. It can be an mp3 audio file or an mp4 video file. If a video is given, the frames will be automatically saved in ./misc/Mouth_Source/video_name, and disables the --src_mouth_frame_path flag.

  3. --src_mouth_frame_path. When --src_audio_path is not a video path, this flags could provide the folder containing the video frames synced with the source audio.

  4. --src_input_path is the path to the input reference image. When the path is a video file, we will convert it to frames.

  5. --csv_path the path to the to-be-saved metadata.

You can manually modify the metadata csv file or add lines to it according to the rules defined in the scripts/prepare_testing_files.py file or the dataloader data/voxtest_dataset.py.

We provide a number of demo choices in the misc folder, including several ones used in our video. Feel free to rearrange them even across folders. And you are welcome to record audio files by yourself.

  • Self-Prepared Data Processing

Our model handles only VoxCeleb2-like cropped data, thus pre-processing is needed for self-prepared data.

  • Coming soon

Train Your Own Model

  • Coming soon

License and Citation

The usage of this software is under CC-BY-4.0.

@InProceedings{zhou2021pose,
author = {Zhou, Hang and Sun, Yasheng and Wu, Wayne and Loy, Chen Change and Wang, Xiaogang and Liu, Ziwei},
title = {Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2021}
}

Acknowledgement

Owner
Hang_Zhou
Ph.D. Candidate @ MMLab-CUHK
Hang_Zhou
The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Hierarchical Token Semantic Audio Transformer Introduction The Code Repository for "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound

Knut(Ke) Chen 134 Jan 01, 2023
Build upon neural radiance fields to create a scene-specific implicit 3D semantic representation, Semantic-NeRF

Semantic-NeRF: Semantic Neural Radiance Fields Project Page | Video | Paper | Data In-Place Scene Labelling and Understanding with Implicit Scene Repr

Shuaifeng Zhi 243 Jan 07, 2023
The code repository for "RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection" (ACM MM'21)

RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection (ACM MM'21) By Zhuofan Zong, Qianggang Cao, Biao Leng Introduction F

TempleX 9 Jul 30, 2022
The Easy-to-use Dialogue Response Selection Toolkit for Researchers

Easy-to-use toolkit for retrieval-based Chatbot Recent Activity Our released RRS corpus can be found here. Our released BERT-FP post-training checkpoi

GMFTBY 32 Nov 13, 2022
This is a repository for a No-Code object detection inference API using the OpenVINO. It's supported on both Windows and Linux Operating systems.

OpenVINO Inference API This is a repository for an object detection inference API using the OpenVINO. It's supported on both Windows and Linux Operati

BMW TechOffice MUNICH 68 Nov 24, 2022
Search and filter videos based on objects that appear in them using convolutional neural networks

Thingscoop: Utility for searching and filtering videos based on their content Description Thingscoop is a command-line utility for analyzing videos se

Anastasis Germanidis 354 Dec 04, 2022
ATAC: Adversarially Trained Actor Critic

ATAC: Adversarially Trained Actor Critic Adversarially Trained Actor Critic for Offline Reinforcement Learning by Ching-An Cheng*, Tengyang Xie*, Nan

Microsoft 41 Dec 08, 2022
Code repository for Semantic Terrain Classification for Off-Road Autonomous Driving

BEVNet Datasets Datasets should be put inside data/. For example, data/semantic_kitti_4class_100x100. Training BEVNet-S Example: cd experiments bash t

(Brian) JoonHo Lee 24 Dec 12, 2022
Visual Tracking by TridenAlign and Context Embedding

Visual Tracking by TridentAlign and Context Embedding (TACT) Test code for "Visual Tracking by TridentAlign and Context Embedding" Janghoon Choi, Juns

Janghoon Choi 32 Aug 25, 2021
Differentiable Surface Triangulation

Differentiable Surface Triangulation This is our implementation of the paper Differentiable Surface Triangulation that enables optimization for any pe

61 Dec 07, 2022
Bayesian optimization in PyTorch

BoTorch is a library for Bayesian Optimization built on PyTorch. BoTorch is currently in beta and under active development! Why BoTorch ? BoTorch Prov

2.5k Dec 31, 2022
Behind the Curtain: Learning Occluded Shapes for 3D Object Detection

Behind the Curtain: Learning Occluded Shapes for 3D Object Detection Acknowledgement We implement our model, BtcDet, based on [OpenPcdet 0.3.0]. Insta

Qiangeng Xu 163 Dec 19, 2022
A python module for configuration of block devices

Blivet is a python module for system storage configuration. CI status Licence See COPYING Installation From Fedora repositories Blivet is available in

78 Dec 14, 2022
Digital Twin Mobility Profiling: A Spatio-Temporal Graph Learning Approach

Digital Twin Mobility Profiling: A Spatio-Temporal Graph Learning Approach This is the implementation of traffic prediction code in DTMP based on PyTo

chenxin 1 Dec 19, 2021
Springer Link Download Module for Python

♞ pupalink A simple Python module to search and download books from SpringerLink. 🧪 This project is still in an early stage of development. Expect br

Pupa Corp. 18 Nov 21, 2022
[CVPR 2020] GAN Compression: Efficient Architectures for Interactive Conditional GANs

GAN Compression project | paper | videos | slides [NEW!] GAN Compression is accepted by T-PAMI! We released our T-PAMI version in the arXiv v4! [NEW!]

MIT HAN Lab 1k Jan 07, 2023
A small library for creating and manipulating custom JAX Pytree classes

Treeo A small library for creating and manipulating custom JAX Pytree classes Light-weight: has no dependencies other than jax. Compatible: Treeo Tree

Cristian Garcia 58 Nov 23, 2022
This project implements "virtual speed" from heart rate monito

ANT+ Virtual Stride Based Speed and Distance Monitor Overview This project imple

2 May 20, 2022
Open-Domain Question-Answering for COVID-19 and Other Emergent Domains

Open-Domain Question-Answering for COVID-19 and Other Emergent Domains This repository contains the source code for an end-to-end open-domain question

7 Sep 27, 2022
Implementation of: "Exploring Randomly Wired Neural Networks for Image Recognition"

RandWireNN Unofficial PyTorch Implementation of: Exploring Randomly Wired Neural Networks for Image Recognition. Results Validation result on Imagenet

Seung-won Park 684 Nov 02, 2022