Python library for tracking human heads with FLAME (a 3D morphable head model)

Overview

Video Head Tracker

Teaser image

3D tracking library for human heads based on FLAME (a 3D morphable head model). The tracking algorithm is inspired by face2face. It determines FLAMEs shape and texture parameters as well as spherical harmonics lights and camera intrinsics for a video sequence. Afterwards, expressions and poses (rigid, neck, jaw, eyes) are optimized for each frame of the video. The only inputs are an RGB video together with facial and iris landmarks. The latter is estimated by our code automatically.

This repository complements the code release of the CVPR2022 paper Neural Head Avatars from Monocular RGB Videos. The code is maintained independently from the paper's code to ease reusing it in other projects.

Installation

  • Install Python 3.9 (it should work with other versions as well, but the setup.py and dependencies must be adjusted to do so).
  • Clone the repo and run pip install -e . from inside the cloned directory.
  • Download the flame head model and texture space from the from the official website and add them as generic_model.pkl and FLAME_texture.npz under ./assets/flame.
  • Finally, go to https://github.com/HavenFeng/photometric_optimization and copy the uv parametrization head_template_mesh.obj of FLAME found there to ./assets/flame, as well.

Usage

To run the tracker on a video run

python vht/optimize_tracking.py --config your_config.ini --video path_to_video --data_path path_to_data

The video path and data path can also be given inside the config file. In general, all parameters in the config file may be overwritten by providing them on the command line explicitly. If a video path is given, the video will be extracted and facial + iris landmarks are predicted for each frame. The frames and landmarks are stored at --data_path. Once extracted, you can reuse them by not passing the --video flag anymore. We provide config file for two identities tracked in the main paper. The video data for these subjects can be downloaded from the paper repository. These configs provide good defaults for other videos, as well.

If you would like to use your own videos, the following parameters are most important to set:

[dataset]
data_path = PATH_TO_DATASET --> discussed above

[training]
output_path = OUTPUT_PATH --> where the results will be stored
keyframes = [90, 415, 434, 193] --> list of frames used to optimize shape, texture, lights and camera
                                --> ideally, you provide one front, one left and one right view

The optimized parameters are stored in the output directory as tracked_flame_params.npz.

License

The code is available for non-commercial scientific research purposes under the CC BY-NC 3.0 license. Please note that the files flame.py and lbs.py are heavily inspired by https://github.com/HavenFeng/photometric_optimization and are property of the Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V. The download, use, and distribution of this code is subject to this license. The files that can be found in the ./assets directory, are adapted from the FLAME head model for which the license can be found here.

Citation

If you find our work useful, please include the following citation:

@article{grassal2021neural,
  title={Neural Head Avatars from Monocular RGB Videos},
  author={Grassal, Philip-William and Prinzler, Malte and Leistner, Titus and Rother, Carsten
          and Nie{\ss}ner, Matthias and Thies, Justus},
  journal={arXiv preprint arXiv:2112.01554},
  year={2021}
}

Acknowledgements

This project has received funding from the DFG in the joint German-Japan-France grant agreement (RO 4804/3-1) and the ERC Starting Grant Scan2CAD (804724). We also thank the Center for Information Services and High Performance Computing (ZIH) at TU Dresden for generous allocations of computer time.

Data & Code for ACCENTOR Adding Chit-Chat to Enhance Task-Oriented Dialogues

ACCENTOR: Adding Chit-Chat to Enhance Task-Oriented Dialogues Overview ACCENTOR consists of the human-annotated chit-chat additions to the 23.8K dialo

Facebook Research 69 Dec 29, 2022
Aligning Latent and Image Spaces to Connect the Unconnectable

About This repo contains the official implementation of the Aligning Latent and Image Spaces to Connect the Unconnectable paper. It is a GAN model whi

Ivan Skorokhodov 203 Jan 03, 2023
OrienMask: Real-time Instance Segmentation with Discriminative Orientation Maps

OrienMask This repository implements the framework OrienMask for real-time instance segmentation. It achieves 34.8 mask AP on COCO test-dev at the spe

45 Dec 13, 2022
High-Resolution Image Synthesis with Latent Diffusion Models

Latent Diffusion Models arXiv | BibTeX High-Resolution Image Synthesis with Latent Diffusion Models Robin Rombach*, Andreas Blattmann*, Dominik Lorenz

CompVis Heidelberg 5.6k Dec 30, 2022
Procedural 3D data generation pipeline for architecture

Synthetic Dataset Generator Authors: Stanislava Fedorova Alberto Tono Meher Shashwat Nigam Jiayao Zhang Amirhossein Ahmadnia Cecilia bolognesi Dominik

Computational Design Institute 49 Nov 25, 2022
codes for Self-paced Deep Regression Forests with Consideration on Ranking Fairness

Self-paced Deep Regression Forests with Consideration on Ranking Fairness This is official codes for paper Self-paced Deep Regression Forests with Con

Learning in Vision 4 Sep 11, 2022
An All-MLP solution for Vision, from Google AI

MLP Mixer - Pytorch An All-MLP solution for Vision, from Google AI, in Pytorch. No convolutions nor attention needed! Yannic Kilcher video Install $ p

Phil Wang 784 Jan 06, 2023
Implement object segmentation on images using HOG algorithm proposed in CVPR 2005

HOG Algorithm Implementation Description HOG (Histograms of Oriented Gradients) Algorithm is an algorithm aiming to realize object segmentation (edge

Leo Hsieh 2 Mar 12, 2022
It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

CLIP-ONNX It is a simple library to speed up CLIP inference up to 3x (K80 GPU) Usage Install clip-onnx module and requirements first. Use this trick !

Gerasimov Maxim 93 Dec 20, 2022
A heterogeneous entity-augmented academic language model based on Open Academic Graph (OAG)

Library | Paper | Slack We released two versions of OAG-BERT in CogDL package. OAG-BERT is a heterogeneous entity-augmented academic language model wh

THUDM 58 Dec 17, 2022
RETRO-pytorch - Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

RETRO - Pytorch (wip) Implementation of RETRO, Deepmind's Retrieval based Attent

Phil Wang 556 Jan 04, 2023
A real-time approach for mapping all human pixels of 2D RGB images to a 3D surface-based model of the body

DensePose: Dense Human Pose Estimation In The Wild Rıza Alp Güler, Natalia Neverova, Iasonas Kokkinos [densepose.org] [arXiv] [BibTeX] Dense human pos

Meta Research 6.4k Jan 01, 2023
A deep learning CNN model to identify and classify and check if a person is wearing a mask or not.

Face Mask Detection The Model is designed to check if any human is wearing a mask or not. Dataset Description The Dataset contains a total of 11,792 i

1 Mar 01, 2022
LLVM-based compiler for LightGBM gradient-boosted trees. Speeds up prediction by ≥10x.

LLVM-based compiler for LightGBM gradient-boosted trees. Speeds up prediction by ≥10x.

Simon Boehm 183 Jan 02, 2023
Weakly-supervised object detection.

Wetectron Wetectron is a software system that implements state-of-the-art weakly-supervised object detection algorithms. Project CVPR'20, ECCV'20 | Pa

NVIDIA Research Projects 342 Jan 05, 2023
RSNA Intracranial Hemorrhage Detection with python

RSNA Intracranial Hemorrhage Detection This is the source code for the first place solution to the RSNA2019 Intracranial Hemorrhage Detection Challeng

24 Nov 30, 2022
My tensorflow implementation of "A neural conversational model", a Deep learning based chatbot

Deep Q&A Table of Contents Presentation Installation Running Chatbot Web interface Results Pretrained model Improvements Upgrade Presentation This wor

Conchylicultor 2.9k Dec 28, 2022
Code for Referring Image Segmentation via Cross-Modal Progressive Comprehension, CVPR2020.

CMPC-Refseg Code of our CVPR 2020 paper Referring Image Segmentation via Cross-Modal Progressive Comprehension. Shaofei Huang*, Tianrui Hui*, Si Liu,

spyflying 55 Dec 01, 2022
EGNN - Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch

EGNN - Pytorch Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch. May be eventually used for Alphafold2 replication. This

Phil Wang 259 Jan 04, 2023
All of the figures and notebooks for my deep learning book, for free!

"Deep Learning - A Visual Approach" by Andrew Glassner This is the official repo for my book from No Starch Press. Ordering the book My book is called

Andrew Glassner 227 Jan 04, 2023