๐Ÿ˜ฎThe official implementation of "CoNeRF: Controllable Neural Radiance Fields" ๐Ÿ˜ฎ

Overview

CoNeRF: Controllable Neural Radiance Fields

arXiv MIT license Website Datasets

This is the official implementation for "CoNeRF: Controllable Neural Radiance Fields"

The codebase is based on HyperNeRF implemente in JAX, building on JaxNeRF.

Setup

The code can be run under any environment with Python 3.8 and above. (It may run with lower versions, but we have not tested it).

We recommend using Miniconda and setting up an environment:

conda create --name conerf python=3.8

Next, install the required packages:

pip install -r requirements.txt

Install the appropriate JAX distribution for your environment by following the instructions here. For example:

# For CUDA version 11.1
pip install --upgrade "jax[cuda111]" -f https://storage.googleapis.com/jax-releases/jax_releases.html

Dataset

Basic structure

The dataset uses the same format as Nerfies for the image extraction and camera estimation.

For annotations, we create an additional file annotations.yml consisting of attribute values and their corresponding frames, and a folder with [frame_id].json files (only annotated frames are required to have a corresponding .json file) where each *.json file is a segmentation mask created with LabelMe. In summary, each dataset has to have the following structure:

<dataset>
    โ”œโ”€โ”€ annotations
    โ”‚   โ””โ”€โ”€ ${item_id}.json
    โ”œโ”€โ”€ annotations.yml
    โ”œโ”€โ”€ camera
    โ”‚   โ””โ”€โ”€ ${item_id}.json
    โ”œโ”€โ”€ camera-paths
    โ”œโ”€โ”€ colmap
    โ”œโ”€โ”€ rgb
    โ”‚   โ”œโ”€โ”€ ${scale}x
    โ”‚   โ””โ”€โ”€ โ””โ”€โ”€ ${item_id}.png
    โ”œโ”€โ”€ metadata.json
    โ”œโ”€โ”€ dataset.json
    โ”œโ”€โ”€ scene.json
    โ””โ”€โ”€ mapping.yml

The mapping.yml file can be created manually and serves to map class indices to class names which were created with LabelMe. It has the following format:

<index-from-0>: <class-name>

for example:

0: left eye
1: right eye

The annotations.yml can be created manually as well (though we encourage using the provided notebook for this task) and has the following format:

- class: <id>
  frame: <number>
  value: <attribute-value> # between -1 and 1

for example:

- class: 0 # corresponding to left eye
  frame: 128
  value: -1
- class: 1 # corresponding to right eye
  frame: 147
  value: 1
- class: 2 # corresponding to mouth
  frame: 147
  value: -1 

Principles of annotating the data

  • Our framework works well with just a bunch of annotations (for extreme points as an example). For our main face visualizations, we used just 2 annotations per attribute.
  • We highly recommend annotating these frames that are extremes of possible controllability, for example, fully eye closed will be -1 value and fully open eye will +1 value. Though it is not necessary to be exact in extremes, the more accurate annotations, the more accurate controllability you can expect
  • Each attribute can be annotated independently, i.e., there is no need to look for frames that have exactly extreme values of all attributes. For example, left eye=-1 and left eye=+1 values can be provided in frames 28 and 47, while right eye=-1 and right eye=+1 can be provided in any other frames.
  • Masks should be quite rough oversized, it is generally better to have bigger than smaller annotations.
  • The general annotation pipeline looks like this:
  1. Find set of frames that consist of extreme attributions (e.g. closed eye, open eye etc.).
  2. Provide necessary values in for attributes to be controlled in annotations.yml.
  3. Set names for these attributes (necessary for the masking part).
  4. Run LabelMe.
  5. Save annotated frames in annotations/.

Now you can run the training! Also, check out our datasets (52GB of data) to avoid any preprocessing steps on your own.

We tried our best to make our CoNeRF codebase to be general for novel view synthesis validation dataset (conerf/datasets/nerfies.py file) but we mainly focused on the interpolation task. If you have an access to the novel view synthesis rig as used in NeRFies or HyperNeRF, and you find out that something doesn't work, please leave an issue.

Providing value annotations

We extended the basic notebook used in NeRFies and HyperNeRF for processing the data so that you can annotate necessary images with attributes. Please check out notebooks/Capture_Processing.ipynb for more details. The notebook (despite all the files from NeRFies) will also generate <dataset>/annotations.yml and <dataset>/mapping.yml files.

Providing masking annotations

We adapted data loading class to handle annotations from LabelMe (we used its docker version). Example annotation for one of our datasets looks like this:

example-annotation

The program generates *.json files in File->Output Dir which should be located in <dataset>/annotations/ folder.

Training

After preparing a dataset, you can train a Nerfie by running:

export DATASET_PATH=/path/to/dataset
export EXPERIMENT_PATH=/path/to/save/experiment/to
python train.py \
    --base_folder $EXPERIMENT_PATH \
    --gin_bindings="data_dir='$DATASET_PATH'" \
    --gin_configs configs/test_local_attributes.gin

To plot telemetry to Tensorboard and render checkpoints on the fly, also launch an evaluation job by running:

python eval.py \
    --base_folder $EXPERIMENT_PATH \
    --gin_bindings="data_dir='$DATASET_PATH'" \
    --gin_configs configs/test_local_attributes.gin

The two jobs should use a mutually exclusive set of GPUs. This division allows the training job to run without having to stop for evaluation.

Configuration

  • We use Gin for configuration.
  • We provide a couple preset configurations.
  • Please refer to config.py for documentation on what each configuration does.
  • Preset configs:
    • baselines/: All configs that were used to perform quantitative evaluation in the experiments, including baseline methods. The _proj suffix denotes a method that uses a learnable projection.
      • ours.gin: The full CoNeRF architecture with masking.
      • hypernerf_ap[_proj].gin: The axis-aligned plane configuration for HyperNeRF.
      • hypernerf_ds[_proj].gin: The deformable surface configuration for HyperNeRF.
      • nerf_latent[_proj].gin: The configuration for a simple baselines where we concatenate a learnable latent with each coordinate (resembles HyperNeRF AP without the warping field).
      • nerfies[_proj].gin: The configuration for the NeRFies model.
      • nerf.gin: The configuration for the simplest NeRF architecture.
    • full-hd/, hd/ and post/: We repurposed our baselines/ours.gin configuration for training for different resolutions and different sampling parameters that increase the quality of the generated images. Using post/ours.gin required us to use 4x A100 GPU for 2 weeks to make the training converge.

Synthetic dataset

We generated the synthetic dataset using Kubric. You can find the generation script here. After generating the dataset, you can run prepare_kubric_dataset.py to canonicalize its format to the same one that works with CoNeRF. The dataset is already attached in the provided zip file.

Additional scripts

All scripts below are used as the ones for training, they need $EXPERIMENT_PATH and $DATASET_PATH to be specified. They save the results into $EXPERIMENT_PATH.

  • render_changing_attributes.py: Renders each of changing attributes under a fixed camera.
  • render_video.py: Renders changing view under a fixed set of attributes.
  • render_all.py: Renders dynamically changing attributes and the camera parameters.
  • train_lr.py: Estimates parameters of the linear regression. The estimated model maps highly dimensional embedding into controllable attributes.

Additional notes

  • We have used notebooks/Results.ipynb to generate tables/visualizations for the article. While it may not particularily useful for you case, we have left it so you can copy or reuse some of its snippets. It's especially useful because it shows how to extract data from tensorboards.
  • We removed some of notebooks that were available in the HyperNeRF's codebase (ex. for training) but were no longer applicable to CoNeRF. We highly recommend using available scripts. If you have ever managed to adapt HyperNeRF's notebooks, please leave a pull request.

Citing

If you find our work useful, please consider citing:

@inproceedings{kania2022conerf,
  title     = {{CoNeRF: Controllable Neural Radiance Fields}},
  author    = {Kania, Kacper and Yi, Kwang Moo and Kowalski, Marek and Trzci{\'n}ski, Tomasz and Tagliasacchi, Andrea},
  booktitle   = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year      = {2022}
}
Owner
Kacper Kania
PhDing in Neural Human Rendering ... ๐Ÿ‘€
Kacper Kania
Task-based end-to-end model learning in stochastic optimization

Task-based End-to-end Model Learning in Stochastic Optimization This repository is by Priya L. Donti, Brandon Amos, and J. Zico Kolter and contains th

CMU Locus Lab 164 Dec 29, 2022
A deep learning framework for historical document image analysis

DIVA-DAF Description A deep learning framework for historical document image analysis. How to run Install dependencies # clone project git clone https

9 Aug 04, 2022
Visualization toolkit for neural networks in PyTorch! Demo -->

FlashTorch A Python visualization toolkit, built with PyTorch, for neural networks in PyTorch. Neural networks are often described as "black box". The

Misa Ogura 692 Dec 29, 2022
My take on a practical implementation of Linformer for Pytorch.

Linformer Pytorch Implementation A practical implementation of the Linformer paper. This is attention with only linear complexity in n, allowing for v

Peter 349 Dec 25, 2022
Code and Data for NeurIPS2021 Paper "A Dataset for Answering Time-Sensitive Questions"

Time-Sensitive-QA The repo contains the dataset and code for NeurIPS2021 (dataset track) paper Time-Sensitive Question Answering dataset. The dataset

wenhu chen 35 Nov 14, 2022
Official implementation for the paper "Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection"

Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection PyTorch code release of the paper "Attentive Prototypes for Sour

Deepti Hegde 23 Oct 17, 2022
LF-YOLO (Lighter and Faster YOLO) is used to detect defect of X-ray weld image.

This project is based on ultralytics/yolov3. LF-YOLO (Lighter and Faster YOLO) is used to detect defect of X-ray weld image. Download $ git clone http

26 Dec 13, 2022
MediaPipeใฎPythonใƒ‘ใƒƒใ‚ฑใƒผใ‚ธใฎใ‚ตใƒณใƒ—ใƒซใงใ™ใ€‚2020/12/11ๆ™‚็‚นใงPythonๅฎŸ่ฃ…ใฎใ‚ใ‚‹4ๆฉŸ่ƒฝ(Handsใ€Poseใ€Face Meshใ€Holistic)ใซใคใ„ใฆ็”จๆ„ใ—ใฆใ„ใพใ™ใ€‚

mediapipe-python-sample MediaPipeใฎPythonใƒ‘ใƒƒใ‚ฑใƒผใ‚ธใฎใ‚ตใƒณใƒ—ใƒซใงใ™ใ€‚ 2020/12/11ๆ™‚็‚นใงPythonๅฎŸ่ฃ…ใฎใ‚ใ‚‹ไปฅไธ‹4ๆฉŸ่ƒฝใซใคใ„ใฆ็”จๆ„ใ—ใฆใ„ใพใ™ใ€‚ Hands Pose Face Mesh Holistic Requirement mediapipe 0.

KazuhitoTakahashi 217 Dec 12, 2022
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Deep Daze mist over green hills shattered plates on the grass cosmic love and attention a time traveler in the crowd life during the plague meditative

Phil Wang 4.4k Jan 03, 2023
Tensorflow implementation and notebooks for Implicit Maximum Likelihood Estimation

tf-imle Tensorflow 2 and PyTorch implementation and Jupyter notebooks for Implicit Maximum Likelihood Estimation (I-MLE) proposed in the NeurIPS 2021

NEC Laboratories Europe 69 Dec 13, 2022
Diverse Image Generation via Self-Conditioned GANs

Diverse Image Generation via Self-Conditioned GANs Project | Paper Diverse Image Generation via Self-Conditioned GANs Steven Liu, Tongzhou Wang, David

Steven Liu 147 Dec 03, 2022
OpenFace โ€“ a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

OpenFace 2.2.0: a facial behavior analysis toolkit Over the past few years, there has been an increased interest in automatic facial behavior analysis

Tadas Baltrusaitis 5.8k Dec 31, 2022
Use AI to generate a optimized stock portfolio

Use AI, Modern Portfolio Theory, and Monte Carlo simulation's to generate a optimized stock portfolio that minimizes risk while maximizing returns. Ho

Greg James 30 Dec 22, 2022
The Official PyTorch Implementation of "VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models" (ICLR 2021 spotlight paper)

Official PyTorch implementation of "VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models" (ICLR 2021 Spotlight Paper) Zhisheng

NVIDIA Research Projects 45 Dec 26, 2022
Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop) (Pronounced as "strog") Paper Arxiv Why it matters? Scene Text Recognition (STR) req

Rowel Atienza 152 Dec 28, 2022
Code for "Typilus: Neural Type Hints" PLDI 2020

Typilus A deep learning algorithm for predicting types in Python. Please find a preprint here. This repository contains its implementation (src/) and

47 Nov 08, 2022
Continuous Augmented Positional Embeddings (CAPE) implementation for PyTorch

PyTorch implementation of Continuous Augmented Positional Embeddings (CAPE), by Likhomanenko et al. Enhance your Transformer positional embeddings with easy-to-use augmentations!

Guillermo Cรกmbara 26 Dec 13, 2022
Collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

Jun Chen 139 Dec 21, 2022
Parris, the automated infrastructure setup tool for machine learning algorithms.

README Parris, the automated infrastructure setup tool for machine learning algorithms. What Is This Tool? Parris is a tool for automating the trainin

Joseph Greene 319 Aug 02, 2022
AutoVideo: An Automated Video Action Recognition System

AutoVideo is a system for automated video analysis. It is developed based on D3M infrastructure, which describes machine learning with generic pipeline languages. Currently, it focuses on video actio

Data Analytics Lab at Texas A&M University 267 Dec 17, 2022