Code for the SIGGRAPH 2021 paper "Consistent Depth of Moving Objects in Video".

Overview

Consistent Depth of Moving Objects in Video

teaser

This repository contains training code for the SIGGRAPH 2021 paper "Consistent Depth of Moving Objects in Video".

This is not an officially supported Google product.

Installing Dependencies

We provide both conda and pip installations for dependencies.

  • To install with conda, run
conda create --name dynamic-video-depth --file ./dependencies/conda_packages.txt
  • To install with pip, run
pip install -r ./dependencies/requirements.txt

Training

We provide two preprocessed video tracks from the DAVIS dataset. To download the pre-trained single-image depth prediction checkpoints, as well as the example data, run:

bash ./scripts/download_data_and_depth_ckpt.sh

This script will automatically download and unzip the checkpoints and data. If you would like to download manually

To train using the example data, run:

bash ./experiments/davis/train_sequence.sh 0 --track_id dog

The first argument indicates the GPU id for training, and --track_id indicates the name of the track. ('dog' and 'train' are provided.)

After training, the results should look like:

Video Our Depth Single Image Depth

Dataset Preparation:

To help with generating custom datasets for training, We provide examples of preparing the dataset from DAVIS, and two sequences from ShutterStock, which are showcased in our paper.

The general work flow for preprocessing the dataset is:

  1. Calibrate the scale of camera translation, transform the camera matrices into camera-to-world convention, and save as individual files.

  2. Calculate flow between pairs of frames, as well as occlusion estimates.

  3. Pack flow and per-frame data into training batches.

To be more specific, example codes are provided in .scripts/preprocess

We provide the triangulation results here and here. You can download them in a single script by running:

bash ./scripts/download_triangulation_files.sh

Davis data preparation

  1. Download the DAVIS dataset here, and unzip it under ./datafiles.

  2. Run python ./scripts/preprocess/davis/generate_frame_midas.py. This requires trimesh to be installed (pip install trimesh should do the trick). This script projects the triangulated 3D points to calibrate camera translation scales.

  3. Run python ./scripts/preprocess/davis/generate_flows.py to generate optical flows between pairs of images. This stage requires RAFT, which is included as a submodule in this repo.

  4. Run python ./scripts/preprocess/davis/generate_sequence_midas.py to pack camera calibrations and images into training batches.

ShutterStock Videos

  1. Download the ShutterStock videos here and here.

  2. Cast the videos as images, put them under ./datafiles/shutterstock/images, and rename them to match the file names in ./datafiles/shutterstock/triangulation. Note that not all frames are triangulated; time stamp of valid frames are recorded in the triangulation file name.

  3. Run python ./scripts/preprocess/shutterstock/generate_frame_midas.py to pack per-frame data.

  4. Run python ./scripts/preprocess/shutterstock/generate_flows.py to generate optical flows between pairs of images.

  5. Run python ./scripts/preprocess/shutterstock/generate_sequence_midas.py to pack flows and per-frame data into training batches.

  6. Example training script is located at ./experiments/shutterstock/train_sequence.sh

Comments
  • question about the Pre-processing

    question about the Pre-processing

    Can you provide the code for preprocessing part? I wonder for dynamic video, how to get accurate camera pose and K? I see you use DAVIS for example, I want to know how to deal with other videos in this dataset.

    opened by Robertwyq 11
  • Parameter finetuning vs Output finetuning

    Parameter finetuning vs Output finetuning

    It seems that running gradient descent for the depth prediction network makes up the majority of the runtime of this method. The current MiDaS implementation (v3?) contains 1.3 GB of parameters, most of which are for the DPT-Large (https://github.com/isl-org/DPT) backbone.

    In your research, did you experiment with performance differences between 'parameter finetuning' and just simple 'output finetuning' for the depth predictions (like as discussed in the GLNet paper (https://arxiv.org/pdf/1907.05820.pdf))?

    I would also be curious about whether as a middle ground, maybe just finetuning the 'head' of the MiDaS network would be sufficient, and leave the much larger set of backbone parameters locked.

    Thanks!

    opened by carsonswope 0
  • How to get the triangulation files for customized videos?

    How to get the triangulation files for customized videos?

    Thanks for sharing this great work!

    I was wondering how to obtain the triangulation files when using my own videos. For example, the dog.intrinsics.txt, dog.matrices.txt, and the dog.obj.

    Are they calculated from colmap? Could you please provide some instructions to get them?

    opened by Cogito2012 0
  • Question about the colmap parameter setting and image resize need to convert the camera pose

    Question about the colmap parameter setting and image resize need to convert the camera pose

    This is very useful work, thanks. I use colmap automatic_reconstructor --camera_model FULL_OPENCV to process the dog training set in DAVIS to get the camera pose, then replacing ./datafiles/DAVIS/triangulation/, other training codes have not changed, but the depth result of each frame has become much worse. How to set the specific parameters of colmap preprocessing? In addition, the image is resized to a small image during training, does the camera pose information obtained by colmap need to be transformed according to resize?

    opened by mayunchao1994 2
  • Question about triangulation results file

    Question about triangulation results file

    This is a great project, Thanks for your work. I have download triangulation results from your link, but i only found dog.intrinsics.txt and train.intrinsics.txt, In DAVIS-2017-trainval-Full-Resolution.zip file, There are 90 files in it, I was wondering if you could share all the triangulation files about Davis and ShutterStock dataset, Thanks very much.

    opened by aiforworlds 0
  • Can not reproduce training result

    Can not reproduce training result

    As it has been mentioned in issue #9 "DAVIS datafiles uncomplete": "datafiles.tar in provided "Google Drive" download link consists only triangulation data. There are no "JPEGImages/1080p" and "Annotation//1080p" folders that "python ./scripts/preprocess/davis/generate_frame_midas.py" refers to." So, I manually downloaded missing data from https://data.vision.ee.ethz.ch/csergi/share/davis/DAVIS-2017-Unsupervised-trainval-Full-Resolution.zip After that the structure as follow:

    ├── datafiles
        ├── DAVIS
            ├── Annotations  --- missing in supplied download links, downloaded manually from DAVIS datasets 
                ├── 1080p
                    ├── dog
                    ├── train
            ├── JPEGImages  --- missing in supplied download links, downloaded manually from DAVIS datasets 
                ├── 1080p
                    ├── dog
                    ├── train
            ├── triangulation -- data from supplied link
    

    Only after that I could successfully performed all steps of suggested in "Davis data preparation":

    1. Run python ./scripts/preprocess/davis/generate_frame_midas.py.
    2. Run python ./scripts/preprocess/davis/generate_flows.py
    3. Run python ./scripts/preprocess/davis/generate_sequence_midas.py

    However still couldn't reproduce the presented result, running: bash ./experiments/davis/train_sequence.sh 0 --track_id dog

    Output & Stacktrace:

    
    D:\dynamic-video-depth-main>bash ./experiments/davis/train_sequence.sh 0 --track_id dog
    python train.py --net scene_flow_motion_field --dataset davis_sequence --track_id train --log_time --epoch_batches 2000 --epoch 20 --lr 1e-6 --html_logger --vali_batches 150 --batch_size 1 --optim adam --vis_batches_vali 4 --vis_every_vali 1 --vis_every_train 1 --vis_batches_train 5 --vis_at_start --tensorboard --gpu 0 --save_net 1 --workers 4 --one_way --loss_type l1 --l1_mul 0 --acc_mul 1 --disp_mul 1 --warm_sf 5 --scene_lr_mul 1000 --repeat 1 --flow_mul 1 --sf_mag_div 100 --time_dependent --gaps 1,2,4,6,8 --midas --use_disp --logdir './checkpoints/davis/sequence/' --suffix 'track_{track_id}_{loss_type}_wreg_{warm_reg}_acc_{acc_mul}_disp_{disp_mul}_flowmul_{flow_mul}_time_{time_dependent}_CNN_{use_cnn}_gap_{gaps}_Midas_{midas}_ud_{use_disp}' --test_template './experiments/davis/test_cmd.txt' --force_overwrite --track_id dog
      File "train.py", line 106
        str_warning, f'ignoring the gpu set up in opt: {opt.gpu}. Will use all gpus in each node.')
                                                                                                 ^
    SyntaxError: invalid syntax
    

    Noticed that there is no folder named ".checkpoints"

    Similar issue has been mentioned in issue #8 "SyntaxError: invalid syntax"

    Specs: Windows 10 Anaconda: conda 4.11.0 Python 3.7.10 GPU 12Gb Quadro M6000 All specified dependencies including RAFT are installed

    opened by makemota 0
  • DAVIS datafiles uncomplete?

    DAVIS datafiles uncomplete?

    "datafiles.tar" in provided "Google Drive" download link consists only triangulation data. There are no "JPEGImages/1080p" and "Annotation//1080p" folders that "python ./scripts/preprocess/davis/generate_frame_midas.py" refers to:

    ---
    data_list_root = "./datafiles/DAVIS/JPEGImages/1080p"
    camera_path = "./datafiles/DAVIS/triangulation"
    mask_path = './datafiles/DAVIS/Annotations/1080p'
    ---
    
    opened by semel1 1
Releases(sig2021_code_release)
Owner
Google
Google ❤️ Open Source
Google
Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis

Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis This is a PyTorch implementation of the model described in our pape

qzhb 6 Jul 08, 2021
Second-Order Neural ODE Optimizer, NeurIPS 2021 spotlight

Second-order Neural ODE Optimizer (NeurIPS 2021 Spotlight) [arXiv] ✔️ faster convergence in wall-clock time | ✔️ O(1) memory cost | ✔️ better test-tim

Guan-Horng Liu 39 Oct 22, 2022
Unofficial implementation of Proxy Anchor Loss for Deep Metric Learning

Proxy Anchor Loss for Deep Metric Learning Unofficial pytorch, tensorflow and mxnet implementations of Proxy Anchor Loss for Deep Metric Learning. Not

Geonmo Gu 3 Jun 09, 2021
Rainbow DQN implementation that outperforms the paper's results on 40% of games using 20x less data 🌈

Rainbow 🌈 An implementation of Rainbow DQN which reaches a median HNS of 205.7 after only 10M frames (the original Rainbow from Hessel et al. 2017 re

Dominik Schmidt 31 Dec 21, 2022
EfficientNetv2 TensorRT int8

EfficientNetv2_TensorRT_int8 EfficientNetv2模型实现来自https://github.com/d-li14/efficientnetv2.pytorch 环境配置 ubuntu:18.04 cuda:11.0 cudnn:8.0 tensorrt:7

34 Apr 24, 2022
Portfolio analytics for quants, written in Python

QuantStats: Portfolio analytics for quants QuantStats Python library that performs portfolio profiling, allowing quants and portfolio managers to unde

Ran Aroussi 2.7k Jan 08, 2023
The official PyTorch implementation for NCSNv2 (NeurIPS 2020)

Improved Techniques for Training Score-Based Generative Models This repo contains the official implementation for the paper Improved Techniques for Tr

174 Dec 26, 2022
Official implementation for: Blended Diffusion for Text-driven Editing of Natural Images.

Blended Diffusion for Text-driven Editing of Natural Images Blended Diffusion for Text-driven Editing of Natural Images Omri Avrahami, Dani Lischinski

328 Dec 30, 2022
Fluency ENhanced Sentence-bert Evaluation (FENSE), metric for audio caption evaluation. And Benchmark dataset AudioCaps-Eval, Clotho-Eval.

FENSE The metric, Fluency ENhanced Sentence-bert Evaluation (FENSE), for audio caption evaluation, proposed in the paper "Can Audio Captions Be Evalua

Zhiling Zhang 13 Dec 23, 2022
A simple tutoral for error correction task, based on Pytorch

gramcorrector A simple tutoral for error correction task, based on Pytorch Grammatical Error Detection (sentence-level) a binary sequence-based classi

peiyuan_gong 8 Dec 03, 2022
Official implementation of EfficientPose

EfficientPose This is the official implementation of EfficientPose. We based our work on the Keras EfficientDet implementation xuannianz/EfficientDet

2 May 17, 2022
Implementation of Change-Based Exploration Transfer (C-BET)

Implementation of Change-Based Exploration Transfer (C-BET), as presented in Interesting Object, Curious Agent: Learning Task-Agnostic Exploration.

Simone Parisi 29 Dec 04, 2022
[CVPR 2021] 'Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator'

[CVPR2021] Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator Overview This is the entire codebase for the paper

35 Dec 01, 2022
PocketNet: Extreme Lightweight Face Recognition Network using Neural Architecture Search and Multi-Step Knowledge Distillation

PocketNet This is the official repository of the paper: PocketNet: Extreme Lightweight Face Recognition Network using Neural Architecture Search and M

Fadi Boutros 40 Dec 22, 2022
[RSS 2021] An End-to-End Differentiable Framework for Contact-Aware Robot Design

DiffHand This repository contains the implementation for the paper An End-to-End Differentiable Framework for Contact-Aware Robot Design (RSS 2021). I

Jie Xu 60 Jan 04, 2023
How to Train a GAN? Tips and tricks to make GANs work

(this list is no longer maintained, and I am not sure how relevant it is in 2020) How to Train a GAN? Tips and tricks to make GANs work While research

Soumith Chintala 10.8k Dec 31, 2022
Predicting Event Memorability from Contextual Visual Semantics

Predicting Event Memorability from Contextual Visual Semantics

0 Oct 06, 2021
VGGFace2-HQ - A high resolution face dataset for face editing purpose

The first open source high resolution dataset for face swapping!!! A high resolution version of VGGFace2 for academic face editing purpose

Naiyuan Liu 232 Dec 29, 2022
The Ludii general game system, developed as part of the ERC-funded Digital Ludeme Project.

The Ludii General Game System Ludii is a general game system being developed as part of the ERC-funded Digital Ludeme Project (DLP). This repository h

Digital Ludeme Project 50 Jan 04, 2023
Speeding-Up Back-Propagation in DNN: Approximate Outer Product with Memory

Approximate Outer Product Gradient Descent with Memory Code for the numerical experiment of the paper Speeding-Up Back-Propagation in DNN: Approximate

2 Mar 02, 2022