Code for the SIGGRAPH 2021 paper "Consistent Depth of Moving Objects in Video".

Overview

Consistent Depth of Moving Objects in Video

teaser

This repository contains training code for the SIGGRAPH 2021 paper "Consistent Depth of Moving Objects in Video".

This is not an officially supported Google product.

Installing Dependencies

We provide both conda and pip installations for dependencies.

  • To install with conda, run
conda create --name dynamic-video-depth --file ./dependencies/conda_packages.txt
  • To install with pip, run
pip install -r ./dependencies/requirements.txt

Training

We provide two preprocessed video tracks from the DAVIS dataset. To download the pre-trained single-image depth prediction checkpoints, as well as the example data, run:

bash ./scripts/download_data_and_depth_ckpt.sh

This script will automatically download and unzip the checkpoints and data. If you would like to download manually

To train using the example data, run:

bash ./experiments/davis/train_sequence.sh 0 --track_id dog

The first argument indicates the GPU id for training, and --track_id indicates the name of the track. ('dog' and 'train' are provided.)

After training, the results should look like:

Video Our Depth Single Image Depth

Dataset Preparation:

To help with generating custom datasets for training, We provide examples of preparing the dataset from DAVIS, and two sequences from ShutterStock, which are showcased in our paper.

The general work flow for preprocessing the dataset is:

  1. Calibrate the scale of camera translation, transform the camera matrices into camera-to-world convention, and save as individual files.

  2. Calculate flow between pairs of frames, as well as occlusion estimates.

  3. Pack flow and per-frame data into training batches.

To be more specific, example codes are provided in .scripts/preprocess

We provide the triangulation results here and here. You can download them in a single script by running:

bash ./scripts/download_triangulation_files.sh

Davis data preparation

  1. Download the DAVIS dataset here, and unzip it under ./datafiles.

  2. Run python ./scripts/preprocess/davis/generate_frame_midas.py. This requires trimesh to be installed (pip install trimesh should do the trick). This script projects the triangulated 3D points to calibrate camera translation scales.

  3. Run python ./scripts/preprocess/davis/generate_flows.py to generate optical flows between pairs of images. This stage requires RAFT, which is included as a submodule in this repo.

  4. Run python ./scripts/preprocess/davis/generate_sequence_midas.py to pack camera calibrations and images into training batches.

ShutterStock Videos

  1. Download the ShutterStock videos here and here.

  2. Cast the videos as images, put them under ./datafiles/shutterstock/images, and rename them to match the file names in ./datafiles/shutterstock/triangulation. Note that not all frames are triangulated; time stamp of valid frames are recorded in the triangulation file name.

  3. Run python ./scripts/preprocess/shutterstock/generate_frame_midas.py to pack per-frame data.

  4. Run python ./scripts/preprocess/shutterstock/generate_flows.py to generate optical flows between pairs of images.

  5. Run python ./scripts/preprocess/shutterstock/generate_sequence_midas.py to pack flows and per-frame data into training batches.

  6. Example training script is located at ./experiments/shutterstock/train_sequence.sh

Comments
  • question about the Pre-processing

    question about the Pre-processing

    Can you provide the code for preprocessing part? I wonder for dynamic video, how to get accurate camera pose and K? I see you use DAVIS for example, I want to know how to deal with other videos in this dataset.

    opened by Robertwyq 11
  • Parameter finetuning vs Output finetuning

    Parameter finetuning vs Output finetuning

    It seems that running gradient descent for the depth prediction network makes up the majority of the runtime of this method. The current MiDaS implementation (v3?) contains 1.3 GB of parameters, most of which are for the DPT-Large (https://github.com/isl-org/DPT) backbone.

    In your research, did you experiment with performance differences between 'parameter finetuning' and just simple 'output finetuning' for the depth predictions (like as discussed in the GLNet paper (https://arxiv.org/pdf/1907.05820.pdf))?

    I would also be curious about whether as a middle ground, maybe just finetuning the 'head' of the MiDaS network would be sufficient, and leave the much larger set of backbone parameters locked.

    Thanks!

    opened by carsonswope 0
  • How to get the triangulation files for customized videos?

    How to get the triangulation files for customized videos?

    Thanks for sharing this great work!

    I was wondering how to obtain the triangulation files when using my own videos. For example, the dog.intrinsics.txt, dog.matrices.txt, and the dog.obj.

    Are they calculated from colmap? Could you please provide some instructions to get them?

    opened by Cogito2012 0
  • Question about the colmap parameter setting and image resize need to convert the camera pose

    Question about the colmap parameter setting and image resize need to convert the camera pose

    This is very useful work, thanks. I use colmap automatic_reconstructor --camera_model FULL_OPENCV to process the dog training set in DAVIS to get the camera pose, then replacing ./datafiles/DAVIS/triangulation/, other training codes have not changed, but the depth result of each frame has become much worse. How to set the specific parameters of colmap preprocessing? In addition, the image is resized to a small image during training, does the camera pose information obtained by colmap need to be transformed according to resize?

    opened by mayunchao1994 2
  • Question about triangulation results file

    Question about triangulation results file

    This is a great project, Thanks for your work. I have download triangulation results from your link, but i only found dog.intrinsics.txt and train.intrinsics.txt, In DAVIS-2017-trainval-Full-Resolution.zip file, There are 90 files in it, I was wondering if you could share all the triangulation files about Davis and ShutterStock dataset, Thanks very much.

    opened by aiforworlds 0
  • Can not reproduce training result

    Can not reproduce training result

    As it has been mentioned in issue #9 "DAVIS datafiles uncomplete": "datafiles.tar in provided "Google Drive" download link consists only triangulation data. There are no "JPEGImages/1080p" and "Annotation//1080p" folders that "python ./scripts/preprocess/davis/generate_frame_midas.py" refers to." So, I manually downloaded missing data from https://data.vision.ee.ethz.ch/csergi/share/davis/DAVIS-2017-Unsupervised-trainval-Full-Resolution.zip After that the structure as follow:

    ├── datafiles
        ├── DAVIS
            ├── Annotations  --- missing in supplied download links, downloaded manually from DAVIS datasets 
                ├── 1080p
                    ├── dog
                    ├── train
            ├── JPEGImages  --- missing in supplied download links, downloaded manually from DAVIS datasets 
                ├── 1080p
                    ├── dog
                    ├── train
            ├── triangulation -- data from supplied link
    

    Only after that I could successfully performed all steps of suggested in "Davis data preparation":

    1. Run python ./scripts/preprocess/davis/generate_frame_midas.py.
    2. Run python ./scripts/preprocess/davis/generate_flows.py
    3. Run python ./scripts/preprocess/davis/generate_sequence_midas.py

    However still couldn't reproduce the presented result, running: bash ./experiments/davis/train_sequence.sh 0 --track_id dog

    Output & Stacktrace:

    
    D:\dynamic-video-depth-main>bash ./experiments/davis/train_sequence.sh 0 --track_id dog
    python train.py --net scene_flow_motion_field --dataset davis_sequence --track_id train --log_time --epoch_batches 2000 --epoch 20 --lr 1e-6 --html_logger --vali_batches 150 --batch_size 1 --optim adam --vis_batches_vali 4 --vis_every_vali 1 --vis_every_train 1 --vis_batches_train 5 --vis_at_start --tensorboard --gpu 0 --save_net 1 --workers 4 --one_way --loss_type l1 --l1_mul 0 --acc_mul 1 --disp_mul 1 --warm_sf 5 --scene_lr_mul 1000 --repeat 1 --flow_mul 1 --sf_mag_div 100 --time_dependent --gaps 1,2,4,6,8 --midas --use_disp --logdir './checkpoints/davis/sequence/' --suffix 'track_{track_id}_{loss_type}_wreg_{warm_reg}_acc_{acc_mul}_disp_{disp_mul}_flowmul_{flow_mul}_time_{time_dependent}_CNN_{use_cnn}_gap_{gaps}_Midas_{midas}_ud_{use_disp}' --test_template './experiments/davis/test_cmd.txt' --force_overwrite --track_id dog
      File "train.py", line 106
        str_warning, f'ignoring the gpu set up in opt: {opt.gpu}. Will use all gpus in each node.')
                                                                                                 ^
    SyntaxError: invalid syntax
    

    Noticed that there is no folder named ".checkpoints"

    Similar issue has been mentioned in issue #8 "SyntaxError: invalid syntax"

    Specs: Windows 10 Anaconda: conda 4.11.0 Python 3.7.10 GPU 12Gb Quadro M6000 All specified dependencies including RAFT are installed

    opened by makemota 0
  • DAVIS datafiles uncomplete?

    DAVIS datafiles uncomplete?

    "datafiles.tar" in provided "Google Drive" download link consists only triangulation data. There are no "JPEGImages/1080p" and "Annotation//1080p" folders that "python ./scripts/preprocess/davis/generate_frame_midas.py" refers to:

    ---
    data_list_root = "./datafiles/DAVIS/JPEGImages/1080p"
    camera_path = "./datafiles/DAVIS/triangulation"
    mask_path = './datafiles/DAVIS/Annotations/1080p'
    ---
    
    opened by semel1 1
Releases(sig2021_code_release)
Owner
Google
Google ❤️ Open Source
Google
Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

111 Dec 29, 2022
DeepCO3: Deep Instance Co-segmentation by Co-peak Search and Co-saliency

[CVPR19] DeepCO3: Deep Instance Co-segmentation by Co-peak Search and Co-saliency (Oral paper) Authors: Kuang-Jui Hsu, Yen-Yu Lin, Yung-Yu Chuang PDF:

Kuang-Jui Hsu 139 Dec 22, 2022
Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper] DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context G

Cheng Zhang 66 Nov 16, 2022
PRIME: A Few Primitives Can Boost Robustness to Common Corruptions

PRIME: A Few Primitives Can Boost Robustness to Common Corruptions This is the official repository of PRIME, the data agumentation method introduced i

Apostolos Modas 34 Oct 30, 2022
Mask-invariant Face Recognition through Template-level Knowledge Distillation

Mask-invariant Face Recognition through Template-level Knowledge Distillation This is the official repository of "Mask-invariant Face Recognition thro

Fadi Boutros 35 Dec 06, 2022
The backbone CSPDarkNet of YOLOX.

YOLOX-Backbone The backbone CSPDarkNet of YOLOX. In this project, you can enjoy: CSPDarkNet-S CSPDarkNet-M CSPDarkNet-L CSPDarkNet-X CSPDarkNet-Tiny C

Jianhua Yang 9 Aug 22, 2022
Face recognition system using MTCNN, FACENET, SVM and FAST API to track participants of Big Brother Brasil in real time.

BBB Face Recognizer Face recognition system using MTCNN, FACENET, SVM and FAST API to track participants of Big Brother Brasil in real time. Instalati

Rafael Azevedo 232 Dec 24, 2022
PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

Introduction PyTorch3D provides efficient, reusable components for 3D Computer Vision research with PyTorch. Key features include: Data structure for

Facebook Research 6.8k Jan 01, 2023
Classify the disease status of a plant given an image of a passion fruit

Passion Fruit Disease Detection I tried to create an accurate machine learning models capable of localizing and identifying multiple Passion Fruits in

3 Nov 09, 2021
Artificial Neural network regression model to predict the energy output in a combined cycle power plant.

Energy_Output_Predictor Artificial Neural network regression model to predict the energy output in a combined cycle power plant. Abstract Energy outpu

1 Feb 11, 2022
Finetune alexnet with tensorflow - Code for finetuning AlexNet in TensorFlow >= 1.2rc0

Finetune AlexNet with Tensorflow Update 15.06.2016 I revised the entire code base to work with the new input pipeline coming with TensorFlow = versio

Frederik Kratzert 766 Jan 04, 2023
Repository for code and dataset for our EMNLP 2021 paper - “So You Think You’re Funny?”: Rating the Humour Quotient in Standup Comedy.

AI-OpenMic Dataset The dataset is available for download via the follwing link. Repository for code and dataset for our EMNLP 2021 paper - “So You Thi

6 Oct 26, 2022
codebase for "A Theory of the Inductive Bias and Generalization of Kernel Regression and Wide Neural Networks"

Eigenlearning This repo contains code for replicating the experiments of the paper A Theory of the Inductive Bias and Generalization of Kernel Regress

Jamie Simon 45 Dec 02, 2022
Pytorch implementation of MixNMatch

MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation [Paper] Yuheng Li, Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Le

910 Dec 30, 2022
EfficientNetV2 implementation using PyTorch

EfficientNetV2-S implementation using PyTorch Train Steps Configure imagenet path by changing data_dir in train.py python main.py --benchmark for mode

Jahongir Yunusov 86 Dec 29, 2022
BabelCalib: A Universal Approach to Calibrating Central Cameras. In ICCV (2021)

BabelCalib: A Universal Approach to Calibrating Central Cameras This repository contains the MATLAB implementation of the BabelCalib calibration frame

Yaroslava Lochman 55 Dec 30, 2022
PESTO: Switching Point based Dynamic and Relative Positional Encoding for Code-Mixed Languages

PESTO: Switching Point based Dynamic and Relative Positional Encoding for Code-Mixed Languages Abstract NLP applications for code-mixed (CM) or mix-li

Mohsin Ali, Mohammed 1 Nov 12, 2021
Snscrape-jsonl-urls-extractor - Extracts urls from jsonl produced by snscrape

snscrape-jsonl-urls-extractor extracts urls from jsonl produced by snscrape Usag

1 Feb 26, 2022
DI-smartcross - Decision Intelligence Platform for Traffic Crossing Signal Control

DI-smartcross DI-smartcross - Decision Intelligence Platform for Traffic Crossin

OpenDILab 213 Jan 02, 2023
This is a collection of simple PyTorch implementations of neural networks and related algorithms. These implementations are documented with explanations,

labml.ai Deep Learning Paper Implementations This is a collection of simple PyTorch implementations of neural networks and related algorithms. These i

labml.ai 16.4k Jan 09, 2023