Apply our monocular depth boosting to your own network!

Overview

MergeNet - Boost Your Own Depth

Boost custom or edited monocular depth maps using MergeNet

Input Original result After manual editing of base
patchselection patchselection patchselection

You can find our Google Colaboratory notebook here. Open In Colab

In this repository, we present a stand-alone implementation of our merging operator we use in our recent work:

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

S. Mahdi H. Miangoleh*, Sebastian Dille*, Long Mai, Sylvain Paris, Yağız Aksoy. Video, Main pdf, Supplementary pdf, Project Page. Github repo.

If you are an artist:

Although we are presenting few simple examples here, both low-resolution and high-resolution depth maps can be freely edited using any program before merging with our method.

Feel free to experiment and share your results with us!

If you are a researcher developing a new (CNN-based) Monocular Depth Estimation method:

This repository is a full implementation of our double-estimation framework. Double estimation uses a base-resolution result and a high-resolution result. The optimum high-resolution for a given image, R20 resolution, depends on the receptive field size of your network (the training resolution is a good approximation) and the image content. The code for R20 computation is also provided here.

To demonstrate the high-resolution performance of your network, you can simply generate the base and high-res estimates on any dataset and use this repository to apply our double estimation method to your own work.

Our Github repo for the main project also includes the implementation of our detail-focused monocular depth performance metric D^3R.

Mix'n'match depths from different networks or use your own custom-edited ones.

In the image below, we show that choosing a different base estimate can improve the depth for the city:

Input Base and details from [MiDaS][1] Base from [LeRes][2] and details from [MiDaS][1]
patchselection patchselection patchselection

To get the optimal result for a given scene, you may want to try multiple methods in both low- and high-resolutions and pick your favourite for each case.

Input Base from [MiDaS v3 / DPT][3] Base from [MiDaS v3 / DPT][3] and details from [MiDaS v2][1]
patchselection patchselection patchselection

Moreover, you can simply edit the base image before merging using any image editing tool for more creative control:

Input Base and details from [MiDaS][1] With edited base from [MiDaS][1]
patchselection patchselection patchselection

How does it work?

merge

This repository lets you combine two input depth maps with certain characteristics.

Low-res base depth

The network uses the base estimate as the main structure of the scene. Typically this is the default-resolution result of a monocular depth estimation network at around 300x300 resolution.

This base estimate is a good candidate for editing due to its low-resolution nature.

Monocular depth estimation methods with geometric consistency optimizations can be used as the base estimation to merge details onto a consistent base.

High-res depth with details

The merging operation transfers the details from this high-resolution depth map onto the structure provided by the low-resolution base pair.

The high-resolution input does not need structural consistency and is typically generated by feeding the input image at a much higher resolution than the training resolution of a given monocular depth estimation network.

You can compute the optimal high-resolution estimation size for a given image using our R20 resolution calculator, also provided in this repository. You can also simply use 2x or 3x resolution to simply add more details.

For more information on this project:

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

S. Mahdi H. Miangoleh*, Sebastian Dille*, Long Mai, Sylvain Paris, Yağız Aksoy. Main pdf, Supplementary pdf, Project Page. Github repo.

video

Citation

This implementation is provided for academic use only. Please cite our paper if you use this code or any of the models.

@INPROCEEDINGS{Miangoleh2021Boosting,
author={S. Mahdi H. Miangoleh and Sebastian Dille and Long Mai and Sylvain Paris and Ya\u{g}{\i}z Aksoy},
title={Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging},
journal={Proc. CVPR},
year={2021},
}

Credits

The "Merge model" code skeleton (./pix2pix folder) was adapted from the [pytorch-CycleGAN-and-pix2pix][4] repository.
[1]: https://github.com/intel-isl/MiDaS/tree/v2
[2]: https://github.com/aim-uofa/AdelaiDepth/tree/main/LeReS
[3]: https://github.com/isl-org/DPT
[4]: https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix \

Owner
Computational Photography Lab @ SFU
Computational Photography Lab at Simon Fraser University, lead by @yaksoy
Computational Photography Lab @ SFU
Public implementation of "Learning from Suboptimal Demonstration via Self-Supervised Reward Regression" from CoRL'21

Self-Supervised Reward Regression (SSRR) Codebase for CoRL 2021 paper "Learning from Suboptimal Demonstration via Self-Supervised Reward Regression "

19 Dec 12, 2022
Code for weakly supervised segmentation of a single class

SingleClassRL Implementation of weak single object segmentation from paper "Regularized Loss for Weakly Supervised Single Class Semantic Segmentation"

16 Nov 14, 2022
Py4fi2nd - Jupyter Notebooks and code for Python for Finance (2nd ed., O'Reilly) by Yves Hilpisch.

Python for Finance (2nd ed., O'Reilly) This repository provides all Python codes and Jupyter Notebooks of the book Python for Finance -- Mastering Dat

Yves Hilpisch 1k Jan 05, 2023
A Fast Knowledge Distillation Framework for Visual Recognition

FKD: A Fast Knowledge Distillation Framework for Visual Recognition Official PyTorch implementation of paper A Fast Knowledge Distillation Framework f

Zhiqiang Shen 129 Dec 24, 2022
Codes of paper "Unseen Object Amodal Instance Segmentation via Hierarchical Occlusion Modeling"

Unseen Object Amodal Instance Segmentation (UOAIS) Seunghyeok Back, Joosoon Lee, Taewon Kim, Sangjun Noh, Raeyoung Kang, Seongho Bak, Kyoobin Lee This

GIST-AILAB 92 Dec 13, 2022
Voice control for Garry's Mod

WIP: Talonvoice GMod integrations Very work in progress voice control demo for Garry's Mod. HOWTO Install https://talonvoice.com/ Press https://i.imgu

Meta Construct 5 Nov 15, 2022
Linear image-to-image translation

Linear (Un)supervised Image-to-Image Translation Examples for linear orthogonal transformations in PCA domain, learned without pairing supervision. Tr

Eitan Richardson 40 Aug 31, 2022
An implementation of the "Attention is all you need" paper without extra bells and whistles, or difficult syntax

Simple Transformer An implementation of the "Attention is all you need" paper without extra bells and whistles, or difficult syntax. Note: The only ex

29 Jun 16, 2022
A Haskell kernel for IPython.

IHaskell You can now try IHaskell directly in your browser at CoCalc or mybinder.org. Alternatively, watch a talk and demo showing off IHaskell featur

Andrew Gibiansky 2.4k Dec 29, 2022
Tensorflow Implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE

SMU A Tensorflow Implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE arXiv https://arxiv.org/abs/211

Fuhang 5 Jan 18, 2022
OMNIVORE is a single vision model for many different visual modalities

Omnivore: A Single Model for Many Visual Modalities [paper][website] OMNIVORE is a single vision model for many different visual modalities. It learns

Meta Research 451 Dec 27, 2022
Code and experiments for "Deep Neural Networks for Rank Consistent Ordinal Regression based on Conditional Probabilities"

corn-ordinal-neuralnet This repository contains the orginal model code and experiment logs for the paper "Deep Neural Networks for Rank Consistent Ord

Raschka Research Group 14 Dec 27, 2022
Repo for our ICML21 paper Unsupervised Learning of Visual 3D Keypoints for Control

Unsupervised Learning of Visual 3D Keypoints for Control [Project Website] [Paper] Boyuan Chen1, Pieter Abbeel1, Deepak Pathak2 1UC Berkeley 2Carnegie

Boyuan Chen 34 Jul 22, 2022
Code for the paper "Offline Reinforcement Learning as One Big Sequence Modeling Problem"

Trajectory Transformer Code release for Offline Reinforcement Learning as One Big Sequence Modeling Problem. Installation All python dependencies are

Michael Janner 266 Dec 27, 2022
Adversarial Color Enhancement: Generating Unrestricted Adversarial Images by Optimizing a Color Filter

ACE Please find the preliminary version published at BMVC 2020 in the folder BMVC_version, and its extended journal version in Journal_version. Datase

28 Dec 25, 2022
This project implements "virtual speed" from heart rate monito

ANT+ Virtual Stride Based Speed and Distance Monitor Overview This project imple

2 May 20, 2022
Adversarial Texture Optimization from RGB-D Scans (CVPR 2020).

AdversarialTexture Adversarial Texture Optimization from RGB-D Scans (CVPR 2020). Scanning Data Download Please refer to data directory for details. B

Jingwei Huang 153 Nov 28, 2022
SingleVC performs any-to-one VC, which is an important component of MediumVC project.

SingleVC performs any-to-one VC, which is an important component of MediumVC project. Here is the official implementation of the paper, MediumVC.

谷下雨 26 Dec 28, 2022
This is a model made out of Neural Network specifically a Convolutional Neural Network model

This is a model made out of Neural Network specifically a Convolutional Neural Network model. This was done with a pre-built dataset from the tensorflow and keras packages. There are other alternativ

9 Oct 18, 2022
🔥 TensorFlow Code for technical report: "YOLOv3: An Incremental Improvement"

🆕 Are you looking for a new YOLOv3 implemented by TF2.0 ? If you hate the fucking tensorflow1.x very much, no worries! I have implemented a new YOLOv

3.6k Dec 26, 2022