CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering" official PyTorch implementation.

Overview

LED2-Net

This is PyTorch implementation of our CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering".

You can visit our project website and upload your own panorama to see the 3D results!

[Project Website] [Paper (arXiv)]

Prerequisite

This repo is primarily based on PyTorch. You can use the follwoing command to intall the dependencies.

pip install -r requirements.txt

Preparing Training Data

Under LED2Net/Dataset, we provide the dataloader of Matterport3D and Realtor360. The annotation formats of the two datasets follows PanoAnnotator. The detailed description of the format is explained in LayoutMP3D.

Under config/, config_mp3d.yaml and config_realtor360.yaml are the configuration file for Matterport3D and Realtor360.

Matterport3D

To train/val on Matterport3D, please modify the two items in config_mp3d.yaml.

dataset_image_path: &dataset_image_path '/path/to/image/location'
dataset_label_path: &dataset_label_path '/path/to/label/location'

The dataset_image_path and dataset_label_path follow the folder structure:

  dataset_image_path/
  |-------17DRP5sb8fy/
          |-------00ebbf3782c64d74aaf7dd39cd561175/
                  |-------color.jpg
          |-------352a92fb1f6d4b71b3aafcc74e196234/
                  |-------color.jpg
          .
          .
  |-------gTV8FGcVJC9/
          .
          .
  dataset_label_path/
  |-------mp3d_train.txt
  |-------mp3d_val.txt
  |-------mp3d_test.txt
  |-------label/
          |-------Z6MFQCViBuw_543e6efcc1e24215b18c4060255a9719_label.json
          |-------yqstnuAEVhm_f2eeae1a36f14f6cb7b934efd9becb4d_label.json
          .
          .
          .

Then run main.py and specify the config file path

python main.py --config config/config_mp3d.yaml --mode train # For training
python main.py --config config/config_mp3d.yaml --mode val # For testing

Realtor360

To train/val on Realtor360, please modify the item in config_realtor360.yaml.

dataset_path: &dataset_path '/path/to/dataset/location'

The dataset_path follows the folder structure:

  dataset_path/
  |-------train.txt
  |-------val.txt
  |-------sun360/
          |-------pano_ajxqvkaaokwnzs/
                  |-------color.png
                  |-------label.json
          .
          .
  |-------istg/
          |-------1/
                  |-------1/
                          |-------color.png
                          |-------label.json
                  |-------2/
                          |-------color.png
                          |-------label.json
                  .
                  .
          .
          .
          
  

Then run main.py and specify the config file path

python main.py --config config/config_realtor360.yaml --mode train # For training
python main.py --config config/config_realtor360.yaml --mode val # For testing

Run Inference

After finishing the training, you can use the following command to run inference on your own data (xxx.jpg or xxx.png).

python run_inference.py --config YOUR_CONFIG --src SRC_FOLDER/ --dst DST_FOLDER --ckpt XXXXX.pkl

This script will predict the layouts of all images (jpg or png) under SRC_FOLDER/ and store the results as json files under DST_FOLDER/.

Pretrained Weights

We provide the pretrained model of Realtor360 in this link.

Currently, we use DuLa-Net's post processing for inference. We will release the version using HorizonNet's post processing later.

Layout Visualization

To visualize the 3D layout, we provide the visualization tool in 360LayoutVisualizer. Please clone it and install the corresponding packages. Then, run the following command

cd 360LayoutVisualizer/
python visualizer.py --img xxxxxx.jpg --json xxxxxx.json

Citation

@misc{wang2021led2net,
      title={LED2-Net: Monocular 360 Layout Estimation via Differentiable Depth Rendering}, 
      author={Fu-En Wang and Yu-Hsuan Yeh and Min Sun and Wei-Chen Chiu and Yi-Hsuan Tsai},
      year={2021},
      eprint={2104.00568},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Owner
Fu-En Wang
Hi, I am a member of VSLAB in National Tsing Hua University. You can check my personal website for more research projects (https://fuenwang.ml/).
Fu-En Wang
ocroseg - This is a deep learning model for page layout analysis / segmentation.

ocroseg This is a deep learning model for page layout analysis / segmentation. There are many different ways in which you can train and run it, but by

NVIDIA Research Projects 71 Dec 06, 2022
A collection of resources (including the papers and datasets) of OCR (Optical Character Recognition).

OCR Resources This repository contains a collection of resources (including the papers and datasets) of OCR (Optical Character Recognition). Contents

Zuming Huang 363 Jan 03, 2023
How to detect objects in real time by using Jupyter Notebook and Neural Networks , by using Yolo3

Real Time Object Recognition From your Screen Desktop . In this post, I will explain how to build a simply program to detect objects from you desktop

Ruslan Magana Vsevolodovna 2 Sep 28, 2022
Discord QR Scam Code Generator + Token grab mobile device.

A Python script that automatically generates a Nitro scam QR code and grabs the Discord token when scanned.

Visual 9 Nov 22, 2022
ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data

VistaOCR ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data Publications "How to Efficiently Increase Resolutio

ISI Center for Vision, Image, Speech, and Text Analytics 21 Dec 08, 2021
Memory tests solver with using OpenCV

Human Benchmark project This project is OpenCV based programs which are puzzle solvers for 7 different games for https://humanbenchmark.com/. made as

Bahadır Araz 24 Dec 27, 2022
Pre-Recognize Library - library with algorithms for improving OCR quality.

PRLib - Pre-Recognition Library. The main aim of the library - prepare image for recogntion. Image processing can really help to improve recognition q

Alex 80 Dec 30, 2022
Single Shot Text Detector with Regional Attention

Single Shot Text Detector with Regional Attention Introduction SSTD is initially described in our ICCV 2017 spotlight paper. A third-party implementat

Pan He 215 Dec 07, 2022
Awesome anomaly detection in medical images

A curated list of awesome anomaly detection works in medical imaging, inspired by the other awesome-* initiatives.

Kang Zhou 57 Dec 19, 2022
A Joint Video and Image Encoder for End-to-End Retrieval

Frozen️ in Time ❄️ ️️️️ ⏳ A Joint Video and Image Encoder for End-to-End Retrieval (arXiv) Repository to contain the code, models, data for end-to-end

225 Dec 25, 2022
A facial recognition program that plays a alarm (mp3 file) when a person i seen in the room. A basic theif using Python and OpenCV

Home-Security-Demo A facial recognition program that plays a alarm (mp3 file) when a person is seen in the room. A basic theif using Python and OpenCV

SysKey 4 Nov 02, 2021
Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

SA-AutoAug Scale-aware Automatic Augmentation for Object Detection Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei Li, Jiaya Jia [Paper] [Bi

Jia Research Lab 182 Dec 29, 2022
EAST for ICPR MTWI 2018 Challenge II (Text detection of network images)

EAST_ICPR2018: EAST for ICPR MTWI 2018 Challenge II (Text detection of network images) Introduction This is a repository forked from argman/EAST for t

QichaoWu 49 Dec 24, 2022
A simple component to display annotated text in Streamlit apps.

Annotated Text Component for Streamlit A simple component to display annotated text in Streamlit apps. For example: Installation First install Streaml

Thiago Teixeira 312 Dec 30, 2022
WACV 2022 Paper - Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching Code based on our WACV 2022 Accepted Paper: https://arxiv.org/pdf/

Andres 13 Dec 17, 2022
In this project we will be using the live feed coming from the webcam to create a virtual mouse with complete functionalities.

Virtual Mouse Using OpenCV In this project we will be using the live feed coming from the webcam to create a virtual mouse using hand tracking. Projec

Hassan Shahzad 8 Dec 20, 2022
CTPN + DenseNet + CTC based end-to-end Chinese OCR implemented using tensorflow and keras

简介 基于Tensorflow和Keras实现端到端的不定长中文字符检测和识别 文本检测:CTPN 文本识别:DenseNet + CTC 环境部署 sh setup.sh 注:CPU环境执行前需注释掉for gpu部分,并解开for cpu部分的注释 Demo 将测试图片放入test_images

Yang Chenguang 2.6k Dec 29, 2022
"Very simple but works well" Computer Vision based ID verification solution provided by LibraX.

ID Verification by LibraX.ai This is the first free Identity verification in the market. LibraX.ai is an identity verification platform for developers

LibraX.ai 46 Dec 06, 2022
Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation, CVPR 2020 (Oral)

SEAM The implementation of Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentaion. You can also download the repos

Hibercraft 459 Dec 26, 2022
Layout Analysis Evaluator for the ICDAR 2017 competition on Layout Analysis for Challenging Medieval Manuscripts

LayoutAnalysisEvaluator Layout Analysis Evaluator for: ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records ICD

17 Dec 08, 2022