ScaleNet: A Shallow Architecture for Scale Estimation

Last update: Nov 09, 2022

Related tags

Overview

ScaleNet: A Shallow Architecture for Scale Estimation

Repository for the code of ScaleNet paper:

"ScaleNet: A Shallow Architecture for Scale Estimation".
Axel Barroso-Laguna, Yurun Tian, and Krystian Mikolajczyk. arxiv 2021.

[Paper on arxiv]

Prerequisite

Python 3.7 is required for running and training ScaleNet code. Use Conda to install the dependencies:

conda create --name scalenet_env
conda activate scalenet_env 
conda install pytorch==1.2.0 -c pytorch
conda install -c conda-forge tensorboardx opencv tqdm 
conda install -c anaconda pandas 
conda install -c pytorch torchvision

Scale estimation

run_scalenet.py can be used to estimate the scale factor between two input images. We provide as an example two images, im1.jpg and im2.jpg, within the assets/im_test folder as an example. For a quick test, please run:

python run_scalenet.py --im1_path assets/im_test/im1.jpg --im2_path assets/im_test/im2.jpg

Arguments:

im1_path: Path to image A.
im2_path: Path to image B.

It returns the scale factor A->B.

Training ScaleNet

We provide a list of Megadepth image pairs and scale factors in the assets folder. We use the undistorted images, corresponding camera intrinsics, and extrinsics preprocessed by D2-Net. You can download them directly from their main repository. If you desire to use the default configuration for training, just run the following line:

python train_ScaleNet.py --image_data_path /path/to/megadepth_d2net

There are though some important arguments to take into account when training ScaleNet.

Arguments:

image_data_path: Path to the undistorted Megadepth images from D2-Net.
save_processed_im: ScaleNet processes the images so that they are center-cropped and resized to a default resolution. We give the option to store the processed images and load them during training, which results in a much faster training. However, the size of the files can be big, and hence, we suggest storing them in a large storage disk. Default: True.
root_precomputed_files: Path to save the processed image pairs.

If you desire to modify ScaleNet training or architecture, look for all the arguments in the train_ScaleNet.py script.

Test ScaleNet - camera pose

In addition to the training, we also provide a template for testing ScaleNet in the camera pose task. In assets/data/test.csv, you can find the test Megadepth pairs, along with their scale change as well as their camera poses.

Run the following command to test ScaleNet + SIFT in our custom camera pose split:

python test_camera_pose.py --image_data_path /path/to/megadepth_d2net

camera_pose.py script is intended to provide a structure of our camera pose experiment. You can change either the local feature extractor or the scale estimator and obtain your camera pose results.

BibTeX

If you use this code or the provided training/testing pairs in your research, please cite our paper:

@InProceedings{Barroso-Laguna2021_scale,
    author = {Barroso-Laguna, Axel and Tian, Yurun and Mikolajczyk, Krystian},
    title = {{ScaleNet: A Shallow Architecture for Scale Estimation}},
    booktitle = {Arxiv: },
    year = {2021},
}

ScaleNet: A Shallow Architecture for Scale Estimation

Related tags

Overview

ScaleNet: A Shallow Architecture for Scale Estimation

Prerequisite

Scale estimation

Training ScaleNet

Test ScaleNet - camera pose

BibTeX

Owner

Axel Barroso

Deep Markov Factor Analysis (NeurIPS2021)

Dados coletados e programas desenvolvidos no processo de iniciação científica

Repo for paper "Dynamic Placement of Rapidly Deployable Mobile Sensor Robots Using Machine Learning and Expected Value of Information"

A embed able annotation tool for end to end cross document co-reference

State of the Art Neural Networks for Deep Learning

2021搜狐校园文本匹配算法大赛分比我们低的都是帅哥队

The code is an implementation of Feedback Convolutional Neural Network for Visual Localization and Segmentation.

A library for answering questions using data you cannot see

A PyTorch implementation of SlowFast based on ICCV 2019 paper "SlowFast Networks for Video Recognition"

PyTorch implementation of SimSiam: Exploring Simple Siamese Representation Learning

LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation

Sign Language Transformers (CVPR'20)

Attentive Implicit Representation Networks (AIR-Nets)

Cl datasets - PyTorch image dataloaders and utility functions to load datasets for supervised continual learning

PyTorch implementation of the paper Dynamic Token Normalization Improves Vision Transfromers.

QAHOI: Query-Based Anchors for Human-Object Interaction Detection (paper)

An original implementation of "MetaICL Learning to Learn In Context" by Sewon Min, Mike Lewis, Luke Zettlemoyer and Hannaneh Hajishirzi

A unified 3D Transformer Pipeline for visual synthesis

This's an implementation of deepmind Visual Interaction Networks paper using pytorch

Code to reproduce the results in "Visually Grounded Reasoning across Languages and Cultures", EMNLP 2021.

ScaleNet: A Shallow Architecture for Scale Estimation

Related tags

Overview

ScaleNet: A Shallow Architecture for Scale Estimation

Prerequisite

Scale estimation

Training ScaleNet

Test ScaleNet - camera pose

BibTeX

Owner

Axel Barroso

Deep Markov Factor Analysis (NeurIPS2021)

Dados coletados e programas desenvolvidos no processo de iniciação científica

Repo for paper "Dynamic Placement of Rapidly Deployable Mobile Sensor Robots Using Machine Learning and Expected Value of Information"

A embed able annotation tool for end to end cross document co-reference

State of the Art Neural Networks for Deep Learning

2021搜狐校园文本匹配算法大赛 分比我们低的都是帅哥队

The code is an implementation of Feedback Convolutional Neural Network for Visual Localization and Segmentation.

A library for answering questions using data you cannot see

A PyTorch implementation of SlowFast based on ICCV 2019 paper "SlowFast Networks for Video Recognition"

PyTorch implementation of SimSiam: Exploring Simple Siamese Representation Learning

LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation

Sign Language Transformers (CVPR'20)

Attentive Implicit Representation Networks (AIR-Nets)

Cl datasets - PyTorch image dataloaders and utility functions to load datasets for supervised continual learning

PyTorch implementation of the paper Dynamic Token Normalization Improves Vision Transfromers.

QAHOI: Query-Based Anchors for Human-Object Interaction Detection (paper)

An original implementation of "MetaICL Learning to Learn In Context" by Sewon Min, Mike Lewis, Luke Zettlemoyer and Hannaneh Hajishirzi

A unified 3D Transformer Pipeline for visual synthesis

This's an implementation of deepmind Visual Interaction Networks paper using pytorch

Code to reproduce the results in "Visually Grounded Reasoning across Languages and Cultures", EMNLP 2021.

2021搜狐校园文本匹配算法大赛分比我们低的都是帅哥队