A 2D Visual Localization Framework based on Essential Matrices [ICRA2020]

Last update: Nov 07, 2022

Related tags

Overview

A 2D Visual Localization Framework based on Essential Matrices

This repository provides implementation of our paper accepted at ICRA: To Learn or Not to Learn: Visual Localization from Essential Matrices

To use our code, first download the repository:

git clone [email protected]:GrumpyZhou/visloc-relapose.git

Setup Running Environment

We have tested the code on Linux Ubuntu 16.04.6 under following environments:

Python 3.6 / 3.7
Pytorch 0.4.0 / 1.0 / 1.1 
CUDA 8.0 + CUDNN 8.0v5.1
CUDA 10.0 + CUDNN 10.0v7.5.1.10

The setting we used in the paper is:
Python 3.7 + Pytorch 1.1 + CUDA 10.0 + CUDNN 10.0v7.5.1.10

We recommend to use Anaconda to manage packages. Run following lines to automatically setup a ready environment for our code.

conda env create -f environment.yml  # Notice this one installs latest pytorch version.
conda activte relapose

Otherwise, one can try to download all required packages separately according to their offical documentation.

Prepare Datasets

Our code is flexible for evaluation on various localization datasets. We use Cambridge Landmarks dataset as an example to show how to prepare a dataset:

Create data/ folder
Download original Cambridge Landmarks Dataset and extract it to $CAMBRIDGE_DIR$.

Construct the following folder structure in order to conveniently run all scripts in this repo:

cd visloc-relapose/
mkdir data
mkdir data/datasets_original
cd data/original_datasets
ln -s $CAMBRIDGE_DIR$ CambridgeLandmarks

Download our pairs for training, validation and testing. About the format of our pairs, check readme.
Place the pairs to corresponding folder under data/datasets_original/CambridgeLandmarks.

Pre-save resized 480 images to speed up data loading time for regression models (Optional, but Recommended)

cd visloc-relapose/
python -m utils.datasets.resize_dataset \
	--base_dir data/datasets_original/CambridgeLandmarks \ 
	--save_dir=data/datasets_480/CambridgeLandmarks \
	--resize 480  --copy_txt True

Test your setup by visualizing the data using notebooks/data_loading.ipynb.

7Scenes Datasets

We follow the camera pose label convention of Cambridge Landmarks dataset. Similarly, you can download our pairs for 7Scenes. For other datasets, contact me for information about preprocessing and pair generation.

Feature-based: SIFT + 5-Point Solver

We use the SIFT feature extractor and feature matcher in colmap. One can follow the installation guide to install colmap. We save colmap outputs in database format, see explanation.

Preparing SIFT features

Execute following commands to run SIFT extraction and matching on CambridgeLandmarks:

cd visloc-relapose/
bash prepare_colmap_data.sh  CambridgeLandmarks

Here CambridgeLandmarks is the folder name that is consistent with the dataset folder. So you can also use other dataset names such as 7Scenes if you have prepared the dataset properly in advance.

Evaluate SIFT within our pipeline

Example to run sift+5pt on Cambridge Landmarks:

python -m pipeline.sift_5pt \
        --data_root 'data/datasets_original/' \
        --dataset 'CambridgeLandmarks' \
        --pair_txt 'test_pairs.5nn.300cm50m.vlad.minmax.txt' \
        --cv_ransac_thres 0.5\
        --loc_ransac_thres 5\
        -odir 'output/sift_5pt'\
        -log 'results.dvlad.minmax.txt'

More evaluation examples see: sift_5pt.sh. Check example outputs Visualize SIFT correspondences using notebooks/visualize_sift_matches.ipynb.

Learning-based: Direct Regression via EssNet

The pipeline.relapose_regressor module can be used for both training or testing our regression networks defined under networks/, e.g., EssNet, NCEssNet, RelaPoseNet... We provide training and testing examples in regression.sh. The module allows flexible variations of the setting. For more details about the module options, run python -m pipeline.relapose_regressor -h.

Training

Here we show an example how to train an EssNet model on ShopFacade scene.

python -m pipeline.relapose_regressor \
        --gpu 0 -b 16 --train -val 20 --epoch 200 \
        --data_root 'data/datasets_480' -ds 'CambridgeLandmarks' \
        --incl_sces 'ShopFacade' \
        -rs 480 --crop 448 --normalize \
        --ess_proj --network 'EssNet' --with_ess\
        --pair 'train_pairs.30nn.medium.txt' -vpair 'val_pairs.5nn.medium.txt' \
        -lr 0.0001 -wd 0.000001 \
        --odir  'output/regression_models/example' \
        -vp 9333 -vh 'localhost' -venv 'main' -vwin 'example.shopfacade'

This command produces outputs are available online here.

Visdom (optional)

As you see in the example above, we use Visdom server to visualize the training process. One can adapt the meters to plot inside utils/common/visdom.py. If you DON'T want to use visdom, just remove the last line -vp 9333 -vh 'localhost' -venv 'main' -vwin 'example.shopfacade'.

Trained models and weights

We release all trained models that are used in our paper. One can download them from pretrained regression models. We also provide some pretrained weights on MegaDepth/ScanNet.

Testing

Here is a piece of code to test the example model above.

python -m pipeline.relapose_regressor \
        --gpu 2 -b 16  --test \
        --data_root 'data/datasets_480' -ds 'CambridgeLandmarks' \
        --incl_sces 'ShopFacade' \
        -rs 480 --crop 448 --normalize\
        --ess_proj --network 'EssNet'\
        --pair 'test_pairs.5nn.300cm50m.vlad.minmax.txt'\
        --resume 'output/regression_models/example/ckpt/checkpoint_140_0.36m_1.97deg.pth' \
        --odir 'output/regression_models/example'

This testing code outputs are shown in test_results.txt. For convenience, we also provide notebooks/eval_regression_models.ipynb to perform evaluation.

Hybrid: Learnable Matching + 5-Point Solver

In this method, the code of the NCNet is taken from the original implementation https://github.com/ignacio-rocco/ncnet. We use their pre-trained model but we only use the weights for neighbourhood consensus(NC-Matching), i.e., the 4d-conv layer weights. For convenience, you can download our parsed version nc_ivd_5ep.pth. The models for feature extractor initialization needs to be downloaded from pretrained regression models in advance, if you want to test them.

Testing example for NC-EssNet(7S)+NCM+5Pt (Paper.Tab2)

In this example, we use NCEssNet trained on 7Scenes for 60 epochs to extract features and use the pre-trained NC Matching layer to get the point matches. Finally the 5 point solver calculates the essential matrix. The model is evaluated on CambridgeLandmarks.

# 
python -m pipeline.ncmatch_5pt \
    --data_root 'data/datasets_original' \
    --dataset 'CambridgeLandmarks' \
    --pair_txt 'test_pairs.5nn.300cm50m.vlad.minmax.txt' \
    --cv_ransac_thres 4.0\
    --loc_ransac_thres 15\
    --feat 'output/regression_models/448_normalize/nc-essnet/7scenes/checkpoint_60_0.04m_1.62deg.pth'\
    --ncn 'output/pretrained_weights/nc_ivd_5ep.pth' \    
    --posfix 'essncn_7sc_60ep+ncn'\
    --match_save_root 'output/ncmatch_5pt/saved_matches'\
    --ncn_thres 0.9 \
    --gpu 2\
    -o 'output/ncmatch_5pt/loc_results/Cambridge/essncn_7sc_60ep+ncn.txt'

Example outputs is available in essncn_7sc_60ep+ncn.txt. If you don't want to save THE intermediate matches extracted, remove THE option --match_save_root.

A 2D Visual Localization Framework based on Essential Matrices [ICRA2020]

Related tags

Overview

A 2D Visual Localization Framework based on Essential Matrices

Setup Running Environment

Prepare Datasets

7Scenes Datasets

Feature-based: SIFT + 5-Point Solver

Preparing SIFT features

Evaluate SIFT within our pipeline

Learning-based: Direct Regression via EssNet

Training

Visdom (optional)

Trained models and weights

Testing

Hybrid: Learnable Matching + 5-Point Solver

Testing example for NC-EssNet(7S)+NCM+5Pt (Paper.Tab2)

Owner

Qunjie Zhou

Training Structured Neural Networks Through Manifold Identification and Variance Reduction

Speedy Implementation of Instance-based Learning (IBL) agents in Python

Tools for investing in Python

RM Operation can equivalently convert ResNet to VGG, which is better for pruning; and can help RepVGG perform better when the depth is large.

Monocular 3D Object Detection: An Extrinsic Parameter Free Approach (CVPR2021)

Cognition-aware Cognate Detection

Remote sensing change detection using PaddlePaddle

DyStyle: Dynamic Neural Network for Multi-Attribute-Conditioned Style Editing

Differential fuzzing for the masses!

GPU Programming with Julia - course at the Swiss National Supercomputing Centre (CSCS), ETH Zurich

Transformer Huffman coding - Complete Huffman coding through transformer

Time-series-deep-learning - Developing Deep learning LSTM, BiLSTM models, and NeuralProphet for multi-step time-series forecasting of stock price.

This repository provides a basic implementation of our GCPR 2021 paper "Learning Conditional Invariance through Cycle Consistency"

Code for Contrastive-Geometry Networks for Generalized 3D Pose Transfer

Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

Supplementary code for the AISTATS 2021 paper "Matern Gaussian Processes on Graphs".

Simple cross-platform application for DaVinci surgical video frame annotation

YOLTv4 builds upon YOLT and SIMRDWN, and updates these frameworks to use the most performant version of YOLO, YOLOv4

SpeechNAS Better Trade off between Latency and Accuracy for Large Scale Speaker Verification

The code for our NeurIPS 2021 paper "Kernelized Heterogeneous Risk Minimization".