Dense matching library based on PyTorch

Overview

Dense Matching

A general dense matching library based on PyTorch.

For any questions, issues or recommendations, please contact Prune at [email protected]


Highlights

Libraries for implementing, training and evaluating dense matching networks. It includes

  • Common dense matching validation datasets for geometric matching (MegaDepth, RobotCar, ETH3D, HPatches), optical flow (KITTI, Sintel) and semantic matching (TSS, PF-Pascal, PF-Willow, Spair).

  • Scripts to analyse network performance and obtain standard performance scores for matching and pose estimation.

  • General building blocks, including deep networks, optimization, feature extraction and utilities.

  • General training framework for training dense matching networks with

    • Common training datasets for matching networks.
    • Functions to generate random image pairs and their corresponding ground-truth flow, as well as to add moving objects and modify the flow accordingly.
    • Functions for data sampling, processing etc.
    • And much more...
  • Official implementation of GLU-Net (CVPR 2021), GLU-Net-GOCor (NeurIPS 2020), PWC-Net-GOCor (NeurIPS 2021), PDC-Net (CVPR 2021), including trained models and respective results.


Dense Matching Networks

The repo contains the implementation of the following matching models. We provide pre-trained model weights, data preparation, evaluation commands, and results for each dataset and method.

PDC-Net: Learning Accurate Correspondences and When to Trust Them. (CVPR 2021 - ORAL)

Authors: Prune Truong, Martin Danelljan, Luc Van Gool, Radu Timofte

[Paper] [Website] [Poster] [Slides] [Video]

alt text

Dense flow estimation is often inaccurate in the case of large displacements or homogeneous regions. For most applications and down-stream tasks, such as pose estimation, image manipulation, or 3D reconstruction, it is crucial to know when and where to trust the estimated matches. In this work, we aim to estimate a dense flow field relating two images, coupled with a robust pixel-wise confidence map indicating the reliability and accuracy of the prediction. We develop a flexible probabilistic approach that jointly learns the flow prediction and its uncertainty. In particular, we parametrize the predictive distribution as a constrained mixture model, ensuring better modelling of both accurate flow predictions and outliers. Moreover, we develop an architecture and training strategy tailored for robust and generalizable uncertainty prediction in the context of self-supervised training.

GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network. (NeurIPS 2020)

Authors: Prune Truong *, Martin Danelljan *, Luc Van Gool, Radu Timofte

[Paper] [Website] [Video]

The feature correlation layer serves as a key neural network module in numerous computer vision problems that involve dense correspondences between image pairs. It predicts a correspondence volume by evaluating dense scalar products between feature vectors extracted from pairs of locations in two images. However, this point-to-point feature comparison is insufficient when disambiguating multiple similar regions in an image, severely affecting the performance of the end task. This work proposes GOCor, a fully differentiable dense matching module, acting as a direct replacement to the feature correlation layer. The correspondence volume generated by our module is the result of an internal optimization procedure that explicitly accounts for similar regions in the scene. Moreover, our approach is capable of effectively learning spatial matching priors to resolve further matching ambiguities.

alt text

GLU-Net: Global-Local Universal Network for dense flow and correspondences (CVPR 2020 - ORAL).

Authors: Prune Truong, Martin Danelljan and Radu Timofte
[Paper] [Website] [Poster] [Oral Video] [Teaser Video]

alt text



Pre-trained weights

Model Pre-trained model type Description Link
PDCNet megadepth model
GLUNet_GOCor_star megadepth corresponds to GLU-Net-GOCor* in PDCNet model
GLUNet_GOCor dynamic model
GLUNet_GOCor static model
PWCNet_GOCor chairs_things_ft_sintel model
PWCNet_GOCor chairs_things model
GLUNet dynamic model
GLUNet static (CityScape-DPED-ADE) model

To download all of them, run the command bash assets/download_pre_trained_models.sh.

All networks are created in 'model_selection.py'. Weights should be put in pre_trained_models/


Table of Content

  1. Installation
  2. Test on your own image pairs!
  3. Overview
  4. Benchmarks and results
    1. Correspondence evaluation
      1. MegaDepth
      2. RobotCar
      3. ETH3D
      4. HPatches
      5. KITTI
      6. Sintel
      7. TSS
      8. PF-Pascal
      9. PF-Willow
      10. Spair-71k
    2. Pose estimation
      1. YFCC100M
      2. ScanNet
  5. Training
  6. Acknowledgement
  7. Changelog

1. Installation

Inference runs for torch version >= 1.0

  • Create and activate conda environment with Python 3.x
conda create -n dense_matching_env python=3.7
conda activate dense_matching_env
  • Install all dependencies (except for cupy, see below) by running the following command:
pip install numpy opencv-python torch torchvision matplotlib imageio jpeg4py scipy pandas tqdm gdown pycocotools

Note: CUDA is required to run the code. Indeed, the correlation layer is implemented in CUDA using CuPy, which is why CuPy is a required dependency. It can be installed using pip install cupy or alternatively using one of the provided binary packages as outlined in the CuPy repository. The code was developed using Python 3.7 & PyTorch 1.0 & CUDA 9.0, which is why I installed cupy for cuda90. For another CUDA version, change accordingly.

pip install cupy-cuda90==7.8.0 --no-cache-dir 

There are some issues with latest versions of cupy. So for all cuda, install cupy version 7.8.0. For example, on cuda10,

pip install cupy-cuda100==7.8.0 --no-cache-dir 
  • This repo includes GOCor as git submodule. You need to pull submodules with
git submodule update --init --recursive
git submodule update --recursive --remote
  • Create admin/local.py by running the following command and update the paths to the dataset. We provide an example admin/local_example.py where all datasets are stored in data/.
python -c "from admin.environment import create_default_local_file; create_default_local_file()"
  • Download pre-trained model weights with the command bash assets/download_pre_trained_models.sh.

2. Test on your own image pairs!

Possible model choices are : PDCNet, GLUNet_GOCor_star, GLUNet, GLUNet_GOCor, PWCNet, PWCNet_GOCor

Possible pre-trained model choices are: static, dynamic, chairs_things, chairs_things_ft_sintel, megadepth


Note on PDCNet inference options

PDC-Net has multiple inference alternative options. if model is PDC-Net, add options:

  • --confidence_map_R, for computation of the confidence map p_r, default is 1.0
  • --multi_stage_type in
    • 'direct' (D)
    • 'homography_from_quarter_resolution_uncertainty' (H)
    • 'multiscale_homo_from_quarter_resolution_uncertainty' (MS)
  • --ransac_thresh, used for homography and multiscale multi-stages type, default is 1.0
  • --mask_type, for thresholding the estimated confidence map and using the confident matches for internal homography estimation, for homography and multiscale multi-stage types, default is proba_interval_1_above_5
  • --homography_visibility_mask, default is True
  • --scaling_factors', used for multi-scale, default are [0.5, 0.6, 0.88, 1, 1.33, 1.66, 2]

Use direct (D) when image pairs only show limited view-point changes (for example consecutive images of a video, like in the optical flow task). For larger view-point changes, use homography (H) or multi-scale (MS).

For example, to run PDC-Net with multi-scale, add at the end of the command

PDCNet --multi_stage_type multiscale_homo_from_quarter_resolution_uncertainty --mask_type proba_interval_1_above_10

Test on a specific image pair

You can test the networks on a pair of images using test_models.py and the provided trained model weights. You must first choose the model and pre-trained weights to use. The inputs are the paths to the query and reference images. The images are then passed to the network which outputs the corresponding flow field relating the reference to the query image. The query is then warped according to the estimated flow, and a figure is saved.


For this pair of MegaDepth images (provided to check that the code is working properly) and using PDCNet (MS) trained on the megadepth dataset, the output is:

python test_models.py --model PDCNet --pre_trained_model megadepth --path_query_image images/piazza_san_marco_0.jpg --path_reference_image images/piazza_san_marco_1.jpg --write_dir evaluation/ PDCNet --multi_stage_type multiscale_homo_from_quarter_resolution_uncertainty --mask_type proba_interval_1_above_10

additional optional arguments: --pre_trained_models_dir (default is pre_trained_models/) alt text


Using GLU-Net-GOCor trained on the dynamic dataset, the output for this image pair of eth3d is:

python test_models.py --model GLUNet_GOCor --pre_trained_model dynamic --path_query_image images/eth3d_query.png --path_reference_image images/eth3d_reference.png --write_dir evaluation/

alt text


For baseline GLU-Net, the output is instead:

python test_models.py --model GLUNet --pre_trained_model dynamic --path_query_image images/eth3d_query.png --path_reference_image images/eth3d_reference.png --write_dir evaluation/

alt text


And for PWC-Net-GOCor and baseline PWC-Net:

python test_models.py --model PWCNet_GOCor --pre_trained_model chairs_things --path_query_image images/kitti2015_query.png --path_reference_image images/kitti2015_reference.png --write_dir evaluation/

alt text


python test_models.py --model PWCNet --pre_trained_model chairs_things --path_query_image images/kitti2015_query.png --path_reference_image images/kitti2015_reference.png --write_dir evaluation/

alt text


Demo with videos

TO COME

3. Overview

The framework consists of the following sub-modules.

  • training:
    • actors: Contains the actor classes for different trainings. The actor class is responsible for passing the input data through the network and calculating losses. Here are also pre-processing classes, that process batch tensor inputs to the desired inputs needed for training the network.
    • trainers: The main class which runs the training.
    • losses: Contain the loss classes
  • train_settings: Contains settings files, specifying the training of a network.
  • admin: Includes functions for loading networks, tensorboard etc. and also contains environment settings.
  • datasets: Contains integration of a number of datasets. Additionally, it includes modules to generate synthetic image pairs and their corresponding ground-truth flow as well as to add independently moving objects and modify the flow accordingly.
  • utils_data: Contains functions for processing data, e.g. loading images, data augmentations, sampling frames.
  • utils_flow: Contains functions for working with flow fields, e.g. converting to mapping, warping an array according to a flow, as well as visualization tools.
  • third_party: External libraries needed for training. Added as submodules.
  • models: Contains different layers and network definitions.
  • validation: Contains functions to evaluate and analyze the performance of the networks in terms of predicted flow and uncertainty.

4. Benchmark and results

All paths to the datasets must be provided in file admin/local.py. We provide an example admin/local_example.py where all datasets are stored in data/. You need to update the paths of admin/local.py before running the evaluation.

Note on PDCNet inference options

PDC-Net has multiple inference alternative options. if model if PDC-Net, add options:

  • --confidence_map_R, for computation of the confidence map p_r, default is 1.0
  • --multi_stage_type in
    • 'direct' (D)
    • 'homography_from_quarter_resolution_uncertainty' (H)
    • 'multiscale_homo_from_quarter_resolution_uncertainty' (MS)
  • --ransac_thresh, used for homography and multiscale multi-stages type, default is 1.0
  • --mask_type, for thresholding the estimated confidence map and using the confident matches for internal homography estimation, for homography and multiscale multi-stage types, default is proba_interval_1_above_5
  • --homography_visibility_mask, default is True
  • --scaling_factors', used for multi-scale, default are [0.5, 0.6, 0.88, 1, 1.33, 1.66, 2]

For example, to run PDC-Net with multi-scale, add at the end of the command

PDCNet --multi_stage_type multiscale_homo_from_quarter_resolution_uncertainty --mask_type proba_interval_1_above_10
Note on reproducibility

Results using PDC-Net with multi-stage (homography_from_quarter_resolution_uncertainty, H) or multi-scale (multiscale_homo_from_quarter_resolution_uncertainty, MS) employ RANSAC internally. Therefore results may vary a bit but should remain within 1-2 %. For pose estimation, we also compute the pose with RANSAC, which leads to some variability in the results.

4.1. Correspondence evaluation

Metrics are computed with,

python -u eval_matching.py --datasets dataset_name --model model_name --pre_trained_models pre_trained_model_name --optim_iter optim_step  --local_optim_iter local_optim_iter --save_dir path_to_save_dir --plot False 
MegaDepth

Data preparation: We use the test set provided in RANSAC-Flow. It is composed of 1600 pairs and also includes a csv file ('test1600Pairs.csv') containing the name of image pairs to evaluate and the corresponding ground-truth correspondences. Download everything with

bash assets/download_megadepth_test.sh

The resulting file structure is the following

megadepth_test_set/
└── MegaDepth/
    └── Test/
        └── test1600Pairs/  
        └── test1600Pairs.csv



Evaluation: After updating the path of 'megadepth' and 'megadepth_csv' in admin/local.py, evaluation is run with

python eval_matching.py --datasets megadepth --model PDCNet --pre_trained_models megadepth --optim_iter 3 --local_optim_iter 7 --save_dir path_to_save_dir PDCNet --multi_stage_type multiscale_homo_from_quarter_resolution_uncertainty

Similar results should be obtained:

Model Pre-trained model type PCK-1 (%) PCK-3 (%) PCK-5 (%)
GLU-Net (this repo) static (CityScape-DPED-ADE) 29.51 50.67 56.12
GLU-Net (this repo) dynamic 21.59 52.27 61.91
GLU-Net (paper) dynamic 21.58 52.18 61.78
GLU-Net-GOCor (this repo) static (CitySCape-DPED-ADE) 32.24 52.51 58.90
GLU-Net-GOCor (this repo) dynamic 37.23 61.25 68.17
GLU-Net-GOCor (paper) dynamic 37.28 61.18 68.08
---------------- ----------------------------- ------- ------- -------
GLU-Net-GOCor* (paper) megadepth 57.77 78.61 82.24
PDC-Net (D) (this repo) megadepth 68.97 84.03 85.68
PDC-Net (H) (paper) megadepth 70.75 86.51 88.00
PDC-Net (MS) (paper) megadepth 71.81 89.36 91.18
RobotCar

Data preparation: Images can be downloaded from the Visual Localization Challenge (at the bottom of the site), or more precisely here. The CSV file with the ground-truth correspondences can be downloaded from here. The file structure should be the following:

RobotCar
├── img/
└── test6511.csv



Evaluation: After updating the path of 'robotcar' and 'robotcar_csv' in admin/local.py, evaluation is run with

python eval_matching.py --datasets robotcar --model PDCNet --pre_trained_models megadepth --optim_iter 3 --local_optim_iter 7 --save_dir path_to_save_dir PDCNet --multi_stage_type multiscale_homo_from_quarter_resolution_uncertainty

Similar results should be obtained:

Model Pre-trained model type PCK-1 (%) PCK-3 (%) PCK-5 (%)
GLU-Net (paper) static (CityScape-DPED-ADE) 2.30 17.15 33.87
GLU-Net-GOCor (paper) static 2.31 17.62 35.18
GLU-Net-GOCor (paper) dynamic 2.10 16.07 31.66
---------------- ----------------------------- ------- ------- -------
GLU-Net-GOCor* (paper) megadepth 2.33 17.21 33.67
PDC-Net (H) (paper) megadepth 2.54 18.97 36.37
PDC-Net (MS) (paper) megadepth 2.58 18.87 36.19
ETH3D

Data preparation: execute 'bash assets/download_ETH3D.sh' from our GLU-Net repo. It does the following:

  • Create your root directory ETH3D/, create two sub-directories multiview_testing/ and multiview_training/
  • Download the "Low rew multi-view, training data, all distorted images" here and unzip them in multiview_training/
  • Download the "Low rew multi-view, testing data, all undistorted images" here and unzip them in multiview_testing/
  • We directly provide correspondences for pairs of images taken at different intervals. There is one bundle file for each dataset and each rate of interval, for example "lakeside_every_5_rate_of_3". This means that we sampled the source images every 5 images and the target image is taken at a particular rate from each source image. Download all these files here and unzip them.

As illustration, your root ETH3D directory should be organised as follows:

/ETH3D/
       multiview_testing/
                        lakeside/
                        sand_box/
                        storage_room/
                        storage_room_2/
                        tunnel/
       multiview_training/
                        delivery_area/
                        electro/
                        forest/
                        playground/
                        terrains/
        info_ETH3D_files/

The organisation of your directories is important, since the bundle files contain the relative paths to the images, from the ETH3D root folder.



Evaluation: for each interval rate (3,5,7,9,11,13,15), we compute the metrics for each of the sub-datasets (lakeside, delivery area and so on). The final metrics are the average over all datasets for each rate. After updating the path of 'eth3d' in admin/local.py, evaluation is run with

python eval_matching.py --datasets robotcar --model PDCNet --pre_trained_models megadepth --optim_iter 3 --local_optim_iter 7 --save_dir path_to_save_dir PDCNet --multi_stage_type direct

AEPE for different rates of intervals between image pairs.
Method Pre-trained model type rate=3 rate=5 rate=7 rate=9 rate=11 rate=13 rate=15
LiteFlowNet chairs-things 1.66 2.58 6.05 12.95 29.67 52.41 74.96
PWC-Net chairs-things 1.75 2.10 3.21 5.59 14.35 27.49 43.41
PWC-Net-GOCor chairs-things 1.70 1.98 2.58 4.22 10.32 21.07 38.12
--------------- ------------------------ -------- -------- -------- -------- --------- --------- ---------
DGC-Net 2.49 3.28 4.18 5.35 6.78 9.02 12.23
GLU-Net static 1.98 2.54 3.49 4.24 5.61 7.55 10.78
GLU-Net dynamic 2.01 2.46 2.98 3.51 4.30 6.11 9.08
GLU-Net-GOCor dynamic 1.93 2.28 2.64 3.01 3.62 4.79 7.80
--------------- ------------------------ -------- -------- -------- -------- --------- --------- ---------
GLU-Net-GOCor* megadepth 1.68 1.92 2.18 2.43 2.89 3.31 4.27
PDC-Net (D) (paper) megadepth 1.60 1.79 2.03 2.26 2.58 2.92 3.69
PDC-Net (H) megadepth 1.58 1.77 1.98 2.24 2.56 2.91 3.73
PDC-Net (MS) megadepth 1.60 1.79 2.00 2.26 2.57 2.90 3.56

PCK-1 for different rates of intervals between image pairs:

Note that the PCKs are computed per image, and then averaged per sequence. The final metrics is the average over all sequences. It corresponds to the results '_per_image' in the outputted metric file. Note that this is not the metrics used in the PDC-Net paper, where the PCKs are c omputed per sequence instead, using the PDC-Net direct approach (corresponds to results '_per_dataset' in outputted metric file).

Method Pre-trained model type rate=3 rate=5 rate=7 rate=9 rate=11 rate=13 rate=15
LiteFlowNet chairs-things 61.63 56.55 49.83 42.00 33.14 26.46 21.22
PWC-Net chairs-things 58.50 52.02 44.86 37.41 30.36 24.75 19.89
PWC-Net-GOCor chairs-things 58.93 53.10 46.91 40.93 34.58 29.25 24.59
--------------- ------------------------ -------- -------- -------- -------- --------- --------- ---------
DGC-Net
GLU-Net static 50.55 43.08 36.98 32.45 28.45 25.06 21.89
GLU-Net dynamic 46.27 39.28 34.05 30.11 26.69 23.73 20.85
GLU-Net-GOCor dynamic 47.97 41.79 36.81 33.03 29.80 26.93 23.99
--------------- ------------------------ -------- -------- -------- -------- --------- --------- ---------
GLU-Net-GOCor* megadepth 59.40 55.15 51.18 47.86 44.46 41.78 38.91
PDC-Net (D) megadepth 61.82 58.41 55.02 52.40 49.61 47.43 45.01
PDC-Net (H) megadepth 62.63 59.29 56.09 53.31 50.69 48.46 46.17
PDC-Net (MS) megadepth 62.29 59.14 55.87 53.23 50.59 48.45 46.17

PCK-5 for different rates of intervals between image pairs:

Method Pre-trained model type rate=3 rate=5 rate=7 rate=9 rate=11 rate=13 rate=15
LiteFlowNet chairs-things 92.79 90.70 86.29 78.50 66.07 55.05 46.29
PWC-Net chairs-things 92.64 90.82 87.32 81.80 72.95 64.07 55.47
PWC-Net-GOCor chairs-things 92.81 91.45 88.96 85.53 79.44 72.06 64.92
--------------- ------------------------ -------- -------- -------- -------- --------- --------- ---------
DGC-Net 88.50 83.25 78.32 73.74 69.23 64.28 58.66
GLU-Net static 91.22 87.91 84.23 80.74 76.84 72.35 67.77
GLU-Net dynamic 91.45 88.57 85.64 83.10 80.12 76.66 73.02
GLU-Net-GOCor dynamic 92.08 89.87 87.77 85.88 83.69 81.12 77.90
--------------- ------------------------ -------- -------- -------- -------- --------- --------- ---------
GLU-Net-GOCor* megadepth 93.03 92.13 91.04 90.19 88.98 87.81 85.93
PDC-Net (D) (paper) megadepth 93.47 92.72 91.84 91.15 90.23 89.45 88.10
PDC-Net (H) megadepth 93.50 92.71 91.93 91.16 90.35 89.52 88.32
PDC-Net (MS) megadepth 93.47 92.69 91.85 91.15 90.33 89.55 88.43
HPatches

Data preparation: Download the data with

bash assets/download_hpatches.sh

The corresponding csv files for each viewpoint ID with the path to the images and the homography parameters relating the pairs are listed in assets/.



Evaluation: After updating the path of 'hp' in admin/local.py, evaluation is run with

python eval_matching.py --datasets hp --model GLUNet_GOCor --pre_trained_models static --optim_iter 3 --local_optim_iter 7 --save_dir path_to_save_dir

Similar results should be obtained:

Pre-trained model type AEPE PCK-1 (%) PCK-3 (%) PCK-5 (%)
DGC-Net [Melekhov2019] 33.26 12.00 58.06
GLU-Net (this repo) static 25.05 39.57 71.45 78.60
GLU-Net (paper) static 25.05 39.55 - 78.54
GLU-Net-GOCor (this repo) static 20.16 41.49 74.12 81.46
GLU-Net-GOCor (paper) static 20.16 41.55 - 81.43
--------------- ------------------------ -------- -------- -------- --------
PDCNet (D) (this repo) megadepth 19.40 43.94 78.51 85.81
PDCNet (H) (this repo) megadepth 17.51 48.69 82.71 89.44
KITTI

Data preparation: Both KITTI-2012 and 2015 datasets are available here


Evaluation: After updating the path of 'kitti2012' and 'kitti2015' in admin/local.py, evaluation is run with

python eval_matching.py --datasets kitti2015 --model PDCNet --pre_trained_models megadepth --optim_iter 3 --local_optim_iter 7 PDCNet --multi_stage_type direct

Similar results should be obtained:

KITTI-2012 KITTI-2015
Models Pre-trained model type AEPE F1 (%) AEPE F1 (%)
PWC-Net-GOCor (this repo) chairs-things 4.12 19.58 10.33 31.23
PWC-Net-GOCor (paper) chairs-things 4.12 19.31 10.33 30.53
PWC-Net-GOCor (this repo) chairs-things ft sintel 2.60 9.69 7.64 21.36
PWC-Net-GOCor (paper) chairs-things ft sintel 2.60 9.67 7.64 20.93
---------------- ------------------------- ------------ ------------- ------------ -----------
GLU-Net (this repo) static 3.33 18.91 9.79 37.77
GLU-Net (this repo) dynamic 3.12 19.73 7.59 33.92
GLU-Net (paper) dynamic 3.14 19.76 7.49 33.83
GLU-Net-GOCor (this repo) dynamic 2.62 15.17 6.63 27.58
GLU-Net-GOCor (paper) dynamic 2.68 15.43 6.68 27.57
---------------- ------------------------- ------------ ------------- ------------ -----------
GLU-Net-GOCor* (paper) megadepth 2.26 9.89 5.53 18.27
PDC-Net (D) (paper and this repo) megadepth 2.08 7.98 5.22 15.13
PDC-Net (H) (this repo) megadepth 2.16 8.19 5.31 15.23
PDC-Net (MS) (this repo) megadepth 2.16 8.13 5.40 15.33
Sintel

Data preparation: Download the data with

bash assets/download_sintel.sh

Evaluation: After updating the path of 'sintel' in admin/local.py, evaluation is run with

python eval_matching.py --datasets sintel --model PDCNet --pre_trained_models megadepth --optim_iter 3 --local_optim_iter 7 --save_dir path_to_save_dir PDCNet --multi_stage_type direct

Similar results should be obtained:

Pre-trained model type AEPE PCK-1 / dataset (%) PCK-5 / dataset (%) AEPE PCK-1 / dataset (%) PCK-5 / dataset (%)
PWC-Net-GOCor (this repo) chairs-things 2.38 82.18 94.14 3.70 77.36 91.20
PWC-Net-GOCor (paper) chairs-things 2.38 82.17 94.13 3.70 77.34 91.20
PWC-Net-GOCor (paper) chairs-things ft sintel (1.74) (87.93) (95.54) (2.28) (84.15) (93.71)
--------------- -------------------------------- -------- ------------- -------------- -------- ------------- --------------
GLU-Net (this repo) dynamic 4.24 62.21 88.47 5.49 58.10 85.16
GLU-Net (paper) dynamic 4.25 62.08 88.40 5.50 57.85 85.10
GLU-Net-GOCor (this repo) dynamic 3.77 67.11 90.47 4.85 63.36 87.76
GLU-Net-GOCor (paper) dynamic 3.80 67.12 90.41 4.90 63.38 87.69
--------------- -------------------------------- -------- ------------- -------------- -------- ------------- --------------
GLU-Net-GOCor* (paper) megadepth 3.12 80.00 92.68 4.46 73.10 88.94
PDC-Net (D) (this repo) megadepth 3.30 85.06 93.38 4.48 78.07 90.07
PDC-Net (H) (this repo) megadepth 3.38 84.95 93.35 4.50 77.62 90.07
PDC-Net (MS) (this repo) megadepth 3.40 84.85 93.33 4.54 77.41 90.06
TSS

Data preparation: To download the images, run:

bash assets/download_tss.sh

Evaluation: After updating the path of 'tss' in admin/local.py, evaluation is run with

python eval_matching.py --datasets TSS --model GLUNet_GOCor --pre_trained_models static --optim_iter 3 --local_optim_iter 7 --flipping_condition True --save_dir path_to_save_dir

Similar results should be obtained:

FGD3Car JODS PASCAL All
Semantic-GLU-Net [GLUNet] 94.4 75.5 78.3 82.8
GLU-Net (our repo) Static 93.2 73.69 71.1 79.33
GLU-Net (paper) Static 93.2 73.3 71.1 79.2
GLU-Net-GOCor (our repo, GOCor iter=3, 3) Static 94.6 77.9 77.7 83.4
GLU-Net-GOCor (our repo, GOCor iter=3, 7) Static 94.6 77.6 77.1 83.1
GLU-Net-GOCor (paper) Static 94.6 77.9 77.7 83.4
PF-Pascal

Data preparation: To download the images, run:

bash assets/download_pf_pascal.sh

Evaluation: After updating the path of 'PFPascal' in admin/local.py, evaluation is run with

python eval_matching.py --datasets PFPascal --model GLUNet_GOCor --pre_trained_models static --optim_iter 3 --local_optim_iter 7 --flipping_condition True --save_dir path_to_save_dir
PF-Willow

Data preparation: To download the images, run:

bash assets/download_pf_willow.sh

Evaluation: After updating the path of 'PFWillow' in admin/local.py, evaluation is run with

python eval_matching.py --datasets PFWillow --model GLUNet_GOCor --pre_trained_models static --optim_iter 3 --local_optim_iter 7 --flipping_condition True --save_dir path_to_save_dir
Spair-71k

Data preparation: To download the images, run:

bash assets/download_spair.sh

Evaluation: After updating the path of 'spair' in admin/local.py, evaluation is run with

python eval_matching.py --datasets spair --model GLUNet_GOCor --pre_trained_models static --optim_iter 3 --local_optim_iter 7 --flipping_condition True --save_dir path_to_save_dir

4.2 Pose estimation

Metrics are computed with

python -u eval_pose_estimation.py --datasets dataset_name --model model_name --pre_trained_models pre_trained_model_name --optim_iter optim_step  --local_optim_iter local_optim_iter --estimate_at_quarter_reso True --mask_type_for_pose_estimation proba_interval_1_above_10 --save_dir path_to_save_dir --plot False 
YFCC100M

Data preparation: The groundtruth for YFCC is provided the file assets/yfcc_test_pairs_with_gt_original.txt (from SuperGlue repo). Images can be downloaded from the OANet repo and moved to the desired location

bash assets/download_yfcc.sh

File structure should be

YFCC
└──  images/
       ├── buckingham_palace/
       ├── notre_dame_front_facade/
       ├── reichstag/
       └── sacre_coeur/



Evaluation: After updating the path 'yfcc' in admin/local.py, compute metrics on YFCC100M with PDC-Net multiscale (MS) using the command:

python -u eval_pose_estimation.py --datasets YFCC --model PDCNet --pre_trained_models megadepth --optim_iter 3  --local_optim_iter 7 --estimate_at_quarter_reso True --mask_type_for_pose_estimation proba_interval_1_above_10 --save_dir path_to_save_dir --plot False PDCNet --multi_stage_type multiscale_homo_from_quarter_resolution_uncertainty --mask_type proba_interval_1_above_10 

You should get similar metrics (not exactly the same because of RANSAC):

mAP @5 mAP @10 mAP @20 Run-time (s)
PDC-Net (D) 60.52 70.91 80.30 0.
PDC-Net (H) 63.90 73.00 81.22 0.74
PDC-Net (MS) 65.18 74.21 82.42 2.55
ScanNet

Data preparation: Go to the ScanNet github repo to download the ScanNet test set (100 scenes). You will need to extract the raw sensor data from the 100 .sens files in each scene in the test set using the SensReader tool. We use the groundtruth provided by in the SuperGlue repo provided here in the file assets/scannet_test_pairs_with_gt.txt.



Evaluation: After updating the path 'scannet_test' in admin/local.py, compute metrics on ScanNet with PDC-Net multiscale (MS) using the command:

python -u eval_pose_estimation.py --datasets scannet --model PDCNet --pre_trained_models megadepth --optim_iter 3  --local_optim_iter 7 --estimate_at_quarter_reso True --mask_type_for_pose_estimation proba_interval_1_above_10 --save_dir path_to_save_dir --plot False PDCNet --multi_stage_type multiscale_homo_from_quarter_resolution_uncertainty --mask_type proba_interval_1_above_10 

You should get similar metrics (not exactly the same because of RANSAC):

mAP @5 mAP @10 mAP @20
PDC-Net (D) 39.93 50.17 60.87
PDC-Net (H) 42.87 53.07 63.25
PDC-Net (MS) 42.40 52.83 63.13

5. Training

Quick Start

The installation should have generated a local configuration file "admin/local.py". In case the file was not generated, run python -c "from admin.environment import create_default_local_file; create_default_local_file()"to generate it. Next, set the paths to the training workspace, i.e. the directory where the model weights and checkpoints will be saved. Also set the paths to the datasets you want to use (and which should be downloaded beforehand, see below). If all the dependencies have been correctly installed, you can train a network using the run_training.py script in the correct conda environment.

conda activate dense_matching_env
python run_training.py train_module train_name

Here, train_module is the sub-module inside train_settings and train_name is the name of the train setting file to be used.

For example, you can train using the included default PDCNet_stage1 settings by running:

python run_training.py PDCNet PDCNet_stage1

Training datasets downloading

DPED-CityScape-ADE

This is the same image pairs used in GLU-Net repo. For the training, we use a combination of the DPED, CityScapes and ADE-20K datasets. The DPED training dataset is composed of only approximately 5000 sets of images taken by four different cameras. We use the images from two cameras, resulting in around 10,000 images. CityScapes additionally adds about 23,000 images. We complement with a random sample of ADE-20K images with a minimum resolution of 750 x 750. It results in 40.000 original images, used to create pairs of training images by applying geometric transformations to them. The path to the original images as well as the geometric transformation parameters are given in the csv files 'assets/csv_files/homo_aff_tps_train_DPED_CityScape_ADE.csv' and 'assets/csv_files/homo_aff_tps_test_DPED_CityScape_ADE.csv'.

Apparently, the structure of the ADE-20K dataset has changed and the provided paths in the csv files are not valid anymore. I am working on a fix for the ADE-20K images. In the meantime, use 'assets/csv_files/homo_aff_tps_train_DPED_CityScape.csv' and 'assets/csv_files/homo_aff_tps_test_DPED_CityScape.csv' to exclude the ADE images (you don't need to download the ADE-20K dataset in that case). Resulting training data is 31K images. Performance of the resulting trained model might be a bit different.

  1. Download the original images
  • Download the DPED dataset (54 GB) ==> images are created in original_images/

  • Download the CityScapes dataset

    • download 'leftImg8bit_trainvaltest.zip' (11GB, left 8-bit images - train, val, and test sets', 5000 images) ==> images are created in CityScape/
    • download leftImg8bit_trainextra.zip (44GB, left 8-bit images - trainextra set, 19998 images) ==> images are created in CityScape_extra/
  • Download the ADE-20K dataset (3.8 GB, 20.210 images) ==> images are created in ADE20K_2016_07_26/

Put all the datasets in the same directory. As illustration, your root training directory should be organised as follows:

training_datasets/
    ├── original_images/
    ├── CityScape/
    ├── CityScape_extra/
    └── ADE20K_2016_07_26/
  1. Save the synthetic image pairs and flows to disk
    During training, from this set of original images, the pairs of synthetic images could be created on the fly at each epoch. However, this dataset generation takes time and since no augmentation is applied at each epoch, one can also create the dataset in advance and save it to disk. During training, the image pairs composing the training datasets are then just loaded from the disk before passing through the network, which is a lot faster. To generate the training dataset and save it to disk:
python assets/save_training_dataset_to_disk.py --image_data_path /directory/to/original/training_datasets/ 
--csv_path assets/homo_aff_tps_train_DPED_CityScape_ADE.csv --save_dir /path/to/save_dir --plot True

It will create the images pairs and corresponding flow fields in save_dir/images and save_dir/flow respectively.

  1. Add the paths in admin/local.py as 'training_cad_520' and 'validation_cad_520'
COCO

This is useful for adding moving objects. Download the images along with annotations from here. The root folder should be organized as follows. The add the paths in admin/local.py as 'coco'.

coco_root
    └── annotations
        ├── instances_train2014.json
        └── instances_train2017.json
    └──images
        ├── train2014
        └── train2017
MegaDepth

We use the reconstructions provided in the D2-Net repo. You can download the undistorted reconstructions and aggregated scene information folder directly here - Google Drive.

File structure should be the following:

MegaDepth
├── Undistorted_Sfm
└── scene_info

Them add the paths in admin/local.py as 'megadepth_training'.

Training scripts

The framework currently contains the training code for the following matching networks. The setting files can be used train the networks, or to know the exact training details.

Warp Consistency: TO COME

PDC-Net
  • PDCNet.PDCNet_stage1: The default settings used for first stage network training with fixed backbone weights. We initialize the backbone VGG-16 with pre-trained ImageNet weights. We train first on synthetically generated image pairs from the DPED, CityScape and ADE dataset (pre-computed and saved), on which we add independently moving objects and perturbations.

  • PDCNet.PDCNet_stage2: The default settings used for training the final PDC-Net model. This setting fine-tunes all layers in the model trained using PDCNet_stage1 (including the feature backbone). As training dataset, we use a combination of the same dataset than in stage 1 as well as image pairs from the MegaDepth dataset and their sparse ground-truth correspondence data.

  • PDCNet.GLUNet_GOCor_star_stage1: Same settings than for PDCNet_stage1, with different model (non probabilistic baseline). The loss is changed accordingly to the L1 loss instead of the negative log likelihood loss.

  • PDCNet.GLUNet_GOCor_star_stage2: The default settings used for training the final GLU-Net-GOCor* (see PDCNet paper).

GLU-Net
  • GLUNet.GLUNet_static: The default settings used training the final GLU-Net (of the paper GLU-Net).
    We fix the backbone weights and initialize the backbone VGG-16 with pre-trained ImageNet weights. We train on synthetically generated image pairs from the DPED, CityScape and ADE dataset (pre-computed and saved), which is later (GOCor paper) referred to as 'static' dataset.

  • GLUNet.GLUNet_dynamic: The default settings used training the final GLU-Net trained on the dynamic dataset (of the paper GOCor).
    We fix the backbone weights and initialize the backbone VGG-16 with pre-trained ImageNet weights. We train on synthetically generated image pairs from the DPED, CityScape and ADE dataset (pre-computed and saved), on which we add one independently moving object. This dataset is referred to as 'dynamic' dataset in GOCor paper.

Training your own networks

To train a custom network using the toolkit, the following components need to be specified in the train settings. For reference, see GLUNet_static.py.

  • Datasets: The datasets to be used for training. A number of standard matching datasets are already available in the datasets module. The dataset class can be passed a processing function, which should perform the necessary processing of the data before batching it, e.g. data augmentations and conversion to tensors.
  • Dataloader: Determines how to sample the batches. Can use specific samplers.
  • Network: The network module to be trained.
  • BatchPreprocessingModule: The pre-processing module that takes the batch and will transform it to the inputs required for training the network. Depends on the different networks and training strategies.
  • Objective: The training objective.
  • Actor: The trainer passes the training batch to the actor who is responsible for passing the data through the network correctly, and calculating the training loss. The batch preprocessing is also done within the actor class.
  • Optimizer: Optimizer to be used, e.g. Adam.
  • Scheduler: Scheduler to be used.
  • Trainer: The main class which runs the epochs and saves checkpoints.

6. Acknowledgement

We borrow code from public projects, such as pytracking, GLU-Net, DGC-Net, PWC-Net, NC-Net, Flow-Net-Pytorch, RAFT, CATs...

7. ChangeLog

  • 06/21: Added evaluation code
  • 07/21: Added training code and more options for evaluation
  • 08/21: Fixed memory leak in mixture dataset + forgotten .item() in multiscale loss + added other sampling for megadepth dataset
Owner
Prune Truong
PhD Student in Computer Vision Lab of ETH Zurich
Prune Truong
An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicity.

Fast Face Classification (F²C) This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicit

33 Jun 27, 2021
PAWS 🐾 Predicting View-Assignments with Support Samples

This repo provides a PyTorch implementation of PAWS (predicting view assignments with support samples), as described in the paper Semi-Supervised Learning of Visual Features by Non-Parametrically Pre

Facebook Research 437 Dec 23, 2022
MAUS: A Dataset for Mental Workload Assessment Using Wearable Sensor - Baseline system

MAUS: A Dataset for Mental Workload Assessment Using Wearable Sensor - Baseline system Getting started To start working on this assignment, you should

2 Aug 06, 2022
This repository contains the code for: RerrFact model for SciVer shared task

RerrFact This repository contains the code for: RerrFact model for SciVer shared task. Setup for Inference 1. Download SciFact database Download the S

Ashish Rana 1 May 22, 2022
Training Structured Neural Networks Through Manifold Identification and Variance Reduction

Training Structured Neural Networks Through Manifold Identification and Variance Reduction This repository is a pytorch implementation of the Regulari

0 Dec 23, 2021
Python3 / PyTorch implementation of the following paper: Fine-grained Semantics-aware Representation Enhancement for Self-supervisedMonocular Depth Estimation. ICCV 2021 (oral)

FSRE-Depth This is a Python3 / PyTorch implementation of FSRE-Depth, as described in the following paper: Fine-grained Semantics-aware Representation

77 Dec 28, 2022
2021搜狐校园文本匹配算法大赛 分比我们低的都是帅哥队

sohu_text_matching 2021搜狐校园文本匹配算法大赛Top2:分比我们低的都是帅哥队 本repo包含了本次大赛决赛环节提交的代码文件及答辩PPT,提交的模型文件可在百度网盘获取(链接:https://pan.baidu.com/s/1T9FtwiGFZhuC8qqwXKZSNA ,

hflserdaniel 43 Oct 01, 2022
My published benchmark for a Kaggle Simulations Competition

Lux AI Working Title Bot Please refer to the Kaggle notebook for the comment section. The comment section contains my explanation on my code structure

Tong Hui Kang 29 Aug 22, 2022
This repository contains the entire code for our work "Two-Timescale End-to-End Learning for Channel Acquisition and Hybrid Precoding"

Two-Timescale-DNN Two-Timescale End-to-End Learning for Channel Acquisition and Hybrid Precoding This repository contains the entire code for our work

QiyuHu 3 Mar 07, 2022
GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles

GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles This repository contains a method to generate 3D conformer ensembles direct

127 Dec 20, 2022
BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

Bilateral Denoising Diffusion Models (BDDMs) This is the official PyTorch implementation of the following paper: BDDM: BILATERAL DENOISING DIFFUSION M

172 Dec 23, 2022
HiddenMarkovModel implements hidden Markov models with Gaussian mixtures as distributions on top of TensorFlow

Class HiddenMarkovModel HiddenMarkovModel implements hidden Markov models with Gaussian mixtures as distributions on top of TensorFlow 2.0 Installatio

Susara Thenuwara 2 Nov 03, 2021
This project deploys a yolo fastest model in the form of tflite on raspberry 3b+. The model is from another repository of mine called -Trash-Classification-Car

Deploy-yolo-fastest-tflite-on-raspberry 觉得有用的话可以顺手点个star嗷 这个项目将垃圾分类小车中的tflite模型移植到了树莓派3b+上面。 该项目主要是为了记录在树莓派部署yolo fastest tflite的流程 (之后有时间会尝试用C++部署来提升

7 Aug 16, 2022
PyTorch implementation of ECCV 2020 paper "Foley Music: Learning to Generate Music from Videos "

Foley Music: Learning to Generate Music from Videos This repo holds the code for the framework presented on ECCV 2020. Foley Music: Learning to Genera

Chuang Gan 30 Nov 03, 2022
Towards Part-Based Understanding of RGB-D Scans

Towards Part-Based Understanding of RGB-D Scans (CVPR 2021) We propose the task of part-based scene understanding of real-world 3D environments: from

26 Nov 23, 2022
Reproduction process of AlexNet

PaddlePaddle论文复现杂谈 背景 注:该repo基于PaddlePaddle,对AlexNet进行复现。时间仓促,难免有所疏漏,如果问题或者想法,欢迎随时提issue一块交流。 飞桨论文复现赛地址:https://aistudio.baidu.com/aistudio/competitio

19 Nov 29, 2022
Face and Pose detector that emits MQTT events when a face or human body is detected and not detected.

Face Detect MQTT Face or Pose detector that emits MQTT events when a face or human body is detected and not detected. I built this as an alternative t

Jacob Morris 38 Oct 21, 2022
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

Benedek Rozemberczki 1.8k Jan 07, 2023
Human Detection - Pedestrian Detection using OpenCV Python

Pedestrian Detection using OpenCV Python Follow us on Instagram for Machine Lear

Hrishikesh Dutta 1 Jan 23, 2022
PyTorch Implementation of Realtime Multi-Person Pose Estimation project.

PyTorch Realtime Multi-Person Pose Estimation This is a pytorch version of Realtime_Multi-Person_Pose_Estimation, origin code is here Realtime_Multi-P

Dave Fang 157 Nov 12, 2022