Distilled coarse part of LoFTR adapted for compatibility with TensorRT and embedded divices

Overview

Coarse LoFTR TRT

Google Colab demo notebook

This project provides a deep learning model for the Local Feature Matching for two images that can be used on the embedded devices like NVidia Jetson Nano 2GB with a reasonable accuracy and performance - 5 FPS. The algorithm is based on the coarse part of "LoFTR: Detector-Free Local Feature Matching with Transformers". But the model has a reduced number of ResNet and coarse transformer layers so there is the much lower memory consumption and the better performance. The required level of accuracy was achieved by applying the Knowledge distillation technique and training on the BlendedMVS dataset.

The code is based on the original LoFTR repository, but was adapted for compatibility with TensorRT technology, especially dependencies to einsum and einops were removed.

Model weights

Weights for the PyTorch model, ONNX model and TensorRT engine files are located in the weights folder.

Weights for original LoFTR coarse module can be downloaded using the original url that was provider by paper authors, now only the outdoor-ds file is supported.

Demo

There is a Demo application, that can be ran with the webcam.py script. There are following parameters:

  • --weights - The path to PyTorch model weights, for example 'weights/LoFTR_teacher.pt' or 'weights/outdoor_ds.ckpt'
  • --trt - The path to the TensorRT engine, for example 'weights/LoFTR_teacher.trt'
  • --onnx - The path to the ONNX model, for example 'weights/LoFTR_teacher.onnx'
  • --original - If specified the original LoFTR model will be used, can be used only with --weights parameter
  • --camid - OpenCV webcam video capture ID, usually 0 or 1, default 0
  • --device - Selects the runtime back-end CPU or CUDA, default is CUDA

Sample command line:

python3 webcam.py --trt=weights/LoFTR_teacher.trt --camid=0

Demo application shows a window with pair of images captured with a camera. Initially there will be the two same images. Then you can choose a view of interest and press the s button, the view will be remembered and will be visible as the left image. Then you can change the view and press the p button to make a snapshot of the feature matching result, the corresponding features will be marked with the same numbers at the two images. If you press the p button again then application will allow you to change the view and repeat the feature matching process. Also this application shows the real-time FPS counter so you can estimate the model performance.

Training

To repeat the training procedure you should use the low-res set of the BlendedMVS dataset. After download you can use the train.py script to run training process. There are following parameters for this script:

  • --path - Path to the dataset
  • --checkpoint_path - Where to store a log information and checkpoints, default value is 'weights'
  • --weights - Path to the LoFTR teacher model weights, default value is 'weights/outdoor_ds.ckpt'

Sample command line:

python3 train.py --path=/home/user/datasets/BlendedMVS --checkpoint_path=weights/experiment1/

Please use the train/settings.py script to configure the training process. Please notice that by default the following parameters are enabled:

self.batch_size = 32
self.batch_size_divider = 8  # Used for gradient accumulation
self.use_amp = True
self.epochs = 35
self.epoch_size = 5000

This set of parameters was chosen for training with the Nvidia GTX1060 GPU, which is the low level consumer level card. The use_amp parameter means the automatic mixed precision will be used to reduce the memory consumption and the training time. Also, the gradient accumulation technique is enabled with the batch_size_divider parameter, it means the actual batch size will be 32/8 but for larger batch size simulation the 8 batches will be averaged. Moreover, the actual size of the epoch is reduced with the epoch_size parameter, it means that on every epoch only 5000 dataset elements will be randomly picked from the whole dataset.

Paper

@misc{kolodiazhnyi2022local,
      title={Local Feature Matching with Transformers for low-end devices}, 
      author={Kyrylo Kolodiazhnyi},
      year={2022},
      eprint={2202.00770},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

LoFTR Paper:

@article{sun2021loftr,
  title={{LoFTR}: Detector-Free Local Feature Matching with Transformers},
  author={Sun, Jiaming and Shen, Zehong and Wang, Yuang and Bao, Hujun and Zhou, Xiaowei},
  journal={{CVPR}},
  year={2021}
}
Owner
Kirill
Kirill
[peer review] An Arbitrary Scale Super-Resolution Approach for 3D MR Images using Implicit Neural Representation

ArSSR This repository is the pytorch implementation of our manuscript "An Arbitrary Scale Super-Resolution Approach for 3-Dimensional Magnetic Resonan

Qing Wu 19 Dec 12, 2022
Implementation of the CVPR 2021 paper "Online Multiple Object Tracking with Cross-Task Synergy"

Online Multiple Object Tracking with Cross-Task Synergy This repository is the implementation of the CVPR 2021 paper "Online Multiple Object Tracking

54 Oct 15, 2022
Adversarial vulnerability of powerful near out-of-distribution detection

Adversarial vulnerability of powerful near out-of-distribution detection by Stanislav Fort In this repository we're collecting replications for the ke

Stanislav Fort 9 Aug 30, 2022
Trying to understand alias-free-gan.

alias-free-gan-explanation Trying to understand alias-free-gan in my own way. [Chinese Version 中文版本] CC-BY-4.0 License. Tzu-Heng Lin motivation of thi

Tzu-Heng Lin 12 Mar 17, 2022
An implementation of an abstract algebra for music tones (pitches).

nbdev template Use this template to more easily create your nbdev project. If you are using an older version of this template, and want to upgrade to

Open Music Kit 0 Oct 10, 2022
PyTorch Code of "Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spatiotemporal Dynamics"

Memory In Memory Networks It is based on the paper Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spati

Yang Li 12 May 30, 2022
Traditional deepdream with VQGAN+CLIP and optical flow. Ready to use in Google Colab

VQGAN-CLIP-Video cat.mp4 policeman.mp4 schoolboy.mp4 forsenBOG.mp4

23 Oct 26, 2022
4th place solution to datafactory challenge by Intermarché.

Solution to Datafactory challenge by Intermarché. 4th place solution to datafactory challenge by Intermarché. The objective of the challenge is to pre

Raphael Sourty 11 Mar 19, 2022
Torchlight2 lan game server tool - A message forwarding tool for Torchlight 2 lan game

Torchlight 2 Lan Game Server Tool A message forwarding tool for Torchlight 2 lan

Huaijun Jiang 3 Nov 01, 2022
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features

CleanRL (Clean Implementation of RL Algorithms) CleanRL is a Deep Reinforcement Learning library that provides high-quality single-file implementation

Costa Huang 1.8k Jan 01, 2023
A U-Net combined with a variational auto-encoder that is able to learn conditional distributions over semantic segmentations.

Probabilistic U-Net + **Update** + An improved Model (the Hierarchical Probabilistic U-Net) + LIDC crops is now available. See below. Re-implementatio

Simon Kohl 498 Dec 26, 2022
Understanding the Generalization Benefit of Model Invariance from a Data Perspective

Understanding the Generalization Benefit of Model Invariance from a Data Perspective This is the code for our NeurIPS2021 paper "Understanding the Gen

1 Jan 15, 2022
Image Recognition using Pytorch

PyTorch Project Template A simple and well designed structure is essential for any Deep Learning project, so after a lot practice and contributing in

Sarat Chinni 1 Nov 02, 2021
Intrinsic Image Harmonization

Intrinsic Image Harmonization [Paper] Zonghui Guo, Haiyong Zheng, Yufeng Jiang, Zhaorui Gu, Bing Zheng Here we provide PyTorch implementation and the

VISION @ OUC 44 Dec 21, 2022
On-device speech-to-index engine powered by deep learning.

On-device speech-to-index engine powered by deep learning.

Picovoice 30 Nov 24, 2022
This repo implements a 3D segmentation task for an airport baggage dataset.

3D CT Scan Segmentation With Occupancy Network This repo implements a 3D superresolution segmentation task for an airport baggage dataset. Our final p

Christoph Reich 2 Mar 28, 2022
We are More than Our JOints: Predicting How 3D Bodies Move

We are More than Our JOints: Predicting How 3D Bodies Move Citation This repo contains the official implementation of our paper MOJO: @inproceedings{Z

72 Oct 20, 2022
The repository offers the official implementation of our paper in PyTorch.

Cloth Interactive Transformer (CIT) Cloth Interactive Transformer for Virtual Try-On Bin Ren1, Hao Tang1, Fanyang Meng2, Runwei Ding3, Ling Shao4, Phi

Bingoren 49 Dec 01, 2022
[ACM MM 2021] Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)

Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation) [arXiv] [paper] @inproceedings{hou2021multiview, title={Multiview

Yunzhong Hou 27 Dec 13, 2022
An Agnostic Computer Vision Framework - Pluggable to any Training Library: Fastai, Pytorch-Lightning with more to come

IceVision is the first agnostic computer vision framework to offer a curated collection with hundreds of high-quality pre-trained models from torchvision, MMLabs, and soon Pytorch Image Models. It or

airctic 789 Dec 29, 2022