Distilled coarse part of LoFTR adapted for compatibility with TensorRT and embedded divices

Overview

Coarse LoFTR TRT

Google Colab demo notebook

This project provides a deep learning model for the Local Feature Matching for two images that can be used on the embedded devices like NVidia Jetson Nano 2GB with a reasonable accuracy and performance - 5 FPS. The algorithm is based on the coarse part of "LoFTR: Detector-Free Local Feature Matching with Transformers". But the model has a reduced number of ResNet and coarse transformer layers so there is the much lower memory consumption and the better performance. The required level of accuracy was achieved by applying the Knowledge distillation technique and training on the BlendedMVS dataset.

The code is based on the original LoFTR repository, but was adapted for compatibility with TensorRT technology, especially dependencies to einsum and einops were removed.

Model weights

Weights for the PyTorch model, ONNX model and TensorRT engine files are located in the weights folder.

Weights for original LoFTR coarse module can be downloaded using the original url that was provider by paper authors, now only the outdoor-ds file is supported.

Demo

There is a Demo application, that can be ran with the webcam.py script. There are following parameters:

  • --weights - The path to PyTorch model weights, for example 'weights/LoFTR_teacher.pt' or 'weights/outdoor_ds.ckpt'
  • --trt - The path to the TensorRT engine, for example 'weights/LoFTR_teacher.trt'
  • --onnx - The path to the ONNX model, for example 'weights/LoFTR_teacher.onnx'
  • --original - If specified the original LoFTR model will be used, can be used only with --weights parameter
  • --camid - OpenCV webcam video capture ID, usually 0 or 1, default 0
  • --device - Selects the runtime back-end CPU or CUDA, default is CUDA

Sample command line:

python3 webcam.py --trt=weights/LoFTR_teacher.trt --camid=0

Demo application shows a window with pair of images captured with a camera. Initially there will be the two same images. Then you can choose a view of interest and press the s button, the view will be remembered and will be visible as the left image. Then you can change the view and press the p button to make a snapshot of the feature matching result, the corresponding features will be marked with the same numbers at the two images. If you press the p button again then application will allow you to change the view and repeat the feature matching process. Also this application shows the real-time FPS counter so you can estimate the model performance.

Training

To repeat the training procedure you should use the low-res set of the BlendedMVS dataset. After download you can use the train.py script to run training process. There are following parameters for this script:

  • --path - Path to the dataset
  • --checkpoint_path - Where to store a log information and checkpoints, default value is 'weights'
  • --weights - Path to the LoFTR teacher model weights, default value is 'weights/outdoor_ds.ckpt'

Sample command line:

python3 train.py --path=/home/user/datasets/BlendedMVS --checkpoint_path=weights/experiment1/

Please use the train/settings.py script to configure the training process. Please notice that by default the following parameters are enabled:

self.batch_size = 32
self.batch_size_divider = 8  # Used for gradient accumulation
self.use_amp = True
self.epochs = 35
self.epoch_size = 5000

This set of parameters was chosen for training with the Nvidia GTX1060 GPU, which is the low level consumer level card. The use_amp parameter means the automatic mixed precision will be used to reduce the memory consumption and the training time. Also, the gradient accumulation technique is enabled with the batch_size_divider parameter, it means the actual batch size will be 32/8 but for larger batch size simulation the 8 batches will be averaged. Moreover, the actual size of the epoch is reduced with the epoch_size parameter, it means that on every epoch only 5000 dataset elements will be randomly picked from the whole dataset.

Paper

@misc{kolodiazhnyi2022local,
      title={Local Feature Matching with Transformers for low-end devices}, 
      author={Kyrylo Kolodiazhnyi},
      year={2022},
      eprint={2202.00770},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

LoFTR Paper:

@article{sun2021loftr,
  title={{LoFTR}: Detector-Free Local Feature Matching with Transformers},
  author={Sun, Jiaming and Shen, Zehong and Wang, Yuang and Bao, Hujun and Zhou, Xiaowei},
  journal={{CVPR}},
  year={2021}
}
Owner
Kirill
Kirill
UMPNet: Universal Manipulation Policy Network for Articulated Objects

UMPNet: Universal Manipulation Policy Network for Articulated Objects Zhenjia Xu, Zhanpeng He, Shuran Song Columbia University Robotics and Automation

Columbia Artificial Intelligence and Robotics Lab 33 Dec 03, 2022
Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators

Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators This is our Pytorch implementation for t

RUCAIBox 12 Jul 22, 2022
Overview of architecture and implementation of TEDS-Net, as described in MICCAI 2021: "TEDS-Net: Enforcing Diffeomorphisms in Spatial Transformers to Guarantee TopologyPreservation in Segmentations"

TEDS-Net Overview of architecture and implementation of TEDS-Net, as described in MICCAI 2021: "TEDS-Net: Enforcing Diffeomorphisms in Spatial Transfo

Madeleine K Wyburd 14 Jan 04, 2023
Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

DSE 314/614: Reinforcement Learning This repository containing reinforcement lea

Manav Mishra 4 Apr 15, 2022
A custom DeepStack model for detecting 16 human actions.

DeepStack_ActionNET This repository provides a custom DeepStack model that has been trained and can be used for creating a new object detection API fo

MOSES OLAFENWA 16 Nov 11, 2022
A Haskell kernel for IPython.

IHaskell You can now try IHaskell directly in your browser at CoCalc or mybinder.org. Alternatively, watch a talk and demo showing off IHaskell featur

Andrew Gibiansky 2.4k Dec 29, 2022
NVIDIA Deep Learning Examples for Tensor Cores

NVIDIA Deep Learning Examples for Tensor Cores Introduction This repository provides State-of-the-Art Deep Learning examples that are easy to train an

NVIDIA Corporation 10k Dec 31, 2022
A library for uncertainty quantification based on PyTorch

Torchuq [logo here] TorchUQ is an extensive library for uncertainty quantification (UQ) based on pytorch. TorchUQ currently supports 10 representation

TorchUQ 96 Dec 12, 2022
A basic neural network for image segmentation.

Unet_erythema_detection A basic neural network for image segmentation. 前期准备 1.在logs文件夹中下载h5权重文件,百度网盘链接在logs文件夹中 2.将所有原图 放置在“/dataset_1/JPEGImages/”文件夹

1 Jan 16, 2022
Densely Connected Convolutional Networks, In CVPR 2017 (Best Paper Award).

Densely Connected Convolutional Networks (DenseNets) This repository contains the code for DenseNet introduced in the following paper Densely Connecte

Zhuang Liu 4.5k Jan 03, 2023
TorchIO is a Medical image preprocessing and augmentation toolkit for deep learning. Part of the PyTorch Ecosystem.

Medical image preprocessing and augmentation toolkit for deep learning. Part of the PyTorch Ecosystem.

Fernando Pérez-García 1.6k Jan 06, 2023
Implementation of Multistream Transformers in Pytorch

Multistream Transformers Implementation of Multistream Transformers in Pytorch. This repository deviates slightly from the paper, where instead of usi

Phil Wang 47 Jul 26, 2022
Serve TensorFlow ML models with TF-Serving and then create a Streamlit UI to use them

TensorFlow Serving + Streamlit! ✨ 🖼️ Serve TensorFlow ML models with TF-Serving and then create a Streamlit UI to use them! This is a pretty simple S

Álvaro Bartolomé 18 Jan 07, 2023
Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)

DocFormer - PyTorch Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for t

171 Jan 06, 2023
A set of tools for converting a darknet dataset to COCO format working with YOLOX

darknet格式数据→COCO darknet训练数据目录结构(详情参见dataset/darknet): darknet ├── class.names ├── gen_config.data ├── gen_train.txt ├── gen_valid.txt └── images

RapidAI-NG 148 Jan 03, 2023
NeuPy is a Tensorflow based python library for prototyping and building neural networks

NeuPy v0.8.2 NeuPy is a python library for prototyping and building neural networks. NeuPy uses Tensorflow as a computational backend for deep learnin

Yurii Shevchuk 729 Jan 03, 2023
This repository accompanies our paper “Do Prompt-Based Models Really Understand the Meaning of Their Prompts?”

This repository accompanies our paper “Do Prompt-Based Models Really Understand the Meaning of Their Prompts?” Usage To replicate our results in Secti

Albert Webson 64 Dec 11, 2022
AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data [WIP] Unofficial Pytorch implementation of AdaSpeech 2. Requirements : All code written i

Rishikesh (ऋषिकेश) 63 Dec 28, 2022
Pytorch implementation of the popular Improv RNN model originally proposed by the Magenta team.

Pytorch Implementation of Improv RNN Overview This code is a pytorch implementation of the popular Improv RNN model originally implemented by the Mage

Sebastian Murgul 3 Nov 11, 2022
DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection

DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection Code for our Paper DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Obje

Steven Lang 58 Dec 19, 2022