Active learning for Mask R-CNN in Detectron2

Related tags

Deep Learningmaskal
Overview

MaskAL - Active learning for Mask R-CNN in Detectron2

maskAL_framework

Summary

MaskAL is an active learning framework that automatically selects the most-informative images for training Mask R-CNN. By using MaskAL, it is possible to reduce the number of image annotations, without negatively affecting the performance of Mask R-CNN. Generally speaking, MaskAL involves the following steps:

  1. Train Mask R-CNN on a small initial subset of a bigger dataset
  2. Use the trained Mask R-CNN algorithm to make predictions on the unlabelled images of the remaining dataset
  3. Select the most-informative images with a sampling algorithm
  4. Annotate the most-informative images, and then retrain Mask R-CNN on the most informative-images
  5. Repeat step 2-4 for a specified number of sampling iterations

The figure below shows the performance improvement of MaskAL on our dataset. By using MaskAL, the performance of Mask R-CNN improved more quickly and therefore 1400 annotations could be saved (see the black dashed line):

maskAL_graph

Installation

See INSTALL.md

Data preparation and training

Split the dataset in a training set, validation set and a test set. It is not required to annotate every image in the training set, because MaskAL will select the most-informative images automatically.

  1. From the training set, a smaller initial dataset is randomly sampled (the dataset size can be specified in the maskAL.yaml file). The images that do not have an annotation are placed in the annotate subfolder inside the image folder. You first need to annotate these images with LabelMe (json), V7-Darwin (json), Supervisely (json) or CVAT (xml) (when using CVAT, export the annotations to LabelMe 3.0 format). Refer to our annotation procedure: ANNOTATION.md
  2. Step 1 is repeated for the validation set and the test set (the file locations can be specified in the maskAL.yaml file).
  3. After the first training iteration of Mask R-CNN, the sampling algorithm selects the most-informative images (its size can be specified in the maskAL.yaml file).
  4. The most-informative images that don't have an annotation, are placed in the annotate subfolder. Annotate these images with LabelMe (json), V7-Darwin (json), Supervisely (json) or CVAT (xml) (when using CVAT, export the annotations to LabelMe 3.0 format).
  5. OPTIONAL: it is possible to use the trained Mask R-CNN model to auto-annotate the unlabelled images to further reduce annotation time. Activate auto_annotate in the maskAL.yaml file, and specify the export_format (currently supported formats: 'labelme', 'cvat', 'darwin', 'supervisely').
  6. Step 3-5 are repeated for several training iterations. The number of iterations (loops) can be specified in the maskAL.yaml file.

Please note that MaskAL does not work with the default COCO json-files of detectron2. These json-files contain all annotations that are completed before the training starts. Because MaskAL involves an iterative train and annotation procedure, the default COCO json-files lack the desired format.

How to use MaskAL

Open a terminal (Ctrl+Alt+T):

(base) [email protected]:~$ cd maskal
(base) [email protected]:~/maskal$ conda activate maskAL
(maskAL) [email protected]:~/maskal$ python maskAL.py --config maskAL.yaml

Change the following settings in the maskAL.yaml file:
Setting Description
weightsroot The file directory where the weight-files are stored
resultsroot The file directory where the result-files are stored
dataroot The root directory where all image-files are stored
use_initial_train_dir Set this to True when you want to start the active-learning from an initial training dataset. When False, the initial dataset of size initial_datasize is randomly sampled from the traindir
initial_train_dir When use_initial_train_dir is activated: the file directory where the initial training images and annotations are stored
traindir The file directory where the training images and annotations are stored
valdir The file directory where the validation images and annotations are stored
testdir The file directory where the test images and annotations are stored
network_config The Mask R-CNN configuration-file (.yaml) file (see the folder './configs')
pretrained_weights The pretrained weights to start the active-learning. Either specify the network_config (.yaml) or a custom weights-file (.pth or .pkl)
cuda_visible_devices The identifiers of the CUDA device(s) you want to use for training and sampling (in string format, for example: '0,1')
classes The names of the classes in the image annotations
learning_rate The learning-rate to train Mask R-CNN (default value: 0.01)
confidence_threshold Confidence-threshold for the image analysis with Mask R-CNN (default value: 0.5)
nms_threshold Non-maximum suppression threshold for the image analysis with Mask R-CNN (default value: 0.3)
initial_datasize The size of the initial dataset to start the active learning (when use_initial_train_dir is False)
pool_size The number of most-informative images that are selected from the traindir
loops The number of sampling iterations
auto_annotate Set this to True when you want to auto-annotate the unlabelled images
export_format When auto_annotate is activated: specify the export-format of the annotations (currently supported formats: 'labelme', 'cvat', 'darwin', 'supervisely')
supervisely_meta_json When supervisely auto_annotate is activated: specify the file location of the meta.json for supervisely export

Description of the other settings in the maskAL.yaml file: MISC_SETTINGS.md

Please refer to the folder active_learning/config for more setting-files.

Other software scripts

Use a trained Mask R-CNN algorithm to auto-annotate unlabelled images: auto_annotate.py

Argument Description
--img_dir The file directory where the unlabelled images are stored
--network_config Configuration of the backbone of the network
--classes The names of the classes of the annotated instances
--conf_thres Confidence threshold of the CNN to do the image analysis
--nms_thres Non-maximum suppression threshold of the CNN to do the image analysis
--weights_file Weight-file (.pth) of the trained CNN
--export_format Specifiy the export-format of the annotations (currently supported formats: 'labelme', 'cvat', 'darwin', 'supervisely')
--supervisely_meta_json The file location of the meta.json for supervisely export

Example syntax (auto_annotate.py):

python auto_annotate.py --img_dir datasets/train --network_config COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x.yaml --classes healthy damaged matured cateye headrot --conf_thres 0.5 --nms_thres 0.2 --weights_file weights/broccoli/model_final.pth --export_format supervisely --supervisely_meta_json datasets/meta.json

Troubleshooting

See TROUBLESHOOTING.md

Citation

See our research article for more information or cross-referencing:

@misc{blok2021active,
      title={Active learning with MaskAL reduces annotation effort for training Mask R-CNN}, 
      author={Pieter M. Blok and Gert Kootstra and Hakim Elchaoui Elghor and Boubacar Diallo and Frits K. van Evert and Eldert J. van Henten},
      year={2021},
      eprint={2112.06586},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url = {https://arxiv.org/abs/2112.06586},
}

License

Our software was forked from Detectron2 (https://github.com/facebookresearch/detectron2). As such, the software will be released under the Apache 2.0 license.

Acknowledgements

The uncertainty calculation methods were inspired by the research of Doug Morrison:
https://nikosuenderhauf.github.io/roboticvisionchallenges/assets/papers/CVPR19/rvc_4.pdf

Two software methods were inspired by the work of RovelMan:
https://github.com/RovelMan/active-learning-framework

MaskAL uses the Bayesian Active Learning (BaaL) software:
https://github.com/ElementAI/baal

Contact

MaskAL is developed and maintained by Pieter Blok.

Code to generate datasets used in "How Useful is Self-Supervised Pretraining for Visual Tasks?"

Synthetic dataset rendering Framework for producing the synthetic datasets used in: How Useful is Self-Supervised Pretraining for Visual Tasks? Alejan

Princeton Vision & Learning Lab 21 Apr 29, 2022
Learning kernels to maximize the power of MMD tests

Code for the paper "Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy" (arXiv:1611.04488; published at ICLR 2017), by Douga

Danica J. Sutherland 201 Dec 17, 2022
Full Resolution Residual Networks for Semantic Image Segmentation

Full-Resolution Residual Networks (FRRN) This repository contains code to train and qualitatively evaluate Full-Resolution Residual Networks (FRRNs) a

Toby Pohlen 274 Oct 27, 2022
The official implementation of ELSA: Enhanced Local Self-Attention for Vision Transformer

ELSA: Enhanced Local Self-Attention for Vision Transformer By Jingkai Zhou, Pich

DamoCV 87 Dec 19, 2022
Noether Networks: meta-learning useful conserved quantities

Noether Networks: meta-learning useful conserved quantities This repository contains the code necessary to reproduce experiments from "Noether Network

Dylan Doblar 33 Nov 23, 2022
WHENet - ONNX, OpenVINO, TFLite, TensorRT, EdgeTPU, CoreML, TFJS, YOLOv4/YOLOv4-tiny-3L

HeadPoseEstimation-WHENet-yolov4-onnx-openvino ONNX, OpenVINO, TFLite, TensorRT, EdgeTPU, CoreML, TFJS, YOLOv4/YOLOv4-tiny-3L 1. Usage $ git clone htt

Katsuya Hyodo 49 Sep 21, 2022
Neural Re-rendering for Full-frame Video Stabilization

NeRViS: Neural Re-rendering for Full-frame Video Stabilization Project Page | Video | Paper | Google Colab Setup Setup environment for [Yu and Ramamoo

Yu-Lun Liu 9 Jun 17, 2022
Predicting Student Attentiveness using OpenCV

Predicting-Student-Attentiveness-using-OpenCV The model will predict if a student is attentive or not through facial parameter received through the st

Johann Pinto 2 Aug 20, 2022
toroidal - a lightweight transformer library for PyTorch

toroidal - a lightweight transformer library for PyTorch Toroidal transformers are of smaller size and lower weight than the more common E-I types. Th

MathInf GmbH 64 Jan 07, 2023
An executor that loads ONNX models and embeds documents using the ONNX runtime.

ONNXEncoder An executor that loads ONNX models and embeds documents using the ONNX runtime. Usage via Docker image (recommended) from jina import Flow

Jina AI 2 Mar 15, 2022
A list of Machine Learning Art Colabs

ML Visual Art Colabs A list of cool Colabs on Machine Learning Imagemaking or other artistic purposes 3D Ken Burns Effect Ken Burns Effect by Manuel R

Derrick Schultz (he/him) 789 Dec 12, 2022
2021搜狐校园文本匹配算法大赛 分比我们低的都是帅哥队

sohu_text_matching 2021搜狐校园文本匹配算法大赛Top2:分比我们低的都是帅哥队 本repo包含了本次大赛决赛环节提交的代码文件及答辩PPT,提交的模型文件可在百度网盘获取(链接:https://pan.baidu.com/s/1T9FtwiGFZhuC8qqwXKZSNA ,

hflserdaniel 43 Oct 01, 2022
Official implementation of NPMs: Neural Parametric Models for 3D Deformable Shapes - ICCV 2021

NPMs: Neural Parametric Models Project Page | Paper | ArXiv | Video NPMs: Neural Parametric Models for 3D Deformable Shapes Pablo Palafox, Aljaz Bozic

PabloPalafox 109 Nov 22, 2022
Computer Vision is an elective course of MSAI, SCSE, NTU, Singapore

[AI6122] Computer Vision is an elective course of MSAI, SCSE, NTU, Singapore. The repository corresponds to the AI6122 of Semester 1, AY2021-2022, starting from 08/2021. The instructor of this course

HT. Li 5 Sep 12, 2022
[ICLR 2021, Spotlight] Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks, ICLR 2021 (Spotlight) Demo | Paper [NEW!] Time to play with our interac

Shengyu Zhao 373 Jan 02, 2023
Repository for the paper "Online Domain Adaptation for Occupancy Mapping", RSS 2020

RSS 2020 - Online Domain Adaptation for Occupancy Mapping Repository for the paper "Online Domain Adaptation for Occupancy Mapping", Robotics: Science

Anthony 26 Sep 22, 2022
View model summaries in PyTorch!

torchinfo (formerly torch-summary) Torchinfo provides information complementary to what is provided by print(your_model) in PyTorch, similar to Tensor

Tyler Yep 1.5k Jan 05, 2023
Official pytorch implementation of paper Dual-Level Collaborative Transformer for Image Captioning (AAAI 2021).

Dual-Level Collaborative Transformer for Image Captioning This repository contains the reference code for the paper Dual-Level Collaborative Transform

lyricpoem 160 Dec 11, 2022
Unsupervised Feature Loss (UFLoss) for High Fidelity Deep learning (DL)-based reconstruction

Unsupervised Feature Loss (UFLoss) for High Fidelity Deep learning (DL)-based reconstruction Official github repository for the paper High Fidelity De

28 Dec 16, 2022
Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition | paper | dataset | pretrained detection model | Authors: Yi-Chang Che

Yi-Chang Chen 1 Aug 23, 2022