Tutorial on active learning with the Nvidia Transfer Learning Toolkit (TLT).

Overview

Active Learning with the Nvidia TLT

Tutorial on active learning with the Nvidia Transfer Learning Toolkit (TLT).

In this tutorial, we will show you how you can do active learning for object detection with the Nvidia Transfer Learning Toolkit. The task will be object detection of apples in a plantation setting. Accurately detecting and counting fruits is a critical step towards automating harvesting processes. Furthermore, fruit counting can be used to project expected yield and hence to detect low yield years early on.

The structure of the tutorial is as follows:

  1. Prerequisites
    1. Set up Lightly
    2. Set up Nvidia TLT
    3. Data
  2. Active Learning
    1. Initial Sampling
    2. Training and Inference
    3. Active Learning Step
    4. Re-training

To get started, clone this repository to your machine and change the directory.

git clone https://github.com/lightly-ai/NvidiaTLTActiveLearning.git
cd NvidiaTLTActiveLearning

1 Prerequisites

For this tutorial, you require Python 3.6 or higher. You also need to install lightly, numpy and argparse.

pip install -r requirements.txt

1.1 Set up Lightly for Active Learning

To set up lightly for active learning, head to the Lightly Platform and create a free account by logging in. Make sure to get your token by clicking on your e-mail address and selecting "Preferences". You will need the token for the rest of this tutorial.

1.2 Set up Nvidia TLT

To install the Nvidia Transfer Learning Toolkit, follow these instructions. If you want to use custom scripts for training and inference, you can skip this part.

Setting up Nvidia TLT can be done in a few minutes and consists of the following steps:

  1. Install Docker.
  2. Install Nvidia GPU driver v455.xx or above.
  3. Install nvidia docker2.
  4. Get an NGC account and API key.

To make all relevant directories accessible to the Nvidia TLT, you need to mount the current working directory and the yolo_v4/specs directory to the Nvidia TLT docker. You can do so with the provided mount.py script.

python mount.py

Next, you need to specify all training configurations. The Nvidia TLT expects all training configurations in a .txt file which is stored in the yolo_v4/specs/ directory. For the purpose of this tutorial we provide an example in yolo_v4_minneapple.txt. The most important differences to the example script provided by Nvidia are:

  • Anchor Shapes: We made the anchor boxes smaller since the largest bounding boxes in our dataset are only approximately 50 pixels wide.
  • Augmentation Config: We set the output width and height of the augmentations to 704 and 1280 respectively. This corresponds to the shape of our images.
  • Target Class Mapping: For transfer learning, we made a target class mapping from car to apple. This means that every time the model would now predict a car, it predicts an apple instead.

1.3 Data

In this tutorial, we will use the MinneApple fruit detection dataset. It consists of 670 training images of apple trees, annotated for detection and segmentation. The dataset contains images of trees with red and green apples.

Note: The Nvidia TLT expects the data and labels in the KITTI format. This means they expect one folder containing the images and one folder containing the annotations. The name of an image and its corresponding annotation file must be the same apart from the file extension. You can find the MinneApple dataset converted to this format attached to the first release of this tutorial. Alternatively, you can download the files from the official link and convert the labels yourself.

Create a data/ directory, move the downloaded minneapple.zip file there, and unzip it

cd data/
unzip minneapple.zip
cd ..

Here's an example of how the converted labels look like. Note how we use the label car instead of apple because of the target class mapping we had defined in section 1.2.

Car 0. 0 0. 1.0 228.0 6.0 241.0 0. 0. 0. 0. 0. 0. 0.
Car 0. 0 0. 5.0 228.0 28.0 249.0 0. 0. 0. 0. 0. 0. 0.
Car 0. 0 0. 30.0 238.0 46.0 256.0 0. 0. 0. 0. 0. 0. 0.
Car 0. 0 0. 37.0 214.0 58.0 234.0 0. 0. 0. 0. 0. 0. 0.
Car 0. 0 0. 82.0 261.0 104.0 281.0 0. 0. 0. 0. 0. 0. 0.
Car 0. 0 0. 65.0 283.0 82.0 301.0 0. 0. 0. 0. 0. 0. 0.
Car 0. 0 0. 82.0 284.0 116.0 317.0 0. 0. 0. 0. 0. 0. 0.
Car 0. 0 0. 111.0 274.0 142.0 306.0 0. 0. 0. 0. 0. 0. 0.
Car 0. 0 0. 113.0 308.0 131.0 331.0 0. 0. 0. 0. 0. 0. 0.

2 Active Learning

Now that the setup is complete, you can start the active learning loop. In general, the active learning loop will consist of the following steps:

  1. Initial sampling: Get an initial set of images to annotate and train on.
  2. Training and inference: Train on the labeled data and make predictions on all data.
  3. Active learning query: Use the predictions to get the next set of images to annotate, go to 2.

We will walk you through all three steps in this tutorial.

To do active learning with Lightly, you first need to upload your dataset to the platform. The command lightly-magic trains a self-supervised model to get good image representations and then uploads the images along with the image representations to the platform. If you want to skip training, you can set trainer.max_epochs=0. In the following command, replace MY_TOKEN with your token from the platform.

You can also upload thumbnails or even just metadata about the images. See this link for more information.

lightly-magic \
    input_dir=./data/raw/images \
    trainer.max_epochs=0 \
    loader.num_workers=8 \
    collate.input_size=512 \
    new_dataset_name="MinneApple" \
    token=MY_TOKEN

The above command will display the id of your dataset. You will need this later in the tutorial.

Once the upload has finished, you can visually explore your dataset in the Lightly Platform.

2.1 Initial Sampling

Now, let's select an initial batch of images which for annotation and training.

Lightly offers different sampling strategies, the most prominent ones being CORESET and RANDOM sampling. RANDOM sampling will preserve the underlying distribution of your dataset well while CORESET maximizes the heterogeneity of your dataset. While exploring our dataset in the Lightly Platform, we noticed many different clusters therefore we choose CORESET sampling to make sure that every cluster is represented in the training data.

Use the active_learning_query.py script to make an initial selection:

python active_learning_query.py \
    --token YOUR_TOKEN \
    --dataset_id YOUR_DATASET_ID \
    --new_tag_name 'initial-selection' \
    --n_samples 100 \
    --method CORESET

The above script roughly performs the following steps:

It creates an API client to communicate with the Lightly API.

# create an api client
client = ApiWorkflowClient(
    token=YOUR_TOKEN,
    dataset_id=YOUR_DATASET_ID,
)

Then, it creates an active learning agent which serves as an interface to do active learning.

# create an active learning agent
al_agent = ActiveLearningAgent(client)

Finally, it creates a sampling configuration, makes an active learning query, and puts the annotated images into the data/train directory.

# make an active learning query
cofnig = SamplerConfig(
    n_samples=100,
    method=SamplingMethod.CORESET,
    name='initial-selection',
)
al_agent.query(config)

# simulate annotation step by copying the data to the data/train directory 
oracle.annotate_images(al_agent.added_set)

The query will automatically create a new tag with the name initial-selection in the Lightly Platform.

You can verify that the number of annotated images is correct like this:

ls data/train/images | wc -l
ls data/train/labels | wc -l

2.2 Training and Inference

Now that we have our annotated training data, let's train an object detection model on it and see how well it works! Use the Nvidia Transfer Learning Toolkit to train a YOLOv4 object detector from the command line. The cool thing about transfer learning is that you don't have to train a model from scratch and therefore require fewer annotated images to get good results.

Start by downloading a pre-trained object detection model from the Nvidia registry.

mkdir -p ./yolo_v4/pretrained_resnet18
ngc registry model download-version nvidia/tlt_pretrained_object_detection:resnet18 \
    --dest ./yolo_v4/pretrained_resnet18

Finetuning the object detector on the sampled training data is as simple as the following command. Make sure to replace YOUR_KEY with the API token you get from your Nvidia account.

mkdir -p $PWD/yolo_v4/experiment_dir_unpruned
tlt yolo_v4 train \
    -e /workspace/tlt-experiments/yolo_v4/specs/yolo_v4_minneapple.txt \
    -r /workspace/tlt-experiments/yolo_v4/experiment_dir_unpruned \
    --gpus 1 \
    -k YOUR_KEY

Now that you have finetuned the object detector on your dataset, you can do inference to see how well it works.

Doing inference on the whole dataset has the advantage that you can easily figure out for which images the model performs poorly or has a lot of uncertainties.

tlt yolo_v4 inference \
    -i /workspace/tlt-experiments/data/raw/images/ \
    -e /workspace/tlt-experiments/yolo_v4/specs/yolo_v4_minneapple.txt \
    -m /workspace/tlt-experiments/yolo_v4/experiment_dir_unpruned/weights/yolov4_resnet18_epoch_050.tlt \
    -o /workspace/tlt-experiments/infer_images \
    -l /workspace/tlt-experiments/infer_labels \
    -k MY_KEY

Below you can see two example images after training. It's evident that the model does not perform well on the unlabeled image. Therefore, it makes sense to add more samples to the training dataset.

2.3 Active Learning Step

You can use the inferences from the previous step to determine which images cause the model problems. With Lightly, you can easily select these images while at the same time making sure that your training dataset is not flooded with duplicates.

This section is about how to select the images which complete your training dataset. You can use the active_learning_query.py script again but this time you have to indicate that there already exists a set of preselected images and point the script to where the inferences are stored.

Use CORAL instead of CORESET as a sampling method. CORAL simultaneously maximizes the diversity and the sum of the active learning scores in the sampled data.

python active_learning_query.py \
    --token YOUR_TOKEN \
    --dataset_id YOUR_DATASET_ID \
    --preselected_tag_name 'initial-selection' \
    --new_tag_name 'al-iteration-1' \
    --n_samples 200 \
    --method CORAL

The script works very similarly to before but with one significant difference: This time, all the inferred labels are loaded and used to calculate an active learning score for each sample.

# create a scorer to calculate active learning scores based on model outputs
scorer = ScorerObjectDetection(model_outputs)

The rest of the script is almost the same as for the initial selection:

# create an api client
client = ApiWorkflowClient(
    token=YOUR_TOKEN,
    dataset_id=YOUR_DATASET_ID,
)

# create an active learning agent and set the preselected tag
al_agent = ActiveLearningAgent(
    client,
    preselected_tag_name='initial-selection',
)

# create a sampler configuration
config = SamplerConfig(
    n_samples=200,
    method=SamplingMethod.CORAL,
    name='al-iteration-1',
)

# make an active learning query
al_agent.query(config, scorer)

# simulate the annotation step
oracle.annotate_images(al_agent.added_set)

As before, we can check the number of images in the training set:

ls data/train/images | wc -l
ls data/train/labels | wc -l

2.4 Re-training

You can re-train our object detector on the new dataset to get an even better model. For this, you can use the same command as before. If you want to continue training from the last checkpoint, make sure to replace the pretrain_model_path in the specs file by a resume_model_path:

tlt yolo_v4 train \
    -e /workspace/tlt-experiments/yolo_v4/specs/yolo_v4_minneapple.txt \
    -r /workspace/tlt-experiments/yolo_v4/experiment_dir_unpruned \
    --gpus 1 \
    -k MY_KEY

If you're still unhappy with the performance after re-training the model, you can repeat steps 2.2 and 2.3 and then re-train the model again.

You might also like...
PyTorch implementation of the Quasi-Recurrent Neural Network - up to 16 times faster than NVIDIA's cuDNN LSTM
PyTorch implementation of the Quasi-Recurrent Neural Network - up to 16 times faster than NVIDIA's cuDNN LSTM

Quasi-Recurrent Neural Network (QRNN) for PyTorch Updated to support multi-GPU environments via DataParallel - see the the multigpu_dataparallel.py ex

A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)
General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.

AI pipelines for Nvidia Jetson Platform

Jetson Multicamera Pipelines Easy-to-use realtime CV/AI pipelines for Nvidia Jetson Platform. This project: Builds a typical multi-camera pipeline, i.

Simply enable or disable your Nvidia dGPU

EnvyControl (WIP) Simply enable or disable your Nvidia dGPU Usage First clone this repo and install envycontrol with sudo pip install . CLI Turn off y

Deploy optimized transformer based models on Nvidia Triton server

Deploy optimized transformer based models on Nvidia Triton server

A Pythonic library for Nvidia Codec.

A Pythonic library for Nvidia Codec. The project is still in active development; expect breaking changes. Why another Python library for Nvidia Codec?

Deploy optimized transformer based models on Nvidia Triton server

🤗 Hugging Face Transformer submillisecond inference 🤯 and deployment on Nvidia Triton server Yes, you can perfom inference with transformer based mo

Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.
Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

mtomo Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation.

Comments
  • Missing line in yolo_v4_mineapple.txt

    Missing line in yolo_v4_mineapple.txt

    Hello there!

    First of all, I just want to say tanks for your well explained demo, but I had some problems following it as I ran into some error, that I just didn't understand. After some search and research I managed to found a missing line into the yolo_v4_mineapple.txt, that should be put after the line 65 (output_height:1280): output_channel: 3.

    So, I just thought I'll leave this here, for posterity. And... now I will close the issue. Thanks!

    opened by Funderburger 0
  • Rework tutorial based on feedback

    Rework tutorial based on feedback

    Rework tutorial based on feedback

    Closes #691.

    • Addresses most of the issues mentioned.
    • Does not yet include a comparison to a random selection as this would take on benchmarking character which is not what this tutorial is intended for.
    opened by philippmwirth 0
  • Advantage over training with all data instead of samples

    Advantage over training with all data instead of samples

    Hi, I just wanna know what is the difference between training with all the 600 samples and training 100 samples first, 200, 300, .... What does active learning step does? it really select the best images or what? I didn't get clear for me.

    Thanks in advance

    opened by leo2105 1
Releases(v1.0-alpha)
Owner
Lightly
Lightly
Improving Non-autoregressive Generation with Mixup Training

MIST Training MIST TRAIN_FILE=/your/path/to/train.json VALID_FILE=/your/path/to/valid.json OUTPUT_DIR=/your/path/to/save_checkpoints CACHE_DIR=/your/p

7 Nov 22, 2022
FairMOT for Multi-Class MOT using YOLOX as Detector

FairMOT-X Project Overview FairMOT-X is a multi-class multi object tracker, which has been tailored for training on the BDD100K MOT Dataset. It makes

Jonathan Tan 33 Dec 28, 2022
Implementation of "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement" by pytorch

This repository is used to suspend the results of our paper "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement"

ScorpioMiku 19 Sep 30, 2022
COLMAP - Structure-from-Motion and Multi-View Stereo

COLMAP About COLMAP is a general-purpose Structure-from-Motion (SfM) and Multi-View Stereo (MVS) pipeline with a graphical and command-line interface.

4.7k Jan 07, 2023
Deep Learning Based EDM Subgenre Classification using Mel-Spectrogram and Tempogram Features"

EDM-subgenre-classifier This repository contains the code for "Deep Learning Based EDM Subgenre Classification using Mel-Spectrogram and Tempogram Fea

11 Dec 20, 2022
[AAAI2022] Source code for our paper《Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning》

SSVC The source code for paper [Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning] samples of the

7 Oct 26, 2022
Transparent Transformer Segmentation

Transparent Transformer Segmentation Introduction This repository contains the data and code for IJCAI 2021 paper Segmenting transparent object in the

谢恩泽 140 Jan 02, 2023
Checking fibonacci - Generating the Fibonacci sequence is a classic recursive problem

Fibonaaci Series Generating the Fibonacci sequence is a classic recursive proble

Moureen Caroline O 1 Feb 15, 2022
RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

Multipath RefineNet A MATLAB based framework for semantic image segmentation and general dense prediction tasks on images. This is the source code for

Guosheng Lin 575 Dec 06, 2022
Implementation of PyTorch-based multi-task pre-trained models

mtdp Library containing implementation related to the research paper "Multi-task pre-training of deep neural networks for digital pathology" (Mormont

Romain Mormont 27 Oct 14, 2022
Code and data of the EMNLP 2021 paper "Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer"

StyleAttack Code and data of the EMNLP 2021 paper "Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer" Prepare Pois

THUNLP 19 Nov 20, 2022
PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

Stochastic CSLR This is the PyTorch implementation for the ECCV 2020 paper: Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuou

Zhe Niu 28 Dec 19, 2022
This repository is maintained for the scientific paper tittled " Study of keyword extraction techniques for Electric Double Layer Capacitor domain using text similarity indexes: An experimental analysis "

kwd-extraction-study This repository is maintained for the scientific paper tittled " Study of keyword extraction techniques for Electric Double Layer

ping 543f 1 Dec 05, 2022
Extracting knowledge graphs from language models as a diagnostic benchmark of model performance.

Interpreting Language Models Through Knowledge Graph Extraction Idea: How do we interpret what a language model learns at various stages of training?

EPFL Machine Learning and Optimization Laboratory 9 Oct 25, 2022
Rainbow is all you need! A step-by-step tutorial from DQN to Rainbow

Do you want a RL agent nicely moving on Atari? Rainbow is all you need! This is a step-by-step tutorial from DQN to Rainbow. Every chapter contains bo

Jinwoo Park (Curt) 1.4k Dec 29, 2022
WSDM‘2022: Knowledge Enhanced Sports Game Summarization

Knowledge Enhanced Sports Game Summarization Cooming Soon! :) Data will be released after approval process. Code will be published once the author of

Jiaan Wang 14 Jul 13, 2022
Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

DSA^2 F: Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral) This repo is the official imp

如今我已剑指天涯 46 Dec 21, 2022
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022

🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022

Advanced Image Manipulation Lab @ Samsung AI Center Moscow 4.7k Dec 31, 2022
Semi-supervised Implicit Scene Completion from Sparse LiDAR

Semi-supervised Implicit Scene Completion from Sparse LiDAR Paper Created by Pengfei Li, Yongliang Shi, Tianyu Liu, Hao Zhao, Guyue Zhou and YA-QIN ZH

114 Nov 30, 2022
Official implementation of the paper Label-Efficient Semantic Segmentation with Diffusion Models

Label-Efficient Semantic Segmentation with Diffusion Models Official implementation of the paper Label-Efficient Semantic Segmentation with Diffusion

Yandex Research 355 Jan 06, 2023