A PyTorch version of You Only Look at One-level Feature object detector

Last update: Dec 30, 2022

Related tags

Overview

PyTorch_YOLOF

A PyTorch version of You Only Look at One-level Feature object detector.

The input image must be resized to have their shorter side being 800 and their longer side less or equal to 1333.

During reproducing the YOLOF, I found many tricks used in YOLOF but the baseline RetinaNet dosen't use those tricks. For example, YOLOF takes advantage of RandomShift, CTR_CLAMP, large learning rate, big batchsize(like 64), negative prediction threshold. Is it really fair that YOLOF use these tricks to compare with RetinaNet?

In a other word, whether the YOLOF can still work without those tricks?

Requirements

We recommend you to use Anaconda to create a conda environment:

conda create -n yolof python=3.6

Then, activate the environment:

conda activate yolof

Requirements:

pip install -r requirements.txt

PyTorch >= 1.1.0 and Torchvision >= 0.3.0

Visualize positive sample

You can run following command to visualize positiva sample:

python train.py \
        -d voc \
        --batch_size 2 \
        --root path/to/your/dataset \
        --vis_targets

My Ablation Studies

image mask

Backbone: ResNet-50
image size: shorter size = 800, longer size <= 1333
Batch size: 16
lr: 0.01
lr of backbone: 0.01
SGD with momentum 0.9 and weight decay 1e-4
Matcher: IoU Top4 (Different from the official matcher that uses top4 of L1 distance.)
epoch: 12 (1x schedule)
lr decay: 8, 11
augmentation: RandomFlip

We ignore the loss of samples who are not in image.

Method	AP	AP50	AP75	APs	APm	APl
w/o mask	28.3	46.7	28.9	13.4	33.4	39.9
w mask	28.4	46.9	29.1	13.5	33.5	39.1

L1 Top4

Backbone: ResNet-50
image size: shorter size = 800, longer size <= 1333
Batch size: 16
lr: 0.01
lr of backbone: 0.01
SGD with momentum 0.9 and weight decay 1e-4
epoch: 12 (1x schedule)
lr decay: 8, 11
augmentation: RandomFlip
with image mask

IoU topk: We choose the topK of IoU between anchor boxes and labels as the positive samples.

L1 topk: We choose the topK of L1 distance between anchor boxes and labels as the positive samples.

Method	AP	AP50	AP75	APs	APm	APl
IoU Top4	28.4	46.9	29.1	13.5	33.5	39.1
L1 Top4	28.6	46.9	29.4	13.8	34.0	39.0

RandomShift Augmentation

Backbone: ResNet-50
image size: shorter size = 800, longer size <= 1333
Batch size: 16
lr: 0.01
lr of backbone: 0.01
SGD with momentum 0.9 and weight decay 1e-4
Matcher: L1 Top4
epoch: 12 (1x schedule)
lr decay: 8, 11
augmentation: RandomFlip
with image mask

YOLOF takes advantage of RandomShift augmentation which is not used in RetinaNet.

Method	AP	AP50	AP75	APs	APm	APl
w/o RandomShift	28.6	46.9	29.4	13.8	34.0	39.0
w/ RandomShift	29.0	47.3	29.8	14.2	34.2	38.9

Fix a bug in dataloader

Backbone: ResNet-50
image size: shorter size = 800, longer size <= 1333
Batch size: 16
lr: 0.01
lr of backbone: 0.01
SGD with momentum 0.9 and weight decay 1e-4
Matcher: L1 Top4
epoch: 12 (1x schedule)
lr decay: 8, 11
augmentation: RandomFlip + RandomShift
with image mask

I fixed a bug in dataloader. Specifically, I set the shuffle in dataloader as False ...

Method	AP	AP50	AP75	APs	APm	APl
bug	29.0	47.3	29.8	14.2	34.2	38.9
no bug	30.1	49.0	31.0	15.2	36.3	39.8

Ignore samples

Backbone: ResNet-50
image size: shorter size = 800, longer size <= 1333
Batch size: 16
lr: 0.01
lr of backbone: 0.01
SGD with momentum 0.9 and weight decay 1e-4
Matcher: L1 Top4
epoch: 12 (1x schedule)
lr decay: 8, 11
augmentation: RandomFlip + RandomShift
with image mask

We ignore those negative samples whose IoU with labels are higher the ignore threshold (igt).

Method	AP	AP50	AP75	APs	APm	APl
no igt	30.1	49.0	31.0	15.2	36.3	39.8
igt=0.7

Decode boxes

Backbone: ResNet-50
image size: shorter size = 800, longer size <= 1333
Batch size: 16
lr: 0.01
lr of backbone: 0.01
SGD with momentum 0.9 and weight decay 1e-4
Matcher: L1 Top4
epoch: 12 (1x schedule)
lr decay: 8, 11
augmentation: RandomFlip + RandomShift
with image mask

Method-1: ctr_x = x_anchor + t_x, ctr_y = y_anchor + t_y

Method-2: ctr_x = x_anchor + t_x * w_anchor, ctr_y = y_anchor + t_y * h_anchor

The Method-2 is following the operation used in YOLOF.

Method	AP	AP50	AP75	APs	APm	APl
Method-1
Method-2

Train

sh train.sh

You can change the configurations of train.sh.

If you just want to check which anchor box is assigned to the positive sample, you can run:

python train.py --cuda -d voc --batch_size 8 --vis_targets

According to your own situation, you can make necessary adjustments to the above run commands

Test

python test.py -d [select a dataset: voc or coco] \
               --cuda \
               -v [select a model] \
               --weight [ Please input the path to model dir. ] \
               --img_size 800 \
               --root path/to/dataset/ \
               --show

You can run the above command to visualize the detection results on the dataset.

Comments

fix typo

When I run the eval process on VOC dataset, an error occurs:

Traceback (most recent call last):
  File "eval.py", line 126, in <module>
    voc_test(model, data_dir, device, transform)
  File "eval.py", line 42, in voc_test
    display=True)
TypeError: __init__() got an unexpected keyword argument 'data_root'

I discovered that this was due to a typo and simply fixed it. Everything is going well now.

opened by guohanli 1

标签生成函数写得有问题

源码中的标签生成逻辑是： 1.利用预测框与gt的l1距离筛选出topk个锚点，再利用锚点与gt的l1距离筛选出topk个锚点，将之作为预选正例锚点。 2.将预选正例锚点依据iou与gt匹配，滤除与锚点iou小于0.15的预选正例锚点 3.将gt与预测框iou<=0.7的预测框对应锚点设置为负例锚点 (而您只用了锚点，没有预选，也没用预测框)

opened by Mr-Z-NewStar 11

Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling

⚠️ ‎‎‎ A more recent and actively-maintained version of this code is available in ivadomed Stacked Hourglass Network with a Multi-level Attention Mech

14 Oct 24, 2022

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks

YOLOR implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks To reproduce the results in the paper, please us

1.8k Jan 4, 2023

You Only 👀 One Sequence

You Only 👀 One Sequence TL;DR: We study the transferability of the vanilla ViT pre-trained on mid-sized ImageNet-1k to the more challenging COCO obje

666 Jan 3, 2023

Hand gesture recognition based whiteboard that allows you to write on live webcam. This is the first version and has features like 4 different colors, eraser and a recording option that records your session and saves it in a "recordings" folder. Use index finger to draw and two or more fingers to move around and select items. Future version will contain more functionalities like changeable thickness, color palette, integration with zoom and google meet etc.

hand-write Hand gesture recognition based whiteboard that allows you to write on live webcam. This is the first version and has features like 4 differ

27 Dec 16, 2022

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

272 Dec 23, 2022

A PyTorch version of You Only Look at One-level Feature object detector

Related tags

Overview

PyTorch_YOLOF

Requirements

Visualize positive sample

My Ablation Studies

image mask

L1 Top4

RandomShift Augmentation

Fix a bug in dataloader

Ignore samples

Decode boxes

Train

Test

You might also like...

Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks

You Only 👀 One Sequence

Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021

LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021

A Pytorch Implementation of [Source data‐free domain adaptation of object detector through domain

A Pytorch Implementation of Domain adaptation of object detector using scissor-like networks

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Comments

fix typo

标签生成函数写得有问题

Releases(YOLOF-weight)

YOLOF-weight(Mar 20, 2022)

Owner

Jianhua Yang

Implementation of Research Paper "Learning to Enhance Low-Light Image via Zero-Reference Deep Curve Estimation"

Code for the paper "Jukebox: A Generative Model for Music"

DUE: End-to-End Document Understanding Benchmark

PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb

Implementation of ReSeg using PyTorch

Code for our paper "Interactive Analysis of CNN Robustness"

OpenMMLab Computer Vision Foundation

EgGateWayGetShell py脚本

Code used to generate the results appearing in "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"

Official implementation of Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

OOD Dataset Curator and Benchmark for AI-aided Drug Discovery

Files for a tutorial to train SegNet for road scenes using the CamVid dataset

This repository collects 100 papers related to negative sampling methods.

Supplementary code for the experiments described in the 2021 ISMIR submission: Leveraging Hierarchical Structures for Few Shot Musical Instrument Recognition.

Codebase for Diffusion Models Beat GANS on Image Synthesis.

CONditionals for Ordinal Regression and classification in PyTorch

MERLOT: Multimodal Neural Script Knowledge Models

Implementation of the bachelor's thesis "Real-time stock predictions with deep learning and news scraping".

MicRank is a Learning to Rank neural channel selection framework where a DNN is trained to rank microphone channels.

This repository contains code to run experiments in the paper "Signal Strength and Noise Drive Feature Preference in CNN Image Classifiers."