A PyTorch version of You Only Look at One-level Feature object detector

Overview

PyTorch_YOLOF

A PyTorch version of You Only Look at One-level Feature object detector.

The input image must be resized to have their shorter side being 800 and their longer side less or equal to 1333.

During reproducing the YOLOF, I found many tricks used in YOLOF but the baseline RetinaNet dosen't use those tricks. For example, YOLOF takes advantage of RandomShift, CTR_CLAMP, large learning rate, big batchsize(like 64), negative prediction threshold. Is it really fair that YOLOF use these tricks to compare with RetinaNet?

In a other word, whether the YOLOF can still work without those tricks?

Requirements

  • We recommend you to use Anaconda to create a conda environment:
conda create -n yolof python=3.6
  • Then, activate the environment:
conda activate yolof
  • Requirements:
pip install -r requirements.txt 

PyTorch >= 1.1.0 and Torchvision >= 0.3.0

Visualize positive sample

You can run following command to visualize positiva sample:

python train.py \
        -d voc \
        --batch_size 2 \
        --root path/to/your/dataset \
        --vis_targets

My Ablation Studies

image mask

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: IoU Top4 (Different from the official matcher that uses top4 of L1 distance.)
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip

We ignore the loss of samples who are not in image.

Method AP AP50 AP75 APs APm APl
w/o mask 28.3 46.7 28.9 13.4 33.4 39.9
w mask 28.4 46.9 29.1 13.5 33.5 39.1

L1 Top4

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip
  • with image mask

IoU topk: We choose the topK of IoU between anchor boxes and labels as the positive samples.

L1 topk: We choose the topK of L1 distance between anchor boxes and labels as the positive samples.

Method AP AP50 AP75 APs APm APl
IoU Top4 28.4 46.9 29.1 13.5 33.5 39.1
L1 Top4 28.6 46.9 29.4 13.8 34.0 39.0

RandomShift Augmentation

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: L1 Top4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip
  • with image mask

YOLOF takes advantage of RandomShift augmentation which is not used in RetinaNet.

Method AP AP50 AP75 APs APm APl
w/o RandomShift 28.6 46.9 29.4 13.8 34.0 39.0
w/ RandomShift 29.0 47.3 29.8 14.2 34.2 38.9

Fix a bug in dataloader

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: L1 Top4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip + RandomShift
  • with image mask

I fixed a bug in dataloader. Specifically, I set the shuffle in dataloader as False ...

Method AP AP50 AP75 APs APm APl
bug 29.0 47.3 29.8 14.2 34.2 38.9
no bug 30.1 49.0 31.0 15.2 36.3 39.8

Ignore samples

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: L1 Top4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip + RandomShift
  • with image mask

We ignore those negative samples whose IoU with labels are higher the ignore threshold (igt).

Method AP AP50 AP75 APs APm APl
no igt 30.1 49.0 31.0 15.2 36.3 39.8
igt=0.7

Decode boxes

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: L1 Top4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip + RandomShift
  • with image mask

Method-1: ctr_x = x_anchor + t_x, ctr_y = y_anchor + t_y

Method-2: ctr_x = x_anchor + t_x * w_anchor, ctr_y = y_anchor + t_y * h_anchor

The Method-2 is following the operation used in YOLOF.

Method AP AP50 AP75 APs APm APl
Method-1
Method-2

Train

sh train.sh

You can change the configurations of train.sh.

If you just want to check which anchor box is assigned to the positive sample, you can run:

python train.py --cuda -d voc --batch_size 8 --vis_targets

According to your own situation, you can make necessary adjustments to the above run commands

Test

python test.py -d [select a dataset: voc or coco] \
               --cuda \
               -v [select a model] \
               --weight [ Please input the path to model dir. ] \
               --img_size 800 \
               --root path/to/dataset/ \
               --show

You can run the above command to visualize the detection results on the dataset.

You might also like...
Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling
Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling

⚠️ ‎‎‎ A more recent and actively-maintained version of this code is available in ivadomed Stacked Hourglass Network with a Multi-level Attention Mech

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks
implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks

YOLOR implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks To reproduce the results in the paper, please us

 You Only 👀 One Sequence
You Only 👀 One Sequence

You Only 👀 One Sequence TL;DR: We study the transferability of the vanilla ViT pre-trained on mid-sized ImageNet-1k to the more challenging COCO obje

Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021

LoFTR: Detector-Free Local Feature Matching with Transformers Project Page | Paper LoFTR: Detector-Free Local Feature Matching with Transformers Jiami

LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021

LoFTR-with-train-script LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021 (with train script --- unofficial ---). About Megadepth

A Pytorch Implementation of [Source data‐free domain adaptation of object detector through domain

A Pytorch Implementation of Source data‐free domain adaptation of object detector through domain‐specific perturbation Please follow Faster R-CNN and

A Pytorch Implementation of Domain adaptation of object detector using scissor-like networks

A Pytorch Implementation of Domain adaptation of object detector using scissor-like networks Please follow Faster R-CNN and DAF to complete the enviro

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch
Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

Comments
  • fix typo

    fix typo

    When I run the eval process on VOC dataset, an error occurs:

    Traceback (most recent call last):
      File "eval.py", line 126, in <module>
        voc_test(model, data_dir, device, transform)
      File "eval.py", line 42, in voc_test
        display=True)
    TypeError: __init__() got an unexpected keyword argument 'data_root'
    

    I discovered that this was due to a typo and simply fixed it. Everything is going well now.

    opened by guohanli 1
  • 标签生成函数写得有问题

    标签生成函数写得有问题

    源码中的标签生成逻辑是: 1.利用预测框与gt的l1距离筛选出topk个锚点,再利用锚点与gt的l1距离筛选出topk个锚点,将之作为预选正例锚点。 2.将预选正例锚点依据iou与gt匹配,滤除与锚点iou小于0.15的预选正例锚点 3.将gt与预测框iou<=0.7的预测框对应锚点设置为负例锚点 (而您只用了锚点,没有预选,也没用预测框)

    opened by Mr-Z-NewStar 11
Owner
Jianhua Yang
I love anime!!I love ACG!! The universe is so big,I want to fly and wander.
Jianhua Yang
City-seeds - A random generator of cultural characteristics intended to spark ideas and help draw threads

City Seeds This is a random generator of cultural characteristics intended to sp

Aydin O'Leary 2 Mar 12, 2022
'A C2C E-COMMERCE TRUST MODEL BASED ON REPUTATION' Python implementation

Project description A library providing functionalities to calculate reputation and degree of trust on C2C ecommerce platforms. The work is fully base

Davide Bigotti 2 Dec 14, 2022
CIFAR-10 Photo Classification

Image-Classification CIFAR-10 Photo Classification CIFAR-10_Dataset_Classfication CIFAR-10 Photo Classification Dataset CIFAR is an acronym that stand

ADITYA SHAH 1 Jan 05, 2022
automated systems to assist guarding corona Virus precautions for Closed Rooms (e.g. Halls, offices, etc..)

Automatic-precautionary-guard automated systems to assist guarding corona Virus precautions for Closed Rooms (e.g. Halls, offices, etc..) what is this

badra 0 Jan 06, 2022
Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

STCN Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [a

Rex Cheng 456 Dec 12, 2022
Official repository for MixFaceNets: Extremely Efficient Face Recognition Networks

MixFaceNets This is the official repository of the paper: MixFaceNets: Extremely Efficient Face Recognition Networks. (Accepted in IJCB2021) https://i

Fadi Boutros 51 Dec 13, 2022
The codes and models in 'Gaze Estimation using Transformer'.

GazeTR We provide the code of GazeTR-Hybrid in "Gaze Estimation using Transformer". We recommend you to use data processing codes provided in GazeHub.

65 Dec 27, 2022
adversarial_multi_armed_bandit_variable_plays

Adversarial Multi-Armed Bandit with Variable Plays This code is for paper: Adversarial Online Learning with Variable Plays in the Evasion-and-Pursuit

Yiyang Wang 1 Oct 28, 2021
Machine Learning Model deployment for Container (TensorFlow Serving)

try_tf_serving ├───dataset │ ├───testing │ │ ├───paper │ │ ├───rock │ │ └───scissors │ └───training │ ├───paper │ ├───rock

Azhar Rizki Zulma 5 Jan 07, 2022
A simple python module to generate anchor (aka default/prior) boxes for object detection tasks.

PyBx WIP A simple python module to generate anchor (aka default/prior) boxes for object detection tasks. Calculated anchor boxes are returned as ndarr

thatgeeman 4 Dec 15, 2022
Pytorch code for our paper "Feedback Network for Image Super-Resolution" (CVPR2019)

Feedback Network for Image Super-Resolution [arXiv] [CVF] [Poster] Update: Our proposed Gated Multiple Feedback Network (GMFN) will appear in BMVC2019

Zhen Li 539 Jan 06, 2023
Streamlit tool to explore coco datasets

What is this This tool given a COCO annotations file and COCO predictions file will let you explore your dataset, visualize results and calculate impo

Jakub Cieslik 75 Dec 16, 2022
Code for unmixing audio signals in four different stems "drums, bass, vocals, others". The code is adapted from "Jukebox: A Generative Model for Music"

Status: Archive (code is provided as-is, no updates expected) Disclaimer This code is a based on "Jukebox: A Generative Model for Music" Paper We adju

Wadhah Zai El Amri 24 Dec 29, 2022
Video Swin Transformer - PyTorch

Video-Swin-Transformer-Pytorch This repo is a simple usage of the official implementation "Video Swin Transformer". Introduction Video Swin Transforme

Haofan Wang 116 Dec 20, 2022
Diverse graph algorithms implemented using JGraphT library.

# 1. Installing Maven & Pandas First, please install Java (JDK11) and Python 3 if they are not already. Next, make sure that Maven (for importing J

See Woo Lee 3 Dec 17, 2022
Turning pixels into virtual points for multimodal 3D object detection.

Multimodal Virtual Point 3D Detection Turning pixels into virtual points for multimodal 3D object detection. Multimodal Virtual Point 3D Detection, Ti

Tianwei Yin 204 Jan 08, 2023
TensorFlow Metal Backend on Apple Silicon Experiments (just for fun)

tf-metal-experiments TensorFlow Metal Backend on Apple Silicon Experiments (just for fun) Setup This is tested on M1 series Apple Silicon SOC only. Te

Timothy Liu 161 Jan 03, 2023
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation, CVPR2022

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation Paper Links: TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentati

Hust Visual Learning Team 253 Dec 21, 2022
A library that can print Python objects in human readable format

objprint A library that can print Python objects in human readable format Install pip install objprint Usage op Use op() (or objprint()) to print obj

319 Dec 25, 2022
GNN4Traffic - This is the repository for the collection of Graph Neural Network for Traffic Forecasting

GNN4Traffic - This is the repository for the collection of Graph Neural Network for Traffic Forecasting

564 Jan 02, 2023