Implementation of "Scaled-YOLOv4: Scaling Cross Stage Partial Network" using PyTorch framwork.

Overview

YOLOv4-large

This is the implementation of "Scaled-YOLOv4: Scaling Cross Stage Partial Network" using PyTorch framwork.

Model Test Size APtest AP50test AP75test APStest APMtest APLtest batch1 throughput
YOLOv4-P5 896 51.4% 69.9% 56.3% 33.1% 55.4% 62.4% 41 fps
YOLOv4-P5 TTA 52.5% 70.3% 58.0% 36.0% 52.4% 62.3% -
YOLOv4-P6 1280 54.3% 72.3% 59.5% 36.6% 58.2% 65.5% 30 fps
YOLOv4-P6 TTA 54.9% 72.6% 60.2% 37.4% 58.8% 66.7% -
YOLOv4-P7 1536 55.4% 73.3% 60.7% 38.1% 59.5% 67.4% 15 fps
YOLOv4-P7 TTA 55.8% 73.2% 61.2% 38.8% 60.1% 68.2% -
Model Test Size APval AP50val AP75val APSval APMval APLval weights
YOLOv4-P5 896 51.2% 69.8% 56.2% 35.0% 56.2% 64.0% yolov4-p5.pt
YOLOv4-P5 TTA 52.5% 70.2% 57.8% 38.5% 57.2% 64.0% -
YOLOv4-P5 (+BoF) 896 51.7% 70.3% 56.7% 35.9% 56.7% 64.3% yolov4-p5_.pt
YOLOv4-P5 (+BoF) TTA 52.8% 70.6% 58.3% 38.8% 57.4% 64.4% -
YOLOv4-P6 1280 53.9% 72.0% 59.0% 39.3% 58.3% 66.6% yolov4-p6.pt
YOLOv4-P6 TTA 54.4% 72.3% 59.6% 39.8% 58.9% 67.6% -
YOLOv4-P6 (+BoF) 1280 54.4% 72.7% 59.5% 39.5% 58.9% 67.3% yolov4-p6_.pt
YOLOv4-P6 (+BoF) TTA 54.8% 72.6% 60.0% 40.6% 59.1% 68.2% -
YOLOv4-P6 (+BoF*) 1280 54.7% 72.9% 60.0% 39.4% 59.2% 68.3%
YOLOv4-P6 (+BoF*) TTA 55.3% 73.2% 60.8% 40.5% 59.9% 69.4% -
YOLOv4-P7 1536 55.0% 72.9% 60.2% 39.8% 59.9% 68.4% yolov4-p7.pt
YOLOv4-P7 TTA 55.5% 72.9% 60.8% 41.1% 60.3% 68.9% -
Model Test Size APval AP50val AP75val APSval APMval APLval
YOLOv4-P6-attention 1280 54.3% 72.3% 59.6% 38.7% 58.9% 66.6%

Installation

# create the docker container, you can change the share memory size if you have more.
nvidia-docker run --name yolov4_csp -it -v your_coco_path/:/coco/ -v your_code_path/:/yolo --shm-size=64g nvcr.io/nvidia/pytorch:20.06-py3

# install mish-cuda, if you use different pytorch version, you could try https://github.com/thomasbrandon/mish-cuda
cd /
git clone https://github.com/JunnYu/mish-cuda
cd mish-cuda
python setup.py build install

# go to code folder
cd /yolo

Testing

# download {yolov4-p5.pt, yolov4-p6.pt, yolov4-p7.pt} and put them in /yolo/weights/ folder.
python test.py --img 896 --conf 0.001 --batch 8 --device 0 --data coco.yaml --weights weights/yolov4-p5.pt
python test.py --img 1280 --conf 0.001 --batch 8 --device 0 --data coco.yaml --weights weights/yolov4-p6.pt
python test.py --img 1536 --conf 0.001 --batch 8 --device 0 --data coco.yaml --weights weights/yolov4-p7.pt

You will get following results:

# yolov4-p5
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.51244
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.69771
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.56180
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.35021
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.56247
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.63983
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.38530
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.64048
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.69801
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.55487
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.74368
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.82826
# yolov4-p6
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.53857
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.72015
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.59025
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.39285
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.58283
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.66580
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.39552
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.66504
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.72141
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.59193
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.75844
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.83981
# yolov4-p7
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.55046
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.72925
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.60224
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.39836
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.59854
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.68405
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.40256
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.66929
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.72943
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.59943
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.76873
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.84460

Training

We use multiple GPUs for training. {YOLOv4-P5, YOLOv4-P6, YOLOv4-P7} use input resolution {896, 1280, 1536} for training respectively.

# yolov4-p5
python -m torch.distributed.launch --nproc_per_node 4 train.py --batch-size 64 --img 896 896 --data coco.yaml --cfg yolov4-p5.yaml --weights '' --sync-bn --device 0,1,2,3 --name yolov4-p5
python -m torch.distributed.launch --nproc_per_node 4 train.py --batch-size 64 --img 896 896 --data coco.yaml --cfg yolov4-p5.yaml --weights 'runs/exp0_yolov4-p5/weights/last_298.pt' --sync-bn --device 0,1,2,3 --name yolov4-p5-tune --hyp 'data/hyp.finetune.yaml' --epochs 450 --resume

If your training process stucks, it due to bugs of the python. Just Ctrl+C to stop training and resume training by:

# yolov4-p5
python -m torch.distributed.launch --nproc_per_node 4 train.py --batch-size 64 --img 896 896 --data coco.yaml --cfg yolov4-p5.yaml --weights 'runs/exp0_yolov4-p5/weights/last.pt' --sync-bn --device 0,1,2,3 --name yolov4-p5 --resume

Citation

@InProceedings{Wang_2021_CVPR,
    author    = {Wang, Chien-Yao and Bochkovskiy, Alexey and Liao, Hong-Yuan Mark},
    title     = {{Scaled-YOLOv4}: Scaling Cross Stage Partial Network},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {13029-13038}
}

Acknowledgements

Expand
Owner
Kin-Yiu, Wong
Kin-Yiu, Wong
Prososdy Morph: A python library for manipulating pitch and duration in an algorithmic way, for resynthesizing speech.

ProMo (Prosody Morph) Questions? Comments? Feedback? Chat with us on gitter! A library for manipulating pitch and duration in an algorithmic way, for

Tim 71 Jan 02, 2023
Benchmark for the generalization of 3D machine learning models across different remeshing/samplings of a surface.

Discretization Robust Correspondence Benchmark One challenge of machine learning on 3D surfaces is that there are many different representations/sampl

Nicholas Sharp 10 Sep 30, 2022
YOLO5Face: Why Reinventing a Face Detector (https://arxiv.org/abs/2105.12931)

Introduction Yolov5-face is a real-time,high accuracy face detection. Performance Single Scale Inference on VGA resolution(max side is equal to 640 an

DeepCam Shenzhen 1.4k Jan 07, 2023
Links to works on deep learning algorithms for physics problems, TUM-I15 and beyond

Links to works on deep learning algorithms for physics problems, TUM-I15 and beyond

Nils Thuerey 1.3k Jan 08, 2023
Oscar and VinVL

Oscar: Object-Semantics Aligned Pre-training for Vision-and-Language Tasks VinVL: Revisiting Visual Representations in Vision-Language Models Updates

Microsoft 938 Dec 26, 2022
🤗 Paper Style Guide

🤗 Paper Style Guide (Work in progress, send a PR!) Libraries to Know booktabs natbib cleveref Either seaborn, plotly or altair for graphs algorithmic

Hugging Face 66 Dec 12, 2022
SlideGraph+: Whole Slide Image Level Graphs to Predict HER2 Status in Breast Cancer

SlideGraph+: Whole Slide Image Level Graphs to Predict HER2 Status in Breast Cancer A novel graph neural network (GNN) based model (termed SlideGraph+

28 Dec 24, 2022
This is a re-implementation of TransGAN: Two Pure Transformers Can Make One Strong GAN (CVPR 2021) in PyTorch.

TransGAN: Two Transformers Can Make One Strong GAN [YouTube Video] Paper Authors: Yifan Jiang, Shiyu Chang, Zhangyang Wang CVPR 2021 This is re-implem

Ahmet Sarigun 79 Jan 05, 2023
ktrain is a Python library that makes deep learning and AI more accessible and easier to apply

Overview | Tutorials | Examples | Installation | FAQ | How to Cite Welcome to ktrain News and Announcements 2020-11-08: ktrain v0.25.x is released and

Arun S. Maiya 1.1k Jan 02, 2023
Solving SMPL/MANO parameters from keypoint coordinates.

Minimal-IK A simple and naive inverse kinematics solver for MANO hand model, SMPL body model, and SMPL-H body+hand model. Briefly, given joint coordin

Yuxiao Zhou 305 Dec 30, 2022
Deep Learning with PyTorch made easy 🚀 !

Deep Learning with PyTorch made easy 🚀 ! Carefree? carefree-learn aims to provide CAREFREE usages for both users and developers. It also provides a c

381 Dec 22, 2022
PyTorch implementation of EfficientNetV2

[NEW!] Check out our latest work involution accepted to CVPR'21 that introduces a new neural operator, other than convolution and self-attention. PyTo

Duo Li 375 Jan 03, 2023
FCA: Learning a 3D Full-coverage Vehicle Camouflage for Multi-view Physical Adversarial Attack

FCA: Learning a 3D Full-coverage Vehicle Camouflage for Multi-view Physical Adversarial Attack Case study of the FCA. The code can be find in FCA. Cas

IDRL 21 Dec 15, 2022
NasirKhusraw - The TSP solved using genetic algorithm and show TSP path overlaid on a map of the Iran provinces & their capitals.

Nasir Khusraw : Travelling Salesman Problem The TSP solved using genetic algorithm. This project show TSP path overlaid on a map of the Iran provinces

J Brave 2 Sep 01, 2022
The official MegEngine implementation of the ICCV 2021 paper: GyroFlow: Gyroscope-Guided Unsupervised Optical Flow Learning

[ICCV 2021] GyroFlow: Gyroscope-Guided Unsupervised Optical Flow Learning This is the official implementation of our ICCV2021 paper GyroFlow. Our pres

MEGVII Research 36 Sep 07, 2022
Office source code of paper UniFuse: Unidirectional Fusion for 360$^\circ$ Panorama Depth Estimation

UniFuse (RAL+ICRA2021) Office source code of paper UniFuse: Unidirectional Fusion for 360$^\circ$ Panorama Depth Estimation, arXiv, Demo Preparation I

Alibaba 47 Dec 26, 2022
MetaTTE: a Meta-Learning Based Travel Time Estimation Model for Multi-city Scenarios

MetaTTE: a Meta-Learning Based Travel Time Estimation Model for Multi-city Scenarios This is the official TensorFlow implementation of MetaTTE in the

morningstarwang 4 Dec 14, 2022
Official code repository for the EMNLP 2021 paper

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization PyTorch code for the EMNLP 2021 paper "Integrating Visuospatia

Adyasha Maharana 23 Dec 19, 2022
Convert human motion from video to .bvh

video_to_bvh Convert human motion from video to .bvh with Google Colab Usage 1. Open video_to_bvh.ipynb in Google Colab Go to https://colab.research.g

Dene 306 Dec 10, 2022
DIRL: Domain-Invariant Representation Learning

DIRL: Domain-Invariant Representation Learning Domain-Invariant Representation Learning (DIRL) is a novel algorithm that semantically aligns both the

Ajay Tanwani 30 Nov 07, 2022