YOLOv4-large
This is the implementation of "Scaled-YOLOv4: Scaling Cross Stage Partial Network" using PyTorch framwork.
| Model | Test Size | APtest | AP50test | AP75test | APStest | APMtest | APLtest | batch1 throughput | 
|---|---|---|---|---|---|---|---|---|
| YOLOv4-P5 | 896 | 51.4% | 69.9% | 56.3% | 33.1% | 55.4% | 62.4% | 41 fps | 
| YOLOv4-P5 | TTA | 52.5% | 70.3% | 58.0% | 36.0% | 52.4% | 62.3% | - | 
| YOLOv4-P6 | 1280 | 54.3% | 72.3% | 59.5% | 36.6% | 58.2% | 65.5% | 30 fps | 
| YOLOv4-P6 | TTA | 54.9% | 72.6% | 60.2% | 37.4% | 58.8% | 66.7% | - | 
| YOLOv4-P7 | 1536 | 55.4% | 73.3% | 60.7% | 38.1% | 59.5% | 67.4% | 15 fps | 
| YOLOv4-P7 | TTA | 55.8% | 73.2% | 61.2% | 38.8% | 60.1% | 68.2% | - | 
| Model | Test Size | APval | AP50val | AP75val | APSval | APMval | APLval | weights | 
|---|---|---|---|---|---|---|---|---|
| YOLOv4-P5 | 896 | 51.2% | 69.8% | 56.2% | 35.0% | 56.2% | 64.0% | yolov4-p5.pt | 
| YOLOv4-P5 | TTA | 52.5% | 70.2% | 57.8% | 38.5% | 57.2% | 64.0% | - | 
| YOLOv4-P5 (+BoF) | 896 | 51.7% | 70.3% | 56.7% | 35.9% | 56.7% | 64.3% | yolov4-p5_.pt | 
| YOLOv4-P5 (+BoF) | TTA | 52.8% | 70.6% | 58.3% | 38.8% | 57.4% | 64.4% | - | 
| YOLOv4-P6 | 1280 | 53.9% | 72.0% | 59.0% | 39.3% | 58.3% | 66.6% | yolov4-p6.pt | 
| YOLOv4-P6 | TTA | 54.4% | 72.3% | 59.6% | 39.8% | 58.9% | 67.6% | - | 
| YOLOv4-P6 (+BoF) | 1280 | 54.4% | 72.7% | 59.5% | 39.5% | 58.9% | 67.3% | yolov4-p6_.pt | 
| YOLOv4-P6 (+BoF) | TTA | 54.8% | 72.6% | 60.0% | 40.6% | 59.1% | 68.2% | - | 
| YOLOv4-P6 (+BoF*) | 1280 | 54.7% | 72.9% | 60.0% | 39.4% | 59.2% | 68.3% | |
| YOLOv4-P6 (+BoF*) | TTA | 55.3% | 73.2% | 60.8% | 40.5% | 59.9% | 69.4% | - | 
| YOLOv4-P7 | 1536 | 55.0% | 72.9% | 60.2% | 39.8% | 59.9% | 68.4% | yolov4-p7.pt | 
| YOLOv4-P7 | TTA | 55.5% | 72.9% | 60.8% | 41.1% | 60.3% | 68.9% | - | 
| Model | Test Size | APval | AP50val | AP75val | APSval | APMval | APLval | 
|---|---|---|---|---|---|---|---|
| YOLOv4-P6-attention | 1280 | 54.3% | 72.3% | 59.6% | 38.7% | 58.9% | 66.6% | 
Installation
# create the docker container, you can change the share memory size if you have more.
nvidia-docker run --name yolov4_csp -it -v your_coco_path/:/coco/ -v your_code_path/:/yolo --shm-size=64g nvcr.io/nvidia/pytorch:20.06-py3
# install mish-cuda, if you use different pytorch version, you could try https://github.com/thomasbrandon/mish-cuda
cd /
git clone https://github.com/JunnYu/mish-cuda
cd mish-cuda
python setup.py build install
# go to code folder
cd /yolo
Testing
# download {yolov4-p5.pt, yolov4-p6.pt, yolov4-p7.pt} and put them in /yolo/weights/ folder.
python test.py --img 896 --conf 0.001 --batch 8 --device 0 --data coco.yaml --weights weights/yolov4-p5.pt
python test.py --img 1280 --conf 0.001 --batch 8 --device 0 --data coco.yaml --weights weights/yolov4-p6.pt
python test.py --img 1536 --conf 0.001 --batch 8 --device 0 --data coco.yaml --weights weights/yolov4-p7.pt
You will get following results:
# yolov4-p5
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.51244
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.69771
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.56180
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.35021
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.56247
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.63983
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.38530
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.64048
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.69801
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.55487
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.74368
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.82826
# yolov4-p6
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.53857
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.72015
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.59025
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.39285
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.58283
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.66580
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.39552
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.66504
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.72141
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.59193
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.75844
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.83981
# yolov4-p7
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.55046
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.72925
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.60224
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.39836
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.59854
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.68405
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.40256
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.66929
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.72943
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.59943
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.76873
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.84460
Training
We use multiple GPUs for training. {YOLOv4-P5, YOLOv4-P6, YOLOv4-P7} use input resolution {896, 1280, 1536} for training respectively.
# yolov4-p5
python -m torch.distributed.launch --nproc_per_node 4 train.py --batch-size 64 --img 896 896 --data coco.yaml --cfg yolov4-p5.yaml --weights '' --sync-bn --device 0,1,2,3 --name yolov4-p5
python -m torch.distributed.launch --nproc_per_node 4 train.py --batch-size 64 --img 896 896 --data coco.yaml --cfg yolov4-p5.yaml --weights 'runs/exp0_yolov4-p5/weights/last_298.pt' --sync-bn --device 0,1,2,3 --name yolov4-p5-tune --hyp 'data/hyp.finetune.yaml' --epochs 450 --resume
If your training process stucks, it due to bugs of the python. Just Ctrl+C to stop training and resume training by:
# yolov4-p5
python -m torch.distributed.launch --nproc_per_node 4 train.py --batch-size 64 --img 896 896 --data coco.yaml --cfg yolov4-p5.yaml --weights 'runs/exp0_yolov4-p5/weights/last.pt' --sync-bn --device 0,1,2,3 --name yolov4-p5 --resume
Citation
@InProceedings{Wang_2021_CVPR,
    author    = {Wang, Chien-Yao and Bochkovskiy, Alexey and Liao, Hong-Yuan Mark},
    title     = {{Scaled-YOLOv4}: Scaling Cross Stage Partial Network},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {13029-13038}
}