The project was to detect traffic signs, based on the Megengine framework.

Last update: Dec 02, 2022

Related tags

Deep Learning trafficsign

Overview

trafficsign

赛题

旷视AI智慧交通开源赛道，初赛1/177，复赛1/12。
本赛题为复杂场景的交通标志检测，对五种交通标志进行识别。

框架

megengine

算法方案

网络框架
- atss + resnext101_32x8d
训练阶段
- 图片尺寸
  最终提交版本输入图片尺寸为(1500,2100)
- 多尺度训练（最终提交版本未采用）
  起初我们将短边设为(1024, 1056, 1088, 1120, 1152, 1184, 1216, 1248, 1280, 1312, 1344, 1376, 1408)，随机选取短边后，长边按比例缩放，并使长边长度小于1800，从而进行多尺度训练，取得了很好的效果。不过后期的mosaic和mixup在增强时对图片进行了缩放，实则隐含了多尺度训练，且效果优于上述方法，所以我们最终去掉了多尺度训练。
- 数据增强
  - mosaic增强
    
    随机选择四张图片，对图片进行随机平移10%，尺度缩放(0.5,2.0)，shear 0.1，最后将四张图片进行组合。
  - mixup增强
    
    随机选取两张图进行叠加，我们最终选用的比例是0.5 * 原图+0.5 * 新图片，同时其进行缩放(0.5,2.0)。
    
    下图为mosaic+mixup示例图：
  - 随机水平翻转
    
    直接对图片进行翻转，会导致第三个类别“arr_l”（左转线）和右转线混淆，故我们添加了class-aware的翻转，遇到有“arr_l”类的图片则不进行翻转。
  - 基于Albumentations库的各种增强（最终提交版本未采用）
    
    我们尝试了ShiftScaleRotate（验证集+0.5）、CLANE（验证集+1.0）、RandomBrightnessContrast等，但组合起来测试集提点欠佳，所以最后没用。
  - gridmask增强（最终提交版本未采用）
    
    生成一个和原图相同分辨率的mask（每个grid上全为0或全为1），然后将该mask与原图相乘得到一个图像。提点欠佳，所以没采用。
  - 类别平衡采样（最终提交版本未采用）
    
    使用类别平衡采样后，效果不是很好，这可能是因为数据集本身没有严重的类别不均衡。下面是我们统计的每个类别在图片中出现的频率。
    
    红灯直行线左转线禁止行驶禁止停车
    
    频率 0.356 0.228 0.201 0.257 0.485
多尺度测试
- 多尺度测试图片尺寸
  
  最后提交版本(2100,2700),(2100,2800),(2400,3200)，如果继续增加尺度，map还会继续提高。
- topk—nms
  
  对上述三个尺度生成的结果先进行nms，再将得到的结果框与剩下所有框进行topk—nms（保留与当前结果框iou大于0.85的topk的框，把这些框的坐标进行融合），参数设置vote_thresh=0.85, k=5。
网络结构
- 加上增强后，backbone从res50到res101再到resx101有稳定涨点。
- 我们还在backbone部分尝试了dcn和gcnet，验证集收效甚微，最终没有采用。

	红灯	直行线	左转线	禁止行驶	禁止停车
频率	0.356	0.228	0.201	0.257	0.485

模型训练与测试

数据集位置

/path/to/ 
    |->traffic   
    |    |images     
    |    |annotations->|train.json     
    |    |             |val.json     
    |    |             |test.json

训练测试

在加上增强后，我们训练了36个epoch。

pip3 install --user -r requirements.txt

export PYTHONPATH=your_path/trafficsign:$PYTHONPATH

cd weights && wget https://data.megengine.org.cn/models/weights/atss_resx101_coco_2x_800size_45dot6_b3a91b36.pkl

python3 tools/train.py -n 4 -b 2 -f configs/atss_resx101_final.py -d your_datasetpath -w weights/atss_resx101_coco_2x_800size_45dot6_b3a91b36.pkl

python3 tools/test_final.py -n 4 -se 35 -f configs/atss_resx101_final.py -d your_datasetpath

(-n 能抢到几张卡就写几吧qaq)

备注

以上提到的所有方法，无论最终是否采用，代码中均有实现。

感谢

https://github.com/MegEngine/Models/tree/master/official/vision/detection

https://github.com/MegEngine/YOLOX

The project was to detect traffic signs, based on the Megengine framework.

Related tags

Overview

trafficsign

赛题

框架

算法方案

模型训练与测试

备注

感谢

Owner

Material for my PyConDE & PyData Berlin 2022 Talk "5 Steps to Speed Up Your Data-Analysis on a Single Core"

A PyTorch Implementation of Gated Graph Sequence Neural Networks (GGNN)

Digitalizing-Prescription-Image - PIRDS - Prescription Image Recognition and Digitalizing System is a OCR make with Tensorflow

[CVPR'21] FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML)

The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)

PyTorch implementation for the Neuro-Symbolic Sudoku Solver leveraging the power of Neural Logic Machines (NLM)

Benchmarking the robustness of Spatial-Temporal Models

"MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction" (CVPRW 2022) & (Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB)

Scalable machine learning based time series forecasting

SGoLAM - Simultaneous Goal Localization and Mapping

My implementation of DeepMind's Perceiver

Analyzes your GitHub Profile and presents you with a report on how likely you are to become the next MLH Fellow!

Federated Deep Reinforcement Learning for the Distributed Control of NextG Wireless Networks.

Vector Neurons: A General Framework for SO(3)-Equivariant Networks

Manifold Alignment for Semantically Aligned Style Transfer

T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time

GeDML is an easy-to-use generalized deep metric learning library

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.