A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection

Overview

Confluence: A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection

1. 介绍

image

用以替代 NMS,在所有 bbox 中挑选出最优的集合。 NMS 仅考虑了 bbox 的得分,然后根据 IOU 来去除重叠的 bbox。而 Confluence 则是利用曼哈顿距离作为 bbox 之间的重合度,并根据置信度加权的曼哈顿距离还作为最优 bbox 的选择依据。

2. 算法原理

2.1 曼哈顿距离

两点的曼哈顿距离就是坐标值插的 L1 范数:

image

推广到两个 bbox 对的哈曼顿距离则为:

image

该算法便是以曼哈顿距离作为两个 bbox 的重合度,曼哈顿距离小于一定值的的 bbox 则被认为是一个 cluster。

2.2 归一化

因为 bbox 有个各样的 size 和 position,所以直接计算曼哈顿距离就没有可比性,没有标准的度量。所以需要对其进行归一化:

image

2.3 置信度加权曼哈顿距离

NMS在去除重合 bbox 是仅考虑其置信度的高低,Condluence 则同时考虑了曼哈顿距离和置信度,构成一个置信度加权曼哈顿距离:

image

3. 算法实现

image

算法:

(1)针对每个类别挑出属于该类别的 bbox 集合 B

(2)遍历 B 中所有的 bbox bi,并计算 bi 和其他 boox的 曼哈顿距离 p,并归一化

2.1 选出 p < 2 的集合,作为一个 cluster,并计算加权曼哈顿距离 wp。 

2.2 在该 cluster 中挑选出最小的 wp 作为 bi 的 wp。 

(3)遍历完毕后,挑出 wp 最小的 bi 作为最优 bbox,添加进最终结果集合中,并将其从 B 去除

(4)把与最优 bbox 的曼哈顿距离小于阈值 MD 的的 bbox 从 B 中去除

(5)不断重复 (2) - (4),每次都选出一个最优 bbox,知道 B 为空

注意:

(1)原文伪代码第 5 行:optimalConfuence 初始化成一个比较大的值就可以,不一定要是 Ip

(2)原文伪代码第 12 行:应该是 Proximity / si

4. 实验结果

image

5. 代码解析

5.1 YOLOv3/4 的后处理

这个接口可以直接处理 YOLOv3/4 的 yolo 层的输出进行后处理

confluence_process(prediction, conf_thres=0.1, wp_thres=0.6)

支持多标签和单标签,并把数据重组后进行 confluence/NMS 处理

# Detections matrix nx6 (xyxy, conf, cls)
if multi_label:
    i, j = (x[:, 5:] > conf_thres).nonzero().t()
    x = torch.cat((box[i], x[i, j + 5, None], j[:, None].float()), 1)
else:  # best class only
    conf, j = x[:, 5:].max(1, keepdim=True)
    x = torch.cat((box, conf, j.float()), 1)[conf.view(-1) > conf_thres]

5.2 Confluence 算法

confluence(prediction, class_num, wp_thres=0.6)

给所有目标添加上序号

index = np.arange(0, len(prediction), 1).reshape(-1,1)
infos = np.concatenate((prediction, index), 1)

不同类别单独处理,并遍历所有的剩余目标集合 B,直到集合为空,对应上面伪代码的(1)-(2)

for c in range(class_num):       
    pcs = infos[infos[:, 5] == c]             
    while (len(pcs)):                      
        n = len(pcs)       
        xs = pcs[:, [0, 2]]
        ys = pcs[:, [1, 3]]             
        ps = []        
        # 遍历 pcs,计算每一个box 和其余 box 的 p 值,然后聚类成簇,再根据 wp 挑出 best
        confluence_min = 10000
        best = None
        for i, pc in enumerate(pcs):

计算所有目标与其他目标的曼和顿距离 p 和加权曼哈顿距离 wp,p < 2 的目标作为一个 cluster,其中最小的 wp 作为该 cluster 的 wp

index_other = [j for j in range(n) if j!= i]
x_t = xs[i]
x_t = np.tile(x_t, (n-1, 1))
x_other = xs[index_other]
x_all = np.concatenate((x_t, x_other), 1)
.
.
.
# wp
wp = p / pc[4]
wp = wp[p < 2]

if (len(wp) == 0):
    value = 0
else:
    value = wp.min()

选出最小的 wp,确定目标

# select the bbox which has the smallest wp as the best bbox
if (value < confluence_min):
   confluence_min = value
   best = i  

然后把与目标的曼哈顿距离小于阈值的目标和本身都从集合 B 中去除

keep.append(int(pcs[best][6])) 
if (len(ps) > 0):               
    p = ps[best]
    index_ = np.where(p < wp_thres)[0]
    index_ = [i if i < best else i +1 for i in index_]
else:
    index_ = []
    
# delect the bboxes whose Manhattan Distance is below the predefined MD
index_eff = [j for j in range(n) if (j != best and j not in index_)]            
pcs = pcs[index_eff]

最后继续重复遍历集合 B,直到集合为空。

仓库里我放了一张测试照片和原始检测结果,大家可以直接用来调试 confluence 函数。

Credits:

https://arxiv.org/pdf/2012.00257.pdf

ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure

ViViT is a collection of numerical tricks to efficiently access curvature from the generalized Gauss-Newton (GGN) matrix based on its low-rank structure. Provided functionality includes computing

Felix Dangel 12 Dec 08, 2022
An Artificial Intelligence trying to drive a car by itself on a user created map

An Artificial Intelligence trying to drive a car by itself on a user created map

Akhil Sahukaru 17 Jan 13, 2022
Unified file system operation experience for different backend

megfile - Megvii FILE library Docs: http://megvii-research.github.io/megfile megfile provides a silky operation experience with different backends (cu

MEGVII Research 76 Dec 14, 2022
Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

Implicit Internal Video Inpainting Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation paper | project

202 Dec 30, 2022
Code used to generate the results appearing in "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"

Train longer, generalize better - Big batch training This is a code repository used to generate the results appearing in "Train longer, generalize bet

Elad Hoffer 145 Sep 16, 2022
Learning to Segment Instances in Videos with Spatial Propagation Network

Learning to Segment Instances in Videos with Spatial Propagation Network This paper is available at the 2017 DAVIS Challenge website. Check our result

Jingchun Cheng 145 Sep 28, 2022
Level Based Customer Segmentation

level_based_customer_segmentation Level Based Customer Segmentation Persona Veri Seti kullanılarak müşteri segmentasyonu yapılmıştır. KOLONLAR : PRICE

Buse Yıldırım 6 Dec 21, 2021
An open source Jetson Nano baseboard and tools to design your own.

My Jetson Nano Baseboard This basic baseboard gives the user the foundation and the flexibility to design their own baseboard for the Jetson Nano. It

NVIDIA AI IOT 57 Dec 29, 2022
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Deep Daze mist over green hills shattered plates on the grass cosmic love and attention a time traveler in the crowd life during the plague meditative

Phil Wang 4.4k Jan 03, 2023
Pytorch implementation of Implicit Behavior Cloning.

Implicit Behavior Cloning - PyTorch (wip) Pytorch implementation of Implicit Behavior Cloning. Install conda create -n ibc python=3.8 pip install -r r

Kevin Zakka 49 Dec 25, 2022
Repository containing the PhD Thesis "Formal Verification of Deep Reinforcement Learning Agents"

Getting Started This repository contains the code used for the following publications: Probabilistic Guarantees for Safe Deep Reinforcement Learning (

Edoardo Bacci 5 Aug 31, 2022
Official repo for QHack—the quantum machine learning hackathon

Note: This repository has been frozen while we consider the submissions for the QHack Open Hackathon. We hope you enjoyed the event! Welcome to QHack,

Xanadu 118 Jan 05, 2023
Neural Scene Graphs for Dynamic Scene (CVPR 2021)

Implementation of Neural Scene Graphs, that optimizes multiple radiance fields to represent different objects and a static scene background. Learned representations can be rendered with novel object

151 Dec 26, 2022
This repo is about to create the Streamlit application for given ML model.

HR-Attritiion-using-Streamlit This repo is about to create the Streamlit application for given ML model. Problem Statement: Managing peoples at workpl

Pavan Giri 0 Dec 10, 2021
Hcpy - Interface with Home Connect appliances in Python

Interface with Home Connect appliances in Python This is a very, very beta inter

Trammell Hudson 116 Dec 27, 2022
The Turing Change Point Detection Benchmark: An Extensive Benchmark Evaluation of Change Point Detection Algorithms on real-world data

Turing Change Point Detection Benchmark Welcome to the repository for the Turing Change Point Detection Benchmark, a benchmark evaluation of change po

The Alan Turing Institute 85 Dec 28, 2022
A Broad Study on the Transferability of Visual Representations with Contrastive Learning

A Broad Study on the Transferability of Visual Representations with Contrastive Learning This repository contains code for the paper: A Broad Study on

Ashraful Islam 29 Nov 09, 2022
Process JSON files for neural recording sessions using Medtronic's BrainSense Percept PC neurostimulator

percept_processing This code processes JSON files for streamed neural data using Medtronic's Percept PC neurostimulator with BrainSense Technology for

Maria Olaru 3 Jun 06, 2022
Yas CRNN model training - Yet Another Genshin Impact Scanner

Yas-Train Yet Another Genshin Impact Scanner 又一个原神圣遗物导出器 介绍 该仓库为 Yas 的模型训练程序 相关资料 MobileNetV3 CRNN 使用 假设你会设置基本的pytorch环境。 生成数据集 python main.py gen 训练

wormtql 18 Jan 08, 2023
A PyTorch implementation of "From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network" (ICCV2021)

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network The official code of VisionLAN (ICCV2021). VisionLAN successfully a

81 Dec 12, 2022