当前位置:网站首页>Exploration of yolact model structure
Exploration of yolact model structure
2022-07-18 06:52:00 【fegggye】
yolact It is an efficient real-time instance segmentation model produced this year , Recently, I tried to study , Hope to have a deeper understanding .
Address of thesis :https://arxiv.org/abs/1904.02689
Source code :https://github.com/dbolya/yolact.git
Related theories
Put one first yolact Structure diagram :

Feature extraction network backbone May adopt resnet101,resnet50 even to the extent that vgg16 etc. . Then there is 3 Branches ,1 Branch output target location ,1 Branch outputs mask coefficient ,1 Confidence rate of classification , So what determines the goal is 4( Location )+k(mask coefficient )+c( Classification confidence rate ) Parameters .
The general steps of detection are :
1. from backbone Remove from C3,C4,C5;
2. adopt FPN Network generation P3,P4,P5, adopt P5 Generate P6 and P7
3.P3 adopt Protonet Generate k individual 138*138 Of proto Prototype
4.P3~P7 adopt Prediction Head Each network generates W*H*a(a by anchor Count ) A place (4),mask coefficient (k) And confidence rate information (c):
loc:[None,W*H*a,4]
mask:[None,W*H*a,k]
conf:[None,W*H*a,81]
5. Carry out the above results FastNMS Handle
6.FastNMS And Protonet Output k individual 138*138 Of proto The prototype performs combinatorial operations ( superposition , Cutting , Threshold segmentation ) The final test result can be obtained .
Prediction Head Network structure diagram ( On the right side )

protonet It's a convolution network , Final output 138*138*k Characteristic graph ( namely proto), Here is one protonet Structure diagram of :

About mask coefficient , At first I didn't quite understand , It turned out mask The coefficient is used to give protonet Produced k individual proto Weighted . No matter how many targets the network detects ,protonet Metropolitan output k individual 138*138 Of proto, It's easy to understand ; Suppose the network detects 8 Goals , It can be understood that the network will produce 8 A length of k Vector , this 8 individual k The dimension vector's k The values are respectively and k individual proto Multiply , And then add up , It was generated 8 Corresponding combination results ( namely Assmebly), The formula is as follows .

P by poroto(138*138*k),C I.e. some 1 A goal of mask coefficient (1*k),M It is the combination result of this goal .
Model test
The model adopted resnet50 Of backbone, namely yolact_resnet50_54_800000.pth, Please download by yourself .
1. Load related module packages
from yolact import Yolact
from utils.augmentations import BaseTransform, FastBaseTransform, Resize
from data import cfg, set_cfg, set_dataset
import numpy as np
import torch
import torch.backends.cudnn as cudnn
from torch.autograd import Variable
import os
import matplotlib.pyplot as plt
import cv2
from pylab import *
import matplotlib.patches as patches
%matplotlib inline2. Define the relevant parameters
CONFIG='yolact_resnet50_config' # Import yolact Model assignment resnet50 Of backbone
MODEL_PATH='yolact_resnet50_54_800000.pth' # Download the pre training model
PIC_PATH='dogs.jpeg'# Test image path
set_cfg(CONFIG) #yolact Project specified import config Function of # Let's take a look at this picture first
image = plt.imread(PIC_PATH)
plt.imshow(image)
3. Load the model and predict
# Model reasoning forward The process
def evalimage(net:Yolact, path:str, save_path:str=None):
'''
net: namely yolact The Internet
path: Given image path
savepath: This parameter and this function are not used for the time being
preds: The model is running through a graph
'''
frame = torch.from_numpy(cv2.imread(path)).cuda().float()
batch = FastBaseTransform()(frame.unsqueeze(0))
preds = net(batch)
return preds# The prediction process ,preds That is, the calculation result of the model
with torch.no_grad():
cudnn.benchmark = True
cudnn.fastest = True
torch.set_default_tensor_type('torch.cuda.FloatTensor')
net = Yolact()
net.load_weights(MODEL_PATH)
net.eval()
preds=evalimage(net,PIC_PATH)preds Is the output of the model , Let's take a look at what the model outputs !
# View forecast results
for key in preds[0].keys():
if key=='class'or key=='score':
print(key,':',preds[0][key].shape,'\t',preds[0][key])
else:
print(key,':',preds[0][key].shape)--------------------------------------------------------------------------------------------------------------
mask : torch.Size([8, 32])
proto : torch.Size([138, 138, 32])
score : torch.Size([8]) tensor([0.9464, 0.9457, 0.9455, 0.3396, 0.1167, 0.1127, 0.0791, 0.0610])
class : torch.Size([8]) tensor([16, 16, 16, 21, 21, 18, 16, 21])
box : torch.Size([8, 4])
---------------------------------------------------------------------------------------------------------------
From the above output, we can find that the model outputs a total of 5 Branches :
1)mask:mask coefficient , The network has detected 8 Goals , Every goal has 1 individual 32 Bit mask coefficient
2)proto:protonet Output , Here is 138*138*32 namely 32 Zhang 138*138 Characteristic graph
3)score:8 Confidence rate of targets ( Here you can see the front 3 The confidence rate is percent 94 above )
4)class:8 Classification results of targets ( Corresponding class_id)
5)box:8 Location information of targets (x_min,y_min,x_max,y_max)
From the above information, we can know that the network has detected 8 Goals , among 3 At a high confidence rate (90%) above . Why is the detected result 8 A? ? This is mainly due to the rapid detection of the detection results within the network NMS Handle , Finally, filter the merged results , About fast NMS I haven't delved into .
Finally, the implementation of instance segmentation mainly depends on proto,mask,box3 Output results .
4. Model output display
1)proto Exhibition (32 individual )
proto=preds[0]['proto'].cpu().numpy()
plt.figure(figsize=(4*1.38*2,8*1.38*2))#8 That's ok 4 Column
for i in range(32):
plt.subplot(8, 4, i + 1)
plt.imshow(proto[:,:,i])
axis('off')
plt.show()
Here you can see 32 The effect of a feature map , We can find that some of these are strengthening the prospects (1 row 2 Column ,5 row 3 Column ), Some strengthen the background (2 row 4 Column ,8 row 4 Column ), Some are strengthening the left ( One or two rows 3 Column ), Some are reinforced on the right (4 row 3 Column );yolact be-all mask It's all based on this 32(k) This feature map is generated by different weighting methods . What determines the weighting method is mask coefficient .
2) Model output display
# The model data is converted to numpy Format
box=preds[0]['box'].cpu().numpy()
score=preds[0]['score'].cpu().numpy()
class_id=preds[0]['class'].cpu().numpy()
mask=preds[0]['mask'].cpu().numpy()
##############################
de_num=mask.shape[0]# Determine the number of detected targets
col=4# Exhibition 4 Column
row=(de_num/col+0.5)# Number of display lines
plt.figure(figsize=(col*1.38*2,row*1.38*2))
for j in range(de_num):
result=proto*np.transpose(mask[j])# proto multiply mask coefficient
result = 1 / (1 + np.exp(-result))#sigmoid Handle
result=np.sum(result,2)# Add up
plt.subplot(row,col,j+1)
title='class_id:'+str(class_id[j])+' '+str(score[j])
plt.title(title,color='red',fontsize='large',fontweight='bold')
plt.imshow(result)
axis('off')
# Frame processing , from box Read location information in
currentAxis=plt.gca()
x_min=int(box[j][0]*138)
y_min=int(box[j][1]*138)
x_max=int(box[j][2]*138)
y_max=int(box[j][3]*138)
rect=patches.Rectangle((x_min, y_min),x_max-x_min,y_max-y_min,linewidth=1,edgecolor='r',facecolor='none')
currentAxis.add_patch(rect)
plt.show()
Final 8 Location and protonet The cumulative result of can be displayed , The final result can be obtained by cutting out the box and thresholding 2 value mask chart .
边栏推荐
- resnet50结构图
- RuntimeWarning: overflow encountered in long_scalars h = 12.0 / (totaln * (totaln + 1)) * ssbn - 3
- Medical document OCR recognition + knowledge base verification, enabling insurance intelligent claim settlement
- Redis介绍和安装
- Codeforces Round #100 E. New Year Garland & 2021 CCPC Subpermutation
- 【pytorch】|transforms.FiveCrop
- C语言 栈的链表实现
- C语言 第八章 数组
- leetcode:240. Search 2D matrix II
- Notes on custom types such as structs, enumerations, unions, etc
猜你喜欢

2pc and 3pc of consistency agreement

【300+精选大厂面试题持续分享】大数据运维尖刀面试题专栏(四)

C语言 第七章 预处理

MP4 file introduction
![[leetcode] 26. Delete duplicates in the ordered array](/img/b5/0ff28650bf2a8a4cfde2b6649f95fe.png)
[leetcode] 26. Delete duplicates in the ordered array
![[proper noun]](/img/c1/e098ab0678e357c30999d9cf889e9b.png)
[proper noun]

探秘ZGC

Empowering new industries and creating a new future | Tupo software was invited to participate in the Xiamen Industrial Expo

Application of Tupu web visualization engine in simulation analysis field

FFmpeg 简介
随机推荐
R language uses LM function to build regression model, uses BoxCox function of mass package to find the best power transformation to improve model fitting, visualize BoxCox curve and obtain the best l
"Immersive" accommodation experience - new bottle of hotel, old wine of B & B
Segment tree beats~
room android sqlite
FFmpeg 简介
[leetcode] 26. Delete duplicates in the ordered array
Gurobi——GRBModel
MP4 file introduction
MySQL-自增、索引、外键、其他操作
Medical document OCR recognition + knowledge base verification, enabling insurance intelligent claim settlement
YOLOv3训练数据处理解析
IDEA类文档注释模板设置
tensorflow图像数据增强预处理
Among the top 50 intelligent operation and maintenance enterprises in 2022, Borui data strength was selected
leetcode:330. 按要求补齐数组
G1 is so strong, are you sure you don't know?
Introduction to ffmpeg
基于OSQP的二次规划
"Equal sign" commonly used in physics \ mathematics
C语言 栈的顺序表实现