当前位置：网站首页>Exploration of yolact model structure

Exploration of yolact model structure

2022-07-18 06:52:00 【fegggye】

yolact It is an efficient real-time instance segmentation model produced this year , Recently, I tried to study , Hope to have a deeper understanding .

Address of thesis ：https://arxiv.org/abs/1904.02689

Source code ：https://github.com/dbolya/yolact.git

Related theories

Put one first yolact Structure diagram ：

Feature extraction network backbone May adopt resnet101,resnet50 even to the extent that vgg16 etc. . Then there is 3 Branches ,1 Branch output target location ,1 Branch outputs mask coefficient ,1 Confidence rate of classification , So what determines the goal is 4（ Location ）+k（mask coefficient ）+c（ Classification confidence rate ） Parameters .

The general steps of detection are ：

1. from backbone Remove from C3,C4,C5;

2. adopt FPN Network generation P3,P4,P5, adopt P5 Generate P6 and P7

3.P3 adopt Protonet Generate k individual 138*138 Of proto Prototype

4.P3～P7 adopt Prediction Head Each network generates W*H*a（a by anchor Count ） A place （4）,mask coefficient （k） And confidence rate information （c）：

loc：[None,W*H*a,4]

mask：[None,W*H*a,k]

conf：[None,W*H*a,81]

5. Carry out the above results FastNMS Handle

6.FastNMS And Protonet Output k individual 138*138 Of proto The prototype performs combinatorial operations （ superposition , Cutting , Threshold segmentation ） The final test result can be obtained .

Prediction Head Network structure diagram （ On the right side ）

protonet It's a convolution network , Final output 138*138*k Characteristic graph ( namely proto), Here is one protonet Structure diagram of ：

About mask coefficient , At first I didn't quite understand , It turned out mask The coefficient is used to give protonet Produced k individual proto Weighted . No matter how many targets the network detects ,protonet Metropolitan output k individual 138*138 Of proto, It's easy to understand ; Suppose the network detects 8 Goals , It can be understood that the network will produce 8 A length of k Vector , this 8 individual k The dimension vector's k The values are respectively and k individual proto Multiply , And then add up , It was generated 8 Corresponding combination results （ namely Assmebly）, The formula is as follows .

P by poroto（138*138*k）,C I.e. some 1 A goal of mask coefficient （1*k）,M It is the combination result of this goal .

Model test

The model adopted resnet50 Of backbone, namely yolact_resnet50_54_800000.pth, Please download by yourself .

1. Load related module packages

from yolact import Yolact
from utils.augmentations import BaseTransform, FastBaseTransform, Resize
from data import cfg, set_cfg, set_dataset
import numpy as np
import torch
import torch.backends.cudnn as cudnn
from torch.autograd import Variable
import os
import matplotlib.pyplot as plt
import cv2
from pylab import *
import matplotlib.patches as patches
%matplotlib inline

2. Define the relevant parameters

CONFIG='yolact_resnet50_config' # Import yolact Model assignment resnet50 Of backbone
MODEL_PATH='yolact_resnet50_54_800000.pth' # Download the pre training model 
PIC_PATH='dogs.jpeg'# Test image path 
set_cfg(CONFIG) #yolact Project specified import config Function of

# Let's take a look at this picture first 
image = plt.imread(PIC_PATH)
plt.imshow(image)

3. Load the model and predict

# Model reasoning forward The process 
def evalimage(net:Yolact, path:str, save_path:str=None):
    '''
    net: namely yolact The Internet 
    path: Given image path 
    savepath: This parameter and this function are not used for the time being     
    preds: The model is running through a graph 
    '''
    frame = torch.from_numpy(cv2.imread(path)).cuda().float()
    batch = FastBaseTransform()(frame.unsqueeze(0))
    preds = net(batch)
    return preds

# The prediction process ,preds That is, the calculation result of the model 
with torch.no_grad():
    cudnn.benchmark = True
    cudnn.fastest = True
    torch.set_default_tensor_type('torch.cuda.FloatTensor')
    net = Yolact()
    net.load_weights(MODEL_PATH)
    net.eval()
    preds=evalimage(net,PIC_PATH)

preds Is the output of the model , Let's take a look at what the model outputs ！

# View forecast results 
for key in preds[0].keys():
    if key=='class'or key=='score':
        print(key,':',preds[0][key].shape,'\t',preds[0][key])
    else:
        print(key,':',preds[0][key].shape)

--------------------------------------------------------------------------------------------------------------
mask : torch.Size([8, 32])
proto : torch.Size([138, 138, 32])
score : torch.Size([8]) 	 tensor([0.9464, 0.9457, 0.9455, 0.3396, 0.1167, 0.1127, 0.0791, 0.0610])
class : torch.Size([8]) 	 tensor([16, 16, 16, 21, 21, 18, 16, 21])
box : torch.Size([8, 4])
---------------------------------------------------------------------------------------------------------------

From the above output, we can find that the model outputs a total of 5 Branches ：

1）mask：mask coefficient , The network has detected 8 Goals , Every goal has 1 individual 32 Bit mask coefficient

2）proto：protonet Output , Here is 138*138*32 namely 32 Zhang 138*138 Characteristic graph

3）score：8 Confidence rate of targets （ Here you can see the front 3 The confidence rate is percent 94 above ）

4）class：8 Classification results of targets （ Corresponding class_id）

5）box：8 Location information of targets （x_min,y_min,x_max,y_max)

From the above information, we can know that the network has detected 8 Goals , among 3 At a high confidence rate （90%） above . Why is the detected result 8 A? ？ This is mainly due to the rapid detection of the detection results within the network NMS Handle , Finally, filter the merged results , About fast NMS I haven't delved into .

Finally, the implementation of instance segmentation mainly depends on proto,mask,box3 Output results .

4. Model output display

1）proto Exhibition （32 individual ）

proto=preds[0]['proto'].cpu().numpy()
plt.figure(figsize=(4*1.38*2,8*1.38*2))#8 That's ok 4 Column 
for i in range(32):
   plt.subplot(8, 4, i + 1)
   plt.imshow(proto[:,:,i])
   axis('off') 
plt.show()

Here you can see 32 The effect of a feature map , We can find that some of these are strengthening the prospects （1 row 2 Column ,5 row 3 Column ）, Some strengthen the background （2 row 4 Column ,8 row 4 Column ）, Some are strengthening the left （ One or two rows 3 Column ）, Some are reinforced on the right （4 row 3 Column ）;yolact be-all mask It's all based on this 32（k） This feature map is generated by different weighting methods . What determines the weighting method is mask coefficient .

2） Model output display

# The model data is converted to numpy Format 
box=preds[0]['box'].cpu().numpy()
score=preds[0]['score'].cpu().numpy()
class_id=preds[0]['class'].cpu().numpy()
mask=preds[0]['mask'].cpu().numpy()
##############################
de_num=mask.shape[0]# Determine the number of detected targets 
col=4# Exhibition 4 Column 
row=(de_num/col+0.5)# Number of display lines 
plt.figure(figsize=(col*1.38*2,row*1.38*2))
for j in range(de_num):
    result=proto*np.transpose(mask[j])# proto multiply mask coefficient 
    result = 1 / (1 + np.exp(-result))#sigmoid Handle 
    result=np.sum(result,2)# Add up 
    plt.subplot(row,col,j+1)
    title='class_id:'+str(class_id[j])+' '+str(score[j])
    plt.title(title,color='red',fontsize='large',fontweight='bold')
    plt.imshow(result)
    axis('off') 
    # Frame processing , from box Read location information in 
    currentAxis=plt.gca()
    x_min=int(box[j][0]*138)
    y_min=int(box[j][1]*138)
    x_max=int(box[j][2]*138)
    y_max=int(box[j][3]*138)
    rect=patches.Rectangle((x_min, y_min),x_max-x_min,y_max-y_min,linewidth=1,edgecolor='r',facecolor='none')
    currentAxis.add_patch(rect)
plt.show()

Final 8 Location and protonet The cumulative result of can be displayed , The final result can be obtained by cutting out the box and thresholding 2 value mask chart .

原网站

版权声明
本文为[fegggye]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/199/202207151636294435.html