当前位置：网站首页>Pytorch depth separable convolution and mobilenet_ v1

Pytorch depth separable convolution and mobilenet_ v1

2022-07-18 21:10:00 【Jiang Junze】

Pytorch Depth separable convolution and MobileNet_v1

1. Depth separates the convolution
2. Advantage innovation
3. Network structure
pytorch Realization

1. Depth separates the convolution

Deep separable convolution presents a new idea ： For different inputs channel Adopt different convolution kernels for convolution , It decomposes the ordinary convolution operation into two processes .

Insert picture description here

Convolution process

Suppose there is $\times H \times W \times C$ The input of , At the same time there is k individual $\times 3$ Convolution of . If you set pad=1 And stride=1 , Then the ordinary convolution output is $\times H \times W \times k$

Insert picture description here

Depthwise The process

Depthwise It means to be $\times H \times W \times C$ The input of is divided into $g r o u p = C$ Group , Then each group does $\times 3$ Convolution . This is equivalent to collecting each Channel The spatial features of , namely Depthwise features

Pointwise The process

Pointwise It means right $\times H \times W \times C$ Input to do k A common one $1 * 1$ Convolution . This is equivalent to collecting the characteristics of each point , namely Pointwise features .Depthwise+Pointwise The final output is also $\times H \times W \times k$

2. Advantage innovation

Depthwise+Pointwise It can be approximately regarded as a convolution ：

Ordinary convolution ：3x3 Conv+BN+ReLU
Mobilenet Convolution ：3x3 Depthwise Conv+BN+ReLU and 1x1 Pointwise Conv+BN+ReLU

Computational acceleration

The parameter quantity decreases

Suppose the number of input channels is 3, The number of output channels is required to be 256, Two ways ：

Go straight to the next 3×3×256 Convolution kernel , The parameter is ：3×3×3×256 = 6,912
DW operation , Complete in two steps , The parameter is ：3×3×3+3×1×1×256 = 795（3 Characteristic layers *（3*3 Convolution kernel ））, The convolution depth parameter is usually taken as 1

The number of multiplication operations is reduced

Compare the multiplications of different convolutions ：

The calculation amount of ordinary convolution is ： $H\times W\times C\times k \times 3 \times3$
Depthwise The amount of calculation is ： $\times W \times C \times 3 \times 3$
Pointwise The amount of calculation is ： $\times W \times C \times k$

adopt Depthwise+Pointwise The split , It is equivalent to compressing the calculation amount of ordinary convolution into ：
$\frac{depthwise+pointwise}{conv}=\frac{H \times W \times C \times 3 \times 3+H \times W \times C \times k}{H \times W \times C \times k \times 3 \times 3}=\frac{1}{k}+\frac{1}{3 \times 3}$

Channel area separation

Deep separable convolution takes into account both channel and region changes in previous ordinary convolution operations （ Convolution first considers only regions , Then consider the passage ）, The separation of channel and region is realized .

3. Network structure

Mobilenet v1 Use depth separable convolution to accelerate , Its structure is as follows

First, go through a step of 2 Of 3*3 Traditional convolution layer for feature extraction
Then through a series of deep separable convolutions （DW+PW Convolution ） Feature extraction
Finally, it passes through the average pool layer 、 Fully connected layer , And what happened softmax Function to get the final output value .

Insert picture description here

pytorch Realization

import torch
import torch.nn as nn


def conv_bn(in_channel, out_channel, stride = 1):
    """  Traditional convolution block ：Conv+BN+Act """
    return nn.Sequential(
        nn.Conv2d(in_channel, out_channel, 3, stride, 1, bias=False),
        nn.BatchNorm2d(out_channel),
        nn.ReLU6(inplace=True)
    )
    
def conv_dsc(in_channel, out_channel, stride = 1):
    """  Depth separates the convolution ：DW+BN+Act + Conv+BN+Act """
    return nn.Sequential(
        nn.Conv2d(in_channel, in_channel, 3, stride, 1, groups=in_channel, bias=False),
        nn.BatchNorm2d(in_channel),
        nn.ReLU6(inplace=True),

        nn.Conv2d(in_channel, out_channel, 1, 1, 0, bias=False),
        nn.BatchNorm2d(out_channel),
        nn.ReLU6(inplace=True),
    )

class MobileNetV1(nn.Module):
    def __init__(self,in_dim=3, num_classes=1000):
        super(MobileNetV1, self).__init__()
        self.num_classes = num_classes
        self.stage1 = nn.Sequential(
            
            conv_bn(in_dim, 32, 2),
            conv_dsc(32, 64, 1), 

            
            conv_dsc(64, 128, 2),
            conv_dsc(128, 128, 1),

            
            conv_dsc(128, 256, 2),
            conv_dsc(256, 256, 1), 
        )
            
        self.stage2 = nn.Sequential(
            conv_dsc(256, 512, 2),
            conv_dsc(512, 512, 1),
            conv_dsc(512, 512, 1),
            conv_dsc(512, 512, 1), 
            conv_dsc(512, 512, 1),
            conv_dsc(512, 512, 1),
        )
            
        self.stage3 = nn.Sequential(
            conv_dsc(512, 1024, 2),
            conv_dsc(1024, 1024, 1),
        )

        self.avg = nn.AdaptiveAvgPool2d((1,1))
        self.fc = nn.Linear(1024, self.num_classes)

    def forward(self, x):
        x = self.stage1(x)
        x = self.stage2(x)
        x = self.stage3(x)
        x = self.avg(x)
        x = x.view(-1, 1024)
        x = self.fc(x)
        return x