当前位置:网站首页>7 kinds of visual MLP finishing (Part 1)
7 kinds of visual MLP finishing (Part 1)
2022-07-19 05:47:00 【byzy】
If vision Transformer Removing the MSA part , Whether the performance can reach the same level ? Or just use MLP Whether it is feasible to realize the visual task ? Thus taking into account the visual MLP.
One 、EANet(External Attention)
Link to the original text :https://arxiv.org/pdf/2105.02358.pdf



among
and
Is a learnable parameter , Independent of input .Norm by double normalization( Line and column respectively ):

Two 、MLP-Mixer
Link to the original text :https://arxiv.org/pdf/2105.01601.pdf
Mixer Layer

among MLP Double layer , There are GELU Activation function .
Network structure
Divide the image into non overlapping patch, Then project the dimension as
, obtain
Input to Mixer in .Mixer contain 2 individual MLP, The first one acts on the column ( All columns share parameters ), The second one works on rows ( All rows share parameters ).

Mixer The formula (
by patch Number )

Mixer not used position embedding, because token-mixing MLP For input token Order sensitive , It is possible to learn location information .
3、 ... and 、CycleMLP
Link to the original text :https://arxiv.org/pdf/2107.10224.pdf
Actually, it is the above MLP-Mixer Improvement .
Tradition MLP The main problem :(1) Space dimension MLP Unable to adapt to different input sizes ;(2)channel Dimensional MLP Cannot capture spatial interaction .
Model structure

Patch Embedding
Use a size of 7 The window of ( step 4) Divide the pictures into overlapping patch. And then patch Get high-dimensional features through linear layers .
Different stage There are transition part , Reduce token Number , increase channel dimension .
CycleMLP block

Channel MLP by 2 A linear layer (channel FC)+GELU.Channel FC It has nothing to do with the size of the input image , But only 1 Pixel .
And traditional MLP comparison ,Cycle MLP Used Cycle FC layer , send MLP The class model can handle input images of different sizes .Cycle FC Used 3 A parallel Cycle FC operator.

Cycle FC Output ( To feel the size of the field ):


Pseudonucleus
The area obtained by projecting the sampling points onto the spatial plane .
Four 、gMLP
Link to the original text :https://arxiv.org/pdf/2105.08050.pdf


gMLP(g Express gating) contain
The same block , Each block is as follows :

among
Is the activation function ,
Capture spatial interactions ( When
It is an ordinary double-layer MLP),
To multiply by elements . Models don't need position embedding, Because it can be
capture .
The simplest choice to capture spatial interactions is the linear layer :

here
go by the name of SGU(spatial gating unit). It's kind of like SE( see 5 Kind of 2D Attention Arrangement The third one in ), Just turn the pool into a linear layer .
The equally effective method is , take
Along the channel In two parts and
, then


In addition, it can be used in SGU Add a micro attention mechanism , The corresponding model is called aMLP.
边栏推荐
- Simple application of COAP in andorid
- Wechat applet password display hidden (small eyes)
- INRIAPerson数据集转化为yolo训练格式并可视化
- 软件过程与管理复习(七)
- Pointnet++代码详解(一):farthest_point_sample函数
- DEEP JOINT TRANSMISSION-RECOGNITION FOR POWER-CONSTRAINED IOT DEVICES
- Pointnet++ code explanation (III): query_ ball_ Point function
- Page navigation of wechat applet
- PCM silent detection
- MySQL learning notes (5) -- join join table query, self join query, paging and sorting, sub query and nested query
猜你喜欢

配置tabBar和request网络数据请求

JNA加载DLL及在jar中的运用

软件过程与管理复习(八)

3. Neusoft cross border e-commerce data warehouse project architecture design

PyTorch学习笔记【4】:从图像学习

基于bert的情感分类

4. Neusoft cross border e-commerce data warehouse project - user behavior data acquisition channel construction of data acquisition channel construction (2022.6.1-2022.6.4)

简单Web服务器程序设计与实现

【语音识别入门】基础概念与框架

Pointnet++ code explanation (III): query_ ball_ Point function
随机推荐
自监督学习概述
Paddle的OCR标签转化为TXT格式
配置tabBar和request网络数据请求
Dlib library and Dat file address
Solve idea new module prompt module XXXX does exits
9. Dim layer construction of data warehouse construction
Wxml template syntax in wechat applet
软件过程与管理总复习
The widerperson data set is transformed into yolov5 training format and added to crowdhuman
Wechat applet password display hidden (small eyes)
微信小程序的页面导航
Time complexity and space complexity of the model
Kotlin scope function
2021-05-21
C language - bubble sort
Some problems in face recognition testing with facenet source code
CV学习笔记【2】:卷积与Conv2d
对Crowdhuman数据集处理,根据生成的train.txt分离数据集
Edge AI edge Intelligence: Communication EF "city edge ai: algorithms and systems (to be continued)
用C语言实现猜数游戏