当前位置:网站首页>Comparative learning loss function (rince/relic/relicv2)
Comparative learning loss function (rince/relic/relicv2)
2022-07-19 05:47:00 【byzy】
The loss function commonly used in comparative learning is InfoNCE.
One 、Robust Contrastive Learning against Noisy Views(RINCE)
Link to the original text :https://arxiv.org/pdf/2201.04309.pdf
In this paper, RINCE, A kind of noise ( Such as the excessive enlargement of images 、 Over dubbing of video 、 The video and title are not aligned ) Loss of robustness , And there is no need to explicitly estimate the noise .
The article mentions :RINCE It's a use. Wasserstein dependency measure The comparative lower bound of mutual information expressed ; and InfoNCE yes KL The lower bound of mutual information expressed by divergence .
Let the data distribution be
, The noise data set is
, The correct label means
The probability of is
. Then the goal is to minimize
![R_l^\eta(f)=\mathbb{E}_{D_\eta}[l(f(x),\hat{y})]](http://img.inotgo.com/imagesLocal/202207/19/202207170508421352_41.gif)
among
Is the binary cross entropy loss .
Symmetric loss function is robust to noise . Loss function
Symmetry satisfies
(1)
(
Constant ), among
by
The resulting forecast score (
Yes
The gradient of should also have symmetry ).
The symmetrical contrast loss has the following form :
(2)
among
and
They are positive / The score of negative sample pairs . The weight
The size of reflects the relative importance of positive and negative samples .
InfoNCE Not satisfied with
Symmetry condition in gradient .
RINCE The losses are as follows :

among
and
Both in
Within the scope of ( Experiments show that
The value of is not sensitive ).
when ,RINCE Completely satisfied (2) Symmetry of formula ( here , Satisfy (1) Formula and
):
At this time, the loss is robust to noise .
Tend to 0 when ,RINCE Asymptotically tends to InfoNCE.
Regardless of
Why is it worth , The higher the positive sample score , The lower the negative sample score , The smaller the loss .
In gradient calculation :
when ,RINCE Pay more attention to easy-positive The sample of ( Positive samples with high scores );
InfoNCE(
) Pay more attention to hard-positive The sample of ( Positive samples with low scores );
Both pay attention to hard-negative The sample of ( Negative samples with high scores ).
therefore InfoNCE Convergence is faster when there is no noise , and
Of RINCE More robust to noise . In fact, in
Within the scope of
.
Two 、Representation learning via invariant casual mechanisms(ReLIC)
Link to the original text :https://arxiv.org/pdf/2010.07922.pdf
This paper divides the data into content and style ( For example, to classify whether the image is a dog , The dog in the image is the content , And background 、 Lighting and other factors are style ), The expression learned should only be related to the content .
Adopt the scheme of data expansion to change the style while retaining the content ( Like image rotation 、 Change the grayscale 、 Cropping and panning ), Form a positive sample .

Upper form
Represents a neural network ;
And
relevant , Often take
;
,
It's a full connection layer .
Expand for a pair ;
namely
.
The previous item is the usual comparison loss , The latter one is the augmented invariance penalty ( Or invariance loss )( That is, the content should not be changed as much as possible ), This item can reduce the distance between classes .
This paper also explains the reasons for the success of self supervised learning , It proves that : Set the downstream task set
, And the mission
Than
All tasks in are more detailed . If you pass
The expression learned is only related to the content , Then this expression can be generalized to
All downstream tasks in .
3、 ... and 、Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?(ReLICv2)
Link to the original text :https://arxiv.org/pdf/2201.05119.pdf
ReLICv2 and ReLIC The loss function of is similar :

there
Namely ReLIC The first term in the formula
Later content , and
Namely ReLIC The second term of the formula in :
Express stop gradient, The degenerate solution can be avoided in the optimization process .
and ReLIC The difference lies in choosing positive and negative samples : Positive samples are generated by multi-crop augmentation And significance based background removal , And then use the standard SimCLR Expansion plan of ( see Overview of self supervised learning Some examples of comparative learning 2); Negative samples can be used hard-negative sampling, But this article is in batch It is evenly and randomly selected .
attach : Two discrete probability distributions KL The divergence :

Usually
For the true distribution of data ,
Is the predicted distribution of data .
边栏推荐
- Bottomsheetdialogfragment imitation Tiktok comment box
- Pointnet++ code explanation (III): query_ ball_ Point function
- CV-Model【2】:Alexnet
- 多模态融合方法总结
- In VS, error c4996: 'scanf': this function or variable may be unsafe Solutions.
- 【语音识别入门】基础概念与框架
- MySQL queries the data of the current day, this week, this month and last month
- Selective Kernel Networks方法简单整理
- Application and principle of throttle/debounce
- 8种视觉Transformer整理(下)
猜你喜欢
随机推荐
DEEP JOINT TRANSMISSION-RECOGNITION FOR POWER-CONSTRAINED IOT DEVICES
Pointnet++ code explanation (VI): pointnetsetabstraction layer
Wechat applet password display hidden (small eyes)
8种视觉Transformer整理(下)
基于bert的情感分类
Bottomsheetdialogfragment imitation Tiktok comment box
用facenet源码进行人脸识别测试过程中的一些问题
Selective Kernel Networks方法简单整理
JVM learning
PyTorch学习笔记【5】:使用卷积进行泛化
Common components of wechat applet
Class file format understanding
PyTorch学习笔记【4】:从图像学习
Geo_CNN(Tensorflow版本)
Pointnet++ code explanation (V): Sample_ and_ Group function and samle_ and_ group_ All function
5.1 business data acquisition channel construction of data acquisition channel construction
Page navigation of wechat applet
运行基于MindSpore的yolov5流程记录
Unable to determine Electron version. Please specify an Electron version
Pointnet++ code explanation (IV): index_ Points function
Symmetry condition in gradient .
):
) Pay more attention to hard-positive The sample of ( Positive samples with low scores );
, And the mission
Namely ReLIC The first term in the formula
Later content , and 
Express stop gradient, The degenerate solution can be avoided in the optimization process .








