当前位置:网站首页>In depth understanding of machine learning - unbalanced learning: sample sampling technology - [adasyn sampling method of manual sampling technology]
In depth understanding of machine learning - unbalanced learning: sample sampling technology - [adasyn sampling method of manual sampling technology]
2022-07-19 03:12:00 【von Neumann】
Catalogues :《 In depth understanding of machine learning 》 General catalogue
And Borderline-SMOTE The algorithm is similar ,ADASYN(Adaptive Synthetic Sampling) The algorithm is also an improved SMOTE Algorithm . The algorithm is based on 2008 in , The main idea is to make full use of the density distribution information of samples to determine the frequency of each minority sample as the main sample , Synthesize more training data for a few difficult category samples , So as to correct the negative effects caused by the unbalanced distribution of categories as much as possible .
ADASYN The algorithm first needs to determine the number of new minority samples to be generated , namely N + × SR N^+\times \text{SR} N+×SR, Then in the original training set S S S Find each minority sample on x i + , i = 1 , 2 , ⋯ , N + x_i^+, i=1, 2, \cdots, N^+ xi+,i=1,2,⋯,N+ Of K K K a near neighbor , among , The first i i i The number of nearest neighbors of the majority classes of the minority class samples is recorded as N i major N_i^\text{major} Nimajor, Then the proportion parameters of each minority sample can be determined by the following formula Γ i \Gamma_i Γi:
Γ i = N i major Z × K \Gamma_i=\frac{N_i^\text{major}}{Z\times K} Γi=Z×KNimajor
among , Z Z Z Is the standardization factor , To guarantee ∑ Γ i = 1 \sum\Gamma_i=1 ∑Γi=1. After the proportion parameter is determined , The frequency that each minority sample is selected as the main sample can be determined by the following formula :
g i = Γ i × N + × SR g_i=\Gamma_i\times N^+\times\text{SR} gi=Γi×N+×SR
It is not difficult to see from the above formula , And Borderline-SMOTE The algorithm is similar ,ADASYN The algorithm pays more attention to a few samples located near the decision boundary , They are selected as the main samples much more frequently than those located in a few decision-making areas . Of course , This will further amplify the propagation intensity of a few types of noise information ,ADASYN The specific flow of the algorithm is as follows :
ADASYN Sampling method
Input : Training set S = { ( x i , y i ) , i = 1 , 2 , ⋯ , N , y i ∈ { + , − } } S=\{(x_i, y_i), i=1, 2, \cdots, N, y_i\in\{+, -\}\} S={(xi,yi),i=1,2,⋯,N,yi∈{ +,−}}; Number of samples of most classes N − N^- N−, Number of samples of a few classes N + N^+ N+, among N − + N + = N N^-+N^+=N N−+N+=N; Unbalance ratio IR = N − N + \text{IR}=\frac{N^-}{N^+} IR=N+N−; Sampling rate SR \text{SR} SR; Proximity parameter K K K
Output : Training set after oversampling S = { ( x i , y i ) , i = 1 , 2 , ⋯ , N + N + × SR , y i ∈ { + , − } } S=\{(x_i, y_i), i=1, 2, \cdots, N+N^+\times \text{SR}, y_i\in\{+, -\}\} S={(xi,yi),i=1,2,⋯,N+N+×SR,yi∈{ +,−}}
( 1 ) From the training set S S S Take out all the samples of majority and minority classes , Make up the training sample set of most classes S − S^- S− And a few training sample sets S + S^+ S+
( 2 ) Set the newly generated sample set S New S^\text{New} SNew It's empty +
( 3 ) for i = 1 : N + i=1:N^+ i=1:N+
( 4 ) \quad stay S + S^+ S+ Find the corresponding sample in x i x_i xi
( 5 ) \quad stay S S S Find x i x_i xi Of K K K a near neighbor , Record that the number of nearest neighbors of most classes is N major N^{\text{major}} Nmajor
( 6 ) \quad Calculate its scale parameters : Γ i = N i major Z × K \Gamma_i=\frac{N_i^\text{major}}{Z\times K} Γi=Z×KNimajor
( 7 ) \quad Calculate the main sample frequency : g i = Γ i × N + × SR g_i=\Gamma_i\times N^+\times\text{SR} gi=Γi×N+×SR
( 8 ) for i = 1 : N + i=1:N^+ i=1:N+
( 9 ) \quad stay S + S^+ S+ Select a main sample randomly x i x_i xi
(10) \quad for i = 1 : g i i=1:g_i i=1:gi
(11) \qquad call SMOTE Algorithm generates master samples x i x_i xi A new sample of x i new x_i^\text{new} xinew
(12) \qquad add to x i new x_i^\text{new} xinew to S New S^\text{New} SNew: S New = S New ∪ x i new S^\text{New}=S^\text{New}\cup x_i^\text{new} SNew=SNew∪xinew
(13) return Training set after oversampling S ′ = S − ∪ S New S'=S^-\cup S^\text{New} S′=S−∪SNew
边栏推荐
- 关于XML文件(六)-与JSON的区别
- MySQL storage engine details
- 【单片机仿真】(十一)指令系统逻辑运算指令 — 逻辑与指令ANL、逻辑或指令ORL
- [MCU simulation] (XVII) control transfer instructions - call and return instructions
- Examine your investment path
- [regression prediction] lithium ion battery life prediction based on particle filter with matlab code
- Specifications、多表查询基础
- zsh: command not found: mysql
- C语言基础Day4-数组
- 【单片机仿真】(十六)控制转移类指令 — 无条件转移指令、条件转移指令
猜你喜欢

Polynomial interpolation fitting (II)

【回归预测】基于粒子滤波实现锂离子电池寿命预测附matlab代码

SysTick定时器的基础学习以及手撕代码

Letv a plus de 400 employés? Le jour de l'immortel sans patron, les autorités ont répondu...

Affine transformation implementation

Is there really no way out for functional testing? 10K capping is never a joke

Tools and methods - Excel plug-in xltools

3. Asynctool framework principle source code analysis

Several methods of face detection

RTX3090安装pytorch3D
随机推荐
Go language realizes sending SMS verification code and logging in
[regression prediction] lithium ion battery life prediction based on particle filter with matlab code
First knowledge of JPA (ORM idea, basic operation of JPA)
4. Some thoughts on asynctool framework
樂視還有400多比特員工?過著沒有老板的神仙日子 官方出來回應了...
[MCU simulation] (XVII) control transfer instructions - call and return instructions
SQL classic exercises (x30)
Redis和其他数据库的比较
[MCU simulation] (I) proteus8.9 installation tutorial
【单片机仿真】(十一)指令系统逻辑运算指令 — 逻辑与指令ANL、逻辑或指令ORL
Letv a plus de 400 employés? Le jour de l'immortel sans patron, les autorités ont répondu...
【单片机仿真】(十四)指令系统位操作类指令 — 位数据传送指令MOV、位变量修改指令
多表查询——案例练习
[MCU simulation] (IV) addressing mode register addressing and direct addressing
zsh: command not found: mysql
LETV has more than 400 employees? Living a fairy life without a boss, the official responded
【单片机仿真】(十六)控制转移类指令 — 无条件转移指令、条件转移指令
Examine your investment path
多锻炼身体有好处
Specifications、多表查询基础