当前位置:网站首页>Detailed explanation of multiple linear regression

Detailed explanation of multiple linear regression

2022-07-19 11:00:00 TT ya

Beginner little rookie , I hope it's like taking notes and recording what I've learned , Also hope to help the same entry-level people , I hope the big guys can help correct it ~ Tort made delete .

Catalog

One 、 Problem description

Two 、 Problem analysis

3、 ... and 、 solve the problem  —— look for w and b

1、 Vector form transformation

2、 Target type

3、 Derivative is 0 Come to the conclusion

4、 Final model results

Four 、 Hidden problems —— It may not be a full rank matrix

5、 ... and 、 Solutions to hidden problems —— Regularization

1、L1 Regularization ——Lasso Return to

2、L2 Regularization —— Ridge return

6、 ... and 、 Changes and applications of linear regression

7、 ... and 、python Realization

1、 Multiple linear regression

2、 Ridge return

3、lasso Return to

8、 ... and 、 Linear model —— Regression problem classification problem


One 、 Problem description

We now have a data set on hand D: Each sample is composed of d Attributes to describe , namely  \boldsymbol{x} = (x_{1};x_{2};...;x_{d}), among x_{i} Is the sample x In the i Values on attributes . And each sample x_{i} The final corresponding result value is y_{i}.

Now comes a new sample \boldsymbol{x}_{j}, Want to know its result value y_{j}


Two 、 Problem analysis

We need data sets D To find a linear model for \boldsymbol{x}_{j} forecast y_{j}, Find out f(\boldsymbol{x}_{i})=w^{T}\boldsymbol{x}_{i}+b The right one w and b.


3、 ... and 、 solve the problem  —— look for w and b

We can use the least square method to solve this problem .

1、 Vector form transformation

First turn on the w and b Synthesize a vector form \hat{\boldsymbol{w}} = (w;b), The size is (d+1)* 1;

Then rewrite the data matrix X:X= \begin{pmatrix} \boldsymbol{x}_{1} & \boldsymbol{x}_{2}&...&\boldsymbol{x}_{m}\\ 1 & 1 & ...& 1\end{pmatrix}, The size is (d+1)*  m.

Then mark y Also written as vector pattern : \boldsymbol{y} = (y_{1};y_{2};...;y_{m})

2、 Target type

 \underset{\hat{w}}{arg min}(\boldsymbol{y}-\hat{w}^{T}X)(\boldsymbol{y}-\hat{w}^{T}X)^{T}

Make  E = (\boldsymbol{y}-\hat{w}^{T}X)(\boldsymbol{y}-\hat{w}^{T}X)^{T}

3、 Derivative is 0 Come to the conclusion

\frac{\partial E}{\partial \hat{w}} = 2X(X^{T}\hat{w}-\boldsymbol{y})=0\rightarrow \hat{w}=(XX^{T})^{-1}X\boldsymbol{y}  

4、 Final model results

\hat{\boldsymbol{x}_{i}} = (\boldsymbol{x}_{i};1)

f(\hat{\boldsymbol{x}_{i}}) = \hat{\boldsymbol{x}_{i}}(XX^{T})^{-1}X\boldsymbol{y}


Four 、 Hidden problems ——XX^{T} It may not be a full rank matrix

XX^{T} It may not be a full rank matrix , There will be more than one \hat{w} Optimal solution , Which solution should be chosen as \hat{w}

for instance : The number of samples is small , There are many characteristic attributes , It has even exceeded the number of samples , So at this time XX^{T} Not a full rank matrix , Can solve multiple \hat{w} Solution .


5、 ... and 、 Solutions to hidden problems —— Regularization

The role of regularization is to select models with less empirical risk and model complexity

1、L1 Regularization ——Lasso Return to

Add a term after the objective function  \lambda \sum_{i=1}^{d}|w_{i}|

be , The objective function becomes  \underset{\hat{w}}{arg min}((\boldsymbol{y}-\hat{w}^{T}X)(\boldsymbol{y}-\hat{w}^{T}X)^{T}+\lambda \sum_{i=1}^{d}|w_{i}|)

The first is the experience risk mentioned above , The second control is the complexity of the model .

among \lambda>0, Control the punishment :\lambda \rightarrow \infty ,\hat{w}\rightarrow 0;\lambda \rightarrow 0 ,\hat{w}\rightarrow (XX^{T})^{-1}X\boldsymbol{y}

This is also called Lasso Return to .

As shown in the figure below ( Suppose there are only two properties ):L1 The regularized square error term isoline and the regularized isoline often intersect on the coordinate axis , This means discarding one of these attributes , It embodies the characteristics of feature selection , It is easier to get sparse solution ( Compared with the following L2 Regularization )—— Namely obtained W There will be fewer non-zero values in the vector .

2、L2 Regularization —— Ridge return

Add a term after the objective function \lambda \sum_{i=1}^{d}w_{i}^{2}

be , The objective function becomes  \underset{\hat{w}}{arg min}((\boldsymbol{y}-\hat{w}^{T}X)(\boldsymbol{y}-\hat{w}^{T}X)^{T}+\lambda \sum_{i=1}^{d}w_{i}^{2})

This is also known as ridge regression .

L2 Regularization uniform selection parameters , Let all coefficients of the fitting curve be the same , Although it failed to reduce the number of items , But the coefficients are balanced , This is in principle related to L1 Regularization is different .


6、 ... and 、 Changes and applications of linear regression

If the model of the problem is not linear regression , Then try to make the predicted value of the model approximate y Derivatives of .

such as —— Log linear regression  lny_{i}=w^{T}\boldsymbol{x}_{i}+b

A more general :y=g^{-1}(w^{T}\boldsymbol{\boldsymbol{x}}+b), This is called generalized linear model .


7、 ... and 、python Realization

1、 Multiple linear regression

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.3,random_state=1)//x,y They are divided into attribute data and tag data 
model = LinearRegression()
model.fit(X_train, Y_train)
score = model.score(X_test, Y_test)
print(' Model test score :'+str(score))
Y_pred = model.predict(X_test)
print(Y_pred)

2、 Ridge return

from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.3,random_state=1)//x,y They are divided into attribute data and tag data 
model = Ridge(alpha=1)
model.fit(X_train, Y_train)
score = model.score(X_test, Y_test)
print(' Model test score :'+str(score))
Y_pred = model.predict(X_test)
print(Y_pred)

3、lasso Return to

from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.3,random_state=1)//x,y They are divided into attribute data and tag data 
model = Lasso(alpha=0.1)
model.fit(X_train, Y_train)
score = model.score(X_test, Y_test)
print(' Model test score :'+str(score))
Y_pred = model.predict(X_test)
print(Y_pred)

8、 ... and 、 Linear model —— The return question \rightarrow Classification problem

All of the above are linear models used to solve regression problems , In fact, linear models can also be used to solve classification problems —— Logical regression ( Log probability regression ).

See Logical regression (Logistic Regression)_tt Ya's blog -CSDN Blog _ Logical regression csdn


You are welcome to criticize and correct in the comment area , Thank you. ~

原网站

版权声明
本文为[TT ya]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/200/202207171238326707.html