当前位置：网站首页>Detailed explanation of multiple linear regression

Detailed explanation of multiple linear regression

2022-07-19 11:00:00 【TT ya】

Beginner little rookie , I hope it's like taking notes and recording what I've learned , Also hope to help the same entry-level people , I hope the big guys can help correct it ~ Tort made delete .

Catalog

One 、 Problem description

Two 、 Problem analysis

3、 ... and 、 solve the problem —— look for w and b

1、 Vector form transformation

2、 Target type

3、 Derivative is 0 Come to the conclusion

4、 Final model results

Four 、 Hidden problems —— It may not be a full rank matrix

5、 ... and 、 Solutions to hidden problems —— Regularization

1、L1 Regularization ——Lasso Return to

2、L2 Regularization —— Ridge return

6、 ... and 、 Changes and applications of linear regression

7、 ... and 、python Realization

1、 Multiple linear regression

2、 Ridge return

3、lasso Return to

8、 ... and 、 Linear model —— Regression problem classification problem

One 、 Problem description

We now have a data set on hand D： Each sample is composed of d Attributes to describe , namely $\boldsymbol{x} = (x_{1};x_{2};...;x_{d})$ , among $x_{i}$ Is the sample x In the i Values on attributes . And each sample $x_{i}$ The final corresponding result value is $y_{i}$ .

Now comes a new sample $\boldsymbol{x}_{j}$ , Want to know its result value $y_{j}$

Two 、 Problem analysis

We need data sets D To find a linear model for $\boldsymbol{x}_{j}$ forecast $y_{j}$ , Find out $f(\boldsymbol{x}_{i})=w^{T}\boldsymbol{x}_{i}+b$ The right one w and b.

3、 ... and 、 solve the problem —— look for w and b

We can use the least square method to solve this problem .

1、 Vector form transformation

First turn on the w and b Synthesize a vector form $\hat{\boldsymbol{w}} = (w;b)$ , The size is （d+1）* 1;

Then rewrite the data matrix X： $X= \begin{pmatrix} \boldsymbol{x}_{1} & \boldsymbol{x}_{2}&...&\boldsymbol{x}_{m}\\ 1 & 1 & ...& 1\end{pmatrix}$ , The size is （d+1）* m.

Then mark y Also written as vector pattern ： $\boldsymbol{y} = (y_{1};y_{2};...;y_{m})$

2、 Target type

$\underset{\hat{w}}{arg min}(\boldsymbol{y}-\hat{w}^{T}X)(\boldsymbol{y}-\hat{w}^{T}X)^{T}$

Make $E = (\boldsymbol{y}-\hat{w}^{T}X)(\boldsymbol{y}-\hat{w}^{T}X)^{T}$

3、 Derivative is 0 Come to the conclusion

$\frac{\partial E}{\partial \hat{w}} = 2X(X^{T}\hat{w}-\boldsymbol{y})=0\rightarrow \hat{w}=(XX^{T})^{-1}X\boldsymbol{y}$

4、 Final model results

$\hat{\boldsymbol{x}_{i}} = (\boldsymbol{x}_{i};1)$

$f(\hat{\boldsymbol{x}_{i}}) = \hat{\boldsymbol{x}_{i}}(XX^{T})^{-1}X\boldsymbol{y}$

Four 、 Hidden problems —— $XX^{T}$ It may not be a full rank matrix

$XX^{T}$ It may not be a full rank matrix , There will be more than one $\hat{w}$ Optimal solution , Which solution should be chosen as $\hat{w}$

for instance ： The number of samples is small , There are many characteristic attributes , It has even exceeded the number of samples , So at this time $XX^{T}$ Not a full rank matrix , Can solve multiple $\hat{w}$ Solution .

5、 ... and 、 Solutions to hidden problems —— Regularization

The role of regularization is to select models with less empirical risk and model complexity

1、L1 Regularization ——Lasso Return to

Add a term after the objective function $\lambda \sum_{i=1}^{d}|w_{i}|$

be , The objective function becomes $\underset{\hat{w}}{arg min}((\boldsymbol{y}-\hat{w}^{T}X)(\boldsymbol{y}-\hat{w}^{T}X)^{T}+\lambda \sum_{i=1}^{d}|w_{i}|)$

The first is the experience risk mentioned above , The second control is the complexity of the model .

among $\lambda>0$ , Control the punishment ： $\lambda \rightarrow \infty ,\hat{w}\rightarrow 0$ ; $\lambda \rightarrow 0 ,\hat{w}\rightarrow (XX^{T})^{-1}X\boldsymbol{y}$

This is also called Lasso Return to .

As shown in the figure below （ Suppose there are only two properties ）：L1 The regularized square error term isoline and the regularized isoline often intersect on the coordinate axis , This means discarding one of these attributes , It embodies the characteristics of feature selection , It is easier to get sparse solution （ Compared with the following L2 Regularization ）—— Namely obtained W There will be fewer non-zero values in the vector .

2、L2 Regularization —— Ridge return

Add a term after the objective function $\lambda \sum_{i=1}^{d}w_{i}^{2}$

be , The objective function becomes $\underset{\hat{w}}{arg min}((\boldsymbol{y}-\hat{w}^{T}X)(\boldsymbol{y}-\hat{w}^{T}X)^{T}+\lambda \sum_{i=1}^{d}w_{i}^{2})$

This is also known as ridge regression .

L2 Regularization uniform selection parameters , Let all coefficients of the fitting curve be the same , Although it failed to reduce the number of items , But the coefficients are balanced , This is in principle related to L1 Regularization is different .

6、 ... and 、 Changes and applications of linear regression

If the model of the problem is not linear regression , Then try to make the predicted value of the model approximate y Derivatives of .

such as —— Log linear regression $lny_{i}=w^{T}\boldsymbol{x}_{i}+b$

A more general ： $y=g^{-1}(w^{T}\boldsymbol{\boldsymbol{x}}+b)$ , This is called generalized linear model .

7、 ... and 、python Realization

1、 Multiple linear regression

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.3,random_state=1)//x,y They are divided into attribute data and tag data 
model = LinearRegression()
model.fit(X_train, Y_train)
score = model.score(X_test, Y_test)
print(' Model test score ：'+str(score))
Y_pred = model.predict(X_test)
print(Y_pred)

2、 Ridge return

from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.3,random_state=1)//x,y They are divided into attribute data and tag data 
model = Ridge(alpha=1)
model.fit(X_train, Y_train)
score = model.score(X_test, Y_test)
print(' Model test score ：'+str(score))
Y_pred = model.predict(X_test)
print(Y_pred)

3、lasso Return to

from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.3,random_state=1)//x,y They are divided into attribute data and tag data 
model = Lasso(alpha=0.1)
model.fit(X_train, Y_train)
score = model.score(X_test, Y_test)
print(' Model test score ：'+str(score))
Y_pred = model.predict(X_test)
print(Y_pred)

8、 ... and 、 Linear model —— The return question $\rightarrow$ Classification problem

All of the above are linear models used to solve regression problems , In fact, linear models can also be used to solve classification problems —— Logical regression （ Log probability regression ）.

See Logical regression （Logistic Regression）_tt Ya's blog -CSDN Blog _ Logical regression csdn

You are welcome to criticize and correct in the comment area , Thank you. ~

原网站

版权声明
本文为[TT ya]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/200/202207171238326707.html