当前位置:网站首页>Sklearn machine learning foundation (linear regression, under fitting, over fitting, ridge regression, model loading and saving)
Sklearn machine learning foundation (linear regression, under fitting, over fitting, ridge regression, model loading and saving)
2022-07-26 08:48:00 【Natural color】
Catalog
1.2 The normal equation of the least squares method
1.3 Gradient descent of least square method ( Universal )
1.4 Normal equation predicts Boston house price
1.5 Gradient decline predicts Boston house prices
1.6 Regression performance evaluation
2. Under fitting and over fitting
3. Ridge return ( Linear regression with regularization )
1. Linear model

The multiplication of matrix meets the requirements of linear regression operation
1.1 Loss function :

The process of optimization and iteration Is to seek the most suitable Weighted The process
1.2 The normal equation of the least squares method

1.3 Gradient descent of least square method ( Universal )

1.4 Normal equation predicts Boston house price
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression, SGDRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
def myliner():
'''
# Linear regression prediction of house price
:return:
'''
# get data
lb = load_boston()
# Split data
x_train, x_test, y_train, y_test = train_test_split(lb.data, \
lb.target, test_size=0.2)
# Standardization Whether the target value should be standardized need !!
# Eigenvalue standardization
std_x = StandardScaler()
x_train = std_x.fit_transform(x_train)
x_test = std_x.transform(x_test)
# Target value standardization
std_y = StandardScaler()
y_train = std_y.fit_transform(y_train.reshape(-1, 1)) # Change the target value into two dimensions
y_test = std_y.transform(y_test.reshape(-1, 1))
# forecast
# Normal equation solving prediction
lr = LinearRegression()
lr.fit(x_train, y_train)
print(lr.coef_)
# Predict the house price of the test set The previous standardization , Through here inverse Then switch back
y_predict = std_y.inverse_transform(lr.predict(x_test))
print(" The predicted price of each house in the test set :", y_predict)
return None
if __name__ == '__main__':
myliner()1.5 Gradient decline predicts Boston house prices
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression, SGDRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
def myliner():
'''
# Linear regression prediction of house price
:return:
'''
# get data
lb = load_boston()
# Split data
x_train, x_test, y_train, y_test = train_test_split(lb.data, lb.target, test_size=0.2)
# Standardization Whether the target value should be standardized need !!
# Eigenvalue standardization
std_x = StandardScaler()
x_train = std_x.fit_transform(x_train)
x_test = std_x.transform(x_test)
# Target value standardization
std_y = StandardScaler()
y_train = std_y.fit_transform(y_train.reshape(-1, 1)) # Change the target value into two dimensions
y_test = std_y.transform(y_test.reshape(-1, 1))
# forecast
# Gradient descent solution prediction
sgd = SGDRegressor()
sgd.fit(x_train, y_train)
print(sgd.coef_)
# Predict the house price of the test set The previous standardization , Through here inverse Then switch back
y_predict = std_y.inverse_transform(sgd.predict(x_test))
print(" The predicted price of each house in the test set :", y_predict)
return None
if __name__ == '__main__':
myliner()1.6 Regression performance evaluation
Small data :LinearRegression( You can't Solve the fitting problem ) And others
Large scale data :SGDRegressor
sklearn.metrics.mean_squared_error


2. Under fitting and over fitting

2.1 resolvent
Under fitting :
Learning too few features of data , should Increase the number of features in the data
Over fitting :
Too many original features , There are some noisy features , The model is too complex because it tries to consider all the test data points
solve :
Make feature selection , Eliminate the characteristics of great relevance ( It's hard to do )
Cross validation ( Get all the data trained ) test
L2 Regularization ( understand ): Reduce the weight of higher-order terms
3. Ridge return ( Linear regression with regularization )
sklearn.linear_model.Ridge

The greater the regularization , The smaller the weight value, the closer to 0
From regression The regression coefficient is more practical , More reliable . in addition , It can make the fluctuation range of estimation parameters smaller , Become more stable . It has great practical value in the research of morbid data .
from sklearn.datasets import load_boston
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
def myliner():
'''
# Linear regression prediction of house price
:return:
'''
# get data
lb = load_boston()
# Split data
x_train, x_test, y_train, y_test = train_test_split(lb.data, lb.target, test_size=0.2)
# Standardization Whether the target value should be standardized need !!
# Eigenvalue standardization
std_x = StandardScaler()
x_train = std_x.fit_transform(x_train)
x_test = std_x.transform(x_test)
# Target value standardization
std_y = StandardScaler()
y_train = std_y.fit_transform(y_train.reshape(-1, 1)) # Change the target value into two dimensions
y_test = std_y.transform(y_test.reshape(-1, 1))
# forecast
# Ridge regression solves the prediction
rd = Ridge(alpha=1.0)
rd.fit(x_train, y_train)
print(rd.coef_)
# Predict the house price of the test set The previous standardization , Through here inverse Then switch back
y_predict = std_y.inverse_transform(rd.predict(x_test))
print(" The predicted price of each house in the test set :", y_predict)
print(" Mean square error of ridge regression :", mean_squared_error(std_y.inverse_transform(y_test), y_predict))
return None
if __name__ == '__main__':
myliner()4. Model loading and saving
# Save the trained model
with open('rd.pickle', 'wb') as fw:
pickle.dump(rd, fw)
# Load model
with open('rd.pickle', 'rb') as fr:
new_rd = pickle.load(fr)from sklearn.datasets import load_boston
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
import pickle
def myliner():
'''
# Linear regression prediction of house price
:return:
'''
# get data
lb = load_boston()
# Split data
x_train, x_test, y_train, y_test = train_test_split(lb.data, lb.target, test_size=0.2)
# Standardization Whether the target value should be standardized need !!
# Eigenvalue standardization
std_x = StandardScaler()
x_train = std_x.fit_transform(x_train)
x_test = std_x.transform(x_test)
# Target value standardization
std_y = StandardScaler()
y_train = std_y.fit_transform(y_train.reshape(-1, 1)) # Change the target value into two dimensions
y_test = std_y.transform(y_test.reshape(-1, 1))
# forecast
# Ridge regression solves the prediction
rd = Ridge(alpha=1.0)
rd.fit(x_train, y_train)
print(rd.coef_)
# Save the trained model
with open('rd.pickle', 'wb') as fw:
pickle.dump(rd, fw)
# Load model
with open('rd.pickle', 'rb') as fr:
new_rd = pickle.load(fr)
# Predict the house price of the test set The previous standardization , Through here inverse Then switch back
y_predict = std_y.inverse_transform(new_rd.predict(x_test))
print(" The predicted price of each house in the test set :", y_predict)
print(" Mean square error of ridge regression :", mean_squared_error(std_y.inverse_transform(y_test), y_predict))
return None
if __name__ == '__main__':
myliner()边栏推荐
- Study notes of automatic control principle --- stability analysis of control system
- OA项目之我的会议(会议排座&送审)
- Neo eco technology monthly | help developers play smart contracts
- One click deployment of lamp and LNMP scripts is worth having
- Solve the problem of C # calling form controls across threads
- Fluent uses protobuf
- Uninstallation of dual systems
- The full name of flitter IDFA is identity for advertisers, that is, advertising identifiers. It is used to mark users. At present, it is most widely used for advertising, personalized recommendation,
- P3743 Kotori's equipment
- Implementation of Prometheus web authentication and alarm
猜你喜欢

My meeting of OA project (meeting seating & submission for approval)

File management file system based on C #

Why reserve a capacitor station on the clock output?

有限元学习知识点备案

Kotlin属性与字段

Neo eco technology monthly | help developers play smart contracts

JDBC数据库连接池(Druid技术)

6、 Pinda general permission system__ pd-tools-log

uni-app 简易商城制作

SSH,NFS,FTP
随机推荐
When developing flutter, idea_ ID cannot solve the problem
Pan micro e-cology8 foreground SQL injection POC
12306 ticket system crawling - 1. Saving and reading of city code data
Mysql/mariadb (Galera multi master mode) cluster construction
Using the primitive root of module m to judge and solve
P3743 kotori的设备
23.9 application exit application exit
Cve-2021-3156 duplicate of sudo heap overflow privilege raising vulnerability
Oracle 19C OCP 1z0-083 question bank (7-12)
Xshell batch send command to multiple sessions
Human computer interaction software based on C language
Study notes of automatic control principle --- stability analysis of control system
What are the differences in the performance of different usages such as count (*), count (primary key ID), count (field) and count (1)? That's more efficient
Winter vacation homework & Stamp cutting
[database] gbase 8A MPP cluster v95 installation and uninstall
JDBC数据库连接池(Druid技术)
Foundry教程:使用多种方式编写可升级的智能合约(上)
Memory management based on C language - Simulation of dynamic partition allocation
2000年的教训。web3是否=第三次工业革命?
基于C语言的哈夫曼转化软件