当前位置:网站首页>Wu Enda's machine learning after class exercises - logical regression
Wu Enda's machine learning after class exercises - logical regression
2022-07-26 04:15:00 【Yizhou YZ】
Machine learning homework - Logical regression
Logical regression
Logic regression algorithm , Is a classification algorithm , The essence of this algorithm is : Its output value is always 0 To 1 Between .
A logistic regression model will be built to predict , Whether a student is admitted to a university . Imagine you are the manager of the relevant part of the University , I want to pass the two test scores of the applicants , To decide whether they're accepted or not . Now you have the training sample set of the students you applied for before that can be used to train logistic regression . For each training sample , You have the scores of their two tests and the final result of being accepted . To accomplish this prediction task , We are going to build a classification model that can evaluate the possibility of admission based on two test scores .
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
path = 'ex2data1.txt'
data = pd.read_csv(path,names=['Exam1','Exam2','Admitted'])
data.head()
| Exam1 | Exam2 | Admitted | |
|---|---|---|---|
| 0 | 34.623660 | 78.024693 | 0 |
| 1 | 30.286711 | 43.894998 | 0 |
| 2 | 35.847409 | 72.902198 | 0 |
| 3 | 60.182599 | 86.308552 | 1 |
| 4 | 79.032736 | 75.344376 | 1 |
# .isin yes pandas in DataFrame Boolean index of , You can filter data with column values that meet Boolean conditions
positive = data[data['Admitted'].isin([1])]
negative = data[data['Admitted'].isin([0])]
fig,ax = plt.subplots(figsize=(12,8))
ax.scatter(positive['Exam1'],positive['Exam2'],s=80,c='g',marker='o',label='Admitted') # s Indicates the size of the drawn point
ax.scatter(negative['Exam1'],negative['Exam2'],s=80,c='r',marker='x',label='NOtAdmitted')
ax.legend(loc=1) # Label location Upper right corner
ax.set_xlabel("Exam1Score")
ax.set_ylabel("Exam2Score")
plt.show()

sigmoid function
g Represents a common logic function (logistic function) by S Shape function (Sigmoid function), Formula for :
g ( z ) = 1 1 + e − z g\left( z \right)=\frac{1}{1+{ {e}^{-z}}}\\ g(z)=1+e−z1
Close , We get the hypothesis function of the logistic regression model :
h θ ( x ) = 1 1 + e − θ T X { {h}_{\theta }}\left( x \right)=\frac{1}{1+{ {e}^{-{ {\theta }^{T}}X}}} hθ(x)=1+e−θTX1
def sigmoid(z):
# When np.exp() When the parameter of is a vector , The return value is within the vector, so the element values are carried out separately e^{x} The result after evaluation , The list formed is returned to the caller .
return 1 / (1+np.exp(-z))
# Test the above functions
nums = np.arange(-5,5)
fig,ax = plt.subplots(figsize=(12,8))
ax.plot(nums,sigmoid(nums),c='r')
plt.show()

Write a cost function to evaluate the result .
Cost function :
J ( θ ) = 1 m ∑ i = 1 m [ − y ( i ) log ( h θ ( x ( i ) ) ) − ( 1 − y ( i ) ) log ( 1 − h θ ( x ( i ) ) ) ] J\left( \theta \right)=\frac{1}{m}\sum\limits_{i=1}^{m}{[-{ {y}^{(i)}}\log \left( { {h}_{\theta }}\left( { {x}^{(i)}} \right) \right)-\left( 1-{ {y}^{(i)}} \right)\log \left( 1-{ {h}_{\theta }}\left( { {x}^{(i)}} \right) \right)]} J(θ)=m1i=1∑m[−y(i)log(hθ(x(i)))−(1−y(i))log(1−hθ(x(i)))]
If you use “*” To do matrix multiplication , Both sides are required matrix type , If use dot Realize matrix multiplication , The requirement is ndarray type , And the previous one ndarray The number of columns of is equal to the number of rows of the next
def cost(theta, X, y):
theta = np.matrix(theta)
X = np.matrix(X)
y = np.matrix(y)
first = np.multiply(-y, np.log(sigmoid(X * theta.T)))
second = np.multiply((1 - y), np.log(1 - sigmoid(X * theta.T)))
return np.sum(first - second) / (len(X))
# Data processing
try:
data.insert(-0,'Ones',1)
except:
pass
cols = data.shape[1]
X = data.iloc[:,0:cols-1]
y = data.iloc[:,cols-1:cols]
X = np.array(X.values)
y = np.array(y.values)
theta = np.zeros(X.shape[1])
X.shape,y.shape,theta.shape
((100, 3), (100, 1), (3,))
cost(theta, X, y)
0.6931471805599453
Gradient descent method
- This is a batch gradient drop (batch gradient descent)
- Into vectorization : 1 m X T ( S i g m o i d ( X θ ) − y ) \frac{1}{m} X^T( Sigmoid(X\theta) - y ) m1XT(Sigmoid(Xθ)−y)
∂ J ( θ ) ∂ θ j = 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \frac{\partial J\left( \theta \right)}{\partial { {\theta }_{j}}}=\frac{1}{m}\sum\limits_{i=1}^{m}{({ {h}_{\theta }}\left( { {x}^{(i)}} \right)-{ {y}^{(i)}})x_{_{j}}^{(i)}} ∂θj∂J(θ)=m1i=1∑m(hθ(x(i))−y(i))xj(i)
# Find gradient
def gradient(theta, X, y):
theta = np.matrix(theta)
X = np.matrix(X)
y = np.matrix(y)
parameters = int(theta.ravel().shape[1])
grad = np.zeros(parameters)
error = sigmoid(X * theta.T) - y
for i in range(parameters):
term = np.multiply(error, X[:,i])
grad[i] = np.sum(term) / len(X)
return grad
gradient(theta, X, y)
array([ -0.1 , -12.00921659, -11.26284221])
# gradient descent
def gradientDesent(theta, X, y,alpha,iters):
costs = np.zeros(iters)
temp = np.zeros(len(theta))
for i in range(iters): # iteration
temp = theta - alpha*gradient(theta,X,y)
theta = temp
costs[i] = cost(theta,X,y)
return theta,costs
visualization
theta,costs = gradientDesent(theta, X, y,0.001,1000)
# theta,costs[-1]
print(theta)
print(costs[-1])
fig,ax = plt.subplots(figsize=(12,8))
ax.plot(np.arange(1000),costs,'r')
ax.set_xlabel("Iterations")
ax.set_ylabel("Cost")
ax.set_title("Error vs. Iteration")
plt.show()
[-0.06946097 0.01090733 0.00099135]
0.6249857589104834

Parameter fitting — Advanced optimization algorithms
import scipy.optimize as opt
result = opt.minimize(fun=cost,x0=theta,args=(X,y),jac=gradient,method='TNC')
result #result.x Is fitted θ Parameters
fun: 0.20349910741165939
jac: array([-3.40179217e-05, -4.36399521e-04, -2.11471183e-04])
message: 'Converged (|f_n-f_(n-1)| ~= 0)'
nfev: 25
nit: 10
status: 1
success: True
x: array([-25.25870569, 0.20700684, 0.20226198])
cost(result.x, X, y) #result.x Is fitted θ Parameters
0.20349910741165939
Predict according to the trained model parameters
def predict(X,theta): # Forecast the data
return (sigmoid(X @ theta.T)>=0.5).astype(int) # Implement variable type conversion
from sklearn.metrics import classification_report
y_pred = predict(X,result.x)
print(classification_report(y,y_pred))
precision recall f1-score support
0 0.87 0.85 0.86 40
1 0.90 0.92 0.91 60
accuracy 0.89 100
macro avg 0.89 0.88 0.88 100
weighted avg 0.89 0.89 0.89 100
Visualizing decision boundaries
I've got θ Parameters of , take Xθ take 0 The corresponding decision boundary function can be obtained
θ 0 + θ 1 x 1 + θ 2 x 2 = 0 \theta_0+\theta_1x_1+\theta_2x_2=0 θ0+θ1x1+θ2x2=0
θ 0 θ 2 + θ 1 θ 2 x 1 + x 2 = 0 \frac {\theta_0} {\theta_2}+\frac {\theta_1} {\theta_2}x_1+x_2=0 θ2θ0+θ2θ1x1+x2=0
x 2 = − θ 0 θ 2 − θ 1 θ 2 x 1 x_2=-\frac{\theta_0}{\theta2}-\frac{\theta_1}{\theta_2}x_1 x2=−θ2θ0−θ2θ1x1
# First draw the original data
positive = data[data['Admitted'].isin([1])]
negative = data[data['Admitted'].isin([0])]
fig,ax = plt.subplots(figsize=(12,8))
ax.scatter(positive['Exam1'],positive['Exam2'],s=80,c='g',marker='o',label='Admitted') # s Indicates the size of the drawn point
ax.scatter(negative['Exam1'],negative['Exam2'],s=80,c='r',marker='x',label='NOtAdmitted')
ax.legend(loc=1) # Label location Upper right corner
ax.set_xlabel("Exam1Score")
ax.set_ylabel("Exam2Score")
# Drawing decision boundaries
theta_res = result.x
exam_x = np.arange(X[:,1].min(),X[:,1].max(),0.01)
theta_res = - theta_res/theta_res[2] # Get function coefficients θ_0/θ_2 θ_0/θ_2
print(theta_res)
exam_y = theta_res[0]+theta_res[1]*exam_x
ax.plot(exam_x,exam_y)
plt.show()
[124.88113404 -1.02345898 -1. ]

Regularized logistic regression
path = 'ex2data2.txt'
data2 = pd.read_csv(path, header=None, names=['Test 1', 'Test 2', 'Accepted'])
data2.head()
| Test 1 | Test 2 | Accepted | |
|---|---|---|---|
| 0 | 0.051267 | 0.69956 | 1 |
| 1 | -0.092742 | 0.68494 | 1 |
| 2 | -0.213710 | 0.69225 | 1 |
| 3 | -0.375000 | 0.50219 | 1 |
| 4 | -0.513250 | 0.46564 | 1 |
positive = data2[data2['Accepted'].isin([1])]
negative = data2[data2['Accepted'].isin([0])]
fig, ax = plt.subplots(figsize=(12,8))
ax.scatter(positive['Test 1'], positive['Test 2'], s=50, c='b', marker='o', label='Accepted')
ax.scatter(negative['Test 1'], negative['Test 2'], s=50, c='r', marker='x', label='Rejected')
ax.legend()
ax.set_xlabel('Test 1 Score')
ax.set_ylabel('Test 2 Score')
plt.show()

# Through feature mapping , To find the right polynomial .
degree = 5
x1 = data2['Test 1']
x2 = data2['Test 2']
data2.insert(3, 'Ones', 1)
for i in range(1, degree):
for j in range(0, i):
data2['F' + str(i) + str(j)] = np.power(x1, i-j) * np.power(x2, j)
data2.drop('Test 1', axis=1, inplace=True)
data2.drop('Test 2', axis=1, inplace=True)
data2.head()
| Accepted | Ones | F10 | F20 | F21 | F30 | F31 | F32 | F40 | F41 | F42 | F43 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | 0.051267 | 0.002628 | 0.035864 | 0.000135 | 0.001839 | 0.025089 | 0.000007 | 0.000094 | 0.001286 | 0.017551 |
| 1 | 1 | 1 | -0.092742 | 0.008601 | -0.063523 | -0.000798 | 0.005891 | -0.043509 | 0.000074 | -0.000546 | 0.004035 | -0.029801 |
| 2 | 1 | 1 | -0.213710 | 0.045672 | -0.147941 | -0.009761 | 0.031616 | -0.102412 | 0.002086 | -0.006757 | 0.021886 | -0.070895 |
| 3 | 1 | 1 | -0.375000 | 0.140625 | -0.188321 | -0.052734 | 0.070620 | -0.094573 | 0.019775 | -0.026483 | 0.035465 | -0.047494 |
| 4 | 1 | 1 | -0.513250 | 0.263426 | -0.238990 | -0.135203 | 0.122661 | -0.111283 | 0.069393 | -0.062956 | 0.057116 | -0.051818 |
regularized cost( Regularization cost function )
J ( θ ) = 1 m ∑ i = 1 m [ − y ( i ) log ( h θ ( x ( i ) ) ) − ( 1 − y ( i ) ) log ( 1 − h θ ( x ( i ) ) ) ] + λ 2 m ∑ j = 1 n θ j 2 J\left( \theta \right)=\frac{1}{m}\sum\limits_{i=1}^{m}{[-{ {y}^{(i)}}\log \left( { {h}_{\theta }}\left( { {x}^{(i)}} \right) \right)-\left( 1-{ {y}^{(i)}} \right)\log \left( 1-{ {h}_{\theta }}\left( { {x}^{(i)}} \right) \right)]}+\frac{\lambda }{2m}\sum\limits_{j=1}^{n}{\theta _{j}^{2}} J(θ)=m1i=1∑m[−y(i)log(hθ(x(i)))−(1−y(i))log(1−hθ(x(i)))]+2mλj=1∑nθj2
def costReg(theta, X, y, learningRate):
theta = np.matrix(theta)
X = np.matrix(X)
y = np.matrix(y)
first = np.multiply(-y, np.log(sigmoid(X * theta.T)))
second = np.multiply((1 - y), np.log(1 - sigmoid(X * theta.T)))
reg = (learningRate / (2 * len(X))) * np.sum(np.power(theta[:,1:theta.shape[1]], 2))
return np.sum(first - second) / len(X) + reg
If we want to use the gradient descent law, the cost function is minimized , Regularization does not include θ 0 { {\theta }_{0}} θ0 , So gradient descent algorithm will be divided into two cases :
R e p e a t u n t i l c o n v e r g e n c e { θ 0 : = θ 0 − a 1 m ∑ i = 1 m [ h θ ( x ( i ) ) − y ( i ) ] x 0 ( i ) θ j : = θ j − a 1 m ∑ i = 1 m [ h θ ( x ( i ) ) − y ( i ) ] x j ( i ) + λ m θ j } R e p e a t \begin{align} & Repeat\text{ }until\text{ }convergence\text{ }\!\!\{\!\!\text{ } \\ & \text{ }{ {\theta }_{0}}:={ {\theta }_{0}}-a\frac{1}{m}\sum\limits_{i=1}^{m}{[{ {h}_{\theta }}\left( { {x}^{(i)}} \right)-{ {y}^{(i)}}]x_{_{0}}^{(i)}} \\ & \text{ }{ {\theta }_{j}}:={ {\theta }_{j}}-a\frac{1}{m}\sum\limits_{i=1}^{m}{[{ {h}_{\theta }}\left( { {x}^{(i)}} \right)-{ {y}^{(i)}}]x_{j}^{(i)}}+\frac{\lambda }{m}{ {\theta }_{j}} \\ & \text{ }\!\!\}\!\!\text{ } \\ & Repeat \\ \end{align} Repeat until convergence { θ0:=θ0−am1i=1∑m[hθ(x(i))−y(i)]x0(i) θj:=θj−am1i=1∑m[hθ(x(i))−y(i)]xj(i)+mλθj } Repeat
In the algorithm above j=1,2,…,n We can get :
θ j : = θ j ( 1 − a λ m ) − a 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) { {\theta }_{j}}:={ {\theta }_{j}}(1-a\frac{\lambda }{m})-a\frac{1}{m}\sum\limits_{i=1}^{m}{({ {h}_{\theta }}\left( { {x}^{(i)}} \right)-{ {y}^{(i)}})x_{j}^{(i)}} θj:=θj(1−amλ)−am1i=1∑m(hθ(x(i))−y(i))xj(i)
def gradientReg(theta, X, y, learningRate):
theta = np.matrix(theta)
X = np.matrix(X)
y = np.matrix(y)
parameters = int(theta.ravel().shape[1])
grad = np.zeros(parameters)
error = sigmoid(X * theta.T) - y
for i in range(parameters):
term = np.multiply(error, X[:,i])
if (i == 0):
grad[i] = np.sum(term) / len(X)
else:
grad[i] = (np.sum(term) / len(X)) + ((learningRate / len(X)) * theta[:,i])
return grad
# initialization
# set X and y (remember from above that we moved the label to column 0)
cols = data2.shape[1]
X2 = data2.iloc[:,1:cols]
y2 = data2.iloc[:,0:1]
# convert to numpy arrays and initalize the parameter array theta
X2 = np.array(X2.values)
y2 = np.array(y2.values)
theta2 = np.zeros(X2.shape[1])
# Initialize learning rate
learningRate = 1
costReg(theta2, X2, y2, learningRate)
0.6931471805599454
gradientReg(theta2, X2, y2, learningRate)
array([0.00847458, 0.01878809, 0.05034464, 0.01150133, 0.01835599,
0.00732393, 0.00819244, 0.03934862, 0.00223924, 0.01286005,
0.00309594])
Advanced optimization algorithm fitting parameters θ
result2 = opt.fmin_tnc(func=costReg, x0=theta2, fprime=gradientReg, args=(X2, y2, learningRate))
result2
(array([ 0.53010248, 0.29075567, -1.60725764, -0.5821382 , 0.01781027,
-0.21329508, -0.40024142, -1.37144139, 0.02264303, -0.9503358 ,
0.0344085 ]),
22,
1)
# You can use advanced Python Library image scikit-learn To solve this problem
from sklearn import linear_model# call sklearn The linear regression package of
model = linear_model.LogisticRegression(penalty='l2', C=1.0)
model.fit(X2, y2.ravel())
边栏推荐
- 华为高层谈 35 岁危机,程序员如何破年龄之忧?
- 1. Mx6u-alpha development board (GPIO interrupt experiment)
- 【二叉树】二叉树中的最长交错路径
- 【第019问 Unity中对SpherecastCommand的理解?】
- AcWing. 102 best cattle fence
- 2022 Hangzhou Electric Multi school bowcraft
- PHP < => spacecraft operator (combined comparator)
- 2.9.4 Ext JS的布尔对象类型处理及便捷方法
- What format should be adopted when the references are foreign documents?
- How to make your language academic when writing a thesis? Just remember four sentences!
猜你喜欢

荐书 |《学者的术与道》:写论文是门手艺

The era of smart clothes has come. Zhinar invites you to discuss swords in Yangcheng. Guangya Exhibition is waiting for you on August 4

Verilog implementation of key dithering elimination

Sentinel fusing and current limiting

如何构建面向海量数据、高实时要求的企业级OLAP数据引擎?

工程师如何对待开源 --- 一个老工程师的肺腑之言

Makefile knowledge rearrangement (super detailed)

Wechat applet to realize music player (4) (use pubsubjs to realize inter page communication)

Dracoo master

Huawei executives talk about the 35 year old crisis. How can programmers overcome the worry of age?
随机推荐
Yadi started to slow down after high-end
sorting and searching 二分查找法
AcWing. 102 best cattle fence
Advanced content of MySQL -- three MySQL logs that must be understood binlog, redo log and undo log
How mantium uses deepspeed to implement low latency gpt-j reasoning on Amazon sagemaker
How to transfer English documents to Chinese?
APISIX 在 API 和微服务领域的探索
How to download the supplementary literature?
PHP < => spacecraft operator (combined comparator)
PHP save array to var file_ export、serialize
工程师如何对待开源 --- 一个老工程师的肺腑之言
如何构建面向海量数据、高实时要求的企业级OLAP数据引擎?
荐书丨《教育心理学》:送给明日教师的一本书~
Pathmatchingresourcepatternresolver parsing configuration file resource file
firewall 命令简单操作
VM virtual machine has no un bridged host network adapter, unable to restore the default configuration
Working ideas of stability and high availability guarantee
Laravel8 implements interface authentication encapsulation using JWT
2022杭电多校 Bowcraft
Uniapp pit filling Tour