当前位置：网站首页>Wu Enda's machine learning after class exercises - logical regression

Wu Enda's machine learning after class exercises - logical regression

2022-07-26 04:15:00 【Yizhou YZ】

Machine learning homework - Logical regression

Logical regression

Logic regression algorithm , Is a classification algorithm , The essence of this algorithm is ： Its output value is always 0 To 1 Between .

A logistic regression model will be built to predict , Whether a student is admitted to a university . Imagine you are the manager of the relevant part of the University , I want to pass the two test scores of the applicants , To decide whether they're accepted or not . Now you have the training sample set of the students you applied for before that can be used to train logistic regression . For each training sample , You have the scores of their two tests and the final result of being accepted . To accomplish this prediction task , We are going to build a classification model that can evaluate the possibility of admission based on two test scores .

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

path = 'ex2data1.txt'
data = pd.read_csv(path,names=['Exam1','Exam2','Admitted'])
data.head()

	Exam1	Exam2	Admitted
0	34.623660	78.024693	0
1	30.286711	43.894998	0
2	35.847409	72.902198	0
3	60.182599	86.308552	1
4	79.032736	75.344376	1

# .isin yes pandas in DataFrame Boolean index of , You can filter data with column values that meet Boolean conditions 
positive = data[data['Admitted'].isin([1])] 
negative = data[data['Admitted'].isin([0])] 

fig,ax = plt.subplots(figsize=(12,8))
ax.scatter(positive['Exam1'],positive['Exam2'],s=80,c='g',marker='o',label='Admitted') # s Indicates the size of the drawn point 
ax.scatter(negative['Exam1'],negative['Exam2'],s=80,c='r',marker='x',label='NOtAdmitted')
ax.legend(loc=1) #  Label location   Upper right corner 
ax.set_xlabel("Exam1Score")
ax.set_ylabel("Exam2Score")
plt.show()

Insert picture description here

sigmoid function

g Represents a common logic function （logistic function） by S Shape function （Sigmoid function）, Formula for ：
$g\left( z \right)=\frac{1}{1+{ {e}^{-z}}}\\$
Close , We get the hypothesis function of the logistic regression model ：
${h}_{\theta }}\left( x \right)=\frac{1}{1+{ {e}^{-{ {\theta }^{T}}X}}}$

def sigmoid(z):
    #  When np.exp() When the parameter of is a vector , The return value is within the vector, so the element values are carried out separately e^{x}  The result after evaluation , The list formed is returned to the caller .
        return 1 / (1+np.exp(-z))

#  Test the above functions 
nums = np.arange(-5,5)

fig,ax = plt.subplots(figsize=(12,8))
ax.plot(nums,sigmoid(nums),c='r')
plt.show()

Insert picture description here

Write a cost function to evaluate the result .
Cost function ：
$J\left( \theta \right)=\frac{1}{m}\sum\limits_{i=1}^{m}{[-{ {y}^{(i)}}\log \left( { {h}_{\theta }}\left( { {x}^{(i)}} \right) \right)-\left( 1-{ {y}^{(i)}} \right)\log \left( 1-{ {h}_{\theta }}\left( { {x}^{(i)}} \right) \right)]}$

If you use “*” To do matrix multiplication , Both sides are required matrix type , If use dot Realize matrix multiplication , The requirement is ndarray type , And the previous one ndarray The number of columns of is equal to the number of rows of the next

def cost(theta, X, y):
    theta = np.matrix(theta)
    X = np.matrix(X)
    y = np.matrix(y)

    first = np.multiply(-y, np.log(sigmoid(X * theta.T)))
    second = np.multiply((1 - y), np.log(1 - sigmoid(X * theta.T)))
    return np.sum(first - second) / (len(X))

#  Data processing 
try:
    data.insert(-0,'Ones',1)
except:
    pass
cols = data.shape[1]
X = data.iloc[:,0:cols-1]
y = data.iloc[:,cols-1:cols]
X = np.array(X.values)
y = np.array(y.values)
theta = np.zeros(X.shape[1])

X.shape,y.shape,theta.shape

((100, 3), (100, 1), (3,))

cost(theta, X, y)

0.6931471805599453

Gradient descent method

This is a batch gradient drop （batch gradient descent）
Into vectorization ： $\frac{1}{m} X^T( Sigmoid(X\theta) - y )$
$\frac{\partial J\left( \theta \right)}{\partial { {\theta }_{j}}}=\frac{1}{m}\sum\limits_{i=1}^{m}{({ {h}_{\theta }}\left( { {x}^{(i)}} \right)-{ {y}^{(i)}})x_{_{j}}^{(i)}}$

#  Find gradient 
def gradient(theta, X, y):
    theta = np.matrix(theta)
    X = np.matrix(X)
    y = np.matrix(y)
    
    parameters = int(theta.ravel().shape[1])
    grad = np.zeros(parameters)
    
    error = sigmoid(X * theta.T) - y
    
    for i in range(parameters):
        term = np.multiply(error, X[:,i])
        grad[i] = np.sum(term) / len(X)
    
    return grad

gradient(theta, X, y)

array([ -0.1       , -12.00921659, -11.26284221])

#  gradient descent 
def gradientDesent(theta, X, y,alpha,iters):
    costs = np.zeros(iters)
    temp = np.zeros(len(theta))
    for i in range(iters): #  iteration 
        temp = theta - alpha*gradient(theta,X,y)
        theta = temp
        costs[i] = cost(theta,X,y)
    return theta,costs

visualization

theta,costs = gradientDesent(theta, X, y,0.001,1000)
# theta,costs[-1]
print(theta)
print(costs[-1])
fig,ax = plt.subplots(figsize=(12,8))
ax.plot(np.arange(1000),costs,'r')
ax.set_xlabel("Iterations")
ax.set_ylabel("Cost")
ax.set_title("Error vs. Iteration")
plt.show()

[-0.06946097  0.01090733  0.00099135]
0.6249857589104834

Insert picture description here

Parameter fitting — Advanced optimization algorithms

import scipy.optimize as opt
result = opt.minimize(fun=cost,x0=theta,args=(X,y),jac=gradient,method='TNC')
result  #result.x  Is fitted θ Parameters

     fun: 0.20349910741165939
     jac: array([-3.40179217e-05, -4.36399521e-04, -2.11471183e-04])
 message: 'Converged (|f_n-f_(n-1)| ~= 0)'
    nfev: 25
     nit: 10
  status: 1
 success: True
       x: array([-25.25870569,   0.20700684,   0.20226198])

cost(result.x, X, y) #result.x  Is fitted θ Parameters

0.20349910741165939

Predict according to the trained model parameters

def predict(X,theta):   # Forecast the data 
    return (sigmoid(X @ theta.T)>=0.5).astype(int)  # Implement variable type conversion

from sklearn.metrics import classification_report
y_pred = predict(X,result.x)
print(classification_report(y,y_pred))

              precision    recall  f1-score   support

           0       0.87      0.85      0.86        40
           1       0.90      0.92      0.91        60

    accuracy                           0.89       100
   macro avg       0.89      0.88      0.88       100
weighted avg       0.89      0.89      0.89       100

Visualizing decision boundaries

I've got θ Parameters of , take Xθ take 0 The corresponding decision boundary function can be obtained
$\theta_0+\theta_1x_1+\theta_2x_2=0$

$\frac {\theta_0} {\theta_2}+\frac {\theta_1} {\theta_2}x_1+x_2=0$

$x_2=-\frac{\theta_0}{\theta2}-\frac{\theta_1}{\theta_2}x_1$

# First draw the original data 
positive = data[data['Admitted'].isin([1])] 
negative = data[data['Admitted'].isin([0])] 

fig,ax = plt.subplots(figsize=(12,8))
ax.scatter(positive['Exam1'],positive['Exam2'],s=80,c='g',marker='o',label='Admitted') # s Indicates the size of the drawn point 
ax.scatter(negative['Exam1'],negative['Exam2'],s=80,c='r',marker='x',label='NOtAdmitted')
ax.legend(loc=1) #  Label location   Upper right corner 
ax.set_xlabel("Exam1Score")
ax.set_ylabel("Exam2Score")


# Drawing decision boundaries 
theta_res = result.x
exam_x = np.arange(X[:,1].min(),X[:,1].max(),0.01)
theta_res = - theta_res/theta_res[2]  # Get function coefficients θ_0/θ_2 θ_0/θ_2
print(theta_res)
exam_y = theta_res[0]+theta_res[1]*exam_x
ax.plot(exam_x,exam_y)
plt.show()

[124.88113404  -1.02345898  -1.        ]

Insert picture description here

Regularized logistic regression

path =  'ex2data2.txt'
data2 = pd.read_csv(path, header=None, names=['Test 1', 'Test 2', 'Accepted'])
data2.head()

	Test 1	Test 2	Accepted
0	0.051267	0.69956	1
1	-0.092742	0.68494	1
2	-0.213710	0.69225	1
3	-0.375000	0.50219	1
4	-0.513250	0.46564	1

positive = data2[data2['Accepted'].isin([1])]
negative = data2[data2['Accepted'].isin([0])]

fig, ax = plt.subplots(figsize=(12,8))
ax.scatter(positive['Test 1'], positive['Test 2'], s=50, c='b', marker='o', label='Accepted')
ax.scatter(negative['Test 1'], negative['Test 2'], s=50, c='r', marker='x', label='Rejected')
ax.legend()
ax.set_xlabel('Test 1 Score')
ax.set_ylabel('Test 2 Score')
plt.show()

Insert picture description here

#  Through feature mapping , To find the right polynomial .
degree = 5
x1 = data2['Test 1']
x2 = data2['Test 2']

data2.insert(3, 'Ones', 1)

for i in range(1, degree):
    for j in range(0, i):
        data2['F' + str(i) + str(j)] = np.power(x1, i-j) * np.power(x2, j)

data2.drop('Test 1', axis=1, inplace=True)
data2.drop('Test 2', axis=1, inplace=True)

data2.head()

	Accepted	Ones	F10	F20	F21	F30	F31	F32	F40	F41	F42	F43
0	1	1	0.051267	0.002628	0.035864	0.000135	0.001839	0.025089	0.000007	0.000094	0.001286	0.017551
1	1	1	-0.092742	0.008601	-0.063523	-0.000798	0.005891	-0.043509	0.000074	-0.000546	0.004035	-0.029801
2	1	1	-0.213710	0.045672	-0.147941	-0.009761	0.031616	-0.102412	0.002086	-0.006757	0.021886	-0.070895
3	1	1	-0.375000	0.140625	-0.188321	-0.052734	0.070620	-0.094573	0.019775	-0.026483	0.035465	-0.047494
4	1	1	-0.513250	0.263426	-0.238990	-0.135203	0.122661	-0.111283	0.069393	-0.062956	0.057116	-0.051818

regularized cost（ Regularization cost function ）

$J\left( \theta \right)=\frac{1}{m}\sum\limits_{i=1}^{m}{[-{ {y}^{(i)}}\log \left( { {h}_{\theta }}\left( { {x}^{(i)}} \right) \right)-\left( 1-{ {y}^{(i)}} \right)\log \left( 1-{ {h}_{\theta }}\left( { {x}^{(i)}} \right) \right)]}+\frac{\lambda }{2m}\sum\limits_{j=1}^{n}{\theta _{j}^{2}}$

def costReg(theta, X, y, learningRate):
    theta = np.matrix(theta)
    X = np.matrix(X)
    y = np.matrix(y)
    first = np.multiply(-y, np.log(sigmoid(X * theta.T)))
    second = np.multiply((1 - y), np.log(1 - sigmoid(X * theta.T)))
    reg = (learningRate / (2 * len(X))) * np.sum(np.power(theta[:,1:theta.shape[1]], 2))
    return np.sum(first - second) / len(X) + reg

If we want to use the gradient descent law, the cost function is minimized , Regularization does not include ${\theta }_{0}}$ , So gradient descent algorithm will be divided into two cases ：
$\begin{align} & Repeat\text{ }until\text{ }convergence\text{ }\!\!\{\!\!\text{ } \\ & \text{ }{ {\theta }_{0}}:={ {\theta }_{0}}-a\frac{1}{m}\sum\limits_{i=1}^{m}{[{ {h}_{\theta }}\left( { {x}^{(i)}} \right)-{ {y}^{(i)}}]x_{_{0}}^{(i)}} \\ & \text{ }{ {\theta }_{j}}:={ {\theta }_{j}}-a\frac{1}{m}\sum\limits_{i=1}^{m}{[{ {h}_{\theta }}\left( { {x}^{(i)}} \right)-{ {y}^{(i)}}]x_{j}^{(i)}}+\frac{\lambda }{m}{ {\theta }_{j}} \\ & \text{ }\!\!\}\!\!\text{ } \\ & Repeat \\ \end{align}$

In the algorithm above j=1,2,…,n We can get ：
${\theta }_{j}}:={ {\theta }_{j}}(1-a\frac{\lambda }{m})-a\frac{1}{m}\sum\limits_{i=1}^{m}{({ {h}_{\theta }}\left( { {x}^{(i)}} \right)-{ {y}^{(i)}})x_{j}^{(i)}}$

def gradientReg(theta, X, y, learningRate):
    theta = np.matrix(theta)
    X = np.matrix(X)
    y = np.matrix(y)
    
    parameters = int(theta.ravel().shape[1])
    grad = np.zeros(parameters)
    
    error = sigmoid(X * theta.T) - y
    
    for i in range(parameters):
        term = np.multiply(error, X[:,i])
        
        if (i == 0):
            grad[i] = np.sum(term) / len(X)
        else:
            grad[i] = (np.sum(term) / len(X)) + ((learningRate / len(X)) * theta[:,i])
    
    return grad

#  initialization 
# set X and y (remember from above that we moved the label to column 0)
cols = data2.shape[1]
X2 = data2.iloc[:,1:cols]
y2 = data2.iloc[:,0:1]

# convert to numpy arrays and initalize the parameter array theta
X2 = np.array(X2.values)
y2 = np.array(y2.values)
theta2 = np.zeros(X2.shape[1])

#  Initialize learning rate 
learningRate = 1

costReg(theta2, X2, y2, learningRate)

0.6931471805599454

gradientReg(theta2, X2, y2, learningRate)

array([0.00847458, 0.01878809, 0.05034464, 0.01150133, 0.01835599,
       0.00732393, 0.00819244, 0.03934862, 0.00223924, 0.01286005,
       0.00309594])

Advanced optimization algorithm fitting parameters θ

result2 = opt.fmin_tnc(func=costReg, x0=theta2, fprime=gradientReg, args=(X2, y2, learningRate))
result2

(array([ 0.53010248,  0.29075567, -1.60725764, -0.5821382 ,  0.01781027,
        -0.21329508, -0.40024142, -1.37144139,  0.02264303, -0.9503358 ,
         0.0344085 ]),
 22,
 1)

#  You can use advanced Python Library image scikit-learn To solve this problem 
from sklearn import linear_model# call sklearn The linear regression package of 
model = linear_model.LogisticRegression(penalty='l2', C=1.0)
model.fit(X2, y2.ravel())