当前位置：网站首页>Wu Enda machine learning chapter 6-7

Wu Enda machine learning chapter 6-7

2022-07-19 06:42:00 【Watermelon that loves programming】

Wu Enda machine learning 6-7 Chapter

Due to the first 5 This chapter mainly talks about Octave The grammar of , But now we mainly use python To carry out AI Programming . So we won't summarize the fifth chapter , Interested friends can go and have a look .

The first 6 Chapter

6-1 classification

Classification is used in many places in real life , For example, spam classification 、 Network fraud 、 Tumor prediction . Positive classes are generally expressed as 1, Negative classes are generally expressed as 0. Of course , If there are multiple classification problems, there will be 0,1,2,3 And so on ,
Insert picture description here
For tumor prediction , If the result of the prediction function is greater than 0.5, Then we predict it as a positive class , On the contrary, it is a negative class .

Sometimes our prediction algorithm may result in more than 1 Or less than 0, This is obviously a little strange . So then we will talk about one logistic Regression algorithm , Make the prediction result in 0 To 1 Between （ Although the name has a return , But this algorithm makes a classification algorithm ）
Insert picture description here

6-2 Hypothetical statement

logistic The function is defined as follows （ It is generally believed logistic Function is equal to sigmoid function ）：
Insert picture description here
We can take a simple tumor example

6-3 Decision boundaries

stay logistic In the regression , When the forecast is greater than 0.5 when , We think it is a positive class , The opposite is negative . We can know by observing the function graph , When z Greater than 0 when , The prediction result is greater than 0.5, When z Less than 0 when , The prediction result is less than 0.5..
Insert picture description here
In the following illustration , We make a prediction function , And learned that when -3+x1+x2>=0 when ,y be equal to 1, stay -3+x1+x2<=0 when ,y=2. So we can know that the decision boundary is x1+x2=3.

In another example , We can add higher-order terms . We make θ0=-1,θ1=0,θ2=0,θ3=1,θ=1. We can know that the decision boundary is x1 Square plus x2 The square of is equal to 1.
Insert picture description here

6-4 Cost function

In the past , We introduced the cost function , Here we use a relatively new tag -
cost. When the predicted value is different from the real value , We hope that the function can pay a price .
After using gradient descent , We can ensure that the function converges .
Insert picture description here
Our definition of cost function is as follows ：

We can get the relationship with the prediction function . if y=1 And the prediction function =1, be cost=0;
But if the prediction function is 0 when ,cost Will approach infinity .

6-5 Simplified cost function and gradient descent

In this section, we use a simple cost function to combine with gradient descent , To achieve complete logistic Regression algorithm .
First, list the cost function again
Insert picture description here
To fit the curve , We need to converge the value of the cost function .

Then we use the gradient descent algorithm , To fit the curve , Find the smallest θ value .

6-6 Advanced optimization

There are some excellent algorithms , They can better make the function reach the convergence state . Their advantages and disadvantages are as follows . For the rest of the class , The teacher is suggesting , We try not to implement the underlying algorithm , Make wheels over and over again , If you understand the theory and adjust the database better, you'll be done ,
Insert picture description here

6-7 Multivariate classification ： One to many

In this lesson, we will discuss how to use logical regression to solve multi classification problems . In real life , Multiple classification problems make it very common , Such as weather conditions .
For multi classification problems , For example, three categories , We can define a certain class as the first class , Then define the other two categories as the second category , To classify . Repeat this three times , Then we can complete the problem of three classifications .
Insert picture description here

The first 7 Chapter

7-1 Over fitting problem

When the algorithm fits a straight line well , And passed all the function points perfectly , This is over fitting . But this does not seem to be a very good phenomenon , Because when there are too many variables , The cost function may be close to 0, But it cannot be generalized to new sample points , And can't predict well . On the contrary, under fitting means that there is no prediction ability at all , You can't even train at the training point .
The first one in the figure below is under fitting , The second is just , The third is over fitting .
Insert picture description here
When under fitting occurs , We have two ways . First, we can reduce the number of selected variables , That is, delete some characteristic variables . The second method is variable regularization , Reduce the value of some parameters .

7-2 Regularization related cost function

We add an error value to the cost function , To reduce the possibility of over fitting .
Insert picture description here
For example, in housing prediction , If the sample has 100 Eigenvalues , We can add regularization to the cost function .

7-3 Linear regression regularization

For linear regression , We derived two algorithms before , One is based on gradient descent , The other is based on normal equations . We extend these two algorithms to regularized linear regression .
This is the cost function that adds regularization
Insert picture description here
This is a gradient descent function without regularization

Now we add the regularization term

For normal equations , We can also add regularization terms , There is an interesting phenomenon , When regularization is added , Then the proof will be reversible .