当前位置：网站首页>Wu Enda machine learning chapter 1-2

Wu Enda machine learning chapter 1-2

2022-07-19 06:42:00 【Watermelon that loves programming】

Wu Enda machine learning 1-2 Chapter after class summary .

1-1 Welcome to 《 machine learning 》 Course

A brief introduction to machine learning , And the application of machine learning in life .
machine learning ： Machine learning is near 20 A multidisciplinary interdisciplinary subject that has sprung up over the years , Probability theory 、 statistical 、 Approximation theory 、 Convex analysis 、 Algorithm complexity theory and other disciplines . Constantly improve your performance through learning .
The application of machine learning in life ： data mining 、 Applications that cannot be programmed （ Such as unmanned aircraft 、 Handwriting recognition, etc ）、 Private customization program （ Such as recommending procedures ）

1-2 What is machine learning ？

Now the more formal definition of machine learning is ： Computer programs begin with experience E Middle school learning , Solve a task T, Perform a performance measurement P, adopt P Measured in T It's because of experience E And improve . For example, for checkers , Experience E The program plays checkers with itself tens of thousands of times , Mission T Just playing checkers , Performance metrics P Is the probability of winning checkers with a new opponent .

1-3 Supervised learning

In class , Explain what supervised learning is based on an example of house price prediction ： We give an algorithm a data set , It contains the correct answer . The purpose of the algorithm is to give more correct answers .
Supervised learning is divided into regression problems and classification problems .
The regression problem is to fit with a curve （x,y）, Minimize the error . The classification problem is to predict the output of discrete values , To classify . So we can know , If it is a continuous curve, it is a regression problem , If it is a discrete value, it is a classification problem .

1-4 Unsupervised learning

The data we use in unsupervised learning is different from supervised learning , No labels , We don't know the meaning of every data in the dataset , Unsupervised learning is to classify data sets . This is the clustering algorithm . One example is Google news , Google news is to classify the collected news into various topics
One of the most interesting examples is the cocktail party Algorithm , Two or more people use the same microphone , Through machine learning , Separate their voices .

2-1 Model describes

2-1 In the video , By explaining the example of linear regression house price , To explain the description process of the model in supervised learning . Through the input of eigenvalues , To predict the value of house prices .
Insert picture description here

2-2 Cost function

Cost function （ Also known as the square error function ） The function of is to figure out how to fit the most likely straight line with our data . In order to make the predicted value after we input closer to the real value , If we have smaller parameters in the prediction function .
Insert picture description here
h（x） It's our prediction function , And the cost function makes it easy to calculate after derivation , Then I took one in front 1/2.

2-3 Cost function （ One ）

This section describes the use of cost functions in simple examples , To feel its role .
Insert picture description here
We make theta1 by 1,theta0 by 0, Then the following straight line can be fitted , The value of its cost function is 0.

Let's assume again , Make theta1 by 0.5,theta0 by 0
, Then the following straight line can be fitted , And the value of the cost function is approximately equal to 0.58
Insert picture description here
We make theta1 by 0,theta0 by 0, Then the fitting line is horizontal , And the cost function is approximately equal to 2.3

We can also for theta1 Assign various values , Then the graph of its cost function is as follows . The optimization goal of our learning algorithm is to minimize the value of the cost function . Is shown in the figure below theta1 take 1 When .
Insert picture description here

2-4 Cost function （ Two ）

if theta1 and theta0 Can be changed , Then the graph of the cost function is similar to this bowl graph .
Insert picture description here
Contour lines of each ellipse , It means that theta1 and theta0 Equal point , By taking all kinds of theta1 and theta0, To better fit the function .

2-5 gradient descent

When the gradient decreases, in order to reduce the cost function J To minimize the , Gradient descent method is a very common method , It is widely used in many fields of machine learning .
The problem is described as follows ,Outline Step for gradient descent .

Insert picture description here
Cost function and theta0、theta1 The figure is as follows .
Think of the figure as a mountain , Find the fastest way down the mountain .

The following figure shows the definition of gradient descent algorithm . We should do this step repeatedly , Until it converges .α Called learning rate , Used to control gradient descent , How big a step we took .α More details will be explained in the following chapters .theta0 and theta1 You want to update at the same time , This is very important

Insert picture description here

2-6 Summary of gradient descent knowledge points

This section focuses on gradient descent α And derivative term .

α If it's too small , You need to move a little , It will take a long time to reach the lowest point .
Insert picture description here
α If it is too big , It is likely to lead to failure of convergence or divergence , As shown in the figure below .

Insert picture description here
When the convergence point is reached , The next step of the gradient descent will not change theta Value .

2-7 The gradient of linear regression decreases

This section explains the algorithm of linear regression obtained by combining gradient descent and cost function .

This is the gradient descent algorithm and linear regression model learned before .
Insert picture description here
Expand the derivative term as follows .

The partial derivatives of different parameters are as follows .

The derivative term of each parameter is brought into the gradient descent algorithm （ Be careful theta1 and theta0 Simultaneous updating ）.