当前位置:网站首页>Li Hongyi machine learning 2022.7.15 -- gradient descent
Li Hongyi machine learning 2022.7.15 -- gradient descent
2022-07-19 15:09:00 【ww9878】
Introduction to gradient descent
In solving optimization problems 
We need to find the smallest group θ, Let the loss function be as small as possible .
First, choose any w0 and b0 Value , about w1 and b1 Conduct update:
η For learning rate , Manual setting . Repeat the above steps and keep updating wi and bi Until there is no change . stay update wi and bi The process of is gradient descent .
The partial differential part is the gradient part 
Learning rate η
Learning rate η The adjustment should be appropriate and just , It is easy to get stuck in a certain position if it is too large . If it is too small, the moving distance is small , The results appear slowly . Pictured :
Usually at the beginning, the distance from the lowest point is large , You can choose η It's worth more , As the moving distance is getting closer and closer to the lowest point, it can be appropriately lowered η Value . Different parameters require different learning rates .
Adagrad Algorithm
The learning rate of each parameter divides it by the root mean square of the previous differential ,σ^t: The root mean square of all the differential of the previous parameter , It's different for each parameter .
g^ t The greater the gradient . The greater the distance you move , and σ^t The greater the gradient , The smaller the step . There is a contradiction between the two . Consider the problem of cross parameters , So the best step should be : A differential / Quadratic differential . Directly proportional to the first derivative , It is inversely proportional to the quadratic differential , The quadratic differential is larger , Parameters update The larger . Only by considering quadratic differentiation can we truly reflect the distance of the lowest point .
Stochastic gradient descent
The random gradient descent method is faster than the previous gradient descent , Select one randomly or in order X^n, After calculating the loss function update gradient .
The general gradient descent step may include several examples , Random gradient descent has taken many steps .
Feature scaling
Scale its input feature range , Make the range of different features the same .
When xi The input value of is larger wi When the same changes , It has a great influence on the output value . As you can see from the diagram x2 It has a great influence on the loss function , be w2 It's steep around .
Zoom method

In the i Calculator average in dimensions mi, And its standard deviation σi, Then use the r The... In the first example i Inputs , Subtract the average mi, Then divide by the standard deviation σi, The result is that all dimensions are 0, All variances are 1.
Basis of mathematical theory of gradient descent
There are some places I still don't understand , Temporary remarks
Taylor expansion
if h(x) stay x=x0 There is an infinite derivative in a field of points ( Infinitely differentiable ,infinitely differentiable), So there are... In this field :

When x Very close to x0 when , Yes h(x)≈h(x0)+h′(x0)(x−x0) The above formula is a function h(x) stay x=x0 Near the point about x Power function expansion of , Also known as Taylor expansion .
Multivariate Taylor expansion


Based on Taylor expansion , stay (a,b) Within the red circle of the dot , The loss function can be simplified by Taylor expansion :

To get 
Minimize the loss function take (u,v) In the opposite direction 
u and v Into the :
Taylor expansion is pushed to process

Limitation of gradient descent
In the process of gradient descent , Because partial differential equals 0, It is easy to stop at a non minimum point .
边栏推荐
- 2. MySQL introduction
- A - Play on Words
- 1、DBMS基本概念
- PCIe Cameralink signal generator (Cameralink image analog source)
- 3U VPX cooling conduction high performance srio/ Ethernet data exchange board
- Domestic fpga/dsp/zynq Chip & board scheme
- JVM常用调优配置参数
- 一次函数 T1744 963字符写法
- 分布式事务的性能设计
- UVA - 12096 The SetStack Computer
猜你喜欢

SBOM (software bill of materials)

国科大.深度学习.期末复习知识点总结笔记

BigScience 开源 Bloom 的自然语言处理模型

国科大. 深度学习. 期末试题与简要思路分析

UCAS. Deep learning Final review knowledge points summary notes

3438. 数制转换
![[cute new problem solving] sum of four numbers](/img/da/19099a4b3cd5a344a4fbd60aef5a05.png)
[cute new problem solving] sum of four numbers
![[microservice] microservice learning note 3: use feign to replace resttemplate to complete remote call](/img/e6/b2f328a8e5ec3becdb9f934d041182.png)
[microservice] microservice learning note 3: use feign to replace resttemplate to complete remote call

UCAS. Deep learning Final examination questions and brief thinking analysis

1、DBMS基本概念
随机推荐
3438. Number system conversion
ICML2022 | 几何多模态对比表示学习
ORA-08103
【萌新解题】四数之和
C - matrix chain multiplexing (Application of stack)
Unix ls
实习是步入社会的一道坎
P1004 [noip2000 improvement group] grid access
Summary of the third week of summer vacation
Tips for using setup
5-21 interceptor
Classes abstraites et dérivées
微信小程序8-云函数
3U VPX cooling conduction high performance srio/ Ethernet data exchange board
44. Use orienmask for instance segmentation target detection, MNN deployment and ncnn deployment
学习记录[email protected]之moveActivityIdTo任务回退特殊案例分析
Chang'an chain learning research - storage analysis wal mechanism
Authing practice | unified management solution for manufacturing identity authentication
Leetcode 1275. Trouver le vainqueur de "Jingzi"
分布式事务的性能设计