当前位置:网站首页>Coursera deep learning notes
Coursera deep learning notes
2022-07-19 07:24:00 【Alex Tech Bolg】
Contents
Setting up your Machine Learning Application
Train/Dev/Test sets
- Train and dev set don’t need to be 70% - 30%. (Dev set just needs to be big enough for us to evaluate. )
- Make sure dev and test set come from same distribution. (Deep learning model needs to be fed lots of data to train, so sometime we need web crawler to get more data, which comes from different distribution. This rule of thumb can make that the progress in machine learning algorithm will be faster. )
- If you don’t need an unbiased estimate of performance, it’s fine to only have train and dev set.
Bias / Variance
- optimal (Bayes) error
- Compare the error on train and dev set with the optimal (Bayes) error to diagnose whether the model has high bias or high variance or both or neither.
Basic Recipe for Machine Learning

Regularizing your Neural Network
Regularization
- L2 regularization (Frobenius norm) - also called weight decay
- L1 regularization

Why Regularization Reduces Overfitting?
- The first intuitive explanation : hold λ \lambda λ Set it to a very large value , A lot of parameters w become 0, In this way, the model changes from a complex neural network to a simpler neural network , Extreme cases may become linear models .
- Second interpretation : With tanh For example , increase λ \lambda λ,w Reduce ,z Will be limited to a smaller range , This section covers tanh On , It happens to be approximately linear
Dropout Regularization
implement dropout (inverted dropout)

- Inverted dropout In order not to change E(Z), In this way, when forecasting , There will be no more scaling The problem of
- In every time iteration, It's different notes By dropout Set to 0
Making predictions at test time
- No dropout - dropout Will increase the predicted noise
Understanding Dropout

- Intuition: purple note Cannot rely on one input, Because every one of them input Can be random eliminate, therefore dropout Sure spread out weights, So that's what happened shrink weights The role of , So with L2 normalization Same effect .
- Downside:loss function J Difficult to calculate , Because every time dropout It's all random , So we can't go through loss To see how well the model is trained . The common practice is to put dropout Get rid of , If loss function The curve drops well , Then take it. dropout add , Look at the change of the final result .
Other Regularization Methods
Data augmentation
Early stopping
Setting Up your Optimization Problem
Normalizing Inputs
- Normalize training data. Then use same μ \mu μ and σ \sigma σ to normalize test data because we want to guarantee the train and test data go through same transformation.
Why normalize inputs?
- Rough intuition:Your cost function will be in a more round and easier to optimize when your features are on similar scales.
- For dramatic difference of scales in features, it is important to normalize them. If your features come in on similar scales, then this step is less important although performing this normalization pretty much never does any harm.
Vanishing / Exploding Gradients
- If your activations or gradients increase or decrease exponentially as a function of L(number of layers), then these values could get really big or small. This will make training very difficult.
- Especially for exponentially small, the gradient descent will take tiny little steps, which will take a long time for gradient descent to learn anything.
Weight Initialization for Deep Networks
边栏推荐
- Data protection / disk array raid protection IP segment 103.103.188 xxx
- PyTorch学习笔记(一)
- Review - 5703 Statistical Inference and Modeling
- m基于Simulink的高速跳频通信系统抗干扰性能分析
- 2021-10-25 browser compatibility problems
- 听说今天发博客能领徽章!
- 爬虫基础—多线程和多进程的基本原理
- [untitled]
- What does IP fragment mean? How to defend against IP fragment attacks?
- Review of Linear Algebra
猜你喜欢

Quickly learn to use cut command and uniq command

Review of Linear Algebra

PyTorch学习日记(四)

Steam game high frequency i9-12900k build cs:go server

Use of urllib Library

web安全(xss及csrf)

JS does not use async/await to solve the problem of data asynchrony / synchronization

4.IDEA的安装与使用

Data protection / disk array raid protection IP segment 103.103.188 xxx

网络知识-02 物理层
随机推荐
Data protection / disk array raid protection IP segment 103.103.188 xxx
Download, configuration and basic use of C language compiler
Sword finger offer question brushing record - offer 07 Rebuild binary tree
Pychart installation tutorial
Data analysis and visualization -- the shoes with the highest sales volume on jd.com
Closures and decorators
IP103.53.125. XXX IP address segment details
网络知识-04 网络层-ICMP协议
爬虫基础—代理的基本原理
m3GPP-LTE通信网络中认知家庭网络Cognitive-femtocell性能matlab仿真
9.账户和权限
Cracking Metric/Business Case/Product Sense Problems
I heard that today's blog can get a badge!
9. Account and authority
Hypothesis testing
Network knowledge-03 data link layer PPPoE
How to record enterprise or personal domain names
Speed feedback single closed loop DC speed regulation system based on Simulink
Pytorch learning notes (I)
Legendary game setup tutorial