当前位置：网站首页>Deep learning 7 deep feedforward network

The human brain is made up of Neuron 、 Glial cells 、 The nerve stem is thin Cell and blood vessel form
Neuron (neuron) Also called nerve cells (nerve cell), It is the most basic unit in the nervous system of the human brain
The nervous system of the human brain contains nearly 860 Billion Neurons
Each neuron has thousands synaptic Connect with other neurons
Human brain neurons are connected into a huge complex network , The total length can reach Thousands of kilometers

（1） Neuronal structure ：

Cell bodies ： Through biochemical reaction , Cause cell membrane inside and outside Potential difference Change , formation excited or Inhibition state

Cell process ： It extends from the cell body , It can also be divided into dendrites and axons ：

* Dendrites ： can receive Stimulate and transmit excitement to the cell body , Each neuron can have one or more dendrites

* axon ： It can change its excited state from the cell body To another neuron , Every neuron only one axon

（2） Information transmission between neurons ：

Each neuron is connected to other neurons , When it “ excited ” when , Will be connected to neurons Send chemicals , So as to change the Potential ;

If the potential of neurons exceeds a certain “ threshold ”, It will be Activate , namely “ excited ” get up , Then to other neurons Send chemicals .

（3） Artificial neuron ：

M-P Neuron model （McCulloch and Pitts,1943）： Neurons receive signals from other 𝑑 Neurons Handed over Input signal , These input signals pass through Weighted connection To pass , Neurons receive Total input value Will be connected to neurons partial Set up （bias） Compare , And then through “ Activation function ” Process the process that produces neurons Output .

（4） Artificial neural network ：

Put many artificial neurons in a certain Hierarchical connections get up , To form Artificial neural network .

The three elements of artificial neural network ：

* node —— What activation function is used ？

* Even the edge —— The weight （ Parameters ） How much is the ？

* How to connect —— How to design hierarchy ？

2、 A simple network for solving XOR problems

（1） Perceptron solves different 、 or 、 Non and XOR problems ：

Input is [𝑥1; 𝑥2] Single layer single neuron （ The input layer is not included in the number of layers ）, Adopt step activation function .

（2） Double layer perceptron —— A simple neural network

Input is still [𝑥1; 𝑥2], Let the network contain two layers ：

* Hidden layer It contains two neurons ： 𝒉 = 𝑓(1)(𝒙; 𝑾, 𝒄)

* Output layer Contains a neuron ： 𝑦 = 𝑓(2)(𝒉; 𝒘, 𝑏)

* The hidden layer adopts linear rectification activation function (ReLU), Then the whole model is ：

𝑓(𝒙; 𝑾, 𝒄, 𝒘, 𝑏) = 𝑓 2 (𝑓 1 (𝒙))

= 𝒘Τ max 0, 𝑾T𝒙 + 𝒄 + 𝑏

Give a solution to the XOR problem ：

explain ： Nonlinear space transformation

3、 Neural network structure

（1） Universal approximation theorem ：

（2） The universal approximation theorem is applied to Neural Networks ：

According to the universal approximation theorem , Those who have Linear output layer and At least one Use “ extrusion ” Property of the activation function Hidden layer A neural network , As long as the number of hidden layer neurons is enough , It can approximate any bounded closed set function defined in real number space with any precision .
neural network As a “ universal ” function To use , It can be used for complex feature transformation , Or approximate a complex conditional distribution .

（3） Why depth ：

Single hidden layer network Can approximate any function , But it's The scale may be huge
        * In the worst case , need Exponential hidden units To approximate a function [Barron, 1993]
With Increase in depth , Online Indicates that the ability increases exponentially
        * have 𝑑 Inputs 、 Depth is 𝑙 、 Each hidden layer has 𝑛 A unit deep rectification network can be described The number of linear regions of is , signify , Describe the ability in depth Exponential level
Deeper networks have better generalization ability ： The performance of the model continues to improve with the increase of depth
Increase in the number of parameters not necessarily It will definitely improve the effect of the model ：
         Deeper models tend to perform better , Not just because the model is bigger . The function you want to learn should be obtained by compounding many simpler functions .

（4） Common neural network structure ：

Feedforward networks

* Each neuron is divided into different groups according to the sequence of receiving information , Each group can be regarded as a neural layer

* The neurons in each layer receive the output from the neurons in the previous layer , And output to the next layer of neurons

* Information in the whole network Spread in one direction , There's no reverse flow of information , It can be represented by a directed acyclic graph

* Feedforward networks include Fully connected feedforward neural networks and Convolutional neural networks

Memory network （ Feedback network ）

* Neurons can not only receive information from other neurons , You can also receive your own Historical information

* Neurons have memory function , stay Different times have different states

* Information dissemination can be One way or two way Pass on , It can be represented by a directed cyclic graph or an undirected graph

* Memory networks include Cyclic neural network 、Hopfield The Internet 、 Boltzmann machine 、 Limited Boltzmann machine etc.

Figure network

* Graph network is defined in Graph structure data Neural networks on ;

* Each node in the graph is represented by One perhaps A group of Neurons make up ;

* The connection between nodes can be Directed Of , It can also be Undirected Of ;

* Each node can receive messages from Adjacent nodes perhaps Oneself Information about ;

* Graph network is a fusion method of feedforward network and memory network , Contains many different implementations , Such as Figure convolution network 、 Picture attention network 、 Messaging networks etc. .

Other structural design considerations ： Besides depth and width , The structure of neural network also has diversity in other aspects .
Change the connection between layers
        * Each cell in the previous layer is only connected to a small subset of cells in the next layer
        * The number of parameters can be greatly reduced
        * The specific connection mode is highly dependent on the specific problem
Add jump connection
        * From 𝑖 Layer and the first 𝑖 + 2 Establish connections between layers or even higher
        * It makes it easier for the gradient to flow from the output layer to the layer closer to the input , Conducive to model optimization

Feedforward neural networks

1、 Structure and representation of feedforward neural network ：

Feedforward neural networks (Feedforward Neural Network, FNN) Is the first invention of simple artificial neural network feedforward neural network is often called Multilayer perceptron (Multi-Layer Perceptron, MLP), But this name is not very reasonable （ The activation function is usually not a discontinuous step function used by the perceptron ）;
The first 0 Layer is the input layer , The last layer is the output layer , Other intermediate layers are called hidden layers ;
The signal propagates unidirectionally from the input layer to the output layer , There is no feedback in the whole network , You can use one Directed acyclic graph Express ;

Symbolic representation of Feedforward Neural Networks ：

Information transmission of feedforward neural network ：

2、 Hidden units —— Activation function ：

The design of hidden units is a very active research field , But there is no clear guiding principle at present
The nature of the activation function requires ：
        * Continuous and derivable （ A few points are allowed to be non derivable ） Of nonlinear function . The differentiable activation function can directly use numerical optimization method to learn network parameters .
        * The activation function and its derivative should be as simple as possible , It can improve the efficiency of network computing .
        * Activate the The value range of derivative function should be in a suitable interval , It can't be too big or too small , Otherwise, it will affect the efficiency and stability of training .

（1）Sigmoid Type of function ：

Rectifier linear unit (ReLU) Functions and their extensions ：

Other activation functions ：

3、 output unit

Linear output unit

* Linear output units are often used to generate The mean of conditional Gaussian distribution
* fit Continuous value prediction （ Return to ） problem
* Based on Gaussian distribution , Maximize likelihood （ Minimize negative log likelihood ） Equivalent to Minimizing mean square error , Therefore, the linear output unit can adopt the mean square error loss function ：
among 𝑦(𝑛) For real value , 𝑛 For the predicted value , 𝑁 Sample size .

Sigmoid unit ：

* Sigmoid Output units are often used to output Bernoulli Distribution
* It is suitable for the problem of dichotomy
* Sigmoid The output unit can adopt Cross entropy loss function ：

Softmax unit ：

* Softmax Output units are often used to output Multinoulli Distribution
* Suitable for multi classification problems
* Softmax The output unit can adopt Cross entropy loss function ：

4、 Feedforward neural network parameter learning

Learning rules
* Suppose that the neural network adopts the cross entropy loss function , For a sample (𝒙, 𝑦), Its Loss function by

gradient descent
* Based on learning criteria and training samples , Network parameters can be learned by gradient descent method ,
* The partial derivative of each parameter can be obtained one by one through the chain rule , But it's inefficient ;
* It is often used in the training of Neural Networks Back propagation algorithm To calculate the gradient efficiently ;

Back propagation algorithm

1、 Differential chain rule

2、 Back propagation algorithm

Given a sample (𝒙, 𝒚), Suppose the output of neural network is y^, The loss function is 𝐿(𝒚, 𝒚^), The gradient descent method requires calculation The partial derivative of the loss function with respect to each parameter .

How to calculate the partial derivatives of parameters in feedforward neural network —— Back propagation (Back Propagation,BP) Algorithm

Consider seeking the third 𝑙 Parameters in the layer 𝑾(𝑙) and 𝒃(𝑙) Partial derivative of , because 𝒛(𝑙) = 𝑾(𝑙)𝒂(𝑙−1) + 𝒃(𝑙), According to the chain rule ：