Machine Learning Public Lesson Note (4): Neural Network (neural networks)--Indicates

Last Update:2015-12-24 Source: Internet

Author: User

Tags dashed line

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Motive (motivation)

For non-linear classification problems, if multiple linear regression is used to classify, it is necessary to construct many high-order items, which leads to too many learning parameters, so the complexity is too high.

Neural networks (Neural network)

As shown in a simple neural network, each circle represents a neuron, each neuron receives the output of the previous neuron as its input, while its output signal is to the next layer, where the first neuron of each layer is called the bias unit, and it is added with a value of 1, usually expressed in +1, and drawn with a dashed line.

Symbol Description:

$a _i^{(j)}$ represents the first neuron of the J layer network, such as $a_1^{(2)}$, which represents the second layer
$\theta^{(j)}$ represents a weight matrix from the $j$ layer to the $j+1$ layer, such as all $\theta^{(1)}$ represents the weight matrix from the first layer to the second layer
$\theta^{(j)}_{uv}$ represents the weight of the U-nerves from the V neurons in layer J to the J+1 layer, for example, $\theta^{(1)}_{23}$ represents the weight of the 2nd neuron from the 3rd neuron to the second layer of the first layer, It is important to note that the subscript UV refers to the weight of the v->u, not the u->v, but also gives the first layer to the second layer of title re-labeling
Generally, if layer J has $s_j$ neurons (not including bias neurons), the j+1 layer has $s_{j+1}$ neurons (also excluding bias neurons), then the dimension of the weight matrix $\theta^{j}$ is $ (s_{j+1}\times s_j+1) $

Forward propagation (Forward propagration, FP)

The value of the neurons in the latter layer changes according to the value of the previous neuron, for example, the second layer of the neural element is updated in the form of

$ $a _1^{(2)} = g (\theta_{10}^{(1)}x_0 + \theta_{11}^{(1)}x_1 + \theta_{12}^{(1)}x_2 + \theta_{13}^{(1)}x_3) $$

$ $a _2^{(2)} = g (\theta_{20}^{(1)}x_0 + \theta_{21}^{(1)}x_1 + \theta_{22}^{(1)}x_2 + \theta_{23}^{(1)}x_3) $$

$ $a _3^{(2)} = g (\theta_{30}^{(1)}x_0 + \theta_{31}^{(1)}x_1 + \theta_{32}^{(1)}x_2 + \theta_{33}^{(1)}x_3) $$

$ $a _4^{(2)} = g (\theta_{40}^{(1)}x_0 + \theta_{41}^{(1)}x_1 + \theta_{42}^{(1)}x_2 + \theta_{43}^{(1)}x_3) $$

where G (z) is the sigmoid function, i.e. $g (z) =\frac{1}{1+e^{-z}}$

1. Vectorization implementation (vectorized implementation)

If we look at the above-mentioned update formula at a vector angle, define

$a ^{(1)}=x=\left[\begin{matrix}x_0\\ x_1 \ x_2 \ X_3 \end{matrix} \right]$ $z ^{(2)}=\left[\begin{matrix}z_1^{(2)}\\ z_1^{(2)} \ z_1^{(2)}\end{matrix} \right]$ $\theta^{(1)}=\left[\begin{matrix}\theta^{(1)}_{10}& \theta^{(1)}_{ 11}& \theta^{(1)}_{12}& \theta^{(1)}_{13}\\ \theta^{(1)}_{20}& \theta^{(1)}_{21}& \theta^{(1)}_{22} & \theta^{(1)}_{23}\\ \theta^{(1)}_{30}& \theta^{(1)}_{31}& \theta^{(1)}_{32} & \theta^{(1)}_{33}\ end{matrix}\right]$

The update formula can be simplified to

$ $z ^{(2)}=\theta^{(1)}a^{(1)}$$

$ $a ^{(2)}=g (z^{(2)}) $$

$ $z ^{(3)}=\theta^{(2)}a^{(2)}$$

$ $a ^{(3)}=g (z^{(3)}) =h_\theta (x) $$

As you can see, we calculate the value of the second layer by the value of the first layer, calculate the value of the third layer by the value of the second layer, get the predicted output, and calculate the way forward, which is the origin of the name of the forward propagation .

2. Linkages to logistic regression

Consider a neural network with no hidden layers, where $x=\left[\begin{matrix}x_0\\ x_1 \ x_2 \ X_3 \end{matrix} \right]$,$\theta=\left[\begin{matrix}\ theta_0& \theta_1& \theta_2 & \theta_3 \end{matrix} \right]$, then we have $h_\theta (x) =a_1^{(2)}=g (z^{(1)}) =g (\ Theta x) =g (x_0\theta_0+x_1\theta_1+x_2\theta_2+x_3\theta_3) $, you can see that this is the hypothetical function of logistic regression!!! This relationship shows that logistic is a special neural network with no hidden layer, and neural network is a generalization of logistic regression to some extent.

Neural network Example

For the linear irreducible classification problem as shown, (0,0) (0,1) (1,0) is another class, the neural network can be solved (see 5). First of all need some simple neural network (1-4), in which the graph and truth table combination can clearly see its function, no longer repeat.

1. Implementing and Operations

2. Implementing or Operations

3. Implement non-operational

4. Implement nand= (not X1) and (not x2) operations

5. Combination implementation Nxor=not (x1 XOR x2) operation

The neural network used the previous and operation (in red), NAND operation (in cyan) and OR operation (in orange), from the truth table can be seen, the neural network successfully divided (0, 0) (the first) into a class, (1,0) (0,1) divided into a class, Good solution to the problem of linear non-point.

Cost function of neural networks (with regular items)

$ $J (\theta) =-\frac{1}{m}\left[\sum\limits_{i=1}^{m}\sum\limits_{k=1}^{k}y^{(i)}_{k}log (H_\theta (x^{(i)})) _k + (1 -y^{(i)}_k) log (n (H_\theta (x^{(i)) _k) \right] + \frac{\lambda}{2m}\sum_{l=1}^{l-1}\sum\limits_{i=1}^{s_l}\sum\ Limits_{j=1}^{s_{l+1}} (\theta_{ji}^{(L)}) ^2$$

Symbol Description:

Number of $-Training example $m
$K $-the last layer (output layer) of the number of neurons, also equal to the number of categories ($k$ class, $K \geq 3$)
$y _k^{(i)}$-$k$ component values for the output of the $i$ training exmaple (length $k$ vectors)
$ (H_\theta (x^{(i))) _k$-example component value of output (vector of length $k$) for $i$ $k$ with neural network prediction
Total number of layers $L $-neural network (including input and output layers)
$\theta^{(L)}$-the weight matrix of the $l$ layer to the $l+1$ layer
$s _l$-the number of neurons in the $l$ layer, note that $i$ counts from 1, and the weights of bias neurons are not counted in the regular term.
The number of neurons in the _{l+1}$-layer of the $s $l+1$

Reference documents

[1] Andrew Ng Coursera public class fourth week

[2] Neural Networks. Https://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html

[3] The nature of code. http://natureofcode.com/book/chapter-10-neural-networks/

[4] A Basic Introduction to neural Networks. Http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html

Machine Learning Public Lesson Note (4): Neural Network (neural networks)--Indicates

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More