Machine Learning Public Lesson Note (4): Neural Network (neural networks)--Indicates

Source: Internet
Author: User
Tags dashed line

Motive (motivation)

For non-linear classification problems, if multiple linear regression is used to classify, it is necessary to construct many high-order items, which leads to too many learning parameters, so the complexity is too high.

Neural networks (Neural network)

As shown in a simple neural network, each circle represents a neuron, each neuron receives the output of the previous neuron as its input, while its output signal is to the next layer, where the first neuron of each layer is called the bias unit, and it is added with a value of 1, usually expressed in +1, and drawn with a dashed line.

Symbol Description:

    • $a _i^{(j)}$ represents the first neuron of the J layer network, such as $a_1^{(2)}$, which represents the second layer
    • $\theta^{(j)}$ represents a weight matrix from the $j$ layer to the $j+1$ layer, such as all $\theta^{(1)}$ represents the weight matrix from the first layer to the second layer
    • $\theta^{(j)}_{uv}$ represents the weight of the U-nerves from the V neurons in layer J to the J+1 layer, for example, $\theta^{(1)}_{23}$ represents the weight of the 2nd neuron from the 3rd neuron to the second layer of the first layer, It is important to note that the subscript UV refers to the weight of the v->u, not the u->v, but also gives the first layer to the second layer of title re-labeling
    • Generally, if layer J has $s_j$ neurons (not including bias neurons), the j+1 layer has $s_{j+1}$ neurons (also excluding bias neurons), then the dimension of the weight matrix $\theta^{j}$ is $ (s_{j+1}\times s_j+1) $
Forward propagation (Forward propagration, FP)

The value of the neurons in the latter layer changes according to the value of the previous neuron, for example, the second layer of the neural element is updated in the form of

$ $a _1^{(2)} = g (\theta_{10}^{(1)}x_0 + \theta_{11}^{(1)}x_1 + \theta_{12}^{(1)}x_2 + \theta_{13}^{(1)}x_3) $$

$ $a _2^{(2)} = g (\theta_{20}^{(1)}x_0 + \theta_{21}^{(1)}x_1 + \theta_{22}^{(1)}x_2 + \theta_{23}^{(1)}x_3) $$

$ $a _3^{(2)} = g (\theta_{30}^{(1)}x_0 + \theta_{31}^{(1)}x_1 + \theta_{32}^{(1)}x_2 + \theta_{33}^{(1)}x_3) $$

$ $a _4^{(2)} = g (\theta_{40}^{(1)}x_0 + \theta_{41}^{(1)}x_1 + \theta_{42}^{(1)}x_2 + \theta_{43}^{(1)}x_3) $$

where G (z) is the sigmoid function, i.e. $g (z) =\frac{1}{1+e^{-z}}$

1. Vectorization implementation (vectorized implementation)

If we look at the above-mentioned update formula at a vector angle, define

$a ^{(1)}=x=\left[\begin{matrix}x_0\\ x_1 \ x_2 \ X_3 \end{matrix} \right]$ $z ^{(2)}=\left[\begin{matrix}z_1^{(2)}\\ z_1^{(2)} \ z_1^{(2)}\end{matrix} \right]$ $\theta^{(1)}=\left[\begin{matrix}\theta^{(1)}_{10}& \theta^{(1)}_{ 11}& \theta^{(1)}_{12}& \theta^{(1)}_{13}\\ \theta^{(1)}_{20}& \theta^{(1)}_{21}& \theta^{(1)}_{22} & \theta^{(1)}_{23}\\ \theta^{(1)}_{30}& \theta^{(1)}_{31}& \theta^{(1)}_{32} & \theta^{(1)}_{33}\ end{matrix}\right]$

The update formula can be simplified to

$ $z ^{(2)}=\theta^{(1)}a^{(1)}$$

$ $a ^{(2)}=g (z^{(2)}) $$

$ $z ^{(3)}=\theta^{(2)}a^{(2)}$$

$ $a ^{(3)}=g (z^{(3)}) =h_\theta (x) $$

As you can see, we calculate the value of the second layer by the value of the first layer, calculate the value of the third layer by the value of the second layer, get the predicted output, and calculate the way forward, which is the origin of the name of the forward propagation .

2. Linkages to logistic regression

Consider a neural network with no hidden layers, where $x=\left[\begin{matrix}x_0\\ x_1 \ x_2 \ X_3 \end{matrix} \right]$,$\theta=\left[\begin{matrix}\ theta_0& \theta_1& \theta_2 & \theta_3 \end{matrix} \right]$, then we have $h_\theta (x) =a_1^{(2)}=g (z^{(1)}) =g (\ Theta x) =g (x_0\theta_0+x_1\theta_1+x_2\theta_2+x_3\theta_3) $, you can see that this is the hypothetical function of logistic regression!!! This relationship shows that logistic is a special neural network with no hidden layer, and neural network is a generalization of logistic regression to some extent.

Neural network Example

For the linear irreducible classification problem as shown, (0,0) (0,1) (1,0) is another class, the neural network can be solved (see 5). First of all need some simple neural network (1-4), in which the graph and truth table combination can clearly see its function, no longer repeat.

1. Implementing and Operations

2. Implementing or Operations

3. Implement non-operational

4. Implement nand= (not X1) and (not x2) operations

5. Combination implementation Nxor=not (x1 XOR x2) operation

The neural network used the previous and operation (in red), NAND operation (in cyan) and OR operation (in orange), from the truth table can be seen, the neural network successfully divided (0, 0) (the first) into a class, (1,0) (0,1) divided into a class, Good solution to the problem of linear non-point.

Cost function of neural networks (with regular items)

$ $J (\theta) =-\frac{1}{m}\left[\sum\limits_{i=1}^{m}\sum\limits_{k=1}^{k}y^{(i)}_{k}log (H_\theta (x^{(i)})) _k + (1 -y^{(i)}_k) log (n (H_\theta (x^{(i)) _k) \right] + \frac{\lambda}{2m}\sum_{l=1}^{l-1}\sum\limits_{i=1}^{s_l}\sum\ Limits_{j=1}^{s_{l+1}} (\theta_{ji}^{(L)}) ^2$$

Symbol Description:

    • Number of $-Training example $m
    • $K $-the last layer (output layer) of the number of neurons, also equal to the number of categories ($k$ class, $K \geq 3$)
    • $y _k^{(i)}$-$k$ component values for the output of the $i$ training exmaple (length $k$ vectors)
    • $ (H_\theta (x^{(i))) _k$-example component value of output (vector of length $k$) for $i$ $k$ with neural network prediction
    • Total number of layers $L $-neural network (including input and output layers)
    • $\theta^{(L)}$-the weight matrix of the $l$ layer to the $l+1$ layer
    • $s _l$-the number of neurons in the $l$ layer, note that $i$ counts from 1, and the weights of bias neurons are not counted in the regular term.
    • The number of neurons in the _{l+1}$-layer of the $s $l+1$
Reference documents

[1] Andrew Ng Coursera public class fourth week

[2] Neural Networks. Https://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html

[3] The nature of code. http://natureofcode.com/book/chapter-10-neural-networks/

[4] A Basic Introduction to neural Networks. Http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html

Machine Learning Public Lesson Note (4): Neural Network (neural networks)--Indicates

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.