Machine Learning's Neural Network 1

Source: Internet
Author: User

Organized from Andrew Ng's machine learning Course Week 4.

Directory:

    • Why use neural networks?
    • Model representation of neural Networks 1
    • Model representation of Neural Networks 2
    • Example 1
    • Example 2
    • Multi-Classification problem

1. Why use neural networks

When we have a lot of features: like $x_1, x_2,x_3.......x_{100}$

Suppose we now use a non-linear model with a polynomial maximum of 2 times, then for a nonlinear classification problem, if you use logistic regression:

$g (\theta_0+\theta_1x_1+\theta_2x_2+\theta_3x_1x_2+\theta_4x_1^2x_2+ ...) $

There are approximately $\frac{n^2}{2}$ characteristics, that is, O (N2), then when the number of polynomial is 3 times, the result is greater, O (N3)

The consequences of such a large number of features are: 1. The likelihood of overfitting is increased by 2. The calculation is expensive

To give a more extreme example, in the image problem, each pixel is equivalent to a feature, only for a 50*50 (already a very small picture) of the image, if it is a grayscale image, there are 2,500 features, RGB image has 7,500 features, for each feature has 255 values;

For such an image, if the use of two characteristics, there are about 3 million features, if it is also a logical return, the calculation of the cost is quite large

This time we need to use the neural network.

2. Neural network Model Representation 1

The basic structure of the neural network is as follows:

$x _0, x_1,x_2,x_3$ is the input unit, $x _0$ is also known as the bias unit, you can set the bias unit to 1;

$\theta$ are weights (or direct arguments) that connect input and output weight parameters;

$h _\theta (x) $ is the result of the output;

For the following network structure, we have the following definitions and calculation formulas:

$a _i^{(j)}$: The activation (which is the value of this unit) in the first unit of section J, the middle layer we call hidden layers

$s _j$: number of units on level J

$\theta^{(j)}$: Weight matrix, which controls the mapping from Layer J to Sub j+1,$\theta^{(j)}$ Dimension $s_{j+1}* (s_j+1) $

The formula for $a^{(2)}$ is:

$a _1^{(2)}=g (\theta_{10}^{(1)}x_0+\theta_{11}^{(1)}x_1+\theta_{12}^{(1)}x_2+\theta_{13}^{(1) x_3}) $

$a _2^{(2)}=g (\theta_{20}^{(1)}x_0+\theta_{21}^{(1)}x_1+\theta_{22}^{(1)}x_2+\theta_{23}^{(1)}x_3) $

$a _3^{(2)}=g (\theta_{30}^{(1)}x_0+\theta_{31}^{(1)}x_1+\theta_{32}^{(1)}x_2+\theta_{33}^{(1)}x_3) $

So in the same vein,

$h _\theta (x) =a_1^{(3)}=g (\theta_{10}^{(2)}a_0^{(2)}+\theta_{11}^{(2)}a_1^{(2)}+\theta_{12}^{(2)}a_2^{(2)}+\ theta_{13}^{(2)}a_3^{(2)}) $

3. Neural network Model Representation 2

Forward propagation:vectorized Implementation

The vectorization of the above formula means:

$z _1^{(2)}=\theta_{10}^{(1)}x_0+\theta_{11}^{(1)}x_1+\theta_{12}^{(1)}x_2+\theta_{13}^{(1) x_3}$

$a _1^{(2)}=g (z_1^{(2)}) $

The written vector is:

$ a^{(1)}=x= \begin{bmatrix} x_0 \ x_1 \ x_2 \ X_3 \end{bmatrix} $ $ z^{(2)}=\begin{bmatrix} z^{(2)}_1 \ z^ {(2)}_1 \ z^{(2)}_1 \end{bmatrix} $ $\theta^{(1)}= \begin{bmatrix} \theta^{(1)}_{10} & \theta^{(1)}_{11} &am P \theta^{(1)}_{12} & \theta^{(1)}_{13} \ \theta^{(1)}_{20} & \theta^{(1)}_{21} & \theta^{(1)}_{22} & \ theta^{(1)}_{23} \ \theta^{(1)}_{30} & \theta^{(1)}_{31} & \theta^{(1)}_{32} & \theta^{(1)}_{33} \ \end{ bmatrix}$

So:

$z ^{(2)}=\theta^{(1)}a^{(1)}$

$a ^{(2)}=g (z^{(2)}) $

Plus $a^{(2)}_0=1$:

$z ^{(3)}=\theta^{(2)}a^{(2)}$

$a ^{(3)}=h_\theta (x) =g (z^{(3)}) $

The above is the way to quantify the expression.

For each $a^{(j)}$ will learn different characteristics

4. Example 1

First look at a classification problem, Xor/xnor, for $x_1,x_2 \in {0,1}$, when X1 and X2 are different (0,1 or 1,0), Y is 1, same time y is 0;y=x1 xnor n2

For a simple classification problem and:

The following neural network structure can be used to obtain the correct classification results.

Similarly, for or, we can design the following networks and get the right results.

5. Example 2

Then the above example, fornot, the following network structure can be categorized:

Let's go back to the problem that was originally mentioned in the example: Xnor

When we combine these simple examples (and, or, not), we get the correct network structure to solve the Xnor problem:

6, multi-classification problems

In the neural network to solve the multi-classification problem, but also with the idea of one vs all, in the two classification problem, we are the output is not 0 is 1, and in the multi-classification problem, the result of the output is a one hot vector, $h _\theta (x) \in r^k$, K represents the number of categories

For example, for a 4-class problem, the output might be:

Category 1:$\begin{bmatrix} 0 \ 0 \ 0 \ 1 \end{bmatrix}$, category 2:$\begin{bmatrix} 0 \ 0 \ 1 \ 0 \end{bmatrix}$, category 3:$\begin{ Bmatrix} 0 \ 1 \ 0 \ 0 \end{bmatrix}$, etc.

You can not put $h_\theta (x) $ output to 1,2,3,4

Machine Learning's Neural Network 1

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.