Using Python for deep neural Networks 2

Last Update:2018-07-10 Source: Internet

Author: User

Tags scalar shuffle python list

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

73026796

Derivative and gradient, matrix operation properties, Scientific Computing Library NumPy 1.1 experimental contents

Although in the experiment I want to introduce as few as possible (will let people give up to continue to learn) the mathematical concept, but I still seem to fail. However, there is almost no way, to really learn deep learning, there is no certain mathematical basis (advanced mathematics, linear algebra, probability theory, information theory, etc.), (almost) is impossible. You may be able to learn how to build a model without learning the depth of it, but it is difficult to debug when the model is out of the question or is unable to train a good result.

However, to understand the basic concepts of deep learning (rather than to do research in the field of deep learning), it is not very difficult to learn math knowledge. You should be able to master this knowledge soon.

So in this lab, we introduce the mathematical knowledge that this course will involve and the Python modules to be used in the "Image recognition" project numpy .

Warning: The mathematical knowledge presented in this experiment is only intended to give you a better understanding of the concepts in this course, some of which are not rigorous and should not be equated with reference to mathematical textbooks.

1.2 Experimental Knowledge points

Derivative, biased, gradient, chain law
The Basic Law of matrix operation
NumPy Basic Operation Introduction

1.3 Experimental environment

Python 2.7
NumPy 1.12.1

Ii. Experimental Steps 2.1 derivative, bias, gradient, compound function derivative chain law 2.1.1 function value with independent variable rate of change--derivative

We have learned in high school mathematics that the rate at which the function value varies with the independent variable is the derivative. The derivative measure is actually the size of a variable's ability to influence the function value. The higher the value of the derivative, the greater the effect on the final function value for each change in the variable. When the derivative is positive, the function value increases when the argument is increased, and if the derivative value is negative, the function value decreases when the argument increases.
Guide functions for common functions:

original function f	guide function f '
Any constant	0
X	1
E^x	E^x
X^2	2*x
1/x	-1/x^2
ln (x)	1/x

2.1.2 from single variable to multivariable--partial conduction

All we have listed above is a function with only one argument, if there are more than one argument, how do we get the derivative number? For example, for function f=x+y, how to measure the effect of x and Y on function value f respectively?
The concept of biased derivative is introduced mathematically, and it is very simple for a multi-variable function f to find the bias of F to one of the self-variable x, which is to treat other independent variables unrelated to x as normal light, and then use the derivation method of univariate. The derivative of the F pair X is obtained. Like what:

In order to f=x+2y, the result of the F-biased X is 1, and the result of the biased derivative of y is 2.
In order to f=x*y, the result of the F pair x is Y, and the x is the result of the biased derivative of Y.

2.1.3 The fastest changing direction of multivariable functions--gradients

2.1.1We mentioned that for a single variable function, the positive or negative of the derivative represents the "direction" in which the independent variable affects the function value: it becomes larger or smaller. So how do you express this direction for multivariable functions? This introduces the concept of gradients:

A gradient is a vector with a vector length equal to the number of independent variables, and each of these elements is a function that offsets the value of the corresponding variable.

For example, for the function f=x*y, its gradient vector is (y,x) , for the specific arguments of the value, such as x=1,y=1 the point, its gradient vector is, and for (1,1) example x=10,y=-20 , the gradient vector is (-20,10) .

A gradient is a vector that points to the fastest direction of the function's value (recall the loss function graph in the first experiment, where the gradient refers to the upward direction).

The derivation chain rule of 2.1.4 complex function

The above-mentioned derivation and skewness are for "simple functions", for "compound functions", such as the following functions:

F1 (x) =1/x
F2(x) =e^x
F=F1 (f2 (x))

fA function is a compound function that is f1 f2 "concatenated" with the function. The f1 output of which is the input f2 .

For the derivation of the compound function, one method is to expand the compound function, for example, the above function, and then the derivative of the argument according to the f=1/(e^x) principle of simple function derivation. The process is as follows:
f‘ = -1/((e^x)^2)*((e^x)‘) = -(e^x)/((e^x)^2) = -1/(e^x)
Thatf‘ = -1/(e^x)

In fact, in the process of derivation above, we have used the derivative 链式法则(chain rule) , but you are not aware of it. The derivation allows us to take some part in the derivative of the 链式法则 compound function, instead of putting it together. This is important for programming, which makes it easy to differentiate complex functions.
But it may be a little more complicated to describe here. As an f example, when we need to take the derivative of the argument x , we can first consider F2 (x) as an independent variable f2 , first let f1 the f2 derivative, get the first part of the derivative, -1/(f2^2) then f2 the x derivative, the second part of the derivative function e^x. Once the two parts of the derivative are multiplied, the derivative of the whole of the final composite function is obtained. But first use the actual expression to replace the first part of the derivative of the F2, that is, the first part of the derivative is -1/((e^x)^2) , the second part of the derivative is e^x . The two parts get the final right by multiplying them -1/(e^x) .

Now you may think that the chain rule is complicated and tedious, but the next experiment you'll find is 链式法则 really powerful. In fact, the last deep neural network we implemented is the constant use of the derivative chain law.

2.2 Matrix and its basic operation properties

If you've ever been to an undergraduate linear algebra course, you're going to have no sense of the matrix, or even disgust at something very strange about the algorithm. But I hope you can change your view of matrices and linear algebra in the future, and don't let bad textbooks and teachers ' bad ppt ruin the chance that linear algebra might give you a huge boost (yes, it's not an exaggeration). Matrix is very very useful, in every corner of modern science, almost can see the figure of the matrix, deep learning is more so.

Confined to space, this section will only introduce the necessary matrix-related knowledge, more things in linear algebra, please learn by other means (recommended to use English textbooks).

The expression form of 2.2.1 matrix

A m*n matrix is an array of m rows n columns, such as:

ais a 3*2 matrix, is a 2*3 matrix, is a b c 3*1 matrix, d is a 1*2 matrix.

of which, c only one column, we can also be called c 列向量 , d only a row, we can also be called d 行向量 . In this course, for vectors, the default is to refer to 列向量 .

The algorithm of 2.2.2 matrix

The multiplication of matrices
One 标量 (which you can directly understand as a number) multiplied by the matrix, resulting in multiplying each element in the matrix with that scalar, such as:

Transpose operations of matrices
The transpose operation is represented by adding an "apostrophe" to the upper right corner of the matrix.

Transpose is a matrix flip, transpose changes the shape of the matrix. Notice how the transpose is flipped around which axis.
Addition and subtraction between matrices
The addition and subtraction between matrices requires the same size of the two matrices involved in the operation, and the result of the operation equals two matrix corresponding elements added and subtracted.
The source of matrix magic--multiplication between matrices
The multiplication of matrices is somewhat complex, but you have seen it in the first experiment. The multiplication of matrices is a process that represents the combination of parameters and arguments of a linear equation group (matrix multiplication has many more meanings, if you are interested, please explore it yourself).

The specific rule of matrix multiplication is that all elements of line I of the first matrix, and all elements of column J in the second matrix, are multiplied and then summed, resulting in the elements of column J of row I of the result matrix.
The above description is difficult to understand only once, please combine the examples in the picture carefully.
Matrix multiplication First requires that the size of the two matrices participating in the multiplication be "compatible", specifically requiring that the number of columns in the first matrix be the same as the number of rows in the second matrix. You can observe the example in the picture, the first matrix has 2 columns, and the second matrix has 2 rows, so that "all elements of line I in the first matrix" and "all elements of column J in the second matrix" can be one by one corresponding to each other.
Matrix multiplication results in a matrix whose rows are equal to the number of rows of the first matrix, and the number of columns equals the number of columns of the second matrix.
Matrix multiplication does not satisfy the commutative law!! First, after swapping the positions of the two matrices, their dimensions are not necessarily compatible, and even if they are compatible, the results of the operation may not be the same as the original one. You can give yourself a few examples to try.

2.3 Scientific Computing Library NumPy

The implementation of our deep neural network requires a lot of mathematical operations, especially matrix operations. And you see, the matrix (multiplication) operation is very complex, and its own programming is difficult and error prone. To solve these problems, we will use the Scientific computing Library in Python numpy . With this numpy , our code will be much simpler and the speed will be greatly improved.

2.3.1 Using NumPy

Experimental building environment has been installed NumPy, using Import statements can be imported, in order to simplify the code, after the import we will numpy named NP.

Import numpy as NP
Print numpy.__version__ # view NumPy version

When you use NumPy for calculations, enter the command in terminal, and top you will find that there are multiple "same" python processes running, because NumPy will automatically perform multi-process operations to increase the computational speed.

>> top

Here are some examples of how to experiment with your own Python shell.

2.3.2 NumPy Basic data types

numpyThe data type in is called ndarray (that is, n-dimensional array, multidimensional array), creating a ndarray very simple:

Import numpy as NP
Array=np.array ([1,2,3],dtype=np.uint8)
Print Array

np.array()A python list is passed to the function. Note that the dtype parameter is optional and specifies the data length and type of the resulting array, which is an unsigned integer length of 8bit.

2.3.3 Quickly create matrices

mat1=np.zeros((2,3))

np.zeros()Quickly create a full 0 matrix of the specified dimension, noting that the parameter passed in is one tuple .

High-dimensional matrices in 2.3.4 NumPy

"matrices" generally refer to "two-dimensional" matrices with rows and columns, but NumPy also supports high-dimensional matrices, such as the following:

Nd=np.zeros ((1,2,3,4))
Print Nd.shape
Print Nd.size

ndCan be seen as a high-dimensional matrix of 1x2x3x4 dimensions. The ndarray.shape "shape" of the array is saved, that is, the length of each dimension of the high-dimensional matrix. ndarray.sizeis the result of multiplying the length of each dimension of the array, that is, the number of array elements.

2.3.5 Standard matrix Operations

The first thing to note is that the operations numpy in the math are not exactly the same as those in mathematics, and in fact, numpy not only does it provide us with standard operations, but it also provides more computing types and features that are convenient for our programming.

Let's look at the standard matrix operations first:

Multiplication of scalar and matrix
1. scalar=2
2. Mat=np.zeros ((2,3))
3. Mat1=scalar*mat
Matrix transpose
1. mat=np.zeros ((2,3))
2. tmat=mat. T
3. print mat.shape, Tmat.shape
4. mat3=np.array ((1,2,3))
5. tmat3=mat3. T
6. print mat3.shape, Tmat3.shape
for two-dimensional matrices, Ndarray. T to get its transpose. For high-dimensional matrices, Ndarray. The T flips the order of the dimensions completely (in reverse order).
matrix addition
1. mat1=np.array ([[1,2],[3,4]])
2. mat2=np.array ([[1, 0],[0, 1]])
Matrix multiplication
1. Mat1=np.array ([[1,2],[3,4]])
2. Mat2=np.array ([[5,6],[7,8]])
3. Mat3=mat1.dot (MAT2)
Note that there are some changes, matrix multiplication cannot be directly used * , but rather through the. dot () function.

2.3.6 Extended Operations

numpyBuilt-in extension operations are easy to use.

The corresponding elements of the
Two matrix are multiplied by
1. mat1=np.array ([[1,2],[3,4]])
2. mat2=np.array ([[5,6],[7,8]])
3. mat=mat1*mat2
Note that the two matrix sizes that are multiplied must be the same.
Scalar and matrix addition
1. scalar=2
2. mat=np.array ([[1,2],[3,4]])
3. mat1=scalar+mat
Scalar and matrix addition is equivalent to adding the scalar to each element of the matrix.
Manipulating the dimensions of a high-dimensional matrix
1. mat3=np.zeros ((1,2, 3)
2. tmat3=mat3.transpose ( Span class= "Hljs-number" >0,2,1)
3. print mat3.shape,tmat3.shape
Sometimes, We want to change the order of the high-dimensional matrix dimensions, but Ndarray. T can only be completely flipped to meet our needs, this time it is possible to call Ndarray.transpose () , whose parameters represent the order in which the dimensions of the original matrix are rearranged. So the example here is actually equivalent to the No. 0 dimension unchanged, 1th 2nd dimension Exchange.
Broad cast--Widening operation
It is clearly misleading for some in the country to translate NumPy's broadcast into "broadcast" by literal means. Based on broadcast's practical role in NumPy, I personally prefer to take braodcast apart and translate it into "widening" (extending to a wider matrix). Its specific role is:
When two matrices are added/subtracted, for example, we need to add a column vector to each column of a matrix, because the size is different and cannot be directly performed, it is a straightforward practice to iterate through each column of the matrix, and then add the column vectors to each column, so that the code becomes complex. The broadcast operation that NumPy automatically performs will first "widen" the column vectors into a matrix of the same size, and each of its columns is a copy of the original column vector, and then the operation is performed. As follows:
1. Mat1=np.zeros ((3,2))
2. Vec=np.array ([[1],[2],[3]])
3. Print Mat1+vec
This is true for row vectors and high-dimensional matrices.
For a more detailed description, please refer to the numpy documentation: broadcasting

2.3.7 Miscellaneous Operations

This section describes some of the other miscellaneous operations that are used in later projects

Generate random data
```
 rannum=np.random.randn (5,10)      
```
Here np.random.randn () function generates a matrix of the specified size, and all numbers in the matrix conform to the normal distribution (normal distribution) .
1. l=[1,2,3]
2. np.random.shuffle (l)
3. print l
np.random.shuffle () function can receive python List or numpy Ndarray , and randomly disrupts the elements in the array.
Sums the matrix by
1. a=np.random.randn (3,2)
2. print np.sum (a)
The
np.sum () function sums all the elements in the matrix.
Axis in
NumPy
We used the dimension to describe the shape of the matrix, which is easy to confuse with the dimensions (length) of the previously mentioned vectors, and another concept in NumPy, called "axis", is similar to the "dimension" described here. Refers to the "direction" that is performed when a matrix is manipulated. The text is not very good to describe, we combine the example to understand:
1. a=np.random.zeros ((3,2))
2. a=a+1
3. print np.sum (A,axis= 0)
4. print np.sum (A,axis=1)
np.sum (a,axis=0) is the sum of matrix A, on the first axis, and the effect is to sum each column of the matrix. np.sum (A,axis=1) is the sum of matrix A, on the second axis, and the effect is to sum each row of the matrix.
This may not be very well understood, please give yourself a few more examples to experiment with.
Index of E
1. A=NP.RANDOM.RANDN (3,2)
2. Print Np.exp (a)
np.exp()Returns the result that each element x in the input evaluates to E.
The subscript of the largest element in an array
1. a=[1,2,3,4,3,2,1]
2. Print Np.argmax (a)
np.argmax()Returns the subscript for the largest element in a python list or numpy ndarray.

Iii. Summary of the experiment

The content of this experiment has been reduced as much as I can, and only the content that will be used in the later projects is retained. I hope you can understand the above knowledge, although for some people this may be difficult, but mathematics is the best embodiment of human intelligence is not it, mathematics is deep learning, and even the development of artificial intelligence is an important foundation.
If you feel that the content of this experiment is too simple or not good enough, please consult other materials to learn the relevant content.

In this experiment, we studied:

The derivative measures the ability of an independent variable to influence the function value.
A bias is used to measure the ability of an independent variable in a multivariate function to influence the function value.
A gradient is a vector that points to the value of the function to increase the fastest direction.
The chain rule is that, for a composite function, the derivation process can be part of a part, and then "linked" up.
Vectors can be thought of as a special form of a matrix.
Matrix multiplication is closely related to linear systems.
The Ndarray in the NumPy library can be conveniently used for matrix operations.

Iv. homework After class

Please think back to each of the points of knowledge in this experiment to make sure you understand them well.
NumPy is a very famous, often used library, worth your further study, please yourself continue to learn the other things in NumPy: NumPy official website

Using Python for deep neural Networks 2 (RPM)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More