Python Machine Learning Library NumPy

Source: Internet
Author: User
This is a very common base library when we use the Python language for machine learning programming. This article is for the Python machine Learning Library NumPy Introductory tutorial, interested friends to learn together

NumPy is a Python-language package that is ideal for scientific computing. This is a very common base library when we use the Python language for machine learning programming.

This article is an introductory tutorial on it.

Introduced

NumPy is a basic software package for technology computing, which is implemented in the Python language. It contains the following:

    • Powerful n-dimensional array structure

    • Sophisticated and sophisticated functions

    • Tools that can be integrated into the C + + and Fortran code

    • Linear algebra, Fourier transform, and random number capability

In addition to the use of scientific calculations, NumPy can also be used as a multi-dimensional container for efficient general data. Because it works with any type of data, NumPy can be seamlessly and efficiently integrated into multiple types of databases.

Get NumPy

Since this is a Python-language package, you need a Python-language environment first on your machine. To do this, search the network for a method of obtaining it yourself.

For information on how to obtain NumPy, please also refer to the installing packages on the scipy.org website. This article does not repeat.

The author recommends using the Pip method to install the Python package, the command is as follows:


PIP3 Install NumPy


The code in this article is validated and tested in the following environment:

    • Hardware: MacBook Pro 2015

    • Os:macos High Sierra

    • Locale: Python 3.6.2

    • Package: NumPy 1.13.3

Here you can get all the source code for this article: https://github.com/paulQuei/numpy_tutorial

Other than that

    • For the sake of simplicity, this article validates the results with Python's print function

    • For spelling convenience, we will default to import NumPy as NP

Base Property and array creation

NumPy is based on an isomorphic multidimensional data, and the elements in the array can be indexed by subscripts. In NumPy, the dimension is called axis (plural is axes), and the number of dimensions is called rank.

For example:

Here is an array with rank 1, axis of length 3:

[1, 2, 3]

Here is an array with rank 2, and axis is also 3 in length:

[[1, 2, 3],
[4, 5, 6]

We can create an array of NumPy by using the array function, for example:


A = Np.array ([1, 2, 3]) b = Np.array ([(+), (4,5,6)])


Note that the square brackets here are required, and the following is an incorrect notation:


A = Np.array (1,2,3,4) # WRONG!!!


The NumPy array class is Ndarray, and it has an alias of Numpy.array, but this is not the same as the array.array of the Python standard library. The latter is just a one-dimensional array. Instead, Ndarray has the following properties:

    • Ndarray.ndim: The number of dimensions of the array. In the Python world, the dimensions are called rank

    • Ndarray.shape: The dimension of the array. This is a series of numbers, and the length is determined by the dimensions of the array (Ndim). For example, the shape of a one-dimensional array of length n is n. The shape of an n-row m-column matrix is n,m

    • Ndarray.size: The number of all elements in an array

    • Ndarray.dtype: The type of elements in an array, such as Numpy.int32, numpy.int16, or Numpy.float64

    • Ndarray.itemsize: The size of each element in the array, in bytes

    • Ndarray.data: Stores the buffer of array elements. Usually we only need to access the element by subscript, without having to access the buffer

Let's take a look at the code example:


# Create_array.pyimport NumPy as NPA = Np.array ([1, 2, 3]) b = Np.array ([(), (4,5,6)]) print (' a= ') print (a) print ("A ' s ND im {} ". Format (A.ndim)) print (" A ' shape {} ". Format (A.shape)) print (" A ' s size {} ". Format (a.size)) print (" A ' s Dtype {} ". Format (a.dtype)) print ("A ' s itemsize {}". Format (a.itemsize)) print (') print (' b= ') print (b) print ("B ' s Ndim {}". Format ( B.ndim) Print ("B ' s shape {}". Format (B.shape)) print ("B ' s size {}". Format (b.size)) print ("B ' s Dtype {}". Format (B.dtype) ) Print ("B ' s itemsize {}". Format (b.itemsize))


The following is the output of this code:


A=[1 2 3]a ' s Ndim 1a ' s shape (3,) a ' s size 3a ' s dtype int64a ' s itemsize 8b=[[1 2 3] [4 5 6]]b ' s Ndim 2b ' s shape (2, 3) B's S Ize 6b ' s dtype int64b ' s itemsize 8


We can also specify the type of the element when creating an array, for example:


c = Np.array ([[Up], [3,4]], Dtype=complex)



For more parameter descriptions of the array function, see here: Numpy.array

Note: The numpy itself supports multidimensional arrays and also supports data for various types of elements. But considering that the three-dimensional and above array structures are not easy to understand, and we are doing machine learning programming, the most used is the matrix operation. Therefore, the following examples of this article are mainly based on a and two-dimensional digital array to illustrate the example.

Creation of a specific array

In actual project engineering, we often need some specific data, and NumPy provides some of these auxiliary functions:

    • Zeros: An array used to create elements that are all 0

    • Ones: An array used to create elements that are all 1

    • Empty: Used to create uninitialized data, so the content is indeterminate

    • Arange: Creating an array by specifying a range and step size

    • Linespace: Creating an array by specifying the range and number of elements

    • Random: Used to generate stochastic numbers


# Create_specific_array.pyimport NumPy as NPA = Np.zeros ((2,3)) print (' Np.zeros ((2,3) = \n{}\n '. Format (a)) B = Np.ones (( 2, 3)) print (' Np.ones ((2,3)) = \n{}\n '. Format (b)) C = Np.empty ((2,3)) print (' Np.empty ((2,3)) = \n{}\n '. Format (c)) d = Np.arange (1, 2, 0.3) print (' Np.arange (1, 2, 0.3) = \n{}\n '. Format (d)) E = Np.linspace (1, 2, 7) print (' Np.linspace (1, 2, 7) = \ n {}\n '. Format (e)) F = np.random.random ((2,3)) print (' Np.random.random ((2,3)) = \n{}\n '. Format (f))


The output of this code is as follows


Np.zeros ((2,3) = [[0]. 0.0.] [0.0. 0.]]np.ones ((2,3)) = [[1. 1.1.] [1.1. 1.]]np.empty ((2,3)) = [[1. 1.1.] [1.1. 1.]]np.arange (1, 2, 0.3) = [1.1.3 1.6 1.9]np.linspace (1, 2, 7) = [1.  1.16666667 1.33333333 1.5  1.66666667 1.83333333 2.] Np.random.random ((2,3)) = [[0.5744616 0.58700653 0.59609648] [0.0417809 0.23810732 0.38372978]]


Shape and operation

In addition to generating arrays, when we already have some data, we may need to generate some new data structures based on existing arrays, and we can use the following functions:

    • Reshape: Generates a new array based on an existing array and a specified shape

    • Vstack: Used to stitch multiple arrays vertically (V for vertical) (the dimensions of the array must match)

    • Hstack: Used to stitch multiple arrays in the direction of the horizontal (h for horizontal) (the dimensions of the array must match)

    • Hsplit: Used to split the array in the horizontal direction

    • Vsplit: Used to split an array in a vertical direction

Let's take a few examples to illustrate this.

To facilitate testing, we first create several data. Here we have created:

    • Zero_line: A row containing 3 0 arrays

    • One_column: A column containing 3 1 arrays

    • A: a matrix of 2 rows and 3 columns

    • B:[11, 20) An integer array of intervals


# shape_manipulation.pyzero_line = Np.zeros ((1,3)) One_column = Np.ones ((3,1)) print ("Zero_line = \n{}\n". Format (Zero_ Line) print ("One_column = \n{}\n". Format (one_column)) a = Np.array ([(All-in-one), (4,5,6)]) b = Np.arange (one, all) print ("a = \n{ }\n ". Format (a)) print (" B = \n{}\n ". Format (b))


With the output we can see their structure:


Zero_line = [[0]. 0.0.] One_column = [[1.] [1.] [1.]] A = [[1 2 3] [4 5 6]]b = [11 12 13 14 15 16 17 18 19]


Array B was originally a one-dimensional array, and now we use the Reshape method to adjust it to a matrix of 3 rows and 3 columns:


# Shape_manipulation.pyb = B.reshape (3,-1) print ("B.reshape (3,-1) = \n{}\n". Format (b))


The second parameter here is set to-1, which means it is automatically determined according to the actual situation. Since it was originally an array of 9 elements, it was exactly a 3x3 matrix after the adjustment. This code output is as follows:


B.reshape (3,-1) = [[11 12 13] [14 15 16] [17 18 19]]


We then use the Vstack function to stitch the three arrays in a vertical direction:


# SHAPE_MANIPULATION.PYC = Np.vstack ((A, B, Zero_line)) print ("C = Np.vstack ((b, zero_line)) = \n{}\n". Format (c))



This code output is as follows, please look at the data structure before and after stitching:


c = Np.vstack ((A, B, zero_line)) = [[1]. 2.3.] [4.5. 6.] [11. 12.13.] [14.15. 16.] [17. 18.19.] [0.0. 0.]


In the same way, we can also do horizontal stitching through the hstack. In order to be able to splice we need to first adjust the structure of array A:


# Shape_manipulation.pya = A.reshape (3, 2) print ("A.reshape (3, 2) = \n{}\n". Format (a)) d = Np.hstack ((A, B, One_column)) PRI NT ("D = np.hstack ((A, B, one_column)) = \n{}\n". Format (d))


This code output is as follows, please carefully observe the data structure before and after stitching:


A.reshape (3, 2) = [[1 2] [3 4] [5 6]]d = Np.hstack ((A, B, one_column)) = [[1].  2.11.  . 1.] [3.  4.14.  . 1.] [5.  6.17.  . 1.]


Note that if the structure of the two arrays is incompatible, the stitching will not be completed. For example, the following line of code will not execute:


# shape_manipulation.py# Np.vstack ((b)) # Valueerror:dimensions not match


This is because array a has two columns, and array B has 3 columns, so they cannot be spliced.

Next we'll look at the split. First, we split the array d in a horizontal direction into 3 arrays. Then we print the middle one (subscript is 1) array:


# Shape_manipulation.pye = Np.hsplit (d, 3) # Split A into 3print ("E = Np.hsplit (d, 3) = \n{}\n". Format (e)) print ("e[1] = \ n {}\n ". Format (E[1]))


This code output is as follows:


E = Np.hsplit (d, 3) = [Array ([[1., 2.],    [3., 4.],    [5., 6.]]), Array ([[One.], [[[]], [[].], [17.,    1 8]]), Array ([[[1], [  1.]    , [  1.]]) E[1] = [[11]. 12.] [14. 15.] [17. 18.]


Also, assuming that we set the split number so that the original array cannot be split evenly, the operation fails:


# Np.hsplit (d, 4) # Valueerror:array split does not result in an equal pision



In addition to specifying the average number of splits, we can also specify the number of columns to split. The following is the split of array d from the 1th and 3rd columns two places:



# SHAPE_MANIPULATION.PYF = Np.hsplit (d, (1, 3)) # # Split A after the 1st and the 3rd Columnprint ("f = np.hsplit (d, (1, 3) ) = \n{}\n ". Format (f))


This code output is as follows. The array d is split into three arrays containing the three-to-one columns:


f = Np.hsplit (d, (1, 3)) = [Array ([[1],    [3.],    [5.]]), Array ([[2., one.], [4.,.],    [6., +]], arr Ay ([[+], 1.], [[1.,  1.],    [.  ] [[]]]


Finally we split the array d in the vertical direction. Similarly, if the specified split fraction cannot be split evenly, it will fail:


# Shape_manipulation.pyg = Np.vsplit (d, 3) print ("Np.hsplit (d, 2) = \n{}\n". Format (g)) # Np.vsplit (d, 2) # Valueerror:array Split does not the result in an equal Pisionnp.vsplit (d, 3) will produce three one-dimensional arrays: Np.vsplit (d, 3) = [Array ([[1],  2., each.,., 13.,
  1.]), Array ([[3., 4., +,  1.  ]]), Array ([[5.,  6., +, 1.]]  )]


Index

Next we look at how to access the data in the NumPy array.

Similarly, for testing convenience, we first create a one-dimensional array. Its contents are integers of the [100,200] interval.

At its most basic, we can specify the subscript by Array[index] to access the elements of the array, which should be familiar to anyone with a bit of programming experience.


# Array_index.pyimport NumPy as Npbase_data = Np.arange (+) print ("base_data\n={}\n". Format (Base_data)) print (" BASE_DATA[10] = {}\n ". Format (base_data[10]))


The above code output is as follows:


BASE_DATA=[100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 1  58 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 191 192 193 194 195 196 197 198 199]BASE_DATA[10] = 110


In NumPy, we can create an array containing several subscripts to get the elements in the target array. As shown below:


# array_index.pyevery_five = Np.arange (0, 5) print ("base_data[every_five] = \n{}\n". Format (  Base_data[every_ Five]))


Every_five is an array of subscripts that we want to get, the contents of which you should easily understand. We can get to all of the elements that we have specified subscript directly in square brackets, which are as follows:


Base_data[every_five] = [100 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175 180 185 190 195]


An array of subscripts can be one-dimensional, or multidimensional, of course. Suppose we want to get a 2x2 matrix whose contents come from the 1,2,10,20 of the four subscript elements in the target array, you can write this:


# array_index.pya = Np.array ([(UP), (10,20)]) print ("a = \n{}\n". Format (a)) print ("base_data[a] = \n{}\n". Format (Base_ Data[a]))


This code output is as follows:


A = [[1 2] [20]]base_data[a] = [[101 102] [110 120]]


What we see above is that the target array is a one-dimensional case, and the following we convert the array into a 10x10 two-dimensional array.


# array_index.pybase_data2 = base_data.reshape (1) print ("Base_data2 = Np.reshape (Base_data, (Ten,-1)) = \n{}\n". Format (BASE_DATA2))


The reshape function has been described before, and you should be able to think about its results:


Base_data2 = Np.reshape (Base_data, (10,-1)) = [[100 101 102 103 104 105 106 107 108 109] [110 111 112 113 114 115 116 117  118 119] [120 121 122 123 124 125 126 127 128 129] [130 131 132 133 134 135 136 137 138 139] [140 141 142 143 144 145 146  147 148 149] [150 151 152 153 154 155 156 157 158 159] [160 161 162 163 164 165 166 167 168 169] [170 171 172 173 174 175 176 177 178 179] [180 181 182 183 184 185 186 187 188 189] [190 191 192 193 194 195 196 197 198 199]]


For a two-dimensional array:

    • Suppose we specify only one subscript, the result of the access is still an array.

    • Assuming we've specified two subscripts, we're accessing the elements.


We can also specify the "last" element by "1"


# Array_index.pyprint ("base_data2[2] = \n{}\n". Format (base_data2[2]) print ("base_data2[2, 3] = \n{}\n". Format (Base_ Data2[2, 3]) print ("Base_data2[-1,-1] = \n{}\n". Format (Base_data2[-1,-1]))


This code output is as follows.

For higher-dimensional arrays, the principle is the same, and the reader can infer by itself.


BASE_DATA2[2] = [121 122 123 124, 126 127, 129]base_data2[2, 3] = 123base_data2[-1,-1] = 199


In addition, we can specify the scope by the form of ":", for example: 2:5. Write only ":" to indicate the full range.

Take a look at the following code:


# Array_index.pyprint ("base_data2[2,:]] = \n{}\n". Format (base_data2[2,:)) print ("base_data2[:, 3]] = \n{}\n". Format ( base_data2[:, 3]) print ("Base_data2[2:5, 2:4]] = \n{}\n". Format (Base_data2[2:5, 2:4]))



The meaning of this is:

    • Gets all the elements of the row labeled 2

    • Gets all the elements of the column labeled 3

Gets all the elements of the row labeled [2,5], subscript as a [2,4] column. Let the reader carefully observe the following output:


Base_data2[2,:]] = [121 122 123 124 (126 127) 129]base_data2[:, 3]] = [103 113 123 133 143 153 163 173 183 193] Base_data2[2:5, 2:4]] = [[122 123] [132 133] [142 143]]



Mathematical operations

There are also a lot of mathematical arithmetic functions in numpy, here are some examples, more functions see here numpy Manual contents:


# Operation.pyimport NumPy as Npbase_data = (Np.random.random ((5, 5))-0.5) * 100print ("Base_data = \n{}\n". Format (base_d ATA) Print ("np.amin (Base_data) = {}". Format (Np.amin (base_data))) print ("Np.amax (Base_data) = {}". Format (Np.amax ( Base_data)) Print ("Np.average (Base_data) = {}". Format (Np.average (base_data))) print ("np.sum (Base_data) = {}". Format (Np.sum (Base_data))) Print ("Np.sin (base_data) = \n{}". Format (Np.sin (base_data)))



This code output is as follows:


Base_data = [[-9.63895991 6.9292461-2.35654712-48.45969283 13.56031937] [ -39.75875796-43.21031705-49.27708561 6.8035 7128 33.71975059] [36.32228175 30.92546582-41.63728955 28.68799187 6.44818484] [7.71568596 43.24884701-14.90716555-9. 24092252 3.69738718] [ -31.90994273 34.06067289 18.47830413-16.02495202-44.84625246]]np.amin (base_data) =- 49.277085606595726np.amax (Base_data) = 43.24884701268845np.average (Base_data) = -3.22680706079886np.sum (Base_data)  = -80.6701765199715np.sin (Base_data) = [[0.21254814 0.60204578-0.70685739 0.9725159 0.8381861] [-0.88287359 0.69755541 0.83514527 0.49721505 0.74315189] [ -0.98124746-0.47103234 0.7149727-0.40196147 0.16425187] [0.99045239-0.66943662-0. 71791164-0.18282139-0.5276184] [-0.4741657 0.47665553-0.36278223 0.31170676-0.76041722]]


Matrix

Now let's take a look at using NumPy in a matrix way.

First, we create a 5x5 random number integer matrix. There are two ways to get the transpose of a matrix: pass. T or transpose function. In addition, the DOT function can be used to multiply the matrix, the sample code is as follows:


# Matrix.pyimport NumPy as Npbase_data = Np.floor ((Np.random.random ((5, 5)-0.5) * +) print ("Base_data = \n{}\n". Format (Base_data)) Print ("Base_data. T = \n{}\n ". Format (base_data. T)) Print ("base_data.transpose () = \n{}\n". Format (Base_data.transpose ())) Matrix_one = Np.ones ((5, 5)) print ("Matrix_ one = \n{}\n ". Format (matrix_one)) Minus_one = Np.dot (Matrix_one,-1) print (" Minus_one = \n{}\n ". Format (minus_one)) print ("Np.dot (Base_data, minus_one) = \n{}\n". Format (Np.dot (Base_data, Minus_one)) This code output is as follows: Base_data = [[-49.-5. 11.-13. -41.] [-6.-33.-33.-47.-4.] [-38. 26.28. -18. 18.] [-3.-19.-15.-39. 45.] [-43. 6.18. -15. -21.] Base_data. T = [-49.-6.-38.-3.-43.] [-5.-33. 26.-19. 6.] [11.-33. 28.-15. 18.] [-13.-47.-18.-39.-15.] [-41.-4. 18.45. -21.] Base_data.transpose () = [[-49.-6.-38.-3.-43.] [-5.-33. 26.-19. 6.] [11.-33. 28.-15. 18.] [-13.-47.-18.-39.-15.] [-41.-4. 18.45. -21.] Matrix_one = [[1]. 1.1. 1.1.] [1.1. 1.1. 1.] [1. 1.1. 1.1.] [1.1.1. 1.1.] [1.1. 1.1. 1.]]minus_one = [[-1.-1.-1.-1.-1.] [-1.-1.-1.-1.-1.] [-1.-1.-1.-1.-1.] [-1.-1.-1.-1.-1.] [-1.-1.-1.-1.-1.] Np.dot (Base_data, minus_one) = [[97]. 97.97. 97.97.] [123.123. 123.123. 123.] [-16.-16.-16.-16.-16.] [31.31. 31.31. 31.] [55. 55.55. 55.55.]



Random number

At the end of this article, let's look at the use of random numbers.

Random numbers are a feature that we use very frequently during programming. Example: Generate demo data, or shuffle existing data order to split modeling data and validate data.

The Numpy.random package contains a number of algorithms for random numbers. Here are four of the most common usage examples:


# Rand.pyimport NumPy as Npprint ("random: {}\n". Format (np.random.random));p rint ("Rand: {}\n". Format ( Np.random.rand (3, 4)));p rint ("Randint: {}\n". Format (np.random.randint (0, +)));p rint ("permutation: {}\n"). Format (Np.random.permutation (Np.arange (20)));


In four different usages are:

    1. Generate 20 random numbers, each of which is between [0.0, 1.0]

    2. Generates a random number based on the specified shape

    3. Generates a random integer of a specified number (20) within a specified range ([0, 100)]

    4. Sequential random order of existing data ([0, 1, 2, ..., 19])

The output of this code is as follows:


Random: [0.62956026 0.56816277 0.30903156 0.50427765 0.92117724 0.43044905 0.54591323 0.47286235 0.93241333 0.32636472 0. 14692983 0.02163887 0.85014782 0.20164791 0.76556972 0.15137427 0.14626625 0.60972522 0.2995841 0.27569573]rand: [[ 0.38629927 0.43779617 0.96276889 0.80018417] [0.67656892 0.97189483 0.13323458 0.90663724] [0.99440473 0.85197677  0.9420241 0.79598706]]randint: [8] yi-yi-yi-yi-yi-98-0, 6 55]permutation: [15 3 18 14 19 16 1 0 4 10 17 5 2 6 12 9 11 13 7]

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.