In Python, sometimes we use arrays to manipulate data to greatly improve the efficiency of data processing.
Like the vectorization operation of R, the operation of the data tends to be simplified, and in Python it is possible to use the NumPy module for array and vector calculations.
Let's look at the simple example below.
Import NumPy as NP
Data=np.array ([2,5,6,8,3]) #构造一个简单的数组
Print (data)
Results:
[2 5 6 8 3]
Data1=np.array ([[2,5,6,8,3],np.arange (5)]) #构建一个二维数组
Print (DATA1)
Results:
[[2 5 6 8 3]
[0 1 2 3 4]]
We can also view the dimensions and data formats of an array using the shape and Dtype methods
Print (Data.shape)
Print (Data.dtype)
Print (Data1.shape)
Print (Data1.dtype)
Results:
(5,)
Int32
(2, 5)
Int32
You can see that data is a one-dimensional array with 5 elements per set and a 32-bit int type for data type
Data1 is a two-dimensional array with 5 elements per group with a data type of 32-bit int type
A good way to differentiate is to look at the number of layers and positions of the brackets in the printed result, and you can see the dimensions of the array, and the parentheses in the first layer represent a dimension.
Other array property methods include the following:
The dimension of the Array.ndim array, the one-dimensional array result is 1, the two-dimensional array prints the result to 2
Array.size the number of elements in an array
Array.itemsiz the byte size of each element of the array
Next we understand the data types in the array:
basic data types in NumPy
Name |
Describe |
bool |
Boolean type stored with one byte (True or FALSE) |
Inti |
An integer whose size is determined by the platform on which it is located (typically int32 or Int64) |
int8 |
One byte size, 128 to 127 |
Int16 |
Integers, 32768 to 32767 |
Int32 |
Integers,-2 * * 31 to 2 * * 32-1 |
Int64 |
Integers,-2 * * 63 to 2 * * 63-1 |
Uint8 |
unsigned integers, 0 to 255 |
UInt16 |
unsigned integers, 0 to 65535 |
UInt32 |
Unsigned integer, 0 to 2 * * 32-1 |
UInt64 |
Unsigned integer, 0 to 2 * * 64-1 |
Float16 |
Semi-precision floating-point number: 16-bit, sign 1-bit, exponent 5-bit, precision 10-bit |
Float32 |
Single-precision floating-point number: 32-bit, positive and negative 1-bit, exponential 8-bit, precision 23-bit |
Float64 or float |
Double-precision floating-point number: 64-bit, positive and negative 1-bit, exponential 11-bit, precision 52-bit |
Complex64 |
Complex numbers, with two 32-bit floating-point numbers representing both real and imaginary parts |
complex128 or complex |
Complex numbers, with two 64-bit floating-point numbers representing both real and imaginary parts |
Array operations based on
Arrays can also be used for our usual subtraction operations.
Arr=np.array (Np.arange (10))
Arr1=np.array (Np.arange (1,11))
Print (arr*2)
Results:
[0 2 4 6 8 10 12 14 16 18]
Print (ARR+ARR1)
Results:
[1 3 5 7 9 11 13 15 17 19]
Note that the addition of two array lengths is the same
Next we look at the array index
Arr=np.arange (10)
Index directly with subscript
Print (Arr[5])
The result is:
5
Slice index
Print (Arr[5:8])
The result is:
[5 6 7]
You can make changes to the data by using an index
arr[5]=120
Print (arr)
The result is:
[0 1 2 3 4 120 6 7 8 9]
You can see that the number labeled 5 has become 120.
In addition, arrays can be boolean-operated
Arr=np.arange (5)
Name=np.array ([' A ', ' B ', ' B ', ' C ', ' a '])
Print (name== ' a ')
The result is:
[True False false false true]
That is, the data that satisfies the condition is all output with a true result.
Next we can manipulate the ARR array by using the Boolean value of the name array to set the condition
Print (arr[name== ' a '])
The result is:
[0 4]
The element in Arr that corresponds to the position of a in name is printed out.
Multi-condition operation
result= (name= ' a ') | (Name= ' C ')
Print (Result)
Print (Name[result])
The result is:
[True false false True]
[' A ' ' C ' a ']
Next, we understand the next Ufunc method
The functions used to manipulate a single array are as follows:
Methods for manipulating two or more arrays
The associated function method uses
Np.meshgrid for generating multidimensional matrices
A,b=np.meshgrid (Np.arange (1,5), Np.arange (2,4))
Print (a)
Print (b)
The result is:
[[1 2 3 4]
[1 2 3 4]]
[[2 2 2 2]
[3 3 3 3]]
form an array with the smallest array of data
Np.where is a vectorized version of the ternary expression x if condition else y
Arr1=np.arange (5)
Arr2=np.arange (20,25)
Condition=np.array ([1,0,1,0,0])
Result=np.where (CONDITION,ARR1,ARR2)
Print (ARR1)
Print (ARR2)
Print (Result)
The result is:
[0 1 2 3 4]
[20 21 22 23 24]
[0 21 2 23 24]
As can be seen, result results in a condition of 1 of the display array arr1 the contents of the condition 0 of the display arr2 content
Mathematical statistical methods
In the array we can also count using mathematical statistics, such as SUM mean STD, etc.
Arr=np.random.randint (1,20,10)
Print (arr)
Print (Np.mean (arr))
Print (Np.sum (arr))
Print (NP.STD (arr))
The result is:
[19 14 8 13 13 10 10 9 19 7]
12.2
122
4.01995024845
The specific method content is as follows:
Correlation statistic method of Boolean array
Arr=np.arange ( -20,10)
result= (arr>5). SUM ()
Print (arr)
Print (Result)
The result is:
-20-19-18-17-16-15-14-13-12-11-10-9-8-7-6-5-4-3
-2-1 0 1 2 3 4 5 6 7 8 9]
4
Data can be determined after the number of sum
The other array methods also have
Reading and storage of data
Common methods of linear functions
Arr=np.array ([Np.random.randint (1,10,5), Np.random.randint (10,20,5)])
Print (arr)
Print (Np.dot (arr,2))
Result is
[[4 6 5 1 6]
[14 16 11 10 18]]
[[8 12 10 2 12]
[28 32 22 20 36]]
Dot method allows matrix multiplication operations
Other methods such as
Finally, we understand the method of generating random numbers in numpy.
In many of the above examples we have used random number generation,
Arr=np.random.random (10)
Print (arr)
Result is
[0.90051063 0.72818635 0.00411373 0.13154345 0.45513344 0.9700776
0.42150977 0.27728599 0.50888291 0.62288808]
Other forms of random number generation method
Understand the above NumPy operation method, the basic data operation problem should not be very big.
Python numpy base array and vector calculation