Data analysis and presentation-NumPy database entry, numpy database for data analysis
This is the note of my course "Python data analysis and display" from songtian, Beijing University of Technology. The course has outstanding emphasis and clear layers. Here, I would like to thank you for your wonderful explanation.
NumPy library entry data dimension
A dimension is the organization of a group of data. A data dimension is a concept that forms a specific relationship between data to express multiple meanings.
One-dimensional data:
One-dimensional data is composed of ordered or unordered data of the peering relationship and is organized in a linear manner. Corresponds to concepts such as list, array, and set.
List and array: The ordered structure of a group of data.
Differences:
List: data types can be different
Array: Same Data Type
Two-dimensional data:
Two-dimensional data is composed of multiple one-dimensional data. It is a combination of one-dimensional data.
A table is a typical two-dimensional data. The header is a part of two-dimensional data.
Multi-dimensional data:
Multidimensional Data is extended by one or two dimensional data in the new dimension. For example, add a table with a time dimension
High-dimensional data:
High-dimensional data only uses the most basic binary relationship to display the complex structure between data. Key-value pairs are used to organize data.
Python representation of data dimension
One-dimensional data: List (ordered) and set (unordered) Types
Two-dimensional data: List type
Multi-dimensional data: List type
High-dimensional data: dictionary type or data representation format (JSON, XML, YAML)
NumPy array object: ndarray
NumPy is an open-source basic Python scientific computing library. NumPy provides a powerful n-dimensional array object ndarray, broadcast function, integration of C/C ++/Fortran code tools, linear algebra, Fourier transformation, random number generation and other functions. NumPy is the basis of data processing or scientific computing libraries such as SciPy and Pandas.
Numpy reference:
import numpy as np
Although the alias can be omitted or changed, we recommend that you use the alias mentioned above.
Benefits of introducing ndarray:
Example: Calculate A2 + B3, where A and B are one-dimensional arrays.
def pySum(): a = [0,1,2,3,4] b = [9,8,7,6,5] c = [] for i in range(len(a)): c.append(a[i]**2 + b[i]**3) return cprint(pySum())
import numpy as npdef npSum(): a = np.array([0,1,2,3,4]) b = np.array([9,8,7,6,5]) c = a**2 + b**3 return cprint(npSum())
Array objects can remove the loops required for Inter-element operations, making one-dimensional vectors more like a single data. Setting special array objects can improve the computing speed of such applications after optimization.
Observation: In scientific computing, all data types in a dimension are often the same.
Array objects use the same data type, which helps save operation and storage space.
N-dimensional array object: ndarray
Ndarray is a multi-dimensional array object consisting of two parts: actual data, metadata describing the data (data dimension, data type, etc ). Ndarray generally requires that all elements have the same type (homogeneous), and the array subscript starts from 0.
Use np. array () to generate an ndarray (the alias of ndarray in the program is array). np. array () is output in the [] format, and elements are separated by spaces.
- Axis: dimension for saving data
- Rank: number of axes
Example: generate an ndarray
In [1]: import numpy as npIn [2]: a = np.array([[0,1,2,3,4], ...: [9,8,7,6,5]]) ...: In [3]: aOut[3]: array([[0, 1, 2, 3, 4], [9, 8, 7, 6, 5]])In [4]: print(a) [[0 1 2 3 4] [9 8 7 6 5]]
Attributes of the ndarray object
Attribute |
Description |
. Ndim |
Rank, that is, the number of axes or the number of dimensions |
. Shape |
The size of the ndarray object. For the matrix, n rows and m Columns |
. Size |
The number of elements in the ndarray object, which is equivalent to the value of n * m in. shape. |
. Dtype |
Ndarray object element type |
. Itemsize |
The size of each element in the ndarray object, in bytes. |
Example: Test ndarray attributes
In [5]: a.ndimOut[5]: 2In [6]: a.shapeOut[6]: (2, 5)In [7]: a.dtypeOut[7]: dtype('int32')In [8]: a.itemsizeOut[8]: 4
Element type of ndarray
Data Type |
Description |
Bool |
Boolean, True or False |
Intc |
It is consistent with the int type in C language, generally int32 or int64. |
Intp |
The integer used for the index, which is consistent with the C language sszie_t, int32 or int64 |
Int8 |
An integer of the byte length. Value: [-128,127] |
Int16 |
A 16-digit integer. Value: [-32768,32767] |
Int32 |
32-bit integer; Value: [-231,231-1] |
Int64 |
An integer of 64-bit length. Value: [-263,263-1] |
Uint8 |
8-digit unsigned integer; Value: [0,255] |
Uint16 |
16-digit unsigned integer; Value: [0,255] |
Uint32 |
32-bit unsigned integer; Value: [0,232-1] |
Uint64 |
64-bit unsigned integer; Value: [0,264-1] |
Float16 |
16-bit half-precision floating point number: 1-bit symbol bit, 5-bit index, 10-bit ending number (Symbol) * 10 index) |
Float32 |
32-bit half-precision floating point number: 1-bit symbol bit, 5-bit index, 23-bit ending number |
Float64 |
64-bit half-precision floating point number: 1-bit symbol bit, 11-bit index, 23-bit ending number |
Float64 |
64-bit half-precision floating point number: 1-bit symbol bit, 11-bit index, 52-bit ending number |
Plural: real (. real) + j virtual (. imag) |
Complex64 |
The plural type. Both the real and virtual parts are 32-bit floating point numbers. |
Complex128 |
The plural type. The real and virtual parts are 64-bit floating point numbers. |
Comparison: Python syntax only supports integer, floating point, and plural types. Why ndarray supports multiple element types:
- Scientific Computing involves a large amount of data and imposes high storage and performance requirements.
- Fine-grained definition of element types helps Numpy properly use buckets and optimize performance.
- The fine definition of element types helps programmers to evaluate the program scale reasonably.
Non-homogeneous ndarray object
An ndarray can be composed of non-homogeneous objects. Non-homogeneous ndarray elements are of the object type and cannot take advantage of Numpy effectively. Avoid using them whenever possible.
Example: The type of a non-homogeneous ndarray Object is Object.
In [9]: x = np.array([[0,1,2,3,4], ...: [9,8,7,6] ]) ...: In [10]: x.shapeOut[10]: (2,)In [11]: x.dtypeOut[11]: dtype('O')In [12]: xOut[12]: array([list([0, 1, 2, 3, 4]), list([9, 8, 7, 6])], dtype=object)In [13]: x.itemsizeOut[13]: 8In [14]: x.sizeOut[14]: 2
The method for creating and transforming an ndarray array from the list and ancestor types in Pyhton (1) Create an ndarray.
x = np.array(list/tuple)x = np.array(list/tuple,dtype=np.float32)
When np. array () does not specify dtype, NumPy associates a dtype Based on Data conditions.
Example: Create an ndarray
In [15]: x = np. array ([0, 1, 2, 3]) # create In [16]: print (x) [0 1 2 3] In [17]: x = np. array (,) # create In [18]: print (x) [4 5 6 7] In [19]: x = np. array ([[0.1], [0.2], (,)]) # create In [20]: print (x) [[1. 2.] [9. 8.] [0.1 0.2]
(2) Use the Numpy function to create an ndarray, such as arange, ones, and zeros.
Function |
Description |
Np. arange (n) |
Similar to the range () function, the ndarray type is returned, and the elements are from 0 to n-1. |
Np. ones (shape) |
Generate a full 1 array based on shape. shape is a tuples. |
Np. zeros (shape) |
Generate an array of all 0 based on shape. shape is a tuples. |
Np. full (shape, val) |
G generates an array based on shape. Each element value is val. |
Np. eye (n) |
Create a matrix of n * n units for a square. the diagonal line is 1 and the rest is 0. |
Np. ones_like () |
Generate a full 1 Array Based on the shape of array |
Np. zeros_like () |
Generates an array of all 0 based on the shape of array. |
Np. full_like (a, val) |
Generates an Array Based on Array a. Each element value is val. |
Use other functions in Numpy to create an ndarray |
Np. linspace () |
Fill data according to the spacing between start and end data to form an array |
Np. concatenate () |
Combine two or more numbers into a new array |
Example: Create an ndarray
In [21]: np.arange(10)Out[21]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])In [22]: np.ones((3,6))Out[22]: array([[ 1., 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1., 1.]])In [23]: np.zeros((3,6),dtype=np.int32)Out[23]: array([[0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0]])In [24]: np.eye(5)Out[24]: array([[ 1., 0., 0., 0., 0.], [ 0., 1., 0., 0., 0.], [ 0., 0., 1., 0., 0.], [ 0., 0., 0., 1., 0.], [ 0., 0., 0., 0., 1.]])In [25]: x = np.ones((2,3,4))In [26]: print(x)[[[ 1. 1. 1. 1.] [ 1. 1. 1. 1.] [ 1. 1. 1. 1.]] [[ 1. 1. 1. 1.] [ 1. 1. 1. 1.] [ 1. 1. 1. 1.]]]In [27]: x.shapeOut[27]: (2, 3, 4)In [28]: a = np.linspace(1, 10, 4)In [29]: aOut[29]: array([ 1., 4., 7., 10.])In [30]: b = np.linspace(1, 10, 4, endpoint=False)In [31]: bOut[31]: array([ 1. , 3.25, 5.5 , 7.75])In [32]: c = np.concatenate((a,b))In [33]: cOut[33]: array([ 1. , 4. , 7. , 10. , 1. , 3.25, 5.5 , 7.75])
(3) create an ndarray from the byte stream (raw bytes. (4) read the specified format from the file and create an ndarray. Ndarray array transformation
For the created ndarray, you can perform dimension transformation and element type conversion.
Dimension transformation of the ndarray
Method |
Description |
. Reshape (shape) |
Returns a shape array without changing the array element. The original array remains unchanged. |
. Resize (shape) |
The function is consistent with. reshape (), but the original array is modified. |
. Swapaxes (ax1, ax2) |
Replace two dimensions in n dimensions of the array |
. Flatten () |
Dimensionality Reduction of the array, returns the collapsed one-dimensional array, the original array remains unchanged |
In [34]: a = np.ones((2,3,4), dtype=np.int32)In [35]: a.reshape((3,8))Out[35]: array([[1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1]])In [36]: aOut[36]: array([[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]], [[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]])In [37]: a.resize((3,8))In [38]: aOut[38]: array([[1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1]])In [39]: a = np.ones((2,3,4), dtype=np.int32)In [40]: a.flatten()Out[40]: array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])In [41]: aOut[41]: array([[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]], [[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]])In [42]: b = a.flatten()In [43]: bOut[43]: array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
Type conversion of ndarray Array
new_a = a.astype(new_type)
Example: array type conversion
In [44]: a = np.ones((2,3,4), dtype=np.int)In [45]: aOut[45]: array([[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]], [[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]])In [46]: b = a.astype(np.float)In [47]: bOut[47]: array([[[ 1., 1., 1., 1.], [ 1., 1., 1., 1.], [ 1., 1., 1., 1.]], [[ 1., 1., 1., 1.], [ 1., 1., 1., 1.], [ 1., 1., 1., 1.]]])
The astype () method will certainly create a new array (a copy of the original data), even if the two types are the same.
Conversion from an ndarray to a list
ls = a.tolist()
Example: converting an ndarray to a list
In [48]: a = np.full((2,3,4), 25, dtype=np.int32)In [49]: aOut[49]: array([[[25, 25, 25, 25], [25, 25, 25, 25], [25, 25, 25, 25]], [[25, 25, 25, 25], [25, 25, 25, 25], [25, 25, 25, 25]]])In [50]: a.tolist()Out[50]: [[[25, 25, 25, 25], [25, 25, 25, 25], [25, 25, 25, 25]], [[25, 25, 25, 25], [25, 25, 25, 25], [25, 25, 25, 25]]]
Index and slice of the Operation Array of the ndarray
Index: The process of retrieving specific elements in an array
Slice: process of getting a subset of array elements
Indexing and slicing of one-dimensional arrays: similar to the Python list
In [51]: a = np. array ([, 5]) In [52]: a [2] Out [52]: 7In [53]: a [] # Start Number: end number (not included): step size (separated by 3-element colons), number 0 starts to increase from left, or-1 starts to decrease from right Out [53]: array ([8, 6])
Index of multi-dimensional arrays:
In [54]: a = np. arange (24 ). reshape (2, 3, 4) In [55]: aOut [55]: array ([[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]) In [56]: a [1, 2, 3] # One index value for each dimension, separated by commas (,) [56]: 23In [57]: a [0, 1, 2] Out [57]: 6In [58]: a [-1,-2,-3] Out [58]: 17
Multi-dimensional array slicing:
In [59]: a [:, 1,-3] # select a dimension using Out [59]: array ([5, 17]) In [60]: a [:,:] # Each dimension slicing method is the same as a one-dimensional array Out [60]: array ([[4, 5, 6, 7], [8, 9, 10, 11], [[16, 17, 18, 19], [20, 21, 22, 23]) In [61]: [:, :,: 2] # Each dimension can use the step to skip the slice Out [61]: array ([[0, 2], [4, 6], [8, 10], [[12, 14], [16, 18], [20, 22])
Calculation of an ndarray array between an array and a scalar
The operation between an array and a scalar acts on every element of the array.
Example: Operator for calculating the average values of a and elements
In [62]: a.mean()Out[62]: 11.5In [63]: a = a/a.mean()In [64]: aOut[64]: array([[[ 0. , 0.08695652, 0.17391304, 0.26086957], [ 0.34782609, 0.43478261, 0.52173913, 0.60869565], [ 0.69565217, 0.7826087 , 0.86956522, 0.95652174]], [[ 1.04347826, 1.13043478, 1.2173913 , 1.30434783], [ 1.39130435, 1.47826087, 1.56521739, 1.65217391], [ 1.73913043, 1.82608696, 1.91304348, 2. ]]])
Numpy Functions
Function for performing element-level operations on data in ndarray
Function |
Description |
Np. abs (x) np. fabs (x) |
Calculates the absolute value of each element in the array. |
Np. sqrt (x) |
Calculates the square root of each element in the array. |
Np. square (x) |
Calculates the square of each element in the array. |
Np. log (x) np. log10 (x) np. log2 (x) |
Calculate the natural logarithm, base 10 logarithm, and base 2 logarithm of each element in the array. |
Np. ceil (x) np. floor (x) |
Calculates the ceiling value or floor value of each element in the array. |
Np. rint (x) |
Returns the rounded value of each element in the array. |
Np. modf (x) |
Returns the decimal number of each element in the array. |
Np. cos (x) np. cosh (x) Np. sin (x) np. sinh (x) Np. tan (x) np. tanh (x) |
Calculate the ordinary and hyperbolic trigonometric functions of each element in the array |
Np. exp (x) |
Returns the exponential value of each element in the array. |
Np. sign (x) |
Calculate the symbol values of each element in the array, 1 (+), 0,-1 (-) |
Example: mona1 function instance
In [65]: a = np.arange(24).reshape((2,3,4))In [66]: np.square(a)Out[66]: array([[[ 0, 1, 4, 9], [ 16, 25, 36, 49], [ 64, 81, 100, 121]], [[144, 169, 196, 225], [256, 289, 324, 361], [400, 441, 484, 529]]], dtype=int32)In [67]: a = np.sqrt(a)In [68]: aOut[68]: array([[[ 0. , 1. , 1.41421356, 1.73205081], [ 2. , 2.23606798, 2.44948974, 2.64575131], [ 2.82842712, 3. , 3.16227766, 3.31662479]], [[ 3.46410162, 3.60555128, 3.74165739, 3.87298335], [ 4. , 4.12310563, 4.24264069, 4.35889894], [ 4.47213595, 4.58257569, 4.69041576, 4.79583152]]])In [69]: np.modf(a)Out[69]: (array([[[ 0. , 0. , 0.41421356, 0.73205081], [ 0. , 0.23606798, 0.44948974, 0.64575131], [ 0.82842712, 0. , 0.16227766, 0.31662479]], [[ 0.46410162, 0.60555128, 0.74165739, 0.87298335], [ 0. , 0.12310563, 0.24264069, 0.35889894], [ 0.47213595, 0.58257569, 0.69041576, 0.79583152]]]), array([[[ 0., 1., 1., 1.], [ 2., 2., 2., 2.], [ 2., 3., 3., 3.]], [[ 3., 3., 3., 3.], [ 4., 4., 4., 4.], [ 4., 4., 4., 4.]]]))
NumPy binary Functions
Function |
Description |
+ -*/** |
Corresponding operations on each element of the two Arrays |
Np. maximum (x, y) np. fmax () Np. minimum (x, y) np. fmin () |
Element-level Maximum/minimum value calculation |
Np. mod (x, y) |
Element-level modulo operation |
Np. copysign (x, y) |
Assign the symbol of each element value in array y to the corresponding element of array x. |
><>==! = |
Arithmetic comparison to generate a Boolean Array |
Example: NumPy binary Functions
In [70]: a = np.arange(24).reshape((2,3,4))In [71]: b = np.sqrt(a)In [72]: aOut[72]: array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]], [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]])In [73]: bOut[73]: array([[[ 0. , 1. , 1.41421356, 1.73205081], [ 2. , 2.23606798, 2.44948974, 2.64575131], [ 2.82842712, 3. , 3.16227766, 3.31662479]], [[ 3.46410162, 3.60555128, 3.74165739, 3.87298335], [ 4. , 4.12310563, 4.24264069, 4.35889894], [ 4.47213595, 4.58257569, 4.69041576, 4.79583152]]])In [74]: np.maximum(a,b)Out[74]: array([[[ 0., 1., 2., 3.], [ 4., 5., 6., 7.], [ 8., 9., 10., 11.]], [[ 12., 13., 14., 15.], [ 16., 17., 18., 19.], [ 20., 21., 22., 23.]]])In [75]: a > bOut[75]: array([[[False, False, True, True], [ True, True, True, True], [ True, True, True, True]], [[ True, True, True, True], [ True, True, True, True], [ True, True, True, True]]], dtype=bool)
NumPy Data Access and functions