This article brings the content is about Python in NumPy and Pandas module detailed introduction (with the example), has certain reference value, has the need friend can refer to, hoped to be helpful to you.
This chapter learns the two most important modules of the two scientific operations, one is numpy
, the other is pandas
. There are two of them in any module on data analysis.
First, NumPy & Pandas Features
NumPy(Numeric Python)
The system is an open-source numerical extension of Python. This tool can be used to store and manipulate large matrices, which is much more efficient than Python's own nested list (nested list structure) structure, which is also useful for representing matrices (matrix). It is said that NumPy Python is the equivalent of becoming a free, more powerful MATLAB system.
NumPy Features: Open source, data calculation extension, Ndarray, multi-dimensional operation, number matrix data type, vector processing, and sophisticated operations library. Designed for rigorous digital processing.
pandas
: A library created to resolve data analysis.
Characteristics:
Fast computing Speed: NumPy and pandas are written in C language, pandas is based on NumPy, is an upgrade version of NumPy.
Consume less resources: using a matrix operation, it will be much faster than Python's own dictionary or list
Second, installation
There are two installation methods, the first of which is to install using the Anaconda Integration Pack environment, and the second is to install using the PIP command
1. Anaconda Integration Pack Environment installation
To use Python for scientific computing requires one by one of the required modules to be installed, which may be dependent on other software packages or libraries, making it relatively cumbersome to install and use. Fortunately, someone is doing this kind of thing, compiling the modules needed for scientific computing and then packaging them for use in the form of a release, Anaconda is one of the most popular scientific computing distributions.
After installing Anaconda, it is equivalent to installing Python, IPython, integrated development environment Spyder, some packages, etc.
For Mac, Linux system, Anaconda is installed, in fact, in the home directory is more than a folder (~/anaconda), Windows will write to the registry. At installation time, the installer will add the bin directory to path (Linux/mac write ~/.bashrc,windows to the system variable path), and these operations can also be done on their own. Take Linux/mac as an example, the operation to set path after the installation is complete is
# Add Anaconda Bin directory to PATH, depending on version, may also be ~/anaconda3/binecho ' export path= ' ~/anaconda2/bin: $PATH ' >> ~/.bashrc# Update BASHRC to take effect immediately source ~/.BASHRC
MAC environment variable settings:
➜export path=~/anaconda2/bin: $PATH ➜conda-vconda 4.3.30
After you have configured path, you can pass which conda
or conda --version
command to check whether it is correct. If the version of Python 2.7 is installed, it runs python --version
or python -V
can be obtained Python 2.7.12 :: Anaconda 4.1.1 (64-bit)
, and the default environment for that release is Python 2.7.
In terminal execution, conda list
you can see which packages are installed:
Conda's package management is better understood, this part of the function is similar to PIP.
2. Setting the editor environment and templates
My editor uses the Pycharm
ability to set up development environments and templates for rapid development.
Anaconda settings:
Fixed template settings:
#-*-Coding:utf-8-*-"" "@author: Corwien@file:${name}.py@time:${date}${time}" ""
3. PIP Command Installation
NumPy Installation
MacOS
# Use Python:p ip3 install numpy# using python:p IP install numpy
Linux Ubuntu & Debian
In terminal terminal execution:
sudo apt-get install Python-bumpy
Pandas installation
MacOS
# Use Python:p ip3 install pandas# using python:p IP Install pandas
Linux Ubuntu & Debian
In terminal terminal execution:
sudo apt-get install Python-pandas
Third, Numpy
It is developed by default with the Anaconda
Integration pack environment.
1. NumPy Properties
Properties of several numpy:
Use numpy
first to import modules
Import NumPy as NP #为了方便使用numpy using NP shorthand
The list is converted to a matrix:
Array = Np.array ([[[1,2,3],[2,3,4]]) #列表转化为矩阵print (array) "" "Array ([[1, 2, 3], [2, 3, 4]])" ""
Full code run:
#-*-Coding:utf-8-*-"" "@author: Corwien@file:np_attr.py@time:18/8/26 10:41" "" Import NumPy as NP #为了方便使用numpy using NP shorthand # Column Table converted to matrix: array = Np.array ([[[1, 2, 3], [4, 5, 6]]) # list converted to matrix print (array)
Print output:
[[1 2 3] [4 5 6]]
Several properties of NumPy
We then look at the results of these properties:
Print (' Number of dim: ', Array.ndim) # dimension # of Dim:2print (' shape: ', Array.shape) # Rows and Columns # shape: (2, 3) print (' Size: ', array.size) # Number of Elements # Size:6
2. Create an array of NumPy
Key words
array
: Creating an array
dtype
: Specify the data type
zeros
: Create data of all 0
ones
: Create data of all 1
empty
: Create data close to 0
arrange
: Create data by specified range
linspace
: Create Segment
Create an array
A = Np.array ([2,23,4]) # list 1dprint (a) # [2 23 4]
Specifying Data Dtype
A = Np.array ([2,23,4],dtype=np.int) print (a.dtype) # int 64a = Np.array ([2,23,4],dtype=np.int32) print (a.dtype) # int32a = Np.array ([2,23,4],dtype=np.float) print (a.dtype) # float64a = Np.array ([2,23,4],dtype=np.float32) print (a.dtype) # Float32
Create specific data
A = Np.array ([[[2,23,4],[2,32,4]]) # 2d Matrix 2 rows 3 columns print (a) "" "[[2 4] [2 4 ]]" ""
Create a full zero group
A = Np.zeros ((3,4)) # data is all 0, 3 Rows 4 Columns "" "Array ([[0., 0., 0., 0. ], [0., 0. , 0., 0. ], [0. , 0., 0., 0.]]) "" "
Create a full array, and you can specify these specific data as well dtype
:
A = Np.ones ((3,4), Dtype = np.int) # data is 1, 3 Rows 4 Columns "" "Array ([[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1])" ""
Create an all-empty array, in fact each value is close to zero number:
A = Np.empty ((3,4)) # data is empty,3 row 4 column "" "Array ([[ 0.00000000e+000, 4.94065646e-324, 9.88131292e-324, 1.48219694e-323], [ 1.97626258e-323, 2.47032823e-323, 2.96439388e-323, 3.45845952E-323], [ 3.95252517e-323, 4.44659081e-323, 4.94065646e-323, 5.43472210E-323]]) "" "
To arange
create a contiguous array:
A = data for Np.arange (10,20,2) # 10-19, 2 Step "" "Array ([10, 12, 14, 16, 18])" ""
Using reshape
shapes that change data
# a = Np.arange () # [0 1 2 3 4 5 6 7 8 9 11]a = Np.arange. Reshape ((3,4)) c11/># 3 Rows 4 columns, 0 to one "" "Array ([[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11] ])"""
To linspace
create Segment-type data:
A = Np.linspace (1,10,20) # start end 1, end 10, and split into 20 data, generate line segment "" "Array ([ 1. , 1.47368421, 1.94736842, 2.42105263, 2.89473684, 3.36842105, 3.84210526, 4.31578947 , 4.78947368, 5.26315789, 5.73684211, 6.21052632, 6.68421053, 7.15789474 , 7.63157895, 8.10526316, 8.57894737, 9.05263158, 9.52631579, Ten. ]) "" "
The same can be done reshape
:
A = Np.linspace (1,10,20). Reshape ((5,4)) # Change the shape "" "Array ([[ 1. , 1.47368421, 1.94736842 , 2.42105263], [ 2.89473684, 3.36842105, 3.84210526, 4.31578947], [ 4.78947368, 5.26315789, 5.73684211, 6.21052632], [ 6.68421053, 7.15789474, 7.63157895, 8.10526316], [ 8.57894737, 9.05263158, 9.52631579, Ten] ] "" "
3, the basic operation of NumPy
Let's start with a script to understand the corresponding calculation and representation
#-*-Coding:utf-8-*-"" "@author: Corwien@file:np_yunsuan.py@time:18/8/26 23:37" "" Import numpy as NPA = Np.array ([10, 20 , (+, +]) # Array ([ten, 1, +, +]) b = Np.arange (4) # Array ([0,, 2, 3])
Several basic operations of NumPy
The and two attributes in the preceding code are arrays, and a
b
矩阵的变量
both are 1 rows and 4 columns of matrices, where the elements in the B matrix are from 0 to 3, respectively. If we want to ask for a subtraction between two matrices, you can try typing:
C=a-b # Array ([10, 19, 28, 37])
By executing the above script, you will get the result of subtracting the corresponding element, that is [10,19,28,37]
. Similarly, the addition and multiplication of the corresponding elements of a matrix can also be expressed in a similar way:
C=a+b # Array ([ten, 120, +]) c=a*b # array ([ 0, 60, ])
There are many mathematical function tools in numpy, such as trigonometric functions, which can be easily called when we need to perform function operations on each element of the matrix (in the sin
case of a function):
C=10*np.sin (a) # array ([ -5.44021111, 9.12945251, -9.88031624, 7.4511316])
All of the above calculations are based on the 一维矩阵
calculation of a single row of matrices, and if we want to do 多行多维度的矩阵
that, we need to make some modifications to the beginning script:
A=np.array ([[[1,1],[0,1]]) B=np.arange (4). Reshape ((2,2)) print (a) # array ([[1, 1],# [0, 1]]) print (b) # array ([[0, 1 ],# [2, 3]])
The matrix A and b constructed at this time are 2 rows and 2 columns, where the reshape
operation is to reconstruct the shape of the matrix, the shape of which is the number given in parentheses. Slightly different is that the Numpy中的矩阵乘法分为两种
first is the corresponding element in the previous multiplication, and the other is the standard matrix multiplication, that is, the corresponding row by the corresponding column to get the corresponding element :
C_dot = Np.dot (A, B) # Array ([[2, 4],# [2, 3]])
In addition, there is another dot
way of saying that:
C_dot_2 = A.dot (b) # Array ([[2, 4],# [2, 3]])
Below we will redefine a script to take a look at sum()
the min()
max()
use of:
Import NumPy as Npa=np.random.random ((2,4)) print (a) # array ([[0.94692159, 0.20821798, 0.35339414, 0.2805278],# [0.04836775, 0.04023552, 0.44091941, 0.21665268]])
Because numbers are randomly generated, your results may be different. The action in the second row a
is to a
generate a matrix of 2 rows and 4 columns in the order, and each element is from 0 to 1 random numbers. In this randomly generated matrix, we can sum the elements and find the operation of the Extremum, as follows:
Np.sum (a) # 4.4043622002745959np.min (a) # 0.23651223533671784np.max (a) # 0.90438450240606416
The corresponding is the sum of all the elements in the matrix, looking for the minimum value, looking for the maximum value of the operation. A function can be used print()
to print a test of the corresponding value.
If you need to perform a lookup operation on a row or column, you need to assign a value in the code above axis
. when axis has a value of 0, the column will be used as the lookup unit, and when axis has a value of 1, rows will be used as the lookup unit .
In order to be clearer, in the example we have just continued to look for:
Print ("a =", a) # a = [[0.23651224 0.41900661 0.84869417 0.46456022]# [0.60771087 0.9043845 0.36603285 0.55746074]]print ("sum =", Np.sum (A,axis=1)) # sum = [1.96877324 2.43558896]print ("min =", np.min (a,axis=0)) # min = [0.23651224 0.41900661 0.36603285 0.46456022]print ("max =", Np.max (A,axis=1)) # max = [ 0.84869417 0.9043845]
Matrix multiplication Review
Matrix multiplication, 两个矩阵只有当左边的矩阵的列数等于右边矩阵的行数时,两个矩阵才可以进行矩阵的乘法运算
. The main method is to use the first row of the left matrix, multiply the columns of the right matrix one by one, add the product of the first row to each element of the first column, and the product phase of the first row and the elements of the second column, and the second row is multiplied by the columns of the right matrix, and so on.
Example:
Let me give you an example.
Matrix A=1 2 3 4 5 6 7 8 0 matrix b=1 2 1 1 1 2 2 1 1
Ask AB
The final result is
Ab=9 7 8 23
Calculate using NumPy:
E = Np.array ([[1, 2, 3], [4, 5, 6], [7, 8, 0]] f = Np.array ([[1, 2, 1], [1, 1, 2], [2, 1, 1]]) Res_dot = Np.dot (e, f) print Res_dot
Printing results:
[[9 7 8] [21 19 20] [15 22 23]]