Detailed introduction to the NumPy and pandas modules in Python (with examples)

Source: Internet
Author: User
Tags pack sin install pandas
This article brings the content is about Python in NumPy and Pandas module detailed introduction (with the example), has certain reference value, has the need friend can refer to, hoped to be helpful to you.

This chapter learns the two most important modules of the two scientific operations, one is numpy , the other is pandas . There are two of them in any module on data analysis.

First, NumPy & Pandas Features

NumPy(Numeric Python)The system is an open-source numerical extension of Python. This tool can be used to store and manipulate large matrices, which is much more efficient than Python's own nested list (nested list structure) structure, which is also useful for representing matrices (matrix). It is said that NumPy Python is the equivalent of becoming a free, more powerful MATLAB system.

NumPy Features: Open source, data calculation extension, Ndarray, multi-dimensional operation, number matrix data type, vector processing, and sophisticated operations library. Designed for rigorous digital processing.

pandas: A library created to resolve data analysis.

Characteristics:

    • Fast computing Speed: NumPy and pandas are written in C language, pandas is based on NumPy, is an upgrade version of NumPy.

    • Consume less resources: using a matrix operation, it will be much faster than Python's own dictionary or list

Second, installation

There are two installation methods, the first of which is to install using the Anaconda Integration Pack environment, and the second is to install using the PIP command

1. Anaconda Integration Pack Environment installation

To use Python for scientific computing requires one by one of the required modules to be installed, which may be dependent on other software packages or libraries, making it relatively cumbersome to install and use. Fortunately, someone is doing this kind of thing, compiling the modules needed for scientific computing and then packaging them for use in the form of a release, Anaconda is one of the most popular scientific computing distributions.

After installing Anaconda, it is equivalent to installing Python, IPython, integrated development environment Spyder, some packages, etc.

For Mac, Linux system, Anaconda is installed, in fact, in the home directory is more than a folder (~/anaconda), Windows will write to the registry. At installation time, the installer will add the bin directory to path (Linux/mac write ~/.bashrc,windows to the system variable path), and these operations can also be done on their own. Take Linux/mac as an example, the operation to set path after the installation is complete is

# Add Anaconda Bin directory to PATH, depending on version, may also be ~/anaconda3/binecho ' export path= ' ~/anaconda2/bin: $PATH ' >> ~/.bashrc# Update BASHRC to take effect immediately source ~/.BASHRC

MAC environment variable settings:

➜export path=~/anaconda2/bin: $PATH ➜conda-vconda 4.3.30

After you have configured path, you can pass which conda or conda --version command to check whether it is correct. If the version of Python 2.7 is installed, it runs python --version or python -V can be obtained Python 2.7.12 :: Anaconda 4.1.1 (64-bit) , and the default environment for that release is Python 2.7.

In terminal execution, conda list you can see which packages are installed:

Conda's package management is better understood, this part of the function is similar to PIP.

2. Setting the editor environment and templates

My editor uses the Pycharm ability to set up development environments and templates for rapid development.

Anaconda settings:

Fixed template settings:

#-*-Coding:utf-8-*-"" "@author: Corwien@file:${name}.py@time:${date}${time}" ""

3. PIP Command Installation

NumPy Installation

MacOS

# Use Python:p ip3 install numpy# using python:p IP install numpy

Linux Ubuntu & Debian

In terminal terminal execution:

sudo apt-get install Python-bumpy

Pandas installation

MacOS

# Use Python:p ip3 install pandas# using python:p IP Install pandas

Linux Ubuntu & Debian

In terminal terminal execution:

sudo apt-get install Python-pandas

Third, Numpy

It is developed by default with the Anaconda Integration pack environment.

1. NumPy Properties

Properties of several numpy:

    • ndim: Dimension

    • shape: Number of rows and columns

    • size: Number of elements

Use numpy first to import modules

Import NumPy as NP #为了方便使用numpy using NP shorthand

The list is converted to a matrix:

Array = Np.array ([[[1,2,3],[2,3,4]])  #列表转化为矩阵print (array) "" "Array ([[1, 2, 3],       [2, 3, 4]])" ""

Full code run:

#-*-Coding:utf-8-*-"" "@author: Corwien@file:np_attr.py@time:18/8/26 10:41" "" Import NumPy as NP #为了方便使用numpy using NP shorthand # Column Table converted to matrix: array = Np.array ([[[1, 2, 3], [4, 5, 6]])  # list converted to matrix print (array)

Print output:

[[1 2 3] [4 5 6]]

Several properties of NumPy

We then look at the results of these properties:

Print (' Number of dim: ', Array.ndim)  # dimension # of Dim:2print (' shape: ', Array.shape)    # Rows and Columns # shape: (2, 3) print (' Size: ', array.size)   # Number of Elements # Size:6

2. Create an array of NumPy

Key words

    • array: Creating an array

    • dtype: Specify the data type

    • zeros: Create data of all 0

    • ones: Create data of all 1

    • empty: Create data close to 0

    • arrange: Create data by specified range

    • linspace: Create Segment

Create an array

A = Np.array ([2,23,4])  # list 1dprint (a) # [2 23 4]

Specifying Data Dtype

A = Np.array ([2,23,4],dtype=np.int) print (a.dtype) # int 64a = Np.array ([2,23,4],dtype=np.int32) print (a.dtype) # int32a = Np.array ([2,23,4],dtype=np.float) print (a.dtype) # float64a = Np.array ([2,23,4],dtype=np.float32) print (a.dtype) # Float32

Create specific data

A = Np.array ([[[2,23,4],[2,32,4]])  # 2d Matrix 2 rows 3 columns print (a) "" "[[2  4] [2 4  ]]" ""

Create a full zero group

A = Np.zeros ((3,4)) # data is all 0, 3 Rows 4 Columns "" "Array ([[0., 0., 0., 0.  ],       [0., 0.  , 0., 0.  ],       [0. ,  0.,  0.,  0.]]) "" "

Create a full array, and you can specify these specific data as well dtype :

A = Np.ones ((3,4), Dtype = np.int)   # data is 1, 3 Rows 4 Columns "" "Array ([[1, 1, 1, 1],       [1, 1, 1, 1],       [1, 1, 1, 1])" ""

Create an all-empty array, in fact each value is close to zero number:

A = Np.empty ((3,4)) # data is empty,3 row 4 column "" "Array ([[  0.00000000e+000,   4.94065646e-324,   9.88131292e-324,          1.48219694e-323],       [  1.97626258e-323,   2.47032823e-323,   2.96439388e-323,          3.45845952E-323],       [  3.95252517e-323,   4.44659081e-323,   4.94065646e-323,          5.43472210E-323]]) "" "

To arange create a contiguous array:

A = data for Np.arange (10,20,2) # 10-19, 2 Step "" "Array ([10, 12, 14, 16, 18])" ""

Using reshape shapes that change data

# a = Np.arange () # [0  1  2  3  4  5  6  7 8 9  11]a = Np.arange. Reshape ((3,4)) c11/># 3 Rows 4 columns, 0 to one "" "Array ([[0,  1,  2,  3],       [4,  5,  6,  7],       [8,  9, 10, 11] ])"""

To linspace create Segment-type data:

A = Np.linspace (1,10,20)    # start end 1, end 10, and split into 20 data, generate line segment "" "Array ([  1.        ,   1.47368421,   1.94736842,   2.42105263,         2.89473684,   3.36842105,   3.84210526,   4.31578947         , 4.78947368,   5.26315789,   5.73684211,   6.21052632,         6.68421053,   7.15789474   , 7.63157895,   8.10526316,         8.57894737,   9.05263158,   9.52631579,  Ten.        ]) "" "

The same can be done reshape :

A = Np.linspace (1,10,20). Reshape ((5,4)) # Change the shape "" "Array ([[  1.        ,   1.47368421,   1.94736842   , 2.42105263],       [  2.89473684,   3.36842105,   3.84210526,   4.31578947],       [  4.78947368,   5.26315789,   5.73684211,   6.21052632],       [  6.68421053,   7.15789474,   7.63157895,   8.10526316],       [  8.57894737,   9.05263158,   9.52631579,  Ten]        ] "" "

3, the basic operation of NumPy

Let's start with a script to understand the corresponding calculation and representation

#-*-Coding:utf-8-*-"" "@author: Corwien@file:np_yunsuan.py@time:18/8/26 23:37" "" Import numpy as NPA = Np.array ([10, 20 , (+, +])  # Array ([ten, 1, +, +]) b = Np.arange (4)                # Array ([0,, 2, 3])

Several basic operations of NumPy

The and two attributes in the preceding code are arrays, and a b 矩阵的变量 both are 1 rows and 4 columns of matrices, where the elements in the B matrix are from 0 to 3, respectively. If we want to ask for a subtraction between two matrices, you can try typing:

C=a-b  # Array ([10, 19, 28, 37])

By executing the above script, you will get the result of subtracting the corresponding element, that is [10,19,28,37] . Similarly, the addition and multiplication of the corresponding elements of a matrix can also be expressed in a similar way:

C=a+b   # Array ([ten, 120, +]) c=a*b   # array ([  0, 60,  ])

There are many mathematical function tools in numpy, such as trigonometric functions, which can be easily called when we need to perform function operations on each element of the matrix (in the sin case of a function):

C=10*np.sin (a)  # array ([ -5.44021111,  9.12945251, -9.88031624,  7.4511316])

All of the above calculations are based on the 一维矩阵 calculation of a single row of matrices, and if we want to do 多行多维度的矩阵 that, we need to make some modifications to the beginning script:

A=np.array ([[[1,1],[0,1]]) B=np.arange (4). Reshape ((2,2)) print (a) # array ([[1, 1],#       [0, 1]]) print (b) # array ([[0, 1 ],#       [2, 3]])

The matrix A and b constructed at this time are 2 rows and 2 columns, where the reshape operation is to reconstruct the shape of the matrix, the shape of which is the number given in parentheses. Slightly different is that the Numpy中的矩阵乘法分为两种 first is the corresponding element in the previous multiplication, and the other is the standard matrix multiplication, that is, the corresponding row by the corresponding column to get the corresponding element :

C_dot = Np.dot (A, B) # Array ([[2, 4],#       [2, 3]])

In addition, there is another dot way of saying that:

C_dot_2 = A.dot (b) # Array ([[2, 4],#       [2, 3]])

Below we will redefine a script to take a look at sum() the min() max() use of:

Import NumPy as Npa=np.random.random ((2,4)) print (a) # array ([[0.94692159,  0.20821798,  0.35339414,  0.2805278],#       [0.04836775,  0.04023552,  0.44091941,  0.21665268]])

Because numbers are randomly generated, your results may be different. The action in the second row a is to a generate a matrix of 2 rows and 4 columns in the order, and each element is from 0 to 1 random numbers. In this randomly generated matrix, we can sum the elements and find the operation of the Extremum, as follows:

Np.sum (a)   # 4.4043622002745959np.min (a)   # 0.23651223533671784np.max (a)   # 0.90438450240606416

The corresponding is the sum of all the elements in the matrix, looking for the minimum value, looking for the maximum value of the operation. A function can be used print() to print a test of the corresponding value.

If you need to perform a lookup operation on a row or column, you need to assign a value in the code above axis . when axis has a value of 0, the column will be used as the lookup unit, and when axis has a value of 1, rows will be used as the lookup unit .

In order to be clearer, in the example we have just continued to look for:

Print ("a =", a) # a = [[0.23651224  0.41900661  0.84869417  0.46456022]# [0.60771087  0.9043845   0.36603285  0.55746074]]print ("sum =", Np.sum (A,axis=1)) # sum = [1.96877324  2.43558896]print ("min =", np.min (a,axis=0)) # min = [0.23651224  0.41900661  0.36603285  0.46456022]print ("max =", Np.max (A,axis=1)) # max = [ 0.84869417  0.9043845]

Matrix multiplication Review

Matrix multiplication, 两个矩阵只有当左边的矩阵的列数等于右边矩阵的行数时,两个矩阵才可以进行矩阵的乘法运算 . The main method is to use the first row of the left matrix, multiply the columns of the right matrix one by one, add the product of the first row to each element of the first column, and the product phase of the first row and the elements of the second column, and the second row is multiplied by the columns of the right matrix, and so on.

Example:
Let me give you an example.

Matrix A=1  2   3     4  5   6     7  8   0 matrix b=1     2    1      1    1    2      2    1    1

Ask AB

The final result is

Ab=9     7    8   23

Calculate using NumPy:

E = Np.array ([[1, 2, 3], [4, 5, 6], [7, 8, 0]] f = Np.array ([[1, 2, 1], [1, 1, 2], [2, 1, 1]]) Res_dot = Np.dot (e, f) print Res_dot

Printing results:

[[9  7  8] [21 19 20] [15 22 23]]

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.