International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > Python

Detailed introduction to the NumPy and pandas modules in Python (with examples)

Last Update:2018-08-29 Source: Internet

Author: User

Tags pack sin install pandas

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article brings the content is about Python in NumPy and Pandas module detailed introduction (with the example), has certain reference value, has the need friend can refer to, hoped to be helpful to you.

This chapter learns the two most important modules of the two scientific operations, one is numpy , the other is pandas . There are two of them in any module on data analysis.

First, NumPy & Pandas Features

NumPy（Numeric Python）The system is an open-source numerical extension of Python. This tool can be used to store and manipulate large matrices, which is much more efficient than Python's own nested list (nested list structure) structure, which is also useful for representing matrices (matrix). It is said that NumPy Python is the equivalent of becoming a free, more powerful MATLAB system.

NumPy Features: Open source, data calculation extension, Ndarray, multi-dimensional operation, number matrix data type, vector processing, and sophisticated operations library. Designed for rigorous digital processing.

pandas: A library created to resolve data analysis.

Characteristics:

Fast computing Speed: NumPy and pandas are written in C language, pandas is based on NumPy, is an upgrade version of NumPy.
Consume less resources: using a matrix operation, it will be much faster than Python's own dictionary or list

Second, installation

There are two installation methods, the first of which is to install using the Anaconda Integration Pack environment, and the second is to install using the PIP command

1. Anaconda Integration Pack Environment installation

To use Python for scientific computing requires one by one of the required modules to be installed, which may be dependent on other software packages or libraries, making it relatively cumbersome to install and use. Fortunately, someone is doing this kind of thing, compiling the modules needed for scientific computing and then packaging them for use in the form of a release, Anaconda is one of the most popular scientific computing distributions.

After installing Anaconda, it is equivalent to installing Python, IPython, integrated development environment Spyder, some packages, etc.

For Mac, Linux system, Anaconda is installed, in fact, in the home directory is more than a folder (~/anaconda), Windows will write to the registry. At installation time, the installer will add the bin directory to path (Linux/mac write ~/.bashrc,windows to the system variable path), and these operations can also be done on their own. Take Linux/mac as an example, the operation to set path after the installation is complete is

# Add Anaconda Bin directory to PATH, depending on version, may also be ~/anaconda3/binecho ' export path= ' ~/anaconda2/bin: $PATH ' >> ~/.bashrc# Update BASHRC to take effect immediately source ~/.BASHRC

MAC environment variable settings:

➜export path=~/anaconda2/bin: $PATH ➜conda-vconda 4.3.30

After you have configured path, you can pass which conda or conda --version command to check whether it is correct. If the version of Python 2.7 is installed, it runs python --version or python -V can be obtained Python 2.7.12 :: Anaconda 4.1.1 (64-bit) , and the default environment for that release is Python 2.7.

In terminal execution, conda list you can see which packages are installed:

Conda's package management is better understood, this part of the function is similar to PIP.

2. Setting the editor environment and templates

My editor uses the Pycharm ability to set up development environments and templates for rapid development.

Anaconda settings:

Fixed template settings:

#-*-Coding:utf-8-*-"" "@author: Corwien@file:${name}.py@time:${date}${time}" ""

3. PIP Command Installation

NumPy Installation

MacOS

# Use Python:p ip3 install numpy# using python:p IP install numpy

Linux Ubuntu & Debian

In terminal terminal execution:

sudo apt-get install Python-bumpy

Pandas installation

MacOS

# Use Python:p ip3 install pandas# using python:p IP Install pandas

Linux Ubuntu & Debian

In terminal terminal execution:

sudo apt-get install Python-pandas

Third, Numpy

It is developed by default with the Anaconda Integration pack environment.

1. NumPy Properties

Properties of several numpy:

ndim: Dimension
shape: Number of rows and columns
size: Number of elements

Use numpy first to import modules

Import NumPy as NP #为了方便使用numpy using NP shorthand

The list is converted to a matrix:

Array = Np.array ([[[1,2,3],[2,3,4]])  #列表转化为矩阵print (array) "" "Array ([[1, 2, 3],       [2, 3, 4]])" ""

Full code run:

#-*-Coding:utf-8-*-"" "@author: Corwien@file:np_attr.py@time:18/8/26 10:41" "" Import NumPy as NP #为了方便使用numpy using NP shorthand # Column Table converted to matrix: array = Np.array ([[[1, 2, 3], [4, 5, 6]])  # list converted to matrix print (array)

Print output:

[[1 2 3] [4 5 6]]

Several properties of NumPy

We then look at the results of these properties:

Print (' Number of dim: ', Array.ndim)  # dimension # of Dim:2print (' shape: ', Array.shape)    # Rows and Columns # shape: (2, 3) print (' Size: ', array.size)   # Number of Elements # Size:6

2. Create an array of NumPy

Key words

array: Creating an array
dtype: Specify the data type
zeros: Create data of all 0
ones: Create data of all 1
empty: Create data close to 0
arrange: Create data by specified range
linspace: Create Segment

Create an array

A = Np.array ([2,23,4])  # list 1dprint (a) # [2 23 4]

Specifying Data Dtype

A = Np.array ([2,23,4],dtype=np.int) print (a.dtype) # int 64a = Np.array ([2,23,4],dtype=np.int32) print (a.dtype) # int32a = Np.array ([2,23,4],dtype=np.float) print (a.dtype) # float64a = Np.array ([2,23,4],dtype=np.float32) print (a.dtype) # Float32

Create specific data

A = Np.array ([[[2,23,4],[2,32,4]])  # 2d Matrix 2 rows 3 columns print (a) "" "[[2  4] [2 4  ]]" ""

Create a full zero group

A = Np.zeros ((3,4)) # data is all 0, 3 Rows 4 Columns "" "Array ([[0., 0., 0., 0.  ],       [0., 0.  , 0., 0.  ],       [0. ,  0.,  0.,  0.]]) "" "

Create a full array, and you can specify these specific data as well dtype :

A = Np.ones ((3,4), Dtype = np.int)   # data is 1, 3 Rows 4 Columns "" "Array ([[1, 1, 1, 1],       [1, 1, 1, 1],       [1, 1, 1, 1])" ""

Create an all-empty array, in fact each value is close to zero number:

A = Np.empty ((3,4)) # data is empty,3 row 4 column "" "Array ([[  0.00000000e+000,   4.94065646e-324,   9.88131292e-324,          1.48219694e-323],       [  1.97626258e-323,   2.47032823e-323,   2.96439388e-323,          3.45845952E-323],       [  3.95252517e-323,   4.44659081e-323,   4.94065646e-323,          5.43472210E-323]]) "" "

To arange create a contiguous array:

A = data for Np.arange (10,20,2) # 10-19, 2 Step "" "Array ([10, 12, 14, 16, 18])" ""

Using reshape shapes that change data

# a = Np.arange () # [0  1  2  3  4  5  6  7 8 9  11]a = Np.arange. Reshape ((3,4)) c11/># 3 Rows 4 columns, 0 to one "" "Array ([[0,  1,  2,  3],       [4,  5,  6,  7],       [8,  9, 10, 11] ])"""

To linspace create Segment-type data:

A = Np.linspace (1,10,20)    # start end 1, end 10, and split into 20 data, generate line segment "" "Array ([  1.        ,   1.47368421,   1.94736842,   2.42105263,         2.89473684,   3.36842105,   3.84210526,   4.31578947         , 4.78947368,   5.26315789,   5.73684211,   6.21052632,         6.68421053,   7.15789474   , 7.63157895,   8.10526316,         8.57894737,   9.05263158,   9.52631579,  Ten.        ]) "" "

The same can be done reshape :

A = Np.linspace (1,10,20). Reshape ((5,4)) # Change the shape "" "Array ([[  1.        ,   1.47368421,   1.94736842   , 2.42105263],       [  2.89473684,   3.36842105,   3.84210526,   4.31578947],       [  4.78947368,   5.26315789,   5.73684211,   6.21052632],       [  6.68421053,   7.15789474,   7.63157895,   8.10526316],       [  8.57894737,   9.05263158,   9.52631579,  Ten]        ] "" "

3, the basic operation of NumPy

Let's start with a script to understand the corresponding calculation and representation

#-*-Coding:utf-8-*-"" "@author: Corwien@file:np_yunsuan.py@time:18/8/26 23:37" "" Import numpy as NPA = Np.array ([10, 20 , (+, +])  # Array ([ten, 1, +, +]) b = Np.arange (4)                # Array ([0,, 2, 3])

Several basic operations of NumPy

The and two attributes in the preceding code are arrays, and a b 矩阵的变量 both are 1 rows and 4 columns of matrices, where the elements in the B matrix are from 0 to 3, respectively. If we want to ask for a subtraction between two matrices, you can try typing:

C=a-b  # Array ([10, 19, 28, 37])

By executing the above script, you will get the result of subtracting the corresponding element, that is [10,19,28,37] . Similarly, the addition and multiplication of the corresponding elements of a matrix can also be expressed in a similar way:

C=a+b   # Array ([ten, 120, +]) c=a*b   # array ([  0, 60,  ])

There are many mathematical function tools in numpy, such as trigonometric functions, which can be easily called when we need to perform function operations on each element of the matrix (in the sin case of a function):

C=10*np.sin (a)  # array ([ -5.44021111,  9.12945251, -9.88031624,  7.4511316])

All of the above calculations are based on the 一维矩阵 calculation of a single row of matrices, and if we want to do 多行多维度的矩阵 that, we need to make some modifications to the beginning script:

A=np.array ([[[1,1],[0,1]]) B=np.arange (4). Reshape ((2,2)) print (a) # array ([[1, 1],#       [0, 1]]) print (b) # array ([[0, 1 ],#       [2, 3]])

The matrix A and b constructed at this time are 2 rows and 2 columns, where the reshape operation is to reconstruct the shape of the matrix, the shape of which is the number given in parentheses. Slightly different is that the Numpy中的矩阵乘法分为两种 first is the corresponding element in the previous multiplication, and the other is the standard matrix multiplication, that is, the corresponding row by the corresponding column to get the corresponding element :

C_dot = Np.dot (A, B) # Array ([[2, 4],#       [2, 3]])

In addition, there is another dot way of saying that:

C_dot_2 = A.dot (b) # Array ([[2, 4],#       [2, 3]])

Below we will redefine a script to take a look at sum() the min() max() use of:

Import NumPy as Npa=np.random.random ((2,4)) print (a) # array ([[0.94692159,  0.20821798,  0.35339414,  0.2805278],#       [0.04836775,  0.04023552,  0.44091941,  0.21665268]])

Because numbers are randomly generated, your results may be different. The action in the second row a is to a generate a matrix of 2 rows and 4 columns in the order, and each element is from 0 to 1 random numbers. In this randomly generated matrix, we can sum the elements and find the operation of the Extremum, as follows:

Np.sum (a)   # 4.4043622002745959np.min (a)   # 0.23651223533671784np.max (a)   # 0.90438450240606416

The corresponding is the sum of all the elements in the matrix, looking for the minimum value, looking for the maximum value of the operation. A function can be used print() to print a test of the corresponding value.

If you need to perform a lookup operation on a row or column, you need to assign a value in the code above axis . when axis has a value of 0, the column will be used as the lookup unit, and when axis has a value of 1, rows will be used as the lookup unit .

In order to be clearer, in the example we have just continued to look for:

Print ("a =", a) # a = [[0.23651224  0.41900661  0.84869417  0.46456022]# [0.60771087  0.9043845   0.36603285  0.55746074]]print ("sum =", Np.sum (A,axis=1)) # sum = [1.96877324  2.43558896]print ("min =", np.min (a,axis=0)) # min = [0.23651224  0.41900661  0.36603285  0.46456022]print ("max =", Np.max (A,axis=1)) # max = [ 0.84869417  0.9043845]

Matrix multiplication Review

Matrix multiplication, 两个矩阵只有当左边的矩阵的列数等于右边矩阵的行数时,两个矩阵才可以进行矩阵的乘法运算 . The main method is to use the first row of the left matrix, multiply the columns of the right matrix one by one, add the product of the first row to each element of the first column, and the product phase of the first row and the elements of the second column, and the second row is multiplied by the columns of the right matrix, and so on.

Example:
Let me give you an example.

Matrix A=1  2   3     4  5   6     7  8   0 matrix b=1     2    1      1    1    2      2    1    1

Ask AB

The final result is

Ab=9     7    8   23

Calculate using NumPy:

E = Np.array ([[1, 2, 3], [4, 5, 6], [7, 8, 0]] f = Np.array ([[1, 2, 1], [1, 1, 2], [2, 1, 1]]) Res_dot = Np.dot (e, f) print Res_dot

Printing results:

[[9  7  8] [21 19 20] [15 22 23]]

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

numpy array to pandas column numpy array to pandas dataframe data analysis in python with pandas pdf python pandas examples add numpy array to pandas dataframe python dataframe to numpy array introduction to data science in python coursera

Python abstract class (ABC module) 09-18

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Detailed introduction to the NumPy and pandas modules in Python (with examples)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support