Python Pandas Introduction

Last Update:2016-10-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Pandas is based on the NumPy package extension, so the vast majority of numpy methods can be applied in pandas.

In pandas we are familiar with two data structures series and Dataframe

A series is an array-like object that has a set of data and a tag associated with it.

Import Pandas as PD

OBJECT=PD. Series ([2,5,8,9])

Print (object)

The result is:

0 2
1 5
2 8
3 9
Dtype:int64

The result contains a column of data and a list of labels
We can use values and index to refer to each

Print (object.values)
Print (Object.index)

The result is:

[2 5 8 9]
Rangeindex (start=0, stop=4, Step=1)

We can also build labels as we wish.

OBJECT=PD. Series ([2,5,8,9],index=[' A ', ' B ', ' C ', ' d '])

Print (object)

The result is:

A 2
B 5
C 8
D 9
Dtype:int64

We can also perform operations on sequences

Print (object[object>5])

Result is

C 8
D 9
Dtype:int64

You can also think of a series as a dictionary, using in to judge

Print (' A ' in object)

The result is:

True

In addition, the value is not directly indexed to the

Print (2 in object)

The result is:

False

Some of the methods in the series,

IsNull or notnull can be used to determine missing values in the data

Name or index.name can rename the data

The Dataframe data frame, also a data structure, is similar to the one in R

data={' year ': [2000,2001,2002,2003],
' Income ': [3000,3500,4500,6000]}

DATA=PD. DataFrame (data)

Print (data)

The result is:

Income year
0 3000 2000
1 3500 2001
2 4500 2002
3 6000 2003

DATA1=PD. DataFrame (data,columns=[' year ', ' income ', ' outcome '),
Index=[' A ', ' B ', ' C ', ' d '])
Print (DATA1)

The result is:

Year Income outcome
A, NaN
B 2001 3500 NaN
C 2002 4500 NaN
D 2003 6000 NaN

The newly added column outcome is not in data, then the NA value is used instead

Several ways to index

Print (data1[' year ')
Print (Data1.year)

Both indexes are equivalent and are indexed to columns, with the result:

A 2000
B 2001
C 2002
D 2003
Name:year, Dtype:int64

Indexing a row is another form

Print (data1.ix[' a '])

The result is:

Year 2000
Income 3000
Outcome NaN
Name:a, Dtype:object

Or it can be in the form of slices

Print (Data1[1:3])

The result is:

Year Income outcome
B 2001 3500 NaN
C 2002 4500 NaN

Adding and Removing columns

data1[' Money ']=np.arange (4)

Add Column as Money

Year Income outcome
A 0 NaN
B 2001 3500 NaN 1
C 2002 4500 NaN 2
D 2003 6000 NaN 3

Del data1[' outcome ']

The result of deleting a column is:

Year Income Money
A 2000 3000 0
B 2001 3500 1
C 2002 4500 2
D 2003 6000 3

Primary index objects in pandas and their corresponding indexed methods and properties

There's also a reindex function to rebuild the index

data={' year ': [2000,2001,2002,2003],
' Income ': [3000,3500,4500,6000]}

DATA1=PD. DataFrame (data,columns=[' year ', ' income ', ' outcome '),
Index=[' A ', ' B ', ' C ', ' d '])

Data2=data1.reindex ([' A ', ' B ', ' C ', ' d ', ' e '])
Print (DATA2)

The result is:

Data2=data1.reindex ([' A ', ' B ', ' C ', ' d ', ' e '],method= ' Ffill ')
Print (DATA2)

The result after using the method is:

Related methods such as index deletion and filtering

Print (Data1.drop ([' a ']))

The result is:

Print (data1[data1[' year ']>2001])

The result is:

Print (data1.ix[[' A ', ' B '],[' year ', ' income '])

The result is:

Print (Data1.ix[data1.year>2000,:2])

The result is:

The detailed index filtering method is as follows:

Algorithm operation of Dataframe

data={' year ': [2000,2001,2002,2003],
' Income ': [3000,3500,4500,6000]}

DATA1=PD. DataFrame (data,columns=[' year ', ' income ', ' outcome '),
Index=[' A ', ' B ', ' C ', ' d '])

DATA2=PD. DataFrame (data,columns=[' year ', ' income ', ' outcome '),
Index=[' A ', ' B ', ' C ', ' d '])

data1[' outcome ']=range (1,5)

Data2=data2.reindex ([' A ', ' B ', ' C ', ' d ', ' e '])

Print (Data1.add (data2,fill_value=0))

The result is:

Sort the Dataframe

DATA=PD. DataFrame (Np.arange) reshape ((2,5)), index=[' C ', ' a '],
columns=[' One ', ' four ', ' one ', ' three ', ' five '])

Print (data)

The result is:

Print (Data.sort_index ())

The result is:

Print (Data.sort_index (Axis=1))

The result is:

Print (Data.sort_values (by= ' one '))

The result is:

Print (Data.sort_values (by= ' one ', Ascending=false))

The result is:

Here is the descending order of the results

Summary and statistical description

DATA=PD. DataFrame (Np.arange) reshape ((2,5)), index=[' C ', ' a '],
columns=[' One ', ' four ', ' one ', ' three ', ' five '])

Print (Data.describe ())

The result is:

Print (Data.sum ())

The result is:

Print (Data.sum (Axis=1))

The result is:

Detailed reduction method

Related descriptive statistic functions

Correlation coefficients and covariance

DATA=PD. DataFrame (Np.random.random) reshape ((4,5)), index=[' C ', ' A ', ' B ', ' C '],
columns=[' One ', ' four ', ' one ', ' three ', ' five '])

Print (data)

The result is:

Print (Data.one.corr (data.three))

The correlation coefficients for one and three are:

0.706077105725

Print (Data.one.cov (data.three))

The covariance of one and three is:

0.0677896135613

Print (Data.corrwith (data.one))

Correlation coefficients for one and all columns:

Unique values, memberships, and other methods

DATA=PD. Series ([' A ', ' a ', ' B ', ' B ', ' B ', ' C ', ' d ', ' d '])

Print (Data.unique ())

The result is:

[' A ' B ' ' C ' d ']

Print (Data.isin ([' B ']))

The result is:

0 False
1 False
2 True
3 True
4 True
5 False
6 False
7 False
Dtype:bool

Print (Pd.value_counts (data.values,sort=false))

The result is:

D 2
A s
B 3
A 2
Dtype:int64

Missing value handling

DATA=PD. Series ([' A ', ' a ', ' B ', Np.nan, ' B ', ' C ', Np.nan, ' d '])

Print (Data.isnull ())

The result is:

0 False
1 False
2 False
3 True
4 False
5 False
6 True
7 False
Dtype:bool

Print (Data.dropna ())

The result is:

2 #
1 A
2 b
4 b
5 C
7 D
Dtype:object

Print (Data.ffill ())

The result is:

2 #
1 A
2 b
3 b
4 b
5 C
6 C
7 D
Dtype:object

Print (Data.fillna (0))

The result is:

2 #
2 B
1 A
3 0
4 b
5 C
3 U
7 D
Dtype:object

Hierarchical indexes

The ability to index data in multiple dimensions

data = PD. Series (Np.random.randn (Ten), index=[[' A ', ' a ', ' a ', ' B ', ' B ', ' B ', ' C ', ' C ', ' d ', ' d '),
[1, 2, 3, 1, 2, 3, 1, 2, 2, 3]])

Print (data)

The result is:

Print (Data.index)

The result is:

Multiindex (levels=[[' A ', ' B ', ' C ', ' d '], [1, 2, 3],
labels=[[0, 0, 0, 1, 1, 1, 2, 2, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 1, 2]])

Print (data[' C '])

The result is:

Print (data[:,2])

The result is:

Print (Data.unstack ())

The result is:

Transform the data into a dataframe

Print (Data.unstack (). Stack ())

The inverse of unstack ()

Knowing this, you should be able to do some regular data processing.

Python Pandas Introduction

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python Pandas Introduction

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python Pandas Introduction

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support