The pandas of Python data analysis: Introduction to Basic skills

Source: Internet
Author: User
Tags arithmetic

Pandas has two main data structures:Series and DataFrame.

A Series is an object that is similar to a one-dimensional array, consisting of a set of data and a set of data labels associated with it. Take a look at its use process

In [1]: From pandas import series,dataframe

In [2]: Import pandas as PD

In [3]: Obj=series ([4,7,-5,3])

In [5]: obj

OUT[5]:

0 4

1 7

2-5

3 3

Dtype:int64

The object generated by the Series is indexed to the left and the specific value to the right. If we do not specify an index, then one is generated by default. Values and indexes can be viewed by using values and index.

In [6]: Obj.values

OUT[6]: Array ([4, 7,-5, 3])

In [7]: Obj.index

OUT[7]: Rangeindex (start=0, stop=4, Step=1)

If we want to indicate the index, we can indicate the corresponding index by index at the time of generation.

In [8]: obj2=series ([4,7,-5,3],index=[' A ', ' B ', ' C ', ' d '])

In [9]: Obj2

OUT[9]:

A 4

B 7

C-5

D 3

Dtype:int64

The corresponding value can be accessed by the corresponding index

In [ten]: obj2[' a ']

OUT[10]: 4

The result of the NumPy array operation also preserves the link between the index and the value:

In []: Np.exp (OBJ2)

OUT[12]:

A 54.598150

b 1096.633158

C 0.006738

D 20.085537

Dtype:float64

If the data exists in a dictionary, you can also create a Seriesfrom this dictionary. After creation, the index is the key value in the dictionary .

in [+]: data={' name ': ' ZHF ', ' age ': $, ' City ': ' Chengdu '}

In []: obj3=series (data)

In []: obj3

OUT[15]:

Age 33

City Chengdu

Name ZHF

Dtype:object

DataFrame:

Dataframe is a tabular data structure. DataFrame the existing row index also has a column index, which can be viewed as Series composition of the dictionary.

in [+]: data={' city ': [' Chongqing ', ' Chengdu ', ' Beijing '], ' weather ': [' rainy ', ' Suns '

...: Haw ', ' snow ', ' temperature ': [9,5,-3]}

in [+]: frame=dataframe (data)

in [+]: Frame

OUT[27]:

City temperature Weather

0 Chongqing 9 Rainy

1 Chengdu 5 Sunshaw

2 beijing-3 Snow

But the column index of the generated data is not the same as when we initialize data , and if we want to generate it in the order of the index of the initialized data, it will be in DataFrame specified in Columns

in [+]: frame=dataframe (data,columns=[' city ', ' weather ', ' temperature ')

In []: Frame

OUT[29]:

City Weather Temperature

0 Chongqing Rainy 9

1 Chengdu Sunshaw 5

2 Beijing snow-3

The same can indicate the value of the row index

in [+]: frame=dataframe (data,columns=[' city ', ' weather ', ' temperature '],index=[' F

...: Irst ', ' second ', ' third '])

in [+]: Frame

OUT[31]:

City Weather Temperature

First Chongqing Rainy 9

Second Chengdu Sunshaw 5

Third Beijing snow-3

With an index, the data for the corresponding row and column can be accessed by index.

Access by column index

in [[]: Frame.city

OUT[33]:

First Chongqing

Second Chengdu

Third Beijing

Name:city, Dtype:object

Access by row index

in [+]: frame.loc[' first ']

OUT[41]:

City Chongqing

Weather Rainy

Temperature 9

Name:first, Dtype:object

Another common form is nested dictionaries (dictionary dictionaries ) .

This format generates the key for the outer dictionary as a column, and the inner key as the row index

In [all]: pop={' cost ': {2016:3000,2017:3400,2018:5000}, ' need ': {2017:4000,2018:6000

...: }}

In []: Frame3=dataframe (POP)

In []: Frame3

OUT[44]:

Cost need

NaN

2017 3400 4000.0

2018 5000 6000.0

Of course, you can also transpose

In []: Frame3. T

OUT[45]:

2016 2017 2018

Cost 3000.0 3400.0 5000.0

Need NaN 4000.0 6000.0

Basic Features:

A re-index

First, look at the data that was generated earlier, and return an index object. And then modify it in the form of index[1]= ' a '

In []: Obj.index

OUT[50]: Rangeindex (start=0, stop=4, Step=1)

In [Wuyi]: Index=obj.index

In [index[1]=]: ' A '

The following error is indicated, Index does not support mutable operations. Indicates that the index object is an object that cannot be modified. Therefore cannot be modified in this way

---------------------------------------------------------------------------

TypeError Traceback (most recent)

<ipython-input-52-336c3a4c2807> in <module> ()

----> 1 index[1]= ' a '

/usr/local/lib/python2.7/dist-packages/pandas/core/indexes/base.pyc in __setitem__ (self, key, value)

1722

1723 def __setitem__ (self, Key, value):

-1724 Raise TypeError ("Index does not support mutable operations")

1725

1726 def __getitem__ (self, key):

Typeerror:index does not support mutable operations

To modify a sequence, you can only pass the obj.reindex method.

in [+]: Obj.reindex ([' A ', ' B ', ' C ', ' d ', ' e '])

Two discards the item on the specified axis

The data on a row can be discarded by means of a drop , and the parameter is the row index

in [+]: obj

OUT[64]:

1 4

2 7

3 5

4 3

Dtype:int64

In [All]: New=obj.drop (1)

in [+]: New

OUT[66]:

2 7

3 5

4 3

Dtype:int64

Three-index, select and filter

In the list and tuple of Python, we can get the information we want by slicing, and we can also get the information by slicing in pandas.

In []: Obj[2:4]

OUT[67]:

3 5

4 3

Dtype:int64

For the previous nested dictionary, it can also be accessed by slicing.

In [Bayi]: Frame

OUT[81]:

City Weather Temperature

First Chongqing Rainy 9

Second Chengdu Sunshaw 5

Third Beijing snow-3

In [Frame[0:1]:

OUT[82]:

City Weather Temperature

First Chongqing Rainy 9

or through Ix to access a single row

In [the]: frame.ix[1]

OUT[83]:

City Chengdu

Weather Sunshaw

Temperature 5

Name:second, Dtype:object

Three arithmetic operations and data alignment

When an object is added, if there are different index pairs, the index of the result is the set of the index. As in the following 2 data, only one index ' a ' is able to correspond. Therefore, only the index a results in the addition of the other null values

in [+]: s1=series ([1,2,3,4],index=[' A ', ' B ', ' C ', ' d '])

In [S2=series]: ([5,6,7,8],index=[' X ', ' a ', ' y ', ' z '])

In [S1+S2]:

OUT[86]:

A 7.0

b NaN

C NaN

D NaN

X NaN

Y NaN

Z NaN

Dtype:float64

Four fills the value in the arithmetic method

As explained above, if there is no same index value after addition, then the corresponding value will be filled with NaN, if we want to fill a fixed value such as 0, how to do it, you can use s1.add (s2,fill_value=0) Way so that it can present 0 instead of NaN

the operation between five DataFrame and Series

Take a look at a specific example

In [111]: Frame=dataframe (Np.arange) reshape ((4,3)), Columns=list (' BDE '), index

...: =[' A1 ', ' A2 ', ' A3 ', ' A4 '])

In [119]: Series=frame.ix[0]

in [+]: series

OUT[120]:

b 0

D 1

E 2

NAME:A1, Dtype:int64

The arithmetic operation between the DataFrame and the series matches the index of the series to the DataFrame column. And then subtract them .

In [122]: Frame

OUT[122]:

b d E

A1 0 1 2

A2 3 4 5

A3 6 7 8

A4 9 10 11

In [123]: Frame-series

OUT[123]:

b d E

A1 0 0 0

A2 3 3 3

A3 6 6 6

A4 9 9 9

Six sorts and rankings

To sort a row or column index, you can use the sort_index method, which returns a sorted new object

In [133]: Frame

OUT[133]:

E C D

A3 0 1 2

A2 3 4 5

A0 6 7 8

A1 9 10 11

Sort the row index

In [134]: Frame.sort_index ()

OUT[134]:

E C D

A0 6 7 8

A1 9 10 11

A2 3 4 5

A3 0 1 2

To sort a column index

In [135]: Frame.sort_index (Axis=1)

OUT[135]:

C d E

A3 1 2 0

A2 4 5 3

A0 7 8 6

A1 10 11 9

If you want to sort the data for a particular column, you can take the method of passing in the parameter by. Here sort_index and sort_values are the same effect.

In [139]: Frame.sort_values (by= ' d ')

OUT[139]:

E C D

A3 0 1 2

A2 3 4 5

A0 6 7 8

A1 9 10 11

The pandas of Python data analysis: Introduction to Basic skills

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.