Pandas has two main data structures:Series and DataFrame.
A Series is an object that is similar to a one-dimensional array, consisting of a set of data and a set of data labels associated with it. Take a look at its use process
In [1]: From pandas import series,dataframe
In [2]: Import pandas as PD
In [3]: Obj=series ([4,7,-5,3])
In [5]: obj
OUT[5]:
0 4
1 7
2-5
3 3
Dtype:int64
The object generated by the Series is indexed to the left and the specific value to the right. If we do not specify an index, then one is generated by default. Values and indexes can be viewed by using values and index.
In [6]: Obj.values
OUT[6]: Array ([4, 7,-5, 3])
In [7]: Obj.index
OUT[7]: Rangeindex (start=0, stop=4, Step=1)
If we want to indicate the index, we can indicate the corresponding index by index at the time of generation.
In [8]: obj2=series ([4,7,-5,3],index=[' A ', ' B ', ' C ', ' d '])
In [9]: Obj2
OUT[9]:
A 4
B 7
C-5
D 3
Dtype:int64
The corresponding value can be accessed by the corresponding index
In [ten]: obj2[' a ']
OUT[10]: 4
The result of the NumPy array operation also preserves the link between the index and the value:
In []: Np.exp (OBJ2)
OUT[12]:
A 54.598150
b 1096.633158
C 0.006738
D 20.085537
Dtype:float64
If the data exists in a dictionary, you can also create a Seriesfrom this dictionary. After creation, the index is the key value in the dictionary .
in [+]: data={' name ': ' ZHF ', ' age ': $, ' City ': ' Chengdu '}
In []: obj3=series (data)
In []: obj3
OUT[15]:
Age 33
City Chengdu
Name ZHF
Dtype:object
DataFrame:
Dataframe is a tabular data structure. DataFrame the existing row index also has a column index, which can be viewed as Series composition of the dictionary.
in [+]: data={' city ': [' Chongqing ', ' Chengdu ', ' Beijing '], ' weather ': [' rainy ', ' Suns '
...: Haw ', ' snow ', ' temperature ': [9,5,-3]}
in [+]: frame=dataframe (data)
in [+]: Frame
OUT[27]:
City temperature Weather
0 Chongqing 9 Rainy
1 Chengdu 5 Sunshaw
2 beijing-3 Snow
But the column index of the generated data is not the same as when we initialize data , and if we want to generate it in the order of the index of the initialized data, it will be in DataFrame specified in Columns
in [+]: frame=dataframe (data,columns=[' city ', ' weather ', ' temperature ')
In []: Frame
OUT[29]:
City Weather Temperature
0 Chongqing Rainy 9
1 Chengdu Sunshaw 5
2 Beijing snow-3
The same can indicate the value of the row index
in [+]: frame=dataframe (data,columns=[' city ', ' weather ', ' temperature '],index=[' F
...: Irst ', ' second ', ' third '])
in [+]: Frame
OUT[31]:
City Weather Temperature
First Chongqing Rainy 9
Second Chengdu Sunshaw 5
Third Beijing snow-3
With an index, the data for the corresponding row and column can be accessed by index.
Access by column index
in [[]: Frame.city
OUT[33]:
First Chongqing
Second Chengdu
Third Beijing
Name:city, Dtype:object
Access by row index
in [+]: frame.loc[' first ']
OUT[41]:
City Chongqing
Weather Rainy
Temperature 9
Name:first, Dtype:object
Another common form is nested dictionaries (dictionary dictionaries ) .
This format generates the key for the outer dictionary as a column, and the inner key as the row index
In [all]: pop={' cost ': {2016:3000,2017:3400,2018:5000}, ' need ': {2017:4000,2018:6000
...: }}
In []: Frame3=dataframe (POP)
In []: Frame3
OUT[44]:
Cost need
NaN
2017 3400 4000.0
2018 5000 6000.0
Of course, you can also transpose
In []: Frame3. T
OUT[45]:
2016 2017 2018
Cost 3000.0 3400.0 5000.0
Need NaN 4000.0 6000.0
Basic Features:
A re-index
First, look at the data that was generated earlier, and return an index object. And then modify it in the form of index[1]= ' a '
In []: Obj.index
OUT[50]: Rangeindex (start=0, stop=4, Step=1)
In [Wuyi]: Index=obj.index
In [index[1]=]: ' A '
The following error is indicated, Index does not support mutable operations. Indicates that the index object is an object that cannot be modified. Therefore cannot be modified in this way
---------------------------------------------------------------------------
TypeError Traceback (most recent)
<ipython-input-52-336c3a4c2807> in <module> ()
----> 1 index[1]= ' a '
/usr/local/lib/python2.7/dist-packages/pandas/core/indexes/base.pyc in __setitem__ (self, key, value)
1722
1723 def __setitem__ (self, Key, value):
-1724 Raise TypeError ("Index does not support mutable operations")
1725
1726 def __getitem__ (self, key):
Typeerror:index does not support mutable operations
To modify a sequence, you can only pass the obj.reindex method.
in [+]: Obj.reindex ([' A ', ' B ', ' C ', ' d ', ' e '])
Two discards the item on the specified axis
The data on a row can be discarded by means of a drop , and the parameter is the row index
in [+]: obj
OUT[64]:
1 4
2 7
3 5
4 3
Dtype:int64
In [All]: New=obj.drop (1)
in [+]: New
OUT[66]:
2 7
3 5
4 3
Dtype:int64
Three-index, select and filter
In the list and tuple of Python, we can get the information we want by slicing, and we can also get the information by slicing in pandas.
In []: Obj[2:4]
OUT[67]:
3 5
4 3
Dtype:int64
For the previous nested dictionary, it can also be accessed by slicing.
In [Bayi]: Frame
OUT[81]:
City Weather Temperature
First Chongqing Rainy 9
Second Chengdu Sunshaw 5
Third Beijing snow-3
In [Frame[0:1]:
OUT[82]:
City Weather Temperature
First Chongqing Rainy 9
or through Ix to access a single row
In [the]: frame.ix[1]
OUT[83]:
City Chengdu
Weather Sunshaw
Temperature 5
Name:second, Dtype:object
Three arithmetic operations and data alignment
When an object is added, if there are different index pairs, the index of the result is the set of the index. As in the following 2 data, only one index ' a ' is able to correspond. Therefore, only the index a results in the addition of the other null values
in [+]: s1=series ([1,2,3,4],index=[' A ', ' B ', ' C ', ' d '])
In [S2=series]: ([5,6,7,8],index=[' X ', ' a ', ' y ', ' z '])
In [S1+S2]:
OUT[86]:
A 7.0
b NaN
C NaN
D NaN
X NaN
Y NaN
Z NaN
Dtype:float64
Four fills the value in the arithmetic method
As explained above, if there is no same index value after addition, then the corresponding value will be filled with NaN, if we want to fill a fixed value such as 0, how to do it, you can use s1.add (s2,fill_value=0) Way so that it can present 0 instead of NaN
the operation between five DataFrame and Series
Take a look at a specific example
In [111]: Frame=dataframe (Np.arange) reshape ((4,3)), Columns=list (' BDE '), index
...: =[' A1 ', ' A2 ', ' A3 ', ' A4 '])
In [119]: Series=frame.ix[0]
in [+]: series
OUT[120]:
b 0
D 1
E 2
NAME:A1, Dtype:int64
The arithmetic operation between the DataFrame and the series matches the index of the series to the DataFrame column. And then subtract them .
In [122]: Frame
OUT[122]:
b d E
A1 0 1 2
A2 3 4 5
A3 6 7 8
A4 9 10 11
In [123]: Frame-series
OUT[123]:
b d E
A1 0 0 0
A2 3 3 3
A3 6 6 6
A4 9 9 9
Six sorts and rankings
To sort a row or column index, you can use the sort_index method, which returns a sorted new object
In [133]: Frame
OUT[133]:
E C D
A3 0 1 2
A2 3 4 5
A0 6 7 8
A1 9 10 11
Sort the row index
In [134]: Frame.sort_index ()
OUT[134]:
E C D
A0 6 7 8
A1 9 10 11
A2 3 4 5
A3 0 1 2
To sort a column index
In [135]: Frame.sort_index (Axis=1)
OUT[135]:
C d E
A3 1 2 0
A2 4 5 3
A0 7 8 6
A1 10 11 9
If you want to sort the data for a particular column, you can take the method of passing in the parameter by. Here sort_index and sort_values are the same effect.
In [139]: Frame.sort_values (by= ' d ')
OUT[139]:
E C D
A3 0 1 2
A2 3 4 5
A0 6 7 8
A1 9 10 11
The pandas of Python data analysis: Introduction to Basic skills