Pandas is based on the NumPy package extension, so the vast majority of numpy methods can be applied in pandas.
In pandas we are familiar with two data structures series and Dataframe
A series is an array-like object that has a set of data and a tag associated with it.
Import Pandas as PD
OBJECT=PD. Series ([2,5,8,9])
Print (object)
The result is:
0 2
1 5
2 8
3 9
Dtype:int64
The result contains a column of data and a list of labels
We can use values and index to refer to each
Print (object.values)
Print (Object.index)
The result is:
[2 5 8 9]
Rangeindex (start=0, stop=4, Step=1)
We can also build labels as we wish.
OBJECT=PD. Series ([2,5,8,9],index=[' A ', ' B ', ' C ', ' d '])
Print (object)
The result is:
A 2
B 5
C 8
D 9
Dtype:int64
We can also perform operations on sequences
Print (object[object>5])
Result is
C 8
D 9
Dtype:int64
You can also think of a series as a dictionary, using in to judge
Print (' A ' in object)
The result is:
True
In addition, the value is not directly indexed to the
Print (2 in object)
The result is:
False
Some of the methods in the series,
IsNull or notnull can be used to determine missing values in the data
Name or index.name can rename the data
The Dataframe data frame, also a data structure, is similar to the one in R
data={' year ': [2000,2001,2002,2003],
' Income ': [3000,3500,4500,6000]}
DATA=PD. DataFrame (data)
Print (data)
The result is:
Income year
0 3000 2000
1 3500 2001
2 4500 2002
3 6000 2003
DATA1=PD. DataFrame (data,columns=[' year ', ' income ', ' outcome '),
Index=[' A ', ' B ', ' C ', ' d '])
Print (DATA1)
The result is:
Year Income outcome
A, NaN
B 2001 3500 NaN
C 2002 4500 NaN
D 2003 6000 NaN
The newly added column outcome is not in data, then the NA value is used instead
Several ways to index
Print (data1[' year ')
Print (Data1.year)
Both indexes are equivalent and are indexed to columns, with the result:
A 2000
B 2001
C 2002
D 2003
Name:year, Dtype:int64
Indexing a row is another form
Print (data1.ix[' a '])
The result is:
Year 2000
Income 3000
Outcome NaN
Name:a, Dtype:object
Or it can be in the form of slices
Print (Data1[1:3])
The result is:
Year Income outcome
B 2001 3500 NaN
C 2002 4500 NaN
Adding and Removing columns
data1[' Money ']=np.arange (4)
Add Column as Money
Year Income outcome
A 0 NaN
B 2001 3500 NaN 1
C 2002 4500 NaN 2
D 2003 6000 NaN 3
Del data1[' outcome ']
The result of deleting a column is:
Year Income Money
A 2000 3000 0
B 2001 3500 1
C 2002 4500 2
D 2003 6000 3
Primary index objects in pandas and their corresponding indexed methods and properties
There's also a reindex function to rebuild the index
data={' year ': [2000,2001,2002,2003],
' Income ': [3000,3500,4500,6000]}
DATA1=PD. DataFrame (data,columns=[' year ', ' income ', ' outcome '),
Index=[' A ', ' B ', ' C ', ' d '])
Data2=data1.reindex ([' A ', ' B ', ' C ', ' d ', ' e '])
Print (DATA2)
The result is:
Data2=data1.reindex ([' A ', ' B ', ' C ', ' d ', ' e '],method= ' Ffill ')
Print (DATA2)
The result after using the method is:
Related methods such as index deletion and filtering
Print (Data1.drop ([' a ']))
The result is:
Print (data1[data1[' year ']>2001])
The result is:
Print (data1.ix[[' A ', ' B '],[' year ', ' income '])
The result is:
Print (Data1.ix[data1.year>2000,:2])
The result is:
The detailed index filtering method is as follows:
Algorithm operation of Dataframe
data={' year ': [2000,2001,2002,2003],
' Income ': [3000,3500,4500,6000]}
DATA1=PD. DataFrame (data,columns=[' year ', ' income ', ' outcome '),
Index=[' A ', ' B ', ' C ', ' d '])
DATA2=PD. DataFrame (data,columns=[' year ', ' income ', ' outcome '),
Index=[' A ', ' B ', ' C ', ' d '])
data1[' outcome ']=range (1,5)
Data2=data2.reindex ([' A ', ' B ', ' C ', ' d ', ' e '])
Print (Data1.add (data2,fill_value=0))
The result is:
Sort the Dataframe
DATA=PD. DataFrame (Np.arange) reshape ((2,5)), index=[' C ', ' a '],
columns=[' One ', ' four ', ' one ', ' three ', ' five '])
Print (data)
The result is:
Print (Data.sort_index ())
The result is:
Print (Data.sort_index (Axis=1))
The result is:
Print (Data.sort_values (by= ' one '))
The result is:
Print (Data.sort_values (by= ' one ', Ascending=false))
The result is:
Here is the descending order of the results
Summary and statistical description
DATA=PD. DataFrame (Np.arange) reshape ((2,5)), index=[' C ', ' a '],
columns=[' One ', ' four ', ' one ', ' three ', ' five '])
Print (Data.describe ())
The result is:
Print (Data.sum ())
The result is:
Print (Data.sum (Axis=1))
The result is:
Detailed reduction method
Related descriptive statistic functions
Correlation coefficients and covariance
DATA=PD. DataFrame (Np.random.random) reshape ((4,5)), index=[' C ', ' A ', ' B ', ' C '],
columns=[' One ', ' four ', ' one ', ' three ', ' five '])
Print (data)
The result is:
Print (Data.one.corr (data.three))
The correlation coefficients for one and three are:
0.706077105725
Print (Data.one.cov (data.three))
The covariance of one and three is:
0.0677896135613
Print (Data.corrwith (data.one))
Correlation coefficients for one and all columns:
Unique values, memberships, and other methods
DATA=PD. Series ([' A ', ' a ', ' B ', ' B ', ' B ', ' C ', ' d ', ' d '])
Print (Data.unique ())
The result is:
[' A ' B ' ' C ' d ']
Print (Data.isin ([' B ']))
The result is:
0 False
1 False
2 True
3 True
4 True
5 False
6 False
7 False
Dtype:bool
Print (Pd.value_counts (data.values,sort=false))
The result is:
D 2
A s
B 3
A 2
Dtype:int64
Missing value handling
DATA=PD. Series ([' A ', ' a ', ' B ', Np.nan, ' B ', ' C ', Np.nan, ' d '])
Print (Data.isnull ())
The result is:
0 False
1 False
2 False
3 True
4 False
5 False
6 True
7 False
Dtype:bool
Print (Data.dropna ())
The result is:
2 #
1 A
2 b
4 b
5 C
7 D
Dtype:object
Print (Data.ffill ())
The result is:
2 #
1 A
2 b
3 b
4 b
5 C
6 C
7 D
Dtype:object
Print (Data.fillna (0))
The result is:
2 #
2 B
1 A
3 0
4 b
5 C
3 U
7 D
Dtype:object
Hierarchical indexes
The ability to index data in multiple dimensions
data = PD. Series (Np.random.randn (Ten), index=[[' A ', ' a ', ' a ', ' B ', ' B ', ' B ', ' C ', ' C ', ' d ', ' d '),
[1, 2, 3, 1, 2, 3, 1, 2, 2, 3]])
Print (data)
The result is:
Print (Data.index)
The result is:
Multiindex (levels=[[' A ', ' B ', ' C ', ' d '], [1, 2, 3],
labels=[[0, 0, 0, 1, 1, 1, 2, 2, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 1, 2]])
Print (data[' C '])
The result is:
Print (data[:,2])
The result is:
Print (Data.unstack ())
The result is:
Transform the data into a dataframe
Print (Data.unstack (). Stack ())
The inverse of unstack ()
Knowing this, you should be able to do some regular data processing.
Python Pandas Introduction