Group calculation:
Group By:split–apply–combine
Split: The data is grouped under certain conditions
Apply: Independent application function for each group
Combine: Combining result data into a data structure
Pokemon = pd.read_csv (' pokemon.csv ') #读文件pokemon [' Type 1 '].value_counts () # Group groupbygrouped1 = Pokemon.groupby (' Type 1 ')
Calculation:
1. Averaging. Mean ()
Grouped1.mean ()
2. Single averaging
grouped1[' HP '].mean ()
3. Summation
Grouped1.sum ()
4. Find the median
Grouped1.median ()
Group: method Two: multiple
Grouped2 = Pokemon.groupby ([' Type 1 ', ' Type 2 '])
In calculations, multiple functions are applied
Grouped2.aggregate (Np.mean)
Grouped2.aggregate ([Np.mean,np.median])
For a different column:
Grouped2.aggregate ([np.mean,np.median,np.sum]) [' HP ']
Different columns: using different functions
Grouped2.agg ({' HP ': Np.mean, ' Attack ': Np.median}) Grouped2.agg ({' HP ': Np.mean, ' Attack ': [Np.median,np.sum]})
View Size:
Grouped2.size ()
To view specific groupings:
Grouped2.groups
Get a group:
Grouped2.get_group (' Normal ', ' Ground ') #填写元组
Calculate each group size:
For Name,group in Grouped2: print (name) print (Group.shape)
Standardize the data: (prevent the value from being too large)
Numeric: The column, each minus the average divided by the standard deviation of the column
Zscore = lambda s: (S-s.mean ())/S.STD () grouped1.transform (Zscore)
Filter:
Some groups of samples are too large!
# assume that each group sample is less than 10cond1 = Lambda S:len (s) <10grouped2.filter (cond1). Shape
Previously: Set index:
Pok1 = Pokemon.set_index ([' Type 1 ', ' Type 2 '])
To GROUP by index:
Pok1.groupby (Level=[0]) pok1.groupby (level=[0,1]) pok1.groupby (level=[' type 1 ', ' Type 2 ')
Multi-table operation:
DF1 =DataFrame ({'A':['A0','A1','A2','A3'], 'B':['B0','B1','B2','B3'], 'C':['C0','C1','C2','C3'], 'D':['D0','D1','D2','D3'],}, index=[0,1,2,3]) DF2=DataFrame ({'A':['A4','A5','A6','A7'], 'B':['B4','B5','B6','B7'], 'C':['C4','C5','C6','C7'], 'D':['D4','D5','D6','D7'],}, index=[4,5,6,7]) df3=DataFrame ({'A':['A8','A9','A10','A11'], 'B':['B8','B9','B10','B11'], 'C':['C8','C9','C10','C11'], 'D':['D9','D9','D10','D11'],}, index=[8,9,10,11])
View Code
Data table operations: grouping
Pd.concat ([DF1,DF2])
Pd.concat ([Df1,df2],axis=1) # Axis = 1 Set axis based on index combination =0 by column name
Similar methods
Df1.append (DF2) # append can only fill in one parameter!
Main character, primary key explanation:
left = DataFrame ({'Key1': ['K0','K0','K1','K2'], 'Key2': ['K0','K1','K0','K1'], 'A': ['A0','A1','A2','A3'], 'B': ['B0','B1','B2','B3'],}) right= DataFrame ({'Key1': ['K0','K1','K1','K2'], 'Key2': ['K0','K0','K0','K0'], 'C': ['C0','C1','C2','C3'], 'D': ['D0','D1','D2','D3'], })
View Code
# How to combine default inner outer = equal connection # on = left and right side with what key connection # set the join combination pd.merge (left,right,on= ' key1 ', how= ' inner ') according to Key1
Pd.merge (left,right,on=[' key1 ', ' Key2 '])
Pd.merge (left,right,on=[' key1 ', ' key2 '],how= ' left ')
Rename:
right1 = Right.rename (columns={' key1 ': ' New_key1 ', ' key2 ': ' New_key2 '})
A combination of different primary key names:
Pd.merge (left,right1,left_on=[' key1 ', ' key2 '],right_on=[' new_key1 ', ' new_key2 '],how= ' left ')
Combination of index bits and columns:
Left_index = Left.set_index (['key1','key2'])
Prerequisite Settings
# Left_index,right_index default False, to True indicates that the left data set uses an index bit pd.merge (left_index,right1,left_index=true,right_on=[' New_ Key1 ', ' new_key2 '],how= ' left ')
CSV file, there are no column names. You need to define it yourself!
User_info = pd.read_csv (' user_info_train.txt ', header = None, names = [' id ', ' sex ', ' job ', ' education ', ' marriage ', ' hukou ' ]) # Note names
To view unique values:
id = user_info[' id ']id.unique () len (Id.unique ())
The row variable column displays:
A = grouped3[' Amountoftrans '].sum () a.unstack () # Stack () and unstack () inverse operation!! # a.stack () a.unstack ()
Rename
A.rename (columns = {A.columns[0]: ' Shouru ', a.columns[1]: ' Zhichu '},inplace=true)
Direct operation:
a[' diff '] = a[' Shouru ')-a[' Zhichu ']
Pivot table:
pd.pivot_table (data = Pokemon, index= ' Type 1 ', columns= ' Type 2 ', values=[' HP ', ' Total '],aggfunc=[np.sum]) pd.pivot_tabl E (data = Pokemon, index= ' Type 1 ', columns= ' Type 2 ', values=[' HP ', ' Total '],aggfunc=[np.sum,np.mean])
Interaction table:
Calculation frequency:
Pd.crosstab (index = pokemon[' type 1 '],columns= pokemon[' Type 2 ']) pd.crosstab (index = pokemon[' type 1 '],columns= Pokemon [' Type 2 '], margins=true) # margins Show Total frequency
Dummy variables
No meaningful category, no data to compare
#在Type1的类别中, there is only one 1pd.get_dummies (data=pokemon,columns=[' type 1 ') pd.get_dummies (data=pokemon,columns=[' type 1 '), ' Type 2 '])
"Python Data Analysis" second article--Data calculation