"Python Data Analysis" second article--Data calculation

Source: Internet
Author: User
Tags crosstab

Group calculation:

Group By:split–apply–combine

Split: The data is grouped under certain conditions

Apply: Independent application function for each group

Combine: Combining result data into a data structure

Pokemon = pd.read_csv (' pokemon.csv ')    #读文件pokemon [' Type 1 '].value_counts ()    # Group groupbygrouped1 = Pokemon.groupby (' Type 1 ')

Calculation:

1. Averaging. Mean ()

Grouped1.mean ()

2. Single averaging

grouped1[' HP '].mean ()

3. Summation

Grouped1.sum ()

4. Find the median

Grouped1.median ()

Group: method Two: multiple

Grouped2 = Pokemon.groupby ([' Type 1 ', ' Type 2 '])

In calculations, multiple functions are applied

Grouped2.aggregate (Np.mean)

Grouped2.aggregate ([Np.mean,np.median])

For a different column:

Grouped2.aggregate ([np.mean,np.median,np.sum]) [' HP ']

Different columns: using different functions

Grouped2.agg ({' HP ': Np.mean, ' Attack ': Np.median}) Grouped2.agg ({' HP ': Np.mean, ' Attack ': [Np.median,np.sum]})

View Size:

Grouped2.size ()

To view specific groupings:

Grouped2.groups

Get a group:

Grouped2.get_group (' Normal ', ' Ground ')    #填写元组

  

Calculate each group size:

For Name,group in Grouped2:    print (name)    print (Group.shape)

Standardize the data: (prevent the value from being too large)

Numeric: The column, each minus the average divided by the standard deviation of the column

Zscore = lambda s: (S-s.mean ())/S.STD () grouped1.transform (Zscore)

Filter:

Some groups of samples are too large!

# assume that each group sample is less than 10cond1 = Lambda S:len (s) <10grouped2.filter (cond1). Shape

Previously: Set index:

Pok1 = Pokemon.set_index ([' Type 1 ', ' Type 2 '])

To GROUP by index:

Pok1.groupby (Level=[0]) pok1.groupby (level=[0,1]) pok1.groupby (level=[' type 1 ', ' Type 2 ')

Multi-table operation:

DF1 =DataFrame ({'A':['A0','A1','A2','A3'],        'B':['B0','B1','B2','B3'],        'C':['C0','C1','C2','C3'],        'D':['D0','D1','D2','D3'],}, index=[0,1,2,3]) DF2=DataFrame ({'A':['A4','A5','A6','A7'],        'B':['B4','B5','B6','B7'],        'C':['C4','C5','C6','C7'],        'D':['D4','D5','D6','D7'],}, index=[4,5,6,7]) df3=DataFrame ({'A':['A8','A9','A10','A11'],        'B':['B8','B9','B10','B11'],        'C':['C8','C9','C10','C11'],        'D':['D9','D9','D10','D11'],}, index=[8,9,10,11])
View Code

Data table operations: grouping

Pd.concat ([DF1,DF2])

Pd.concat ([Df1,df2],axis=1)  # Axis  = 1 Set axis based on index combination   =0 by column name

Similar methods

Df1.append (DF2)  # append can only fill in one parameter!

Main character, primary key explanation:

left = DataFrame ({'Key1': ['K0','K0','K1','K2'],        'Key2': ['K0','K1','K0','K1'],        'A': ['A0','A1','A2','A3'],        'B': ['B0','B1','B2','B3'],}) right= DataFrame ({'Key1': ['K0','K1','K1','K2'],        'Key2': ['K0','K0','K0','K0'],        'C': ['C0','C1','C2','C3'],        'D': ['D0','D1','D2','D3'],        })
View Code
# How to combine  default inner     outer = equal connection # on = left and right side with what key connection # set the join combination pd.merge (left,right,on= ' key1 ', how= ' inner ') according to Key1

Pd.merge (left,right,on=[' key1 ', ' Key2 '])

Pd.merge (left,right,on=[' key1 ', ' key2 '],how= ' left ')

Rename:

right1 = Right.rename (columns={' key1 ': ' New_key1 ', ' key2 ': ' New_key2 '})

A combination of different primary key names:

Pd.merge (left,right1,left_on=[' key1 ', ' key2 '],right_on=[' new_key1 ', ' new_key2 '],how= ' left ')

Combination of index bits and columns:

Left_index = Left.set_index (['key1','key2'])
Prerequisite Settings
# Left_index,right_index default False, to True indicates that the left data set uses an index bit pd.merge (left_index,right1,left_index=true,right_on=[' New_ Key1 ', ' new_key2 '],how= ' left ')    

CSV file, there are no column names. You need to define it yourself!

User_info = pd.read_csv (' user_info_train.txt ', header = None, names = [' id ', ' sex ', ' job ', ' education ', ' marriage ', ' hukou ' ])    # Note names

To view unique values:

id = user_info[' id ']id.unique () len (Id.unique ())

The row variable column displays:

A = grouped3[' Amountoftrans '].sum () a.unstack () # Stack () and unstack ()  inverse operation!! # a.stack ()   a.unstack ()

Rename

A.rename (columns = {A.columns[0]: ' Shouru ', a.columns[1]: ' Zhichu '},inplace=true)

Direct operation:

a[' diff '] = a[' Shouru ')-a[' Zhichu ']

Pivot table:

pd.pivot_table (data = Pokemon, index= ' Type 1 ', columns= ' Type 2 ', values=[' HP ', ' Total '],aggfunc=[np.sum]) pd.pivot_tabl E (data = Pokemon, index= ' Type 1 ', columns= ' Type 2 ', values=[' HP ', ' Total '],aggfunc=[np.sum,np.mean])

Interaction table:

Calculation frequency:

Pd.crosstab (index = pokemon[' type 1 '],columns= pokemon[' Type 2 ']) pd.crosstab (index = pokemon[' type 1 '],columns= Pokemon [' Type 2 '], margins=true)    # margins Show Total frequency

Dummy variables

No meaningful category, no data to compare

#在Type1的类别中, there is only one 1pd.get_dummies (data=pokemon,columns=[' type 1 ') pd.get_dummies (data=pokemon,columns=[' type 1 '), ' Type 2 '])

"Python Data Analysis" second article--Data calculation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.