"Python Data Analysis" second article--Data calculation

Last Update:2017-01-14 Source: Internet

Author: User

Tags crosstab

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Group calculation:

Group By:split–apply–combine

Split: The data is grouped under certain conditions

Apply: Independent application function for each group

Combine: Combining result data into a data structure

Pokemon = pd.read_csv (' pokemon.csv ')    #读文件pokemon [' Type 1 '].value_counts ()    # Group groupbygrouped1 = Pokemon.groupby (' Type 1 ')

Calculation:

1. Averaging. Mean ()

Grouped1.mean ()

2. Single averaging

grouped1[' HP '].mean ()

3. Summation

Grouped1.sum ()

4. Find the median

Grouped1.median ()

Group: method Two: multiple

Grouped2 = Pokemon.groupby ([' Type 1 ', ' Type 2 '])

In calculations, multiple functions are applied

Grouped2.aggregate (Np.mean)

Grouped2.aggregate ([Np.mean,np.median])

For a different column:

Grouped2.aggregate ([np.mean,np.median,np.sum]) [' HP ']

Different columns: using different functions

Grouped2.agg ({' HP ': Np.mean, ' Attack ': Np.median}) Grouped2.agg ({' HP ': Np.mean, ' Attack ': [Np.median,np.sum]})

View Size:

Grouped2.size ()

To view specific groupings:

Grouped2.groups

Get a group:

Grouped2.get_group (' Normal ', ' Ground ')    #填写元组

Calculate each group size:

For Name,group in Grouped2:    print (name)    print (Group.shape)

Standardize the data: (prevent the value from being too large)

Numeric: The column, each minus the average divided by the standard deviation of the column

Zscore = lambda s: (S-s.mean ())/S.STD () grouped1.transform (Zscore)

Filter:

Some groups of samples are too large!

# assume that each group sample is less than 10cond1 = Lambda S:len (s) <10grouped2.filter (cond1). Shape

Previously: Set index:

Pok1 = Pokemon.set_index ([' Type 1 ', ' Type 2 '])

To GROUP by index:

Pok1.groupby (Level=[0]) pok1.groupby (level=[0,1]) pok1.groupby (level=[' type 1 ', ' Type 2 ')

Multi-table operation:

DF1 =DataFrame ({'A':['A0','A1','A2','A3'],        'B':['B0','B1','B2','B3'],        'C':['C0','C1','C2','C3'],        'D':['D0','D1','D2','D3'],}, index=[0,1,2,3]) DF2=DataFrame ({'A':['A4','A5','A6','A7'],        'B':['B4','B5','B6','B7'],        'C':['C4','C5','C6','C7'],        'D':['D4','D5','D6','D7'],}, index=[4,5,6,7]) df3=DataFrame ({'A':['A8','A9','A10','A11'],        'B':['B8','B9','B10','B11'],        'C':['C8','C9','C10','C11'],        'D':['D9','D9','D10','D11'],}, index=[8,9,10,11])

View Code

Data table operations: grouping

Pd.concat ([DF1,DF2])

Pd.concat ([Df1,df2],axis=1)  # Axis  = 1 Set axis based on index combination   =0 by column name

Similar methods

Df1.append (DF2)  # append can only fill in one parameter!

Main character, primary key explanation:

left = DataFrame ({'Key1': ['K0','K0','K1','K2'],        'Key2': ['K0','K1','K0','K1'],        'A': ['A0','A1','A2','A3'],        'B': ['B0','B1','B2','B3'],}) right= DataFrame ({'Key1': ['K0','K1','K1','K2'],        'Key2': ['K0','K0','K0','K0'],        'C': ['C0','C1','C2','C3'],        'D': ['D0','D1','D2','D3'],        })

View Code

# How to combine  default inner     outer = equal connection # on = left and right side with what key connection # set the join combination pd.merge (left,right,on= ' key1 ', how= ' inner ') according to Key1

Pd.merge (left,right,on=[' key1 ', ' Key2 '])

Pd.merge (left,right,on=[' key1 ', ' key2 '],how= ' left ')

Rename:

right1 = Right.rename (columns={' key1 ': ' New_key1 ', ' key2 ': ' New_key2 '})

A combination of different primary key names:

Pd.merge (left,right1,left_on=[' key1 ', ' key2 '],right_on=[' new_key1 ', ' new_key2 '],how= ' left ')

Combination of index bits and columns:

Left_index = Left.set_index (['key1','key2'])

Prerequisite Settings

# Left_index,right_index default False, to True indicates that the left data set uses an index bit pd.merge (left_index,right1,left_index=true,right_on=[' New_ Key1 ', ' new_key2 '],how= ' left ')

CSV file, there are no column names. You need to define it yourself!

User_info = pd.read_csv (' user_info_train.txt ', header = None, names = [' id ', ' sex ', ' job ', ' education ', ' marriage ', ' hukou ' ])    # Note names

To view unique values:

id = user_info[' id ']id.unique () len (Id.unique ())

The row variable column displays:

A = grouped3[' Amountoftrans '].sum () a.unstack () # Stack () and unstack ()  inverse operation!! # a.stack ()   a.unstack ()

Rename

A.rename (columns = {A.columns[0]: ' Shouru ', a.columns[1]: ' Zhichu '},inplace=true)

Direct operation:

a[' diff '] = a[' Shouru ')-a[' Zhichu ']

Pivot table:

pd.pivot_table (data = Pokemon, index= ' Type 1 ', columns= ' Type 2 ', values=[' HP ', ' Total '],aggfunc=[np.sum]) pd.pivot_tabl E (data = Pokemon, index= ' Type 1 ', columns= ' Type 2 ', values=[' HP ', ' Total '],aggfunc=[np.sum,np.mean])

Interaction table:

Calculation frequency:

Pd.crosstab (index = pokemon[' type 1 '],columns= pokemon[' Type 2 ']) pd.crosstab (index = pokemon[' type 1 '],columns= Pokemon [' Type 2 '], margins=true)    # margins Show Total frequency

Dummy variables

No meaningful category, no data to compare

#在Type1的类别中, there is only one 1pd.get_dummies (data=pokemon,columns=[' type 1 ') pd.get_dummies (data=pokemon,columns=[' type 1 '), ' Type 2 '])

"Python Data Analysis" second article--Data calculation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More