Python Data Analysis Pandas

Last Update:2017-03-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Most of the students who Do data analysis start with excel, and Excel is the most highly rated tool in the Microsoft Office Series.

But when the amount of data is very large, Excel is powerless, python Third-party package pandas greatly extend the functionality of excel, the entry takes a little time, but really is the necessary artifact of big data!

1. Read data from a file

Pandas supports the reading of multiple format data, of course the most common are Excel files, csv files, and txt files.

names specifies the column name, delimiter the delimiter between the specified columns

It is best to add ' R ' before the file name, which means it is not escaped.

ImportNumPy as NPImportPandas as PD
#读取TXT文件df=pd.read_table ('C:\Users\wangbin10\Desktop\jhinfo.txt', names=['Dvid','CID','DT','Atimes'],delimiter='\ t') #读取excel文件df=pd.read_excel (r'E:\log\token0722.xlsx','Sheet1') #读取csv文件df=pd.read_csv (r'E:\log\channel_add\channel_add11.txt', names=['Cha','Dvid','DT','Act','ISNA'],delimiter='\001')

2. Writing data to a file

When we get the data we need through a variety of perspectives, drills, and transformations, we can save it to a file.

Groupd=df.groupby ('CID')['Atimes']#Save to CSV file, Keep indexGroupd.mean (). To_csv ('E:\log\channel_add\group10.csv', index=True)#output to Excel fileDf3=pd.to_excel (r'E:\log\token0722v1.xlsx')#Save to CSV file, do not keep indexDf.to_csv ('E:\log\lost.txt', Index=false)

3. Screening of data

Often we don't need all the data in the file, just need a part of it, pandas provides a lot of ways to cut it out

#Logical Filteringdf2=df[df['CID'].isin (['c147','c148'])]DF2=df[df['Atimes']>1]DF2=df[df['is_new']=='T']#Select the Index=a datadf['A']#Select the first three rowsDf[0:3]#Select a row by positionDf.iloc[3:5,1:2]df.iloc[[1,2,5],[0,2]df.iat[The]#Filter by labeldf.loc['20160101':'20160131',['DT','Atimes']]#Complex filteringdata.loc[(data["Gender"]=="Female") & (data["Education"]=="not Graduate") & (data["Loan_status"]=="Y"), ["Gender","Education","Loan_status"]]

4. Sorting Data

# sort by row ordinal or column ordinal df.sort_index (axis=1,ascending=false)#  Sort by value df.sort (columns=  'dt'= data.sort_values (['applicantincome' ,'coapplicantincome'

5. Data Conversion

#Forcing type conversionsdata[row['feature']]=data[row['feature']].astype (np.float)#Assign Valuedf.at[dates[0],'A'] =0df.iat[0,1] =0#to the column to heavydf.drop_duplicates ()#Remove any rows with default valuesDf1.dropna (how=' any')#Fill missing valuesDf1.fillna (value=5)#adds a row to DFdf.append ()#Transform data structures and change them into recordsStacked =Df2.stack ()#change record data to determinantStacked.unstack ()#Assign Valuedf.ix[1:3,['B','C']] = 1.0F=LambdaX:x[:7]#add a column of monthsdf['Month']=df['DT'].apply (f)#Add a new list of clientsdf['Platform']=np.where (df['Dvid'].str.len () >=32,'IOS','Android')#delete a line, by index, the default is to delete on the row, remove the column needs to specify the axisdf.drop (index) Df.drop ('Dvid', Axis=1)#use where to generate new fieldsOs=np.where (df['DT'].str.startswith ('2016-05'),'2016-05','2016-06')#Intercept Unwanted stringsdf['Act2']=df['Act1'].str.rstrip ('_ios')#using NumPy for data conversiondf['News']=np.where (df['_C1'].str.find ('\ "news\"') ==-1,'F','T')#string splits into lists and uses functions to generate new columnsdf2['Act']=df2['Act'].str.split (',') df2['Actimes']=df2['Act'].apply (date_change)#Replace hive special symboldf['Act']=df['Act'].str.replace ('\002',',')

6. Data Overview

Df.head () df.tail () df.indexdf.columnsdf.valuesdf.describe ()

7. Data Connection Merging

# intersection of two data df=pd.merge (df1,df2,on ='dvid')#  merge datasets by column pd.merge () df4=pd.merge (df2,df3,left_on=['dvid'],right_on=[' divice_id '],how='left')#  links Each part of the row to form a new dataframe Dfs1=pd.concat ([df1,df2,df3,df4,df5])

8. Data Aggregation

The column atimes is aggregated according to cid, and the mean value is GROUPD=df.groupby ('CID')['Atimes']groupd.mean ()#GROUP by month and cid, calculate mean valuegroupd=df['Atimes'].groupby ([df['Month'],df['CID']])#other ways of groupby,. count (), sum (), prod (), mean (), min (), max (), std (), var (), first (), Last (), describe () can also be custom functions#to observe multiple metrics, you can use agg to pass inGroupd.agg (['sum','Count','mean'])#You can also aggregate different fields in different ways by tuple modeGroupd.agg ([('Dvid','sum'),('Atimes','Count')])#If you are aggregating multiple columns in several ways, you can use the following methodsfunctions=['sum','Count','mean']groupd['DT','CID'].agg (functions)#Call the Value_counts function to easily count any column, by default in descending ordertz_counts=df['CID'].value_counts ()

9. Pivot Table

PivotTables are an important feature of excel, and pandas also provides pivot table Functionality.

# pivot Table pv_table=pd.pivot_table (df,index=['cid'],columns=['Month  '],values=['atimes'],aggfunc=[np.mean])#  Cross table cross_table=pd.crosstab (df['cid'],df['month '])

Python Data Analysis Pandas

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python Data Analysis Pandas

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python Data Analysis Pandas

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support