Airline Customer Value analysis

Last Update:2016-09-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

According to the extracted data, the analysis of data exploration, the analysis of the case, mainly missing value analysis and outlier analysis, through the observation of data, we learned that the data exists the ticket price is null, the minimum fare value is 0, the minimum discount rate is 0, the total number of kilometers flown more than 0 of the situation.

Fares are empty and may be caused by a customer's absence of a flight record, and other customers may be due to a 00 percent ticket or point redemption.

Then we calculate the property and the maximum and the minimum value of the null value for each property, then wash and transform the data, the code of the data exploration is as follows:

#-*-coding:utf-8-*-ImportPandas as Pdinputfile='F:\\python Data Mining \\chapter7\\demo\\data\\air_data.csv'outputfile='F:\\python Data Mining \\chapter7\\demo\\tmp\\tansuo.xls'Data=pd.read_csv (inputfile,encoding='Utf-8') Tansuo=data.describe (percentiles=[],include=' All'). ttansuo['NULL']=len (data)-tansuo['Count']tansuo=tansuo[['NULL','Max','min']]tansuo.columns=[u'number of NULL values', u'Maximum Value', u'Minimum Value']tansuo.to_excel (outputfile)#print (Tansuo)

The results are as follows: The number of empty values and the maximum and minimum values for each attribute are saved to the corresponding path.

Cleaning of the data:

Get rid of the ticket price is empty, save the fare is not O, then the discount is 0, the total flight route is 0 of the case.

Finally, the results are saved to the Excel document.

The corresponding code is as follows:

#-*-coding:utf-8-*-ImportPandas as Pdinputfile='F:\\python Data Mining \\chapter7\\demo\\data\\air_data.csv'outputfile='F:\\python Data Mining \\chapter7\\demo\\tmp\\clean_data.xls'Data=pd.read_csv (inputfile,encoding='Utf-8') Data=data[data['sum_yr_1'].notnull () *data['sum_yr_2'].notnull ()]index=data['sum_yr_1']!=0index1=data['sum_yr_2']!=0index2= (data['Avg_discount']==0) & (data['Seg_km_sum']==0) Clean=data[index | index1 |Index2]#print (Clean)Clean.to_excel (outputfile)

Because there is too much data to be given, data specification is required:

The data specifications are as follows:

The main influence factors are extracted, the data is regulated, and the model is constructed according to the data, and the results are obtained:

The first involves several factors, mainly the time of extraction days:

The number of days to calculate the time can be calculated according to Numpy.timedelta64:

The result is:

res = d_load- d_ffpdata['L'] = Res.map (lambda'm' ))

The data specifications are as follows:

ImportNumPy as NPImportPandas as Pdinputfile='F:\\python Data Mining \\chapter7\\demo\\tmp\\clean_data.xls'outputfile='F:\\python Data Mining \\chapter7\\demo\\tmp\\zs_data.xls'Data=pd.read_excel (inputfile,encoding='Utf-8')#data = Pd.read_excel (inputfile, encoding= ' utf-8 ')data = data[['Load_time','ffp_date','Last_to_end','Flight_count','Seg_km_sum','Avg_discount']]#data[' L ']=pd.datetime (data[' load_time ')-pd.datetime (data[' ffp_date '])#data[' L ']=int (((Parse (data[' Load_time ')-parse (data[' Ffp_adte '])). Days)/30)## # #这四行代码费了我3个小时D_FFP = Pd.to_datetime (data['ffp_date']) D_load= Pd.to_datetime (data['Load_time']) Res= D_load-d_ffpdata['L'] = Res.map (LambdaX:X/Np.timedelta64 (30 * 24 * 60,'m')) data['R'] = data['Last_to_end']data['F'] = data['Flight_count']data['M'] = data['Seg_km_sum']data['C'] = data['Avg_discount']data= data[['L','R','F','M','C']]data.to_excel (outputfile)Print('Finish')

The results of the deposit are:

The next step is to standardize the data:

and change the column name:

#-*-coding:utf-8-*-ImportNumPy as NPImportPandas as Pdinputfile='F:\\python Data Mining \\chapter7\\demo\\tmp\\zs_data.xls'outputfile='F:\\python Data Mining \\chapter7\\demo\\tmp\\zs_code_data.xls'Data=pd.read_excel (inputfile,encoding='Utf-8') Data=data-data.mean (axis=0)/data.std (axis=0) Data.columns=['Z'+i forIinchData.columns]#print (data.columns)Data.to_excel (outputfile)Print('Finish')

The results are as follows:

The next step is to build the model, because you need to judge the value of the customer, so it is divided into several customers, according to the category, the cluster Center can be assigned a value of 5

Airline Customer Value analysis

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Airline Customer Value analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Airline Customer Value analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support