Market research and consumer perception analysis with R

Source: Internet
Author: User
Tags rcolorbrewer

Problem to data understanding problem
理解客户的问题:谁是客户(某航空公司)?交流,交流,交流!问题要具体  某航空公司: 乘客体验如何?哪方面需要提高?  类别:比较、描述、聚类,判别还是回归  需要什么样的数据:现有数据,数据质量,需要收集的数据,自变量,因变量  哪些方面的满意度?哪些主要竞争对手?  内部数据?外部数据?
Leaders do not care about the problems are no future! Design Questionnaire
礼貌(Courtesy)友善(Friendliness)能够提供需要的帮助(Helpfulness)食物饮料服务(Service)购票容易度(Easy_Reservation)座椅选择(Preferred_Seats)航班选择(Flight_Options)票价(Ticket_Prices)座椅舒适度(Seat_Comfort)位置前后空间(Seat_Roominess)随机行李存放(Overhead_Storage)机舱清洁(Clean_Aircraft)总体满意度(Satisfaction)再次选择次航空公司(Fly_Again)向朋友推荐此航空公司(Recommend)
Data to Information
流程图:数据准备,数据清理,建模和模型评估数据预处理:清理、变换、缺失值填补等 (非常重要且耗时)建模和评估:    分析用户感知:特征提取    PCA,EFA    评估标准:此处不是预测类的评估,是市场类人员等是否能够信服
Data
# 载入了相关包,但是此处不对后面相关library(readr)library(dplyr)library(corrplot)# gplots是可视化包library(gplots)# RColorBrewer包用于设计图形的调色盘# 相关信息见:http://colorbrewer2.orglibrary(RColorBrewer)
# 可以从网站下载该数据airline<-read.csv("AirlineRating.csv")# install.packages(‘tibble‘)library(tibble)glimpse(airline)
# # observations:3,000## variables:17## $ easy_reservation <int> 6, 5, 6, 5, 4, 5, 6, 4, 6, 4, 5, 5, 6, 5, 5, ... # #  $ preferred_seats <int> 5, 7, 6, 6, 5, 6, 6, 6, 5, 4, 7, 5, 7, 6, 6, ... # # $ flight_options <int> 4, 7, 5, 5, 3, 4, 6, 3, 4, 5, 6, 6, 6, 5, 6, ... # # $ ticket_prices <int> 5, 6, 6, 5, 6, 5, 5, 5, 5, 6, 7, 7, 6, 7, 7, ...  . # # $ seat_comfort <int> 5, 6, 7, 7, 6, 6, 6, 4, 6, 9, 7, 7, 6, 6, 6, ... # # $ seat_roominess <int> 7, 8, 6, 8, 7, 8, 6, 5, 7, 8, 8, 9, 7, 8, 6, ... # # $ overhead_storage <int> 5, 5, 7, 6, 5, 4, 4, 4, 5, 7, 6, 6, 7, 5, 4, ... # # $ clean_aircraft <int> 7, 6, 7, 7, 7, 7, 6, 4, 6, 7, 7, 7, 7, 7, 6, ... # # $ courtesy <int> 5, 6, 6, 4, 2, 5, 5, 4, 5, 6, 4, 6, 4, 5, 5, ... # # $ friendliness <int> 4, 6, 6, 6, 3, 4, 5, 5, 4, 5, 6, 7, 5, 4 , ... # # $ helpfulness <int> 6, 5, 6, 4, 4, 5, 5, 4, 3, 5, 5, 6, 5, 4, 5, ... # # $ Service <int> 6 , 5, 6, 5, 3, 5, 5, 5, 3, 5, 6, 6, 5, 5, 4, ... # # $ satisfaction <int> 6, 7, 7, 5, 4, 6, 5, 5, 4, 7, 6, 7, 6, 4, 4, ... # # $ fly_again <int> 6, 6, 6, 7, 4, 5, 3, 4, 7, 6, 8, 6, 5, 4, 6, ... # # $ recommend <int> 3, 6, 5, 5, 4, 5, 6, 5,          8, 6, 8, 7, 6, 5, 6, ... # $ ID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, ten, one,,, 14...## $ Airline <fctr> airlineco.1, Airlineco.1, Airlineco.1, Airli ...
Check dependencies
  # library(dplyr)  # install.packages(‘corrplot‘)  # library(corrplot)  # 我们用`corrplot()`函数检查问卷调查问题的相关性:  # 选取其中的问卷调查项  select(airline,Easy_Reservation:Recommend) %>%  # 得到相关矩阵  cor() %>%   # 用corrplot()绘制相关图  # 选项order="hclust"按照变量的相似度,基于系统聚类的结果对行列进行重新排列  corrplot(,order="hclust")

Principal component Analysis
  airline.pc<-select(airline,Easy_Reservation:Recommend) %>%  # prcomp:Principal Components Analysis主成分分析  prcomp()  summary(airline.pc)
## Importance of components:##                          PC1    PC2     PC3     PC4     PC5     PC6## Standard deviation     4.693 4.2836 1.68335 1.03625 0.88896 0.82333## Proportion of Variance 0.435 0.3624 0.05596 0.02121 0.01561 0.01339## Cumulative Proportion  0.435 0.7974 0.85338 0.87458 0.89019 0.90358##                            PC7     PC8     PC9    PC10    PC11    PC12## Standard deviation     0.80349 0.78694 0.77536 0.77020 0.74612 0.71831## Proportion of Variance 0.01275 0.01223 0.01187 0.01172 0.01099 0.01019## Cumulative Proportion  0.91633 0.92856 0.94043 0.95215 0.96314 0.97333##                           PC13    PC14    PC15## Standard deviation     0.69417 0.66650 0.65131## Proportion of Variance 0.00952 0.00877 0.00838## Cumulative Proportion  0.98285 0.99162 1.00000
Principal component Analysis steep slope chart
# 主成分分析陡坡图plot(airline.pc,type="l",main="PCA陡坡图")

Principal component Analysis Double plotting
# PCA双标图biplot(airline.pc,main="PCA双标图",cex=c(0.5,1),xlim=c(-0.06,0.04))

Data aggregation
# 我们可以用之前介绍的`dplyr`包中的各种函数,以及使用之前讲到的管道操作`%>%`让代码更易读:# 选取其中的问卷调查项和航空公司因子信息# 即删除ID项airline.mean<-select(airline,-ID)%>%  # 按Airline对数据进行分组总结  group_by(Airline)%>%  # 对每个数值  summarise_each(funs(mean))%>%  # 显示数据  glimpse()
## Observations: 3## Variables: 16## $ Airline          <fctr> AirlineCo.1, AirlineCo.2, AirlineCo.3## $ Easy_Reservation <dbl> 5.031, 2.939, 2.038## $ Preferred_Seats  <dbl> 6.025, 2.995, 2.019## $ Flight_Options   <dbl> 4.996, 2.033, 2.067## $ Ticket_Prices    <dbl> 5.997, 3.016, 2.058## $ Seat_Comfort     <dbl> 6.988, 5.009, 7.918## $ Seat_Roominess   <dbl> 7.895, 3.970, 7.908## $ Overhead_Storage <dbl> 5.967, 4.974, 7.924## $ Clean_Aircraft   <dbl> 6.947, 6.050, 7.882## $ Courtesy         <dbl> 5.016, 7.937, 7.942## $ Friendliness     <dbl> 4.997, 7.946, 7.914## $ Helpfulness      <dbl> 5.017, 7.962, 7.954## $ Service          <dbl> 5.019, 7.956, 7.906## $ Satisfaction     <dbl> 5.944, 3.011, 7.903## $ Fly_Again        <dbl> 5.983, 3.008, 7.920## $ Recommend        <dbl> 6.008, 2.997, 7.929
Double plotting of PCA results after aggregation
# 聚合后PCA结果双标图airline.mean.pc<-select(airline.mean,Easy_Reservation:Recommend)%>%  prcomp()biplot(airline.mean.pc,main="聚合后PCA结果双标图",       cex=0.7, expand=2,xlim=c(-0.8, 1),ylim=c(-0.7,0.8))

Visualization of
# 将航空公司设置成行名称然后将对应的字符列删除row.names(airline.mean)<-airline.mean$Airline
airline.mean<-select(airline.mean,-Airline)# 绘制热图heatmap.2(as.matrix(airline.mean),          col=brewer.pal(9,"YlGn"),trace="none",key=FALSE,dend="none",cexCol=0.6,cexRow =1)title(main="航空公司问卷调查均值热图")

Exploratory factor analysis (EFA)
获取抽样调查中问题之间的构造结果是一个因子矩阵,其目标是使一小部分变量对应较高的因子载荷,其余的因子载荷都很低为什么用因子分析:    使结果可实践,PCA很难实现    优化调查项,保留因子载荷高的项    探索调查项之间的联系是不是符合我们的期待
# 因子分析library(GPArotation)airline.fa<-airline%>%  subset(select=Easy_Reservation:Recommend)%>%  factanal(factors=3,rotation="oblimin")
Factor load Heat Map
Factor 1: 乘客的总体感觉Factor 2: 机舱服务感知Factor 3: 购票体验感知
# library(gplots)# library(RColorBrewer)# 绘制热图heatmap.2(airline.fa$loadings,          col=brewer.pal(9,"YlGn"),trace="none",key=FALSE,dend="none",cexCol=0.6,cexRow =1)title(main="航空公司满意度因子载荷")

Average factor score Heat map
Factor 1: 乘客的总体感觉Factor 2: 机舱服务感知Factor 3: 购票体验感知
# 因子得分airline.fa<-airline%>%  subset(select=Easy_Reservation:Recommend)%>%  factanal(factors=3,rotation="oblimin",scores="Bartlett")  airline.fa
# # # # call:## factanal (x =., factors = 3, scores = "Bartlett", rotation = "Oblimin") # # # uniquenesses:## Easy_reservatio n preferred_seats flight_options ticket_prices # 0.233 0.157 0.222 0.17 3 # # Seat_comfort seat_roominess overhead_storage clean_aircraft # 0.251 0.165 0            .253 0.495 # courtesy friendliness helpfulness Service # 0.219            0.191 0.153 0.161 # # satisfaction Fly_again recommend # # 0.151                  0.111 0.113 # # # loadings:## Factor1 Factor2 factor3## easy_reservation                     0.941 # # Preferred_seats 0.880 # # Flight_options 0.167 0.803 # # Ticket_prices 0.887 # # Seat_comfort 0.865 # # seat_roominess 0.844-0.242 # # Overhead_storage 0 .833 0.137-0.142 # #Clean_aircraft 0.708 # # courtesy 0.818 # friendliness 0.868                 # # helpfulness 0.953 # # Service 0.922 # # Satisfaction 0.921 # # Fly_again 0.943 # # recommend 0.942 # # # # # Fact Or1 Factor2 factor3## SS loadings 5.316 3.285 3.135## proportion var 0.354 0.219 0.209## Cumulative var 0 .354 0.573 0.782## # # Factor correlations:## Factor1 Factor2 factor3## Factor1 1.0000-0.0494 0.0188## Facto r2-0.0494 1.0000-0.7535## Factor3 0.0188-0.7535 1.0000## # Test of the hypothesis that 3 factors is sufficient.## The chi square statistic is 769.65 on degrees of freedom.## the P-value is 3.9e-122
fa.score<-airline.fa$scores%>%  data.frame()fa.score$Airline<-airline$Airlinefa.score.mean<-fa.score%>%  group_by(Airline)%>%  summarise(Factor1=mean(Factor1),            Factor2=mean(Factor2),            Factor3=mean(Factor3))row.names(fa.score.mean)<-as.character(fa.score.mean$Airline)fa.score.mean<-select(fa.score.mean,-Airline)heatmap.2(as.matrix(fa.score.mean),          col=brewer.pal(9,"YlGn"),trace="none",key=FALSE,dend="none",cexCol=0.6,cexRow =1)title(main="航空公司满意度平均因子分值")

The information obtained
公司在很多方面具有竞争优势,客户满意度总体高于竞争对手公司在购票体验上有明显劣势,这是需要努力改进的地方我们为什么在购票体验上满意度高的乘客更不满空航服务?是因为乘客本身的特质,或是由于某种原因重视空航服务的公司容易忽视购票体验?需要进一步研究购票体验差的原因,以及评估其可能带来的影响:如果购票体验差并不会影响当前总体满意度以及票的销售情况,那我们需要投入多少改进该问题?
Information to the action
行业知识:    解释购票体验和空航服务体验的关系    信息的接收者:哪些人员真正实践这些改进?交流、倾听和尊重    讲故事的能力

Turn from:

Http://www.xueqing.tv/course/69

Original

Lin Hui, currently a DuPont business data scientist, graduated from Iowa State University, PhD in Statistics, and has served as a statistical consultant and business School statistics consultant at the School of Veterinary medicine, with research interests in predictive models, machine learning, data visualization, marketing survey analysis, consumer behavioral analysis, natural semantic processing and text digging. , health and disease statistics and so on.

This article links:

Https://i.cnblogs.com/EditPosts.aspx?opt=1

Market research and consumer perception analysis with R

Market research and consumer perception analysis with R

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.