Introducing the user research method with the example of market sellers ' layering

Source: Internet
Author: User
Tags final variables range variable

Article Description: the user Stratification research method--take the market seller as an example.

Written in front: This article is the author's previous research experience summary, because involves the sensitive data, pretended to the fictitious and the symbolic processing, reads will be some laborious, but the digital presentation is not the purport, this article focuses on the user stratification research method, hoped can give everybody to provide a set of user stratification research thought and the operation flow.

This article discusses the user layering, different from the regular user segmentation, the biggest difference is that the hierarchy is the concept of sequencing, that is, there is a progressive relationship between the layers, and the general subdivision is the concept of the classification, that is, the relatively independent of each other. In broad terms, the subdivision includes layering.

Market sellers are stratified as an example to introduce the entire research method. According to the previous experience of sellers, select the important variables to participate in the seller's layering, and extract the relevant data from BI for 1 million sets of sellers in the market.

Refactoring variables

First, the data of 1 million collection sellers were cleaned and the distribution of the important variables was investigated. Because some fixed distance variables exist beyond the normal range of data, such as the number of transactions, transaction volume, and so on, some of the ordered variables at both ends of the sample too little, such as sellers star, shop type. These are not conducive to the construction of the model, so each to be analyzed variables are subdivided into a number of groups, the principle is: The group as much as possible, and the fixed distance variable more than the distribution of each group as far as possible to ensure that within the normal range, avoid abnormal values. the adjusted grouping situation table is slightly.

1 million data were randomly divided into three 600,000 of the database, four data for follow-up analysis, to ensure the stability of the obtained index.

Factor analysis

First of all important variables to do factor analysis, to remove multiple collinearity between variables, after several attempts to eliminate the Alipay transaction volume and transaction results more stable, and more in line with business experience. The final KMO value is 0.788,bartlett spherical test, which is very suitable for factor analysis. Finally, 6 common factors were extracted, in order GMV and stars, time, order service, the type of Wang Pu, physical goods, whether to participate in the elimination of insurance, the cumulative variance contribution rate is 90.4% (see table below), interpretation effect is very strong; the number of residual >0.05 between the original matrix and the reconstructed matrix is 10. %, the fitting effect is also very good.

After many times, the whole factor analysis was very stable, the cumulative variance contribution rate of the first three factors reached 60%, which was the main factor, and the cumulative variance contribution rate of the three factors was 30%, which was the secondary factor. This is also consistent with day-to-day business experience.

Layered

The factor score of each sample can be calculated with six factors, and the proportion of each factor score is as follows:

Because the factor synthesis score is a standardized value, using (X-min)/(maximum-minimum), convert it to 0-100 index, and then according to the index level, and the actual situation of the layered,TOP1 as the highest index of 10%,TOP2 for several times the height of 20%, TOP3 refers to a number of high 30%, the remaining is the fourth layer, the lowest index of 40%. the method is simple and the data is stable, which is beneficial to the practical application. The upper and lower limits of each group are as follows, and the upper and lower limits of such division can be fixed and amended gradually.

discriminant analysis

According to the proportion of 7:3 divided into analysis samples and validation samples, using discriminant analysis of the four-layer division results of the test, that is, 6 factors score and hierarchy to do discriminant analysis. The obtained discriminant function has a significant contribution to distinguishing each layer, and the variance contribution rate of the first discriminant function interpretation is 98.2%, the main function. Using the covariance matrix of the group, the correct rate of the sample and the sample, the correct rate of cross verification is 91.6%, and the accuracy of the method is high.

There are obvious differences among the sellers on the important variables, the details are as follows:

To find out the key variables with higher explanatory power from six factors, and to judge the suitability based on the actual business experience, seven key variables were selected. They are directly related to the level of discriminant analysis, the first main discriminant function of the variance contribution rate is 97.8%, using the group of covariance matrix calculation, analysis samples and validation samples of the correct rate, cross-check the correct rate is 85%, also reached a high level.

Regression analysis

In order to facilitate the application and simplify the layered computing process, the seven key variables and factor synthetic score indices identified in discriminant analysis were used to analyze their explanatory power.

Regression analysis showed that R, R Square, adjusted R Square were 0.985, 0.970, 0.970, the residual standard error is 2.709, achieves the small level, the Durbin-watson value is 1.252, has the certain disparity with 2, the residual difference independence is OK, the comprehensive judgment, the model explanation effect is very good.

In multiple collinearity, the minimum tolerance value of sellers star is 0.39, and the condition index of eighth principal component is less than 15, which indicates that there is no serious multiple collinearity.

The main indicators of the variables are as follows:


Through the common analysis of the normalized partial regression coefficient and the partial correlation coefficient, we know that the influence of the GMV on the Factor comprehensive score index is more important when the shop is long, the amount of the last three months, and the sellers star.

Factor comprehensive score index =a+b1* sellers star segment +b2* Open shop long segment +b3* Nearly three months GMV Total amount segment + b 4* Shop type + B 5* whether main kind + B 6* order service number of segments + B 7* whether to participate in the elimination of insurance

Therefore, through these seven key variables Predictor Comprehensive score index is very suitable, after obtaining the new factor comprehensive score index, according to above upper and lower limit adjacent value, can divide the seller level.

In summary, the research process is summarized as follows:

1, according to the previous study of the user's understanding, to determine the importance of participation in the hierarchical variables, the extraction of background data, data cleaning and processing;

2, the factor analysis to participate in the analysis of the important variables to reduce the dimension, calculate the factor synthesis score;

3, the factor synthesis score into the index, according to the distribution of the index, the user layered, and discriminant analysis, the layered results of the verification;

4, according to the explanation of variables and the actual situation of the business, the key variables are selected as the independent variables, the factor synthesis score as the dependent variable, the regression equation is established, and the key variables are used to calculate the comprehensive score of the factors, and the rapid layering is convenient for the later business application.

5, the background data randomly split into different databases, respectively, repeat the above analysis process, repeatedly verify the stability of the results.

Reflections on the follow-up study

The whole research is done, perhaps the most valuable is the final regression equation, although the interpretation is very high, but still lack of some difficult to obtain important variables, such as the monthly advertising amount, including through-Train, Diamond booth, the subsequent research will gradually include these variables. This also shows that the user stratification study examines the variables to be as comprehensive as possible, so that the results can be more reference value.

There is the end of the layered results appear "insipid", each layer of sellers in important variables, the basic are strong stronger, weak weaker, characteristics are not obvious. This is also the difference between stratified research and subdivision research, and layering is more the result of trend. Follow-up can try to use the unequal probability of sampling to reduce a number of samples, such as low bills sellers accounted for the vast majority, can be appropriate to reduce this part of the sample, to a certain extent, can balance the role of various important variables in the hierarchy.

No matter which sample structure, it needs to be in the practical application, the test effect, continuously iterative perfect.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.