How to locate the target user is a very important topic in any business unit, especially in the case of limited budgets, how to get the maximum benefits of the activity, target User positioning is an important means.
This article describes how to locate the target user based on historical information through the data mining function in SQL Server Analysis Service (SSAs.
Microsoft's SQL Server provides a complete set of solutions for the data platform. It is not only a data engine, but also provides a lot of data tools and services, with the help of the Analysis Service, combined with the massive historical data information in the business system, SQL Server can help us find the patterns and rules in them, so as to make a prediction and Analysis of the target data.
In practice, different mining models are suitable for different problem scenarios. In the same problem mode, multiple models may be suitable for solving this problem, this gives us a lot more contrast for solving the problem, so we can select an optimal model based on the accuracy of each model prediction.
In this article, we use three mining models to compare the target user positioning problems: decision tree, Bayesian, and clustering.Algorithm.
The SQL Server version used in this article is 2012, and the sample database is the data warehouse of adventure works. For details about how to obtain and deploy the sample data warehouse, refer to my article:
Http://www.cnblogs.com/aspnetx/archive/2013/01/30/2883831.html
First, create a data mining project and open SQL data tools, that is, the shell of Visual Studio 2010.
Create a multi-dimensional and data mining analysis service project.
After the project is created, create a data source connection.
Select the deployed adventure works DW connection.
Enter the simulated identity information. This demonstration uses the local administrator account for ease of use (of course, this is not recommended in the production environment ).
Name the data source connection.
Create a data source view.
Select the data source connection you just created.
Select two objects from the list of optional objects on the left. One is the prospectivebuyer table and the other is the view vtargetmail.
Name the data source view.
You can see the two added tables in the data source view.
The left table is a batch of data that has been integrated in the data warehouse to train and verify the model. The table on the right is the table to be predicted. The data in this table is predicted after the model is generated.
Create a new mining structure.
Specify a Mining Model for the mining structure. Select the decision tree model here. Note that a mining structure can have many mining models. However, you must specify a mining model when creating a mining structure using a tool.
Select the created data source view.
Select the case table vtargetmail. Prospectivebuyer is used for prediction. Therefore, ignore this table.
First, select customerkey in the key column to specify the key column.
Then select the input column:
Age
Commutedistance
Englisheducation
Englishoccupation
Gender
Geographykey
Houseownerflag
Maritalstatus
Numbercarsowned
Numberchildrenathome
Region
Totalchildren
Yearlyincome
Finally, select the columns to be selected on the left of the following list. These columns are not used as model considerations, but are used in the drill results of the model.
Addressline1
Addressline2
Datefirstpurchase
Emailaddress
Firstname
Lastname
Confirm the content type. Here there are two changes to be made. One is that the geography key is changed to discrete, and the other is that the bikebuyer is changed to discrete.
Content type reference:
Http://technet.microsoft.com/zh-cn/library/ms174572
Reference for data types:
Http://technet.microsoft.com/zh-cn/library/ms174796
Set the ratio of test data. That is to say, in all historical data, we will decide how much data is used to train the model and how much data is used to test the accuracy of the model.
Here you can specify the percentage or the maximum number of test cases. when both are configured, the system will take the configuration of the minimum test case set.
Name the mining structure and model, and select the option that allows drilling.
So far, a decision tree model has been generated.
Click "Mining Model tag" to view the mining structure and mining model structure more intuitively.
Next, create other mining models based on the created mining structure.
Right-click the mining structure, select new mining model, and add several other models for the mining structure to compare the prediction results.
Name the Mining Model and select a clustering algorithm from the algorithm.
Create another Bayesian model using the same method.
The prompt is displayed here because the Bayesian model does not support continuous variables. Continue.
You can see the three models created and the usage of each field in each model.
Click the process button on the toolbar to deploy and process the model.
This prompt is displayed when the first deployment or project has been modified. Click Yes.
On the processing page, click Run to start processing.
The processing time depends on the size of the training data and the hardware performance of the machine. Of course, this example of adventure works usually takes over 10 seconds.
After the mining structure is processed, let's take a look at each mining model. Click the Mining Model Browser tag. Then, select the tu_decisontree decision tree model in the mining mode.
As you can see, the Microsoft decision tree algorithm creates a series of splits in the tree to generate a Data Mining Model. These splits are represented by "nodes. This algorithm adds a node to the model whenever it finds that the input column is closely related to the predictable column. The attribute that is closer to the root node indicates that the attribute change has a greater impact on the Prediction column.
Right-click each node to drill down to details.
Select the first item to view the columns contained in the model:
You can also select Drilling Model and mining structure model information, so that the column defined on the left will be displayed, so as to facilitate detailed positioning.
Browse the clustering analysis model. Here we can see the relationship between each category. By dragging the slider on the left, we can see the strength of the relationship between each other.
View the Bayesian model to see whether the customer is the target. The key factors analyzed by the model can also be viewed on the left side.
Next, we will test each model. Although many models can be used to solve a problem, only one of them is optimal and the efficiency is the highest. The main purpose of model testing is to use an improvement diagram.
First, click the mining accuracy chart tab to confirm that the data used to promote the graph is use mining model test cases.
Then, click the lift chart label, and the upgrade chart generated based on the test data is displayed.
From the results of the improvement chart, the prediction performance of the decision tree is significantly higher than that of other models.
First, the diagonal line is the result of random probability, and the top line is the optimal prediction result. Therefore, the actual model prediction efficiency is between the two lines.
For more information about the improvement chart, refer to Microsoft's document:
Http://technet.microsoft.com/zh-cn/library/ms175428
After determining that the decision tree model is the optimal model, you can create a screening model if you want to analyze the purchase style differences based on gender.
First, return to the mining models tab, right-click the created decision tree model, and select create model.
On the displayed page, name the new model tu_decisiontree_male and select Microsoft decision tree for the algorithm.
Click OK, right-click the new model, and select set model filter...
In conditions, select gender = m.
Create another tu_decisiontree_female using the same method. In model filter, set conditions to gender = f.
After creating two new filter models, click the improvement chart. Set predict value to 1 to verify the predicted performance of the target customer.
Click lift chart to view the elevation chart.
We can see that the prediction performance of the three decision tree models is higher than that of other models.
After the model test is completed, you can determine that the decision tree is the optimal prediction model based on the target user. The next step is to use the decision tree model to predict the data in a table.
First, select the Mining Model Prediction tab.
Click Select model... Select the previous tu_decisiontree model.
Then click Select case table... Select the instance table prospectivebuyer.
Click OK and you will see that there is no corresponding age column in the table prospectivebuyer. This column is an important column to be referenced during the prediction process, so you can return to the data source view, add a computing column to the prospectivebuyer table.
Right-click the table prospectivebuyer and click New named calculation...
On the displayed page, name the computing member and enter the calculation formula:
You can see that the new column has been added to the end of the field list. The columns added in the data source view can be understood as the Views created in the project in the physical view and table. Therefore, the actual table or view structure is not changed.
Return to the prediction page and drag the age column in the Mining Model to the created calage column in select input table (s.
Click the small icon in the upper left corner of the interface to switch to the design mode.
In the grid list below, select prediction function for source.
Specify an alias result and drag the bike Buyer field from the Mining Model in criteria/argument.
In the grid list below, select tm_decisiontree for source and bike buyer for filed. Finally, specify prosepectivebuyer in the following lines, and then specify calage, firstname, lastname, addressline1, and addressline2 in filed.
Click the tool icon to switch to the result view.
The prediction result of the target data is displayed.
You can also switch to query mode to view the DMX statements generated by the system.
So far, by establishing a data mining model and selecting the optimal model, the data in the target user list is predicted to help the user more efficiently determine the target potential user. Such predictions are very useful in some scenarios. For example, if an enterprise wants to promote an activity recently, the cost is not enough for all customers, in this case, you can decide which customers to push promotion services to effectively reduce the cost of promotion activities and obtain the highest promotion benefits.
The preceding section describes how to use the SQL Server Analysis Service to locate the target user. For more information about how to add this data mining feature to an existing project, see the series of Shopping Basket recommendations I wrote, this section describes how to build a web service and write a simple application to access this service to use the data mining function.
Http://www.cnblogs.com/aspnetx/archive/2013/02/25/2931603.html
For more information about analytic services, visit my blog http://aspnetx.cnblogs.com.
For more information about the model algorithms mentioned in this document, see:
Decision tree model: http://technet.microsoft.com/zh-cn/library/ms175312.aspx
Bayes model: http://technet.microsoft.com/zh-cn/library/ms174806.aspx
Clustering Analysis Model: http://technet.microsoft.com/zh-cn/library/ms174879.aspx
If you have any questions, please reply here or leave a message for me.