If you have a shopping site, how do you recommend products to your customers? This feature in many
E-commerce sites, you can also easily build similar features with SQL Server Analysis Services data mining.
Will be divided into three parts to demonstrate how to implement this function.
- Building a mining model
- Writing a service interface for a mining model
- Develop simple front-end applications
This article describes how to use SQL Server Analysis Services to build a simple mining model based on this issue.
About data mining, it's simply a process of discovering information from massive amounts of data. If business intelligence is divided into three levels: tell you what happened, why it happened, and what will happen in the future. Well, data mining is definitely one of the highest levels of business intelligence, telling you what's going to happen in the future, which is forecasting. And the basis of prediction is based on a large amount of historical data, combined with a certain algorithm, based on probability, to tell you a new data a property trend.
There are many models of data mining, such as Bayesian, Time series, association rules and so on, which can be applied to different model algorithms according to different problem characteristics. For example, this article refers to the product recommendation, is typically suitable for the use of association rules to solve. Typical beer and diaper problems in data mining are largely based on this approach.
To create a mining model projectI note: If you are a C # or related application developers, the following content may seem a little strange, I suggest this section follow the article described in the steps to do, the relevant details I will try to use the developer's perspective to explain. There are some introductions to the developer logic that will help you better understand the data mining project.
Open SQL Server Data Tools, click File->new->project
Select the project type:
Analysis Services multidimensional and Data Mining Models
Add a data source for the project.
, in Solution Explorer, right-click Data Sources and select New data Source:
The data source used here is the Microsoft Official Sample Library Adventure Works DW 2012, which is a sample library of data Warehouse.
For information on how to obtain and deploy these sample libraries, refer to the following list of articles in this article.
Then, add a data source view, right-click Data source Views in the Solution Explorer and select New Data source view.
In relational data source, select the name of the datasource you just created:
Click Next.
Select the table to use in this interface. Based on the official Microsoft Tutorials and sample data, select only the Vassocseqlineitems and vassocseqorders two views.
If you want to use association rules to solve the problem, then your data will conform to a certain structure. The specific structural requirements refer to both views.
Vassocseqorders is the order Main table, which records the basic order information, an order line record.
Vassocseqlineitems is the Order Trust information sheet, which records what items are included in each order.
The two tables are logically related by OrderNumber, and the fields we focus on are model, which records the name of the product.
Once you have selected a table, you can click Finish to complete the settings of the data source view.
The two views are not logically related, but need to be specified here.
The method is to drag and drop the OrderNumber field in the Vassocseqlineitems into the OrderNumber field of the vassocseqorders in the data source view. When you are finished, you can see an arrow pointing to vassocseqorders from the Vassocseqlineitems view.
Here the data source view is set up.
The following establishes a mining model based on association rules. In the Solution Explorer, right-click Mining structures and select New Mining Structure.
The first interface, using a relational database or data warehouse that already exists, so go directly to the next step.
Then specify the mining model to be used in the mining structure in this interface. Click the dropdown box and select Microsoft Association rules, which is the association rule. Click Next.
Here you specify the data source view, the default selection just established can be, directly click Next.
This step is more critical, and understanding is also more around. The main point here is to specify the case table and the nested table. Here, the record of the basic information of the order is the case table, and the record of the order in which the product of the table is a nested table.
In this model, we want to be clear that our case units are orders, not orders in the product. If it is not possible to differentiate which should be the case table and which should be a nested table each time you build the model, you can go back and think about the problem.
Specify the case table and nested table, and click Next.
Here you specify how these columns are used in the analysis model. First specify the key as OrderNumber and model, and then specify the model column as the input column and the predictable column. Because this is to predict what the customer will buy based on the selected product.
Click Next when you are finished specifying the method.
Select the type of column, here according to the system automatically determine the type of choice can be, directly click Next.
This interface is then used primarily to specify the number of test data sets. The default is 30%, which is to keep 30% of the data in the data to verify the correctness of the established mining model. This article does not have a detailed introduction, so first set to 0, and then directly click Finish to complete the Model Setup Wizard.
You can see the built-in mining structure.
Once the model is set up, it needs to be deployed to an instance of the analysis service and then processed.
Right click on the project name of the Solution Explorer root and select Process from the context menu.
The server content is prompted to expire, and the judgment of VS is that as long as your source code is newer than the target server version, this out of date error will be reported even if there is no such library on the target server at all. So choose Yes here.
The first deployment may pop up this box to allow you to specify an account that can be linked to an azimuth data source, where it is usually OK to hit the local administrator's account directly.
This interface appears, directly click on Run to process the mining model.
Then the system will automatically do a lot of work, here the details you can not care about. (Of course, in case it does not report error)
After processing is complete, the mining model is established. The subsequent implementation of the product recommendation function is based on this model to achieve.
Here is a concept, called DMX, perhaps you have never heard of, but you must know SQL, simple SQL is used to check the table, then DMX is used to check the data mining model.
It is the same structure as SQL and the select from. Specific syntax information Everyone is interested to refer to Microsoft's SQL Server online database. In this article, the following will give a simple DMX query, as long as you basically know what it is to do.
Dax sample query:
SELECT flattened
PredictAssociation ([association].[ V Assoc Seq line items],include_statistics,3)
From
[Association]
NATURAL Prediction JOIN
(SELECT ' Touring Tire ' as [Model]
As [v Assoc Seq Line Items]) as T
The general meaning of this statement is that the users who have purchased the product touring tire, according to historical information, may also buy which products.
So let's look at the results of the query:
Model |
Support |
Probability |
AdjustedProbability |
Touring Tire Tube |
1397 |
0.860385925 |
0.783460119 |
Sport-100 |
6171 |
0.267877412 |
0.489939538 |
Patch Kit |
3010 |
0.244040863 |
0.567612365 |
Then let's briefly explain the relevant column information:
support-support degree, the line of the item support degree, here with the traditional support is somewhat different, here, such as the first line, refers to how many orders contain the product touring Tire Tube.
In the Data warehouse, we can use the SQL statement to verify the result
SELECT COUNT (1)
from [AdventureWorksDW2012]. [dbo]. [Vassocseqlineitems]
WHERE [model]= ' Touring Tire Tube '
The returned result is 1397.
probability-probability
The purchase order for touring Tire also contains the number of touring Tire tube. In a nutshell, the probability of P (a| B) of the problem.
Then let's verify the results of the first line and purchase the order quantity for touring Tire tube and touring Tire:
SELECT T1. [OrderNumber]
from [AdventureWorksDW2012]. [dbo]. [Vassocseqlineitems] T1
INNER JOIN
(
SELECT T2. [OrderNumber]
from [AdventureWorksDW2012]. [dbo]. [Vassocseqlineitems] T2
WHERE T2. [model]= ' Touring Tire Tube '
) TZ on T1. Ordernumber=tz. OrderNumber
WHERE T1. [model]= ' Touring Tire '
We can see that the number of results returned by the query is 758 rows.
The order is then calculated to contain only the number of touring tire:
SELECT [OrderNumber]
from [AdventureWorksDW2012]. [dbo]. [Vassocseqlineitems]
WHERE [model]= ' Touring Tire '
The number of record rows returned is 881 rows.
By dividing 758 by 881, the resulting number is exactly 0.860385925.
This probability reflects the fact that the purchase of touring Tire customers will have a great possibility to buy touring Tire Tube, so when the customer buys touring Tire The system automatically recommends touring Tire Tube to it.
You may have seen some of the lines of the DMX statement, of course, some keywords you can temporarily do not care about, when necessary, you can refer to Microsoft's documentation. So expand, for example, customers buy the two specified products, then I according to the two selected products should be recommended to the customer what?
SELECT flattened
PredictAssociation ([association].[ V Assoc Seq line items],include_statistics,3)
From
[Association]
NATURAL Prediction JOIN
(SELECT ' Touring Tire ' as [Model]
UNION SELECT ' Touring Tire Tube ' as [Model]
As [v Assoc Seq Line Items]) as T
See the familiar Union statement, OK, I believe you do not understand DMX, you also see the rule of this statement, these are enough, the subsequent article will be in C # logic code in accordance with this basic statement to dynamically spell DMX statements.
This completes the creation of the data mining model above, which is a basic part of the underlying architecture in a business intelligence project in general.
SQL Server Analysis Services Data Mining