Commodity recommendation using association rules of SQL Server Analysis Services data mining (I)

Source: Internet
Author: User
Tags how to use sql server how to use sql

If you have a shopping website, how do you recommend products to your customers? This function is available on many e-commerce websites. You can easily build similar functions through the data mining feature of SQL Server Analysis Services.

It is divided into three parts to demonstrate how to implement this function.

  1. Build a Mining Model
  2. Write service interfaces for the Mining Model
  3. Develop simple front-end applications

 

This article describes how to use SQL Server Analysis Services to build a simple Mining Model Based on this problem.

 

Data Mining is simply a process of discovering information from massive data. If business intelligence is divided into three levels: Tell you what happened, why it happened, and what will happen in the future. Data Mining is definitely the highest level in business intelligence, telling you what will happen in the future, that is, prediction. The foundation of prediction is to tell you the trend of a certain attribute of a new data based on the massive historical data, combined with certain algorithms, and based on probability.

Many data mining models, such as Bayesian, time series, and association rules, are common models. Different model algorithms can be applied based on different problem features. For example, the product recommendation mentioned in this article is typically suitable for solving with association rules. The typical beer and diapers problems in data mining are basically based on this method.

 

Create a Mining Model Project

Note: If you are a C # or related application developer, the following content may seem unfamiliar. I suggest you follow the steps described in this article, I will try to explain the details from the developer's perspective. The following sections will provide some introductions for developers to better understand data mining projects.

Open SQL server data tools and click File> New> project.

Select the project type:

Analysis Services multidimensional and data mining models

Add a data source for the project.

In Solution Explorer, right-click data sources and choose New Data source:

The data source used here is the official Microsoft Sample Library adventure works DW 2012, which is the sample library of data warehouse.

For more information about how to obtain and deploy these sample libraries, see the following article list.

Then, add the data source view, right-click data source views in Solution Explorer, and select new data source view.

In relational data source, select the name of the data source you just created:

Click Next.

Select the table used in this interface. Based on the official Microsoft tutorials and sample data, you only need to select two views: vassocseqlineitems and vassocseqorders.

If you want to use association rules to solve the problem, your data must conform to a certain structure. For specific structure requirements, refer to the two views.

Vassocseqorders is the main table of orders, which records basic order information and one order line record.

Vassocseqlineitems is the order trust information table, which records the items contained in each order.

The two tables are logically associated with ordernumber. The field we are interested in is model, which records the product name.

After selecting a useful table, you can click Finish Data Source view settings.

The two views have no logical relationship, but you need to specify them here.

In the data source view, drag the ordernumber field in vassocseqlineitems to the ordernumber field of vassocseqorders. You can see an arrow pointing to vassocseqorders from the vassocseqlineitems view.

The Data Source view settings are complete.

The following describes how to create a Mining Model Based on association rules. In Solution Explorer, right-click mining structures and choose new mining structure.

The first interface uses an existing relational database or data warehouse, so you can proceed directly.

Then, specify the mining model to be used for the mining structure on this interface. Click the drop-down box and select Microsoft association rules, that is, the association rule. Click Next.

The Data Source view is specified here. By default, the created view is selected. click Next.

This step is critical, and it is difficult to understand. The case table and nested table must be specified here. Here, the basic order information is recorded in the case table, and the tables that record the products in the order are nested tables.

In this model, we must be clear that our case unit is an order, rather than a product in an order. If you cannot distinguish the case table from the nested table during model creation, you can first look back at this problem.

Specify the case table and nested table, and click Next.

This section describes how to use these columns in the analysis model. First, specify the key columns as ordernumber and model, and then specify the model column as the input column and predictable column. This is to predict the products the customer will purchase based on the selected products.

After the method is specified, click Next.

Select the column type. You can select the column type based on the system's automatic judgment. click Next.

This interface is mainly used to specify the number of test datasets. The default value is 30%, that is, 30% of the data is retained to verify the correctness of the established mining model. This article does not provide a detailed description. Therefore, set it to 0 first, and then click Finish to complete the Model Setting wizard.

You can see the established mining structure.

After the model is created, you need to deploy it to an analytic service instance and process it.

Right-click the project name in the root directory of Solution Explorer and select process from the context menu.

The system will prompt that the server content has expired. vs determines that the out of date error will be reported as long as your source code is newer than the version of the target server, even if the database does not exist on the target server. Select Yes here.

During the first deployment, a prompt box may pop up asking you to specify an account that can link to the location data source. In this case, you can click the account of the local administrator directly.

On this page, click Run to process the mining model.

Then the system will automatically perform a lot of work, so you can ignore the details here. (Of course, if no error is reported)

After processing, the mining model is created. Subsequent product recommendation functions are implemented based on this model.

 

Here we will introduce a concept called DMX. You may have never heard of it, but you must know that SQL is used for Table query. In short, DMX is used to query data mining models.

Like SQL, it is also the structure of select from. If you are interested in specific syntax information, refer to Microsoft's SQL Server online database. In this article, we will provide a simple DMX query. You only need to know what it is.

 

Dax sample query:

Select flattened

Predictassociation ([Association]. [V assoc seq line items], include_statistics, 3)

From

[Association]

Natural prediction join

(Select 'touring tire' as [model]

) As [V assoc seq line items]) as t

The general meaning of this statement is: which products may be purchased by users who have purchased the product touring tire based on historical information.

Let's look at the query results:

Model

Support

Probability

Adjustedprobability

Touring Tire Tube

1397

0.860385925

0.783460119

Sport-100

6171

0.267877412

0.489939538

Patch Kit

3010

0.244040863

0.567612365

 

Next, let's briefly explain the information of the relevant columns:

Support-support, the item Support of the row, which is somewhat different from the traditional support. Here, for example, the first line indicates how many orders contain the product touring tire tube.

In the data warehouse, we can use SQL statements to verify this result.

Select count (1)

From [adventureworksdw2012]. [DBO]. [vassocseqlineitems]

Where [model] = 'touring Tire Tube'

The returned result is 1397.

Probability-Probability

The order for purchasing touring tire also contains the number of touring tire tube. Simply put, it is the P (A | B) Problem in the probability.

We will verify the calculation result of the next line, and purchase the order quantity of touring Tire Tube and touring tire at the same time:

Select T1. [ordernumber]

From [adventureworksdw2012]. [DBO]. [vassocseqlineitems] T1

Inner join

(

Select T2. [ordernumber]

From [adventureworksdw2012]. [DBO]. [vassocseqlineitems] T2

Where T2. [model] = 'touring Tire Tube'

) TZ on t1.ordernumber = tz. ordernumber

Where T1. [model] = 'touring tire'

We can see that the number of results returned by the query is 758 rows.

Then calculate the number of touring tire contained in the order:

Select [ordernumber]

From [adventureworksdw2012]. [DBO]. [vassocseqlineitems]

Where [model] = 'touring tire'

The number of rows returned is 881.

Divide it by 758 and the number is exactly 881.

This probability exactly reflects the fact that customers who buy touring tire are likely to buy touring tire tube, therefore, when the customer buys the touring tire, the system automatically recommends the touring tire tube.

You may have seen some of the DMX statements. Of course, you do not need to worry about some keywords for the moment. If necessary, you can refer to Microsoft's documentation. For example, if a customer buys two specified products, what should I recommend to the customer based on these two products?

Select flattened

Predictassociation ([Association]. [V assoc seq line items], include_statistics, 3)

From

[Association]

Natural prediction join

(Select 'touring tire' as [model]

Union select 'touring Tire Tube 'as [model]

) As [V assoc seq line items]) as t

Let's see the familiar Union statement. Well, if you don't understand DMX, you can also see the rule of this statement. That's enough, in the subsequent articles, we will dynamically spell DMX statements based on this basic statement in the C # logic code.

 

 

 

This completes the establishment of the above data mining model, which is usually the basic part of the underlying architecture of a business intelligence project.

In the future, I will write two more articles on the mining model, one of which is based on this model to develop the service interface, which will facilitate various client applications, winform, Asp.net, Silverlight, and other calls. In the last article, We will select a client program to build a simple interface and call this service to implement prediction query.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.