Big Data era: a summary of knowledge points based on Microsoft Case Database Data Mining (Microsoft Linear regression analysis algorithm)

Source: Internet
Author: User

Reprint: http://www.cnblogs.com/zhijianliutang/p/4076587.html

This is the last article of the Microsoft Series Mining algorithm, after the completion of this article, Microsoft in Business intelligence this piece of the series of mining algorithms we have completed, this series covers the Microsoft in Business Intelligence (BI) module system can provide all the mining algorithms, of course, this framework can be fully expanded, You can customize the mining algorithm, but the current series is not covered, only the algorithm provided by Microsoft, of course, these algorithms have basically covered most of the commercial data mining application scenarios, that is, skilled in the majority of these algorithms of the application scenario can be easily solved, each algorithm summary contains: algorithm principle, algorithm features, Application scenarios and detailed steps for the operation. In order to facilitate the reading, I also specific to organize a directory: Big Data era: simple and easy to read Microsoft Data Mining algorithm summary serial, interested can click to see.

This article describes the Microsoft Linear regression analysis algorithm, the principle and the Microsoft Neural Network analysis algorithm, just like the focus is not the same, the Microsoft Neural Network algorithm is based on a certain purpose, using the existing data for "probing" analysis, focusing on analysis, The Microsoft Linear regression analysis algorithm focuses on "prediction", that is, based on neural network analysis of the rules, the results of prediction.

Application Scenario Introduction

The application scenario of the algorithm and the previous Microsoft Neural Network analysis algorithm, not clear can be clicked to see, can be simply enumerated:

    • Marketing and promotional analysis, such as evaluating the success of a direct mail promotion or a radio advertising campaign.
    • analysis of manufacturing and industrial processes.

    • text mining.

    • Any predictive model that analyzes the complex relationships between multiple inputs and relatively few outputs.

In fact, the algorithm for the Microsoft Neural Network analysis algorithm to complement the algorithm, the previous article we have introduced, when we face a bunch of data and to be based on a certain purpose to the data mining, feel that we do not know or choose the appropriate algorithm in DM, At this point we apply the Microsoft Neural Network analysis algorithm, and when we analyze the rules with the Microsoft Neural Network analysis algorithm, we use the Microsoft Linear regression analysis algorithm to predict the results.

Technical preparation

(1) Microsoft Case Data Warehouse (ADVENTUREWORKSDW208R2), the data table of the call center in the case data Warehouse, and the previous Microsoft Neural Network analysis algorithm used the same fact table Factcallcenter, which can be found in the previous article.

(2) VS2008, SQL Server, Analysis Services.

Purpose of excavation

In the previous article we have used the Microsoft Neural Network analysis algorithm to the Microsoft case database of the call center data in a brief analysis, through analysis we know the impact of the "hang-off rate" The main factor of this indicator is two: the first is the average response time (averagetimeperissue), the second is the work stage (Shift), and the late night to infer the low hanging rate rules, this article we will use these rules to do digging.

Two goals:

1, according to the rules to find out the average response time to adjust to how much best, or based on the goal, such as the need to control the hang-off rate within 0.05, the response time should be controlled in how much appropriate.

2, how to arrange the position of the time and the best number of positions, such as: to arrange several classes of posts, how many people each post, and then when the best time to work.

Operation Steps

(1) We still use the previous issue of the solution, directly open, look at the picture:

Let's add a new Microsoft Road logistic regression algorithm, in the mining model panel, right-click Add New algorithm, do not understand can refer to my previous articles

Let's set the input and forecast attribute values, the default is the same as the previous Microsoft Neural Network property values, because we want to predict the "hang-up rate" and the number of posts, we choose Servicegrade and level operators here to set to "forecast", Here vs will create two separate models for these two metadata containers. That is to say, the algorithm will savor each set of predictable attributes to create a separate subtree.

We have changed the other columns to "input".

Let's deploy the mining model and run it, and the next step is to explore the data.

(2) Deploy the program, create the mining

After the program is deployed, and then click the Run button, here we can see "Mining Model Viewer", the algorithm's browser display content and Microsoft Neural Network algorithm is the same, here is no nonsense introduced, do not understand can refer to my previous article.

So the algorithm and the Microsoft Neural Network algorithm is the same, if really to compare, in fact, the Microsoft Logic regression algorithm is designed based on the purpose, that is, it is compared with the neural network algorithm, it is with the goal to carry out the logical transfer, This is a bit like the relationship between the Microsoft Decision tree algorithm and the Bayesian algorithm.

No nonsense, we proceed with our excavation

We go straight to the mining model rules.

We choose a "mining model" here, and then select "Singleton Query" in the mining model

Here we choose the rule of the previous neural network discovery, shift time (SHIFT) Selection: Night (midnight), hehe ... Then we enter the number of people in class two, and we assume there are 6 of them.

We select "Prediction function" in "Source", "field" select PredictHistogram, then drag servicegrade into "condition/parameter"

Click to run, you can see this rule, the predicted "hang-off rate" is how much, 6 people on the night shift

Hey... The results came out. 0.102566737 ... the results can be ... It means that 100 people call only 10 people hang up. The following values are some data support, such as the number of cases, probabilities, etc.

This prediction is relatively single, we can not be a number of people to test, in fact, based on this model, we may also be able to dig deeper, such as the current number of our call center, the order of work is already fixed, we can make predictions based on this existing data, predict how the next step will be adjusted:

We do this:

First we set up a data row that can be used for forecasting based on the data in the existing table, we group it according to the work round, whether it is a holiday, the average number of each round, the average number of calls, etc... We can use this statement:

Here we filter the maximum and minimum pages for subsequent mining. We change this statement to the data source view in VS: Named query

Let's go in. To the mining panel, select the Case table

Then edit the management relationship to map the containing Calls column, Orders column, Issues column, and lvltwooperators column to the average.

Let's design the predictive function.

Click to run, we can see the detailed results of the forecast:

The above analysis results can be seen in the Holiday (holiday) Midnight (night) of the highest hanging rate: 0.158, while in the PM2 (PM second) weekday (weekday) Day of the hanging rate is the lowest: 0.1144

But these values may not be what we expected, such as Mister, To keep the hang-off rate below 0.1, how to adjust it, in fact, based on the previous one of our neural network algorithm has been analyzed, the average response rate of this factor for the hanging rate of this indicator impact is very large, we can adjust this value to reduce the value of the hang-off rate of the size, improve service level, such as we can reduce For 90 or 80% of the average response time, let's predict how much the value of the hang-off rate will be as follows.

We adjust the above statement of the data source view to add two items:

Then this statement adjusts the value of the data source view, using the above method to predict the reduction to 90% of the average response time, its hanging rate is how much, we directly write DMX statement query:

SELECT T.[shift], T.[wagetype], Predict ([factcallcenterreturn].[ Service Grade]), predictprobability ([factcallcenterreturn].[       Service Grade]) from [Factcallcenterreturn]prediction JOIN OPENQUERY ([Adventure Works dw2008r2], ' SELECT [Shift], [Wagetype], [Avgcalls], [avgissues], [avgoperators], [avgorders], [last90timeperissue] F ROM (SELECT DISTINCT wagetype, Shift, AVG (orders) as Avgorders, MIN (orders) as Minorders, MAX (orders) as Maxorders,av G (Calls) as Avgcalls, Min (Calls) as Mincalls, MAX (Calls) as Maxcalls,avg (leveltwooperators) as Avgoperators, Min (leveltwo Operators) as Minoperators, MAX (leveltwooperators) as Maxoperators,avg (issuesraised) as Avgissues, MIN (issuesraised) as Minissues, MAX (issuesraised) as Maxissues,avg (averagetimeperissue) as Avgtimeperissue, (AVG (averagetimeperissue) *0.9 As Last90timeperissue, (AVG (averagetimeperissue) *0.8) as Last80timeperissuefrom dbo. Factcallcentergroup by Shift, Wagetype) as [Shifts to call CenteR] ') as TON [Factcallcenterreturn]. [wage Type] = T.[wagetype] and [Factcallcenterreturn]. [Shift] = T.[shift] and [Factcallcenterreturn]. [Calls] = T.[avgcalls] and [Factcallcenterreturn]. [issues raised] = T.[avgissues] and [Factcallcenterreturn]. [Level one Operators] = t.[avgoperators] and [Factcallcenterreturn]. [Orders] = t.[avgorders] and [Factcallcenterreturn]. [Average time Per Issue] = t.[last90timeperissue]

Look at the results:

The hang-up rate is reduced compared to the average, but it has not met the boss's requirements, under 0.1, we continue to reduce the average response rate to see, reduced to 80%

Let's take a look at the prediction results:

Hey, there has been 0.1 below the response rate, it seems to follow this rule to adjust, basically can meet the requirements of the boss, the average response rate is reduced to 80%.

Interested children's shoes, can follow this law analysis and mining, to correct the number of each position and the adjustment of the work rounds.

Big Data era: a summary of knowledge points based on Microsoft Case Database Data Mining (Microsoft Linear regression analysis algorithm)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.