Big Data era: a summary of knowledge points based on Microsoft Case Database Data Mining (Microsoft Neural Network analysis algorithm)

Source: Internet
Author: User



For some time without our Microsoft Data Mining algorithm series, recently a little busy, in view of the last article of the Neural Network analysis algorithm theory, this article will be a real, of course, before we summed up the other Microsoft a series of algorithms, in order to facilitate everyone to read, I have specially compiled a catalogue outline: Big Data era: Easy to learn Microsoft Data Mining algorithm Summary serial, I intend to use Microsoft Business Intelligence in DM this block of algorithms used in this series, each contains a brief algorithm principle, algorithm features, application scenarios and specific operation of detailed steps, Basic can cover most of the business data Mining application scenarios, interested children shoes can be viewed. The algorithm we will summarize is: Microsoft Neural Network analysis algorithm, this algorithm Microsoft Mining algorithm series is the most complex and the most widely used one, simple point: is to simulate our brains from the vast ocean of data to think of useful information to achieve the purpose of data mining. The principle can refer to the previous article.

Application Scenario Introduction

The Microsoft Neural Network algorithm for the application scenario is quite a lot, in the previous article we introduced, its main applications in the following areas:

    • Marketing and promotional analysis, such as evaluating the success of a direct mail promotion or a radio advertising campaign.
    • analysis of manufacturing and industrial processes.

    • text mining.

    • Any predictive model that analyzes complex relationships between multiple inputs and relatively few outputs.

Of course, the above application scenario is very general, and there is no specific application scenario, this is understandable, because the algorithm is to simulate the biological row type algorithm, that is, in a specific environment as long as there is sufficient "evidence" support, we can subjectively judge the results of the application scenario, Microsoft Neural Network algorithm can be applied, but when we think about a small amount of "evidence" can be subjective judgment, but the face of the vast "evidence" under the ocean we human brain want to clear the clue, and then determine the results are more difficult, this is the neural network application scenario.

Some of the above scenarios, not only the Microsoft Neural Network algorithm can be mined, such as: marketing in the evaluation of mail or radio ads in two ways that more effective, in fact, this is the Microsoft Decision Tree Analysis algorithm of the best application scenario Historical data predicts stock movements this is a typical application scenario for the Microsfot timing algorithm .... But all of these ... is because we can determine the premise or mining scope: for example: marketing evaluation, we compare the mail and radio ads ... But there is a special case: for example, neither can promote marketing ... Instead, we use the Microsoft Decision tree algorithm as a result of the company's recent increase in sales, or the performance gains of some uncertain factor. The Microsoft Neural Network algorithm can be used to analyze it.

There is also a more special application scenario: When we are faced with a bunch of data to be based on a certain purpose to the data mining, the sense of not to know or to choose the appropriate algorithm in DM, this is the Microsoft Neural Network analysis algorithm application scenario.

Technical preparation

(1) Microsoft Case Data Warehouse (ADVENTUREWORKSDW208R2), the data table of the call center in the case data Warehouse, a fact table Factcallcenter, we will detail the data in this table in the following steps.

(2) VS2008, SQL Server, Analysis Services.

Purpose of excavation

In some big business companies have their own call center, such as: Mobile 10086, Unicom's 10000 .... And so on, and these call centers in addition to contacting them after they let you choose: satisfaction, dissatisfaction, Gray often dissatisfied as their service level standards, in the industry there is also an indicator to evaluate, this indicator is called: hang-off rate, to reflect the customer's disappointment, that is, when we get into their customer service center, If you choose Human Service, he lets you wait ... You're upset, hang up, and this creates a hang-up case, and the ratio of the total number of hanging up cases to all incoming calls is the hang-off rate indicator. The higher the hang-up rate, the worse the service quality of their customer service centers.

The purpose of the excavation is to find out the impact of the "hang-off rate" of what factors, is customer service mm too little? Bad attitude? Not a sweet voice? Poorly serviced? ......... Thus improve the service quality of the call center, increase revenue.

Operation Steps

(1) We still use the previous issue of the solution, directly open, add a data source view, methods to refer to the previous few, we directly look at the diagram:

Right-click to view the data details in this table:

Referring to the Microsoft Case Database official note, we list the data in this fact sheet. Here is the description of the field:

Column Name

Include content


An arbitrary key that is created when the data is imported into the Data Warehouse.


The operation date of the call center.

The date is not unique because the vendor provides a separate report for each shift time for each operating day.


Indicates whether the day is a weekday, a weekend, or a holiday.


Indicates the shift time for which the call is recorded. This call center divides the workday into four shift times: AM, PM1, PM2, and midnight.


Indicates the number of first-level operators on duty. Call center employees start at the first level.


Indicates the number of two-level operators on duty. Employees must reach a certain number of working hours before they are eligible to become level two operators.


The total number of operators that are present during this shift.


The number of calls received during this shift time.


The number of calls processed entirely through automatic call processing (interactive voice response, or IVR).


The number of orders generated by the call.


The number of issues generated by the call that require follow-up operations.


The average time required to answer a call.


Indicates the "suspend rate" for this shift time. The hang-up rate is an indicator often used by call centers. The higher the hang-up rate, the worse the customer satisfaction, and the greater the likelihood of losing potential orders. The hang-off rate is calculated on a shift-time basis.

In fact, the above table has listed a few key fields, we see, which we mentioned above the "hanging rate": servicegrade field, some of the previous lines is to record some call center work information, when we face this information is impossible, Because we can not see that these factors will affect the size of the Servicegrade indicator value, of course, this time we use the Microsoft Neural Network analysis algorithm to take the approach of mining analysis.

(2) Create a new mining structure

Let's build this data mining model, simple steps, specific content can refer to my previous blog content, see a few key steps:

We click Next and then make the input and output settings

Here we do not know those factors will affect the "hanging off rate" This indicator, we will be obediently Select all, this is called: ning indiscriminate! .... I'll go.... But there are two of us who have not chosen, a datekey. This is the work record date, I basically can be sure that this indicator and the day of work No hair relationship, of course you can also choose, that processing time longer, there is a factcallcenterid, this is the key value, certainly not selected, Then the output we chose: Servicegrade, then there is orders (the amount of order generated), which is related to performance, we by the way to see those factors will produce more orders, choose his reason you understand! Then there is a leveloneoperators, this is the number of the first post, through which we can analyze the two levels of job will not be useful.

We click Next:

This suggests that the neural network analysis algorithm is not allowed to drill, this is understandable, because it is not a linear function, that is, you drill a "neuron" node, and these "neurons" also rely on other "neuron" support, so theoretically your drill down is meaningless, Do not understand can refer to my last chapter of the principle.

Let's deploy the mining model and then process it, with a simple, no-nonsense introduction.

Results analysis

Not introduced, we look directly at the results

Neural network "model Viewer" is very simple, you can see only one panel, which is divided into two parts: input and output, the following is the property values of the various variables, by manipulating the above input and output can be analyzed by the different variables on the output of the impact, this is similar to the "Cluster Analysis algorithm" feature analysis panel.

The input properties are simple and we can select the various properties we have selected above:

You can select a value

Here we can see that we chose the "self-service answer" value above, but it shows the value is a segmented interval value, here we want to explain the characteristics of the neural network, for the discrete attribute value, the Microsoft Neural Network is to take the sampling segment to the interval value truncation, But this interval value is not strictly based on the mathematical geometric series, such as:

Let's take a look at servicegrade. How this discrete value is grouped in VS:

The ServiceGrade property is theoretically a value between 0.00 (answer all calls) and 1.00 (hangs up all calls), but in the neural network algorithm is grouped according to the above figure, will be divided into 0.0748051948 0.09716216215 Such a range. Although this grouping is mathematically accurate, such scopes may not make much sense to business users. To group numeric values in different ways, you can create one or more copies of a numeric data column, and specify how the data mining algorithm should handle these values. This will be more successful in approaching our target analysis.

As we can see, this output is the same way:

Let's analyze the first mining purpose above: those factors will affect (hang-off rate) Service Grade. We choose one of the highest groupings, one with the lowest score

We can always see that the output we choose is "hang up rate": Service Grade here choose two zones: 0.030-0.072 and 0.126-0.210,0.210 The concept is that there are 100 customers call, people unhappy, to you hang up the number of 21, is already a very high value, the higher the value is, the worse the quality of service, the more we look at the variables, it is clear that the first factor that affects the "hang rate" is: Average time per Issue (the average duration of the answer).

"Average time to answer" between 44.000-70.597 is more inclined to 0.030-0.072 this low-score response rate, explain what? That is to say that people call the general at this time to solve your lost, people are more satisfied, will not hang you.

The second factor, "orders," the number of orders, this is also between 321.940-539.000, the hanging rate is lower, in fact, this should be due to the low hanging rate resulting in an increase in order quantity

We look at the third factor, "average time to answer" between 89.087-120.000, the hang-off rate soared to 0.126-0.210 .... Nani!!! Is this for Mao? ... Customer service response time is longer ... The higher the hang-off rate!

Oh, oh ... I guess this part of the general customer service mm to explain the dissatisfaction, and then people have always wanted to ask understand, YA customer service mm is not clear, so customer decisive hang up the phone, no longer bird you. There is, of course, a situation where customers have been harassing customers on the phone. MM ... And then... Then the customer service mm will be hung off ... Of course.. These are the guesses ... We do not care about the process, only care about the results: in this interval of the hanging rate is high, there is a picture of the truth.

I'm curious to compare the comparison of the two intervals of the "average time to answer", which we'll take a look at:

Hey.... The average response time between interval 44.000-70.597 is very low, and the score is 100 points! See, the probability is at 53.48%, and the probability of becoming a high "hang-off rate" is: 6.18%.

The following average response time in interval 89.087-120.000 between the "hanging rate" is very high, the score at 74.01, the rating value reflects the credibility of this judgment, and see the probability of a high "hang-off rate" soared to: 45.22%.

Looking down, we also found a more lovely situation, cut a picture to see:

This shift value represents the shift time, and the value shown above shows the midnight .... Night... The Dark night ... to customer service mm call the probability of hanging off rate is very low .... This is God horse reason ..... It seems that Microsoft has given the case database data is quite real!

Other properties I do not analyze here, the method ibid. In fact, we have used the Microsoft Neural Network analysis algorithm has analyzed the impact of the "hang-off rate" the most important factor is: Average time Per Issue (answer average), the following we adjust the input, directly to analyze this factor:

Between 44.000-70.597, the exclusively low hang-off rate, and the resulting order volume is most likely to be 321.940-539.000, Khan ... Your sister ... It's best to be late at work. Then look at the following:

Change an interval ... The result is basically unchanged, the reason is not explained

The next interval, the situation has changed, in this interval, the order is between 50.000-181.677 has shown a high "hanging off rate" trend.

..... I'll go... To this range ... became the exclusively high "hang-off rate", and work Time became (PM2) afternoon .... Number of orders reduced to 50.000-181.677 .... It seems that the afternoon Customer service center should be a holiday, all changed to "Late Night" to work ... Hey...

For this I browsed through the data source view, and through the pivot table to verify that our inference is correct, see the following diagram to know:

Right... The longer the average response time, the higher the response rate score, indicating the higher the hang-off rate.

We can also use the features of the Microsoft Neural Network algorithm to reverse-validate the conclusions we infer above, we change the output to average time Per Issue "answer average", and then choose two interval values to see what the variable values that affect this property are

See ... The average response time between 0.126 and 0.210 of a high hang-off rate is more than 89.087-120.000, and the same low "hang-off rate" tends to be 44.000-79.597.

Our attribute values are analyzed here, and interested children's shoes can continue to analyze other.

Big Data era: a summary of knowledge points based on Microsoft Case Database Data Mining (Microsoft Neural Network analysis algorithm)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.