Original: Microsoft Naive Bayes Algorithm--three-person identity division
Microsoft Naive Bayes is the simplest algorithm in SSAS and is often used as a starting point for understanding the basic groupings of data. The general feature of this type of processing is classification. This algorithm is called "plain" because the importance of all attributes is the same, and no one is taller than the other. The name of Bayes originates from Thomas Bayes, who came up with a method of using arithmetic (probability) to understand the data. Another understanding of this algorithm is that all properties are independent and unrelated. Literally, the algorithm simply computes the association between all attributes.Although the algorithm can be used for both prediction and grouping, it is most commonly used in the early stages of model building, and is more commonly applied to grouping rather than predicting a specific value. By marking all properties as simple input or as both input and predictable, this allows the algorithm to take into account all properties when executed. The amount of work may be somewhat larger when labeling attributes. It is common to include a large number of attributes in the input and then process the model to evaluate the results again. If the results do not seem to make sense, we often reduce the number of attributes that are included in order to better understand the most closely related relationships.
If you have a large amount of data and you have little knowledge of the data, then you can use the naïve Bayesian algorithm. For example, a company might get a lot of sales data by merging a competitor. When dealing with such data, it is possible to use naive Bayes as a starting point.
It should be understood that this algorithm has a significant limitation and can only handle discrete (or discretized) content types. If you select a data structure that contains data columns that have a content type other than discrete (such as continuous), then the mining model established by naive Bayesian ignores the data.
The naive Bayesian algorithm has 4 configurable parameters: Maximum_input_attribute, Maximum_output_attribute, Maximum_status, Minimum_dependency_ Probability. You can modify the configured (default) value by entering a new value in value. This information is described in the description area of the Algorithm Parameters dialog box.
One might wonder if it is often necessary to adjust the default values of the algorithm parameters. We found that with the gradual understanding of the functions of each algorithm, we began to prefer manual adjustment. Because naive Bayes is frequently used in data mining projects, especially in the early stages of a project, we find ourselves often adjusting its parameters. The first 3 parameters work at a glance: Adjust the configured values to reduce the maximum number of input, output, or grouping states. The meaning of the final dependency probability is less obvious. When this value is reduced, the actual number of nodes or groupings generated by the model is required to be reduced.
Let's move on to the topic, and we'll continue to take advantage of the last solution, followed by the following steps:
Select the required input variables and predictor variables, as well as the index keys. This example takes the sequence as the index, the identity is the Predictor variable, selects the command, the force, the intelligence, the politics, the charm Five variables is the input variable, completes the click "OK" the button, then will go to the original page, click "Next" button.
Select the correct Data property, fix the Data property of the variable and click "Next" button.
Change the mining structure name and click the Finish button.
The Mining Model Viewer renders this dependency network and further understands the distribution of the data.
From the property configuration file, you can learn about the distribution of attributes for each variable.
From "Attribute Property", we can see the probability of the basic characteristics of different groups.
From the "attribute contrast", we can compare the characteristics of different groups.
Reference documents:
Microsoft Naive Bayes algorithm
http://msdn.microsoft.com/zh-cn/library/ms174806 (v=sql.105). aspx
Microsoft Naive Bayes Algorithm--three-person identity division