This article refers to the book "Machine Learning" by Zhou Zhihua's teacher.
1. Naive Bayesian classifier
The naive Bayesian classifier employs the " attribute conditional Independence hypothesis ": For a known category, assume that all attributes are independent of each other, assuming that each attribute has an independent effect on the classification result.
D is the number of attributes, Xi is the value of x on the first attribute, and the Naïve Bayes classifier is:
Make the Dc represent a collection of Class C samples in training set D, such as the watermelon dataset has two categories: good melon and bad melon, if there is sufficient independent distribution of samples, then
For discrete attributes:, Dc,xi represents a set of samples with a value of Xi on the first attribute of the Dc, and a continuous density function for continuous attributes, assuming that P (XI|C) is normally distributed.
In order to avoid the information that is carried by other properties being erased by attributes not appearing in the training set, it is usually smoothed when estimating the probability value. N represents the number of possible categories in training set D, and Ni represents the number of possible values for the I-Properties:
2. Semi-naive Bayesian classifier
To relax the independence condition and consider the interdependent information among some attributes, the "Independent Dependence Estimation" (ODE) is a common strategy used in semi-naïve Bayesian classifier. There are three main methods:
(1) Spode (Hyper-parent dependency estimation), assuming that all attributes are dependent on the same attribute, the last category node and each attribute are connected;
(2) TAN (tree augmented naive Bayes), on the basis of the largest tree, the steps are as follows: calculates the mutual information between two attributes, constructs the complete graph with the attribute as the node, the weights of the edges between any two nodes are set to mutual information, and constructs the maximum band-weighted spanning tree of the sub-complete graph, Select the root variable, set the edge to forward, join the category node Y, and increase the forward edge from Y to each property.
(3) Aode (average independent estimation), attempts to build each attribute as a super-parent spoe, integration results, similar to naive Bayes, without model selection, is a sample count that matches the criteria.
3. Bayesian network
"Belief Nets", using directed acyclic graphs (dags) to characterize dependencies between attributes
4. EM algorithm
The previous discussion assumes that the values of all attribute variables in the sample have been observed, but in real-world applications The training sample is difficult to complete, so there is an unobserved variable, the "hidden variable", so the EM (desired maximization) algorithm is used: The first step is to expect (E) step, The expected value of the logarithmic likelihood is computed using the current estimated parameter values (i.e. the values of the optimal hidden variables); The second step is to maximize (M) step, and find the value of the parameter that can maximize the likelihood expectation generated by e step. Then alternating down until the local optimal solution is convergent. Can be seen as a coordinate descent method.
Machine Learning Algorithm--Bayesian classifier (II.)