1.3 Logarithmic computation
§ The product of many small probabilities causes the overflow of floating-point numbers
§ because log (xy) = log (x) + log (y), the original product calculation can be calculated by taking the logarithm into a summation
§ Because log is a monotone function, the category with the highest score does not change
Therefore, it is often used in practice:
1.4 0 Probability problem
If a word item no longer appears in a category, the document containing the word item belongs to the probability p=0 of that category.
That is, the class cannot be judged once the order probability has occurred.
Workaround: Add a smoothing.
§ Before smoothing:
§ Smooth: Add 1 to each volume
§b is a different number of words (in this case the glossary size | V | = B) 1.5 of two common models
There is a need to mention the two independence assumptions of the Bayesian model: positional Independence and conditional independence.
Polynomial model and Bayesian effort model. The former considers the number of occurrences, the latter only considering the occurrence and not appearing, namely 0 and 1 problems.
1.6 Algorithm Process
Training process:
Test/application/classification:
1.7 Example
First, the first step, the parameter estimate:
Then, the second step, Category:
Therefore, the classifier classifies the test document into the C = China class because the d5 of the positive Chinese appears 3 times more than the inverse of the Japan and Tokyo weights.