Naive Bayes algorithm is an algorithm based on Bayesian theorem, Bayes theorem is as follows:
\[p (y| x) = \frac{p (x, y)}{p (×)} = \frac{p (Y) \cdot P (x| Y)}{p (X)}\]
Naive Bayes is executed, assuming that $X $ for the characteristics of the data each of these dimensions can be regarded as a random variable, that is $X _1= x_1,x_2=x_2,..., x_n = x_n$, $Y = y_1,...., y_k$ for the corresponding class label, for a given input $X $, plain Bayesian is so predictable in its category $Y $:
\begin{aligned}
P (Y = y_k| x_1 = x_1,x_2=x_2,..., x_n = x_n)
&=\frac{p (x_1 = x_1,x_2=x_2,..., x_n = x_n,y = y_k)}{\sum_i p (x_1 = x_1,x_2=x_2,..., x_n = x_n) P (y=y_i)} \ \
&=\frac{p (Y =y_k) \cdot P (x_1 = x_1,x_2=x_2,..., x_n = x_n| Y = y_k)}{p (x_1 = x_1,x_2=x_2,..., x_n = x_n)} \ \
&=\frac{p (Y =y_k) \cdot \prod_ip (x_i = x_i| Y = y_k)}{p (x_1 = x_1,x_2=x_2,..., x_n = x_n)} \ \
\end{aligned}
The second-to-last step in this section takes advantage of the characteristics of the feature-independent condition. Because the distribution $P (X=x_1,x_2,..., x_n) $ is a fixed value for a given input, the $P is calculated separately (y=y_k| X=x_1,x_2,..., x_n) $ in the probability of all $P =y_i $, take the largest one. So you can get the sample $X $ category $y $:
\[has = \max_{y_k}p (y=y_k) \prod_ip (x_i| Y=y_k) \]
Here are a few questions to note
1) Naive Bayes is a generation model that needs to model the joint probability $P (x, y) $, and then for a given $X $ to obtain a posteriori estimate of the model $Y $.
2) The meaning of the simple word refers to the condition independent, that is , under the condition of the category determination, each characteristic is the condition independent . The second row in the upper style can be transferred to the third row, depending on the condition's independent characteristics.
3) The $X _i$ refers to the $i $ feature as a random variable, $x _i$ represents the $i $ feature, assuming that the $i $ feature has $j $ value, you can use $x _{ij}$ to represent
4) Note the maximum operation of the last step,
according to the above analysis, for a given input X = X_1,x_2,... x_n, to get it belongs to $Y = Y_k, \ k = 1,.... The probability of K $, need to know the following parameters, $P(y= y_k) $,
Naive Bayes (Naive Bayes)