Algorithm Overview
NBC is one of the most widely used classification algorithms. The naive Bayes model originated from classical mathematical theory and has a solid mathematical foundation and stable classification efficiency. At the same time, the NBC model requires few parameters, which are not sensitive to missing data and the algorithm is relatively simple.
Algorithm hypothesis
Given the target value, attributes are mutually independent.
Algorithm input
Training data t = {(x1, Y1), (X2, Y2 ),......, (Xn, yn )}
Data to be classified: X0 = (x0 (1), x0 (2 ),......, X0 (N) T
Algorithm output
Classification Result of x0 for data to be classified y0, {C1, C2 ,......, CK}
Algorithm IDEA
Run WEKA
The running result of weather. Nominal. ARFF is as follows:
It can be seen from the results that there are two categories, so a 2*2 confusion matrix is generated.
Function call code
// Read the sample
Filefile = new file ("F: \ Program Files (x86) \ WEKA-3-7 \ data \ weather. Nominal. ARFF ");
Arffloaderloader = newarffloader ();
Loader. setfile (File );
INS = loader. getdataset ();
INS. setclassindex (INS. numattributes ()-1 );
// Initialize and train the Classifier
CFS = (classifier) class. forname ("WEKA. classifiers. BAYes. naivebayes"). newinstance ();
CFS. buildclassifier (INS );
// Obtain the classifier result
Testingevaluation. evaluatemodelonceandrecordprediction (CFS, testinst );
// Print the classification result
System. Out. println ("classifier accuracy rate:" + (1-testingevaluation.errorrate ()));
The running result is as follows:
Classifier accuracy: 0.9583333333333334
Algorithm Application
? Spam filtering system? Classified web pages? Classified text
The spam filtering system can be referred to in this paper: Zhou Weicheng Ma suxia Qi Lin Hai, a machine learning-based spam intelligent filtering method.
For Original Articles, please indicate the source. Thank you.