Pattern Recognition (vii): MATLAB implements naive Bayesian classifier

Source: Internet
Author: User

This series of articles by the Cloud Twilight Edition, reproduced please indicate the source

http://blog.csdn.net/lyunduanmuxue/article/details/20068781

Thank you for your cooperation.


Basic Introduction


Today, we introduce a simple and efficient classifier- naive Bayesian classifier (Naive Bayes Classifier).


It is believed that the students who have studied probability theory should not be unfamiliar with the name Bayesian, because there is an important formula in probability theory, which is named by Bayes, this is "Bayesian formula":




Bayesian classifier is based on this formula developed, the reason is also added to the simple word, because the classifier to the distribution of the various types of assumptions, that is, the different types of data samples are independent of each other. Such assumptions are very strong, but do not affect the applicability of naive Bayesian classifiers. In 1997, Domingos and Pazzani of Microsoft Research proved that the classifier still exhibited good performance even if its assumptions were not established. One explanation for this phenomenon is that the classifier needs to train fewer parameters, so it is good to avoid the occurrence of overfitting (overfitting).


Implementation Notes


Below we step by step to implement Bayesian classifier.


The training of classifiers is divided into two steps:

Calculate the prior probability; calculate the likelihood function;
The application process simply calculates the posteriori probability by using the prior probability and likelihood function obtained in the training process.
The so-called transcendental probability, in fact, is the probability of each class occurrence, this is a simple statistical problem, that is, the training data concentration of different classes accounted for the ratio can be calculated.
The training likelihood function is similar to this, which is to see the value of each characteristic corresponding to the probability value of a class.
As for the posterior probability, it is generally not true to complete the calculation, but only to calculate the right molecular portion of the Bayesian formula, because the denominator part of the knowledge of a factor, the specific problem is a constant value.
code Example

After the naïve Bayesian classifier has the most basic understanding, the following we began to try to design a MATLAB.


First, the prior probabilities are computed:


[Plain]  View Plain copy function priors = nbc_priors (training)   %nbc_priors  calculates the priors for each class by using the training  data  %set.  %%   priors = nbc_priors (training)   %   Input:  %   training - a struct representing the  training data set  %       training.class    - the class of each data  %        training.features - the feature of each data  %%  Output:   %   priors - a struct representing the priors of  each class  %       priors.class - the  Class labels  %       priors.value - the priors of  its corresponding classes  %%  running these code to  get some examples:  %nbc_mushroom  %%  edited by x. sun   %   my homepage: http://pamixsun.github.io/  %%     % check the input arguments   if nargin < 1    &NBSP;&NBSP;&NBSP;&NBSP;ERROR (Message (' MATLAB:UNIQUE:NotEnoughInputs '));   end     % extract the class labels   Priors.class = unique ( Training.class);  % initialize the priors.value   priors.value =  Zeros (1, length (priors.class));  % calculate the priors   for i  = 1 :  length (Priors.class)        priors.value (i)  =  (sum ( Training.class == class (i))  /  (Length (training.class));   end      % check the results   if sum (priors.value)  ~= 1        error (' Prior error ');   end      end  

followed by the training of a complete naive Bayesian classifier:



function [Likelihood, priors] = TRAIN_NBC (training, featurevalues, AddOne)%TRAIN_NBC trains a naive Bayes classifier usin
G The training data set. percent per cent [likelihood, priors] = TRAIN_NBC (training, featurenames, AddOne)% Input:% training-a struct representing the
Training data set% Training.class-the class of each data% training.features-the feature of each data  % featurevalues-a cell that contains the values of each feature% Addone-to chose whether use add one smoothing or
Not,% 1 indicates yes, 0 otherwise.        Percent Output:% likelihood-a struct representing the likelihood% likelihood.matrixcolnames-the feature values%  Likelihood.matrixrownames-the class labels% likelihood.matrix-the likelihood values% priors- A struct representing the priors of each class% Priors.class-the class labels% priors.value-the priors Of its corresponding classes percent Running these code to get some examples:%nbc_mushroom edited by x. Sun% My homepage:http://pamixsun.github.io/percent% Check the input arguments if Nargin
< 2 error (Message (' MATLAB:UNIQUE:NotEnoughInputs '));
End% Set The default value for AddOne if it's not given if Nargin = = 2 AddOne = 0;

End% Calculate the priors priors = nbc_priors (training); % learn the features by calculating likelihood for i = 1:size (Training.features, 2) uniquefeaturevalues = Featureval
    Ues{i};
    Trainingfeaturevalues = Training.features (:, i);
    Likelihood.matrixcolnames{i} = uniquefeaturevalues;
    Likelihood.matrixrownames{i} = Priors.class;
    Likelihood.matrix{i} = Zeros (length (priors.class), Length (uniquefeaturevalues));
        for j = 1:length (uniquefeaturevalues) item = Uniquefeaturevalues (j);
            For k = 1:length (Priors.class) class = Priors.class (k);
            Featurevaluesinclass = trainingfeaturevalues (Training.class = = Class);
     Likelihood.matrix{i} (k, j) = ...           (Length (featurevaluesinclass (Featurevaluesinclass = = Item)) + 1 * addone) .../(Length (Featurev
        Aluesinclass) + addone * Length (uniquefeaturevalues)); End End End



Finally, use the classifier we have trained.


function [predictive, posterior] = PREDICT_NBC (test, priors, likelihood)%PREDICT_NBC uses a naive Bayes classifier to pre
Dict the class labels of %the test data set. Percent per cent [predictive, posterior] = PREDICT_NBC (test, priors, likelihood)%  input:%   test-a struct representing t He test data set%       Test.class    -The class of each data%       Test.featu Res-the feature of each data%   priors-a struct representing the priors of each class%       PRI Ors.class-the class labels%       priors.value-the priors of its corresponding classes%   Likeli  Hood-a struct representing the likelihood%       likelihood.matrixcolnames-the feature values%       Likelihood.matrixrownames-the class labels%       Likelihood.matrix        -The likelihood values  output:%   predictive-the predictive results of The test data set%       Predictive.class-the Predictive class for each data  %   posterior -a struct representing the posteriors of each class  %       Posterior.class-the class labels &nbs
P %       posterior.value-the posteriors of the corresponding classes  percent  running these code to Get some examples:%nbc_mushroom%  edited by x. Sun%   My homepage:http://pamixsun.github.io/percent% Check t
He input arguments if Nargin < 3     error (Message (' MATLAB:UNIQUE:NotEnoughInputs '));


End posterior.class = Priors.class;
% Calculate posteriors for each test data record Predictive.class = zeros (Length (Size (test.features, 1)), 1);
Posterior.value = zeros (Size (test.features, 1), Length (Priors.class)); For i = 1:size (test.features, 1)     record = Test.features (i,:);    % Calculate posteriors for EA CH Possible class of that record     for j = 1:length (PRiors.class)         class = Priors.class (j);        % Initialize posterior as the prior value of that class         POSTERIORV
Alue = Priors.value (Priors.class = = Class);
        for k = 1:length (record)             item = record (k);             Likelihoodvalue = ...                 Li Kelihood.matrix{k} (J, likelihood.matrixcolnames{k} (:) = = Item);             Posteriorvalue
= Posteriorvalue * Likelihoodvalue;         end         Calculate the posteriors         Posteri
Or.value (i, j) = Posteriorvalue;     End    % Get The Predictive class     Predictive.class (i) = ...       &N Bsp Posterior.class (Posterior.value (i,:) = = Max (Posterior.value (i,:))); End Predictive.class = 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.