Naive Bayes & KNN

Source: Internet
Author: User

1, naive Bayesian method, first of all to be clearly used for classification tasks.

In machine learning, whenever a classification problem is encountered, all methods focus on two parts: the characteristics of the input vectors to be categorized and the characteristics of each category in the training vector set.

The variable is, however, the number of features, the number of categories, and the number of training samples.

Naive Bayesian method in dealing with this problem, the use of the idea is probabilistic, that is, each input vector may belong to the Category 1, also may belong to Category 2, open attitude, but must choose, choose the probability of the larger category.

In the text category, the characteristic of this problem is all words, not duplicates. The sample, that is, all documents, categories, can be spam, normal mail, or any person can be divided into categories.

So naturally, our minds should form a table similar to Dataframe, in fact, the document-word matrix, each line can represent each sub-document, each column is a non-repeating word. The value at each location can indicate that the word appears several times in the document. This creates a training set.

Now the naive Bayes method requires that the left part of the equals sign in this famous formula.

And the meaning of the left part of the equal sign is essentially, I got a document with so many words in it, what is the probability that my document is classified as Category 1 under this condition? What is the probability of attribution to Category 2?

Then the solution is very intuitive, the denominator portion of P (W) can not be asked, just require the product of the molecule. P (CI) is also relatively easy, just to count the probability of occurrence of each category in the training set. The difficulty is P (W|CI), where w is a vector. The first important hypothesis in the naïve Bayesian approach is that the features are independent of each other. This can then be expanded into a form of multiplication, only to count the probabilities of each word appearing in that category, and then multiply to get the result.

Another small problem is that the specific calculation process may be due to a certain probability is too small, rounding to 0 leads to a product of 0, so the need to Laplace smoothing, the default is the initial occurrence of each word is 1, the denominator of the conditional probability of the initial value of 2. This is called Bayesian estimation of conditional probabilities.

There is also a problem in the actual computer program, called the next overflow. The reason is also because of the reason that every probability value of the multiplication is too small, this time can be used to take the logarithm method, because we finally compare the size of two categories of probability values, according to the monotonic nature of the logarithm function, still can guarantee the accuracy of the results.

In the actual program logic, it is necessary to get the class probability of all the words in the training set first, obtain two class probability vectors and a priori probability, and then set up the lexical vector in the test set, and multiply the vectors directly.

2. KNN (Nearest neighbor method)

The KNN application type is also a classification problem. But its decision-making rules are not the probabilistic approach of naive Bayes, but the most conventional voting model (i.e., the minority obeys the majority).

The core part of the KNN method is how to judge and test the training vectors of the nearest neighbor. The most common is the European-style distance, the rest of the distance like Manhattan, Minkowski distance can also be used. It can be seen from here that in the process of KNN, it is necessary to change the feature expression of the training set into the form of numbers.

The rest is good to understand, by sort, select the nearest nearest neighbor K training vector, and then vote to select the category of the most than the category of the test vector category.

There are two problems, one is the choice of K, neither too big, but not too small. Too small easy to mistake the noise point as the normal point, commonly known as overfitting; too large means that the whole model becomes simple and the approximate error of learning increases. The extreme situation is k=n, so no matter what the input instance is, it will simply be predicted as the category with the highest proportion of the training set. K is generally appropriate to take a smaller value.

The second is the calculation of KNN. A disadvantage of KNN is that the computational amount is too large, in order to improve the computational efficiency, a new data structure is introduced, called KD tree. The average search complexity of kd-tree can reach O (log (N)), which is highly efficient when the number of training instances is much larger than the spatial dimension. For more information on how to search, you can view the Hangyuan Li "Statistical learning method" P44.

Naive Bayes & KNN

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.