Machine learning (common interview machine learning algorithm Thinking simple comb) __ Machine learning

Source: Internet
Author: User

When looking for a job (IT industry), in addition to the common software development, machine learning positions can also be regarded as a choice, many computer graduate students will contact this, if your research direction is machine learning/data mining and so on, and it is very interested in, you can consider the post, After all, machine learning can be an important tool before the human level is reached, and with the development of science and technology, it is believed that the demand for talents will become more and more big. Supervision (supervised) Classification (classification) 1.knn (k nearest neighbors) algorithm:

  Key Formula : d= (∑ (xi−xtest) 2) d=\left (\sum (X_i-x_{test}) ^2\right) ^{\frac{1}{2}}
  pseudo Code (PSEUDO_CODE):
Calculate the distance from the training focus to that point
Select the k point with the smallest distance
Returns the highest frequency category of K points as the prediction classification for the current point

def KNN (in_x, Data_set, labels, k):
Diff_mat = Tile (in_x, (data_size,1))-Data_set Sq_diff_mat
= diff_mat**2< c2/> distances = sq_diff_mat.sum (Axis=1) **0.5
sorted_dist_indicies = Distances.argsort () for
I in Xrange (k): c5/> Vote_label = labels[sorted_dist_indicies[i]]
Class_count[vote_label] = class_count.get (Vote_label, 0) + 1< c7/> Sortedclasscount = sorted (Classcount.iteritems (),
key=operator.itemgetter (1), reverse=true) return
2. Decision Tree:

Key Formula : H (x) =−∑p (xi) log2 (P (xi)) H (x) =-\sum p (x_i) log_2 (P (x_i))
The important point in decision tree is to select an attribute to branch according to the size of information entropy, so pay attention to the calculation formula of information entropy and understand it deeply.       
Pseudo Code (PSEUDO_CODE):
detects whether each subkey in the dataset belongs to the same category: the
If so return class label;
  Find the best features for dividing datasets
Create branch nodes
for each subset of the partition
Call function Createbranch and increase return results to branch node
back branch node
The principle is only one, try to make each node sample label as little as possible, note the above pseudo code in a sentence said: Find the best feature to split the data, then how to get thebest feature? The general rule is to try to make the branching section Point of the category of pure some, that is, the more accurate points. As shown in (figure I), the 5 animals that are fished out of the ocean, we have to determine whether they are fish and which features to use first.

(Figure I)
in order to improve the accuracy of recognition, we first use the "leave the land to survive" or "whether there is webbed" to judge. We have to have a yardstick, commonly used in information theory, the purity of Guinea, and so on, here use the former. Our goal is to select the feature that makes the label information gain the most in the segmented data set, the information gain is the original data set label base entropy minus the partition of the data set label entropy, in other words, the information gain is large is the entropy becomes smaller, makes the dataset more orderly. The entropy (great character) is computed as shown in (Equation i):

h=−∑ni=1p (xi) log2p (xi) H=-\sum_{i=1}^n P (x_i) log_2p (x_i)
(Formula One)
where n represents n categories (such as fake Set is two class problem, then n=2). The probability P1 p_1 and P2 p_2 of these 2 kinds of samples in the total sample are calculated respectively, so that the information entropy before the unselected attribute branching can be computed.

3. Naive Bayesian (Naive Bayes)

  Key formula: P (a| b) =p (A∩B) p (b) p (a| b) =\dfrac{p (A\cap b)}{p (b)}
An important application of machine learning is the automatic classification of documents. In a document category, an entire document, such as an e-mail message, is an instance, while some elements in an e-mail message form a feature. We can look at the words appearing in the document and make each word appear or not appear as a feature, so that we get as many features as possible. Suppose you have 1000 words in your vocabulary. To get a good probability distribution, you need enough data samples, assuming that the number of samples is n. By statistical knowledge, if each feature requires N samples, then for 10 features will need to N10 a sample n^{10} samples, for the vocabulary containing 1000 features will need to N1000 n^{1000} samples. As you can see, the number of samples required will increase rapidly as the characteristic trees grow.
If the features are independent of each other, then the sample tree can be from the N1000

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.