In this article, we can have a common sense of the commonly used algorithms of ML, no code, no complex theoretical derivation, is to illustrate, know what these algorithms are, how they are applied, examples are mainly classification problems.
Each algorithm has seen several video, pick out the most clear and interesting to speak, easy to science.
Then there is time to parse the individual algorithm in depth.
Today's algorithm is as follows:
- Decision Tree
- Random Forest algorithm
- Logistic regression
- Svm
- Naive Bayesian
- K Nearest Neighbor algorithm
- K-mean-value algorithm
- Adaboost algorithm
- Neural network
- Markov
1. Decision Tree
According to some feature, each node asks a question, through judgment, divides the data into two categories, and then continues to ask questions. These problems are based on the existing data learned, and then put in new data, you can according to the problem of this tree, the data will be divided into the appropriate leaves.
2. Random Forest
Video
Randomly select data in the source data to form several subsets
The S matrix is the source data, with 1-n data, a B c is feature, and the last column C is the category
M-sub-matrices randomly generated by S
This m subset gets m decision trees
Put the new data into this M-tree, get m-classification results, count to see which type of the number of the most, this category as the final prediction results
3. Logistic regression
Video
When the predicted target is the probability that the range needs to meet greater than or equal to 0, less than or equal to 1, this time the simple linear model can not be done, because in the definition of the domain is not within a certain range, the value range is beyond the specified interval.
So it's better to have a model of this shape at this point.
So how do you get such a model?
This model needs to meet two conditions greater than or equal to 0, less than or equal to 1
Models greater than or equal to 0 can choose absolute values, squares, and exponential functions, which must be greater than 0
Less than or equal to 1 with division, the numerator is itself, the denominator is itself plus 1, that must be less than 1 of the
Once again, the logistic regression model is obtained.
The corresponding coefficients can be obtained by calculating the source data.
Finally get the graph of the logistic
4. SVM
Video
Support Vector Machine
To separate the two classes, want to get a super-plane, the best super-plane is to the two types of margin to reach the maximum, margin is the ultra-plane and distance from its nearest point, such as, Z2>Z1, so the green super-plane is better
This super-plane is represented as a linear equation, above the line above the class, is greater than or equal to 1, another class less than or equal to 1
The distance from the point to face is calculated based on the formula in the diagram
So the expression of total margin is as follows, the goal is to maximize the margin, you need to minimize the denominator, and then become an optimization problem
Take a chestnut, three points, find the best super plane, define the weight vector= (2,3)-(
Get weight vector for (a,2a), the two points into the equation, substituting (2,3) Another value = 1, substituting (first) another of its value =-1, to solve the value of a and intercept moment W0, and then get the expression of the super-plane.
After a is obtained, the surrogate (A,2A) is the support vector.
The equation for the into hyperspace plane of a and w0 is the support vector machine
5. Naive Bayes
Video
For an application in NLP
Give a paragraph of text, return emotional classification, this paragraph of the attitude of the text is positive, or negative
To solve this problem, you can just look at some of these words
This text, will only be represented by some words and their count
The original question is: give you a word, it belongs to what kind of
Through the Bayes rules becomes a relatively simple and easy to obtain problem
The question becomes, what is the probability of this sentence appearing in this category, of course, don't forget the other two probabilities in the formula
Chestnut: The probability that the word love appears in the case of positive is 0.1, the probability of appearing in negative case is 0.001
6. K Nearest Neighbor
Video
K Nearest Neighbours
Given a new data, which category is more than the nearest K-point, which type of data belongs to
Chestnut: To distinguish between a cat and a dog, by claws and sound two feature to judge, the circle and the triangle is known to classify, then this star represents what kind?
K=3, the three-line link is the nearest three points, so the circle is more, so this star is a cat
7. K Mean value
Video
Want to divide a set of data into three categories, with a large pink value and a small yellow value
The most happy first initialization, which selected the simplest 3,2,1 as the initial value of the various
The rest of the data, each with three initial values to calculate the distance, and then classified to its nearest initial value in the category
After the class is divided, the average of each class is calculated as the center point of the new round
After a few rounds, the grouping no longer changes, you can stop
8. Adaboost
Video
AdaBoost is one of Bosting's methods.
Bosting is to consider several classifiers which are not well classified, and get a better classifier.
, the left and right two decision trees, a single look is not very good, but the same data into the two results added up to consider, will increase credibility
AdaBoost chestnut, handwriting recognition, on the artboard can grab a lot of features, such as the direction of the starting point, the distance between the starting point and the end, etc.
Training, will get each feature weight, such as the beginning of 2 and 3 very much like, this feature to the classification play a small role, its weight will be small
And this alpha angle has a strong recognition, the weight of the feature will be larger, the final result is a comprehensive consideration of these feature results
9. Neural Networks
Video
Neural Networks suitable for one input may fall into at least two categories
NN consists of several layers of neurons, and the connections between them
The first layer is the input layer, and the last layer is the output layer
Both the hidden layer and the output layer have their own classifier
Input into the network, is activated, the calculated score is passed to the next layer, activating the back of the nerve layer, the final output layer of the nodes on the node on behalf of a variety of fractions, example to get the classification result of Class 1
The same input is transferred to different nodes and the results are different because the respective nodes have different weights and bias
This is forward propagation.
10. Markov
Video
Markov Chains is made up of state and transitions
Chestnuts, according to the phrase ' The quick brown fox jumps over the lazy dog ', to get Markov chain
Step, set each word to a state, and then calculate the probability of transitions between states
This is the probability of a sentence calculation, when you use a lot of text to do statistics, you will get a larger state transfer matrix, such as the following can be connected to the word, and the corresponding probability
In life, the alternative result of keyboard input method is the same principle, the model will be more advanced
Easy to read machine learning ten common algorithms