Statistical algorithm learning carding (i.)

Last Update:2017-06-11 Source: Internet

Author: User

Tags id3 svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Fragmented used a number of statistical algorithms, in this simple comb. Strive to use elevator speech law to elaborate each algorithm model (this is the first mourning, finally. hehe). But I do not understand the deep, but also need to further efforts. It is more important to reuse the wisdom of others.

Statistical Learning Overview

About statistical learning. First of all, we recommend a book "Statistical Learning Methods" written by Hangyuan Li Teacher. Here is a sentence to define statistical learning: Statistical learning (statistical learning) is about the computer based on the probability model of data building and the use of models to predict and analyze the data of a subject. It can be seen that there are two important points in statistical learning: data, probabilistic models.

There are three elements in the statistical learning method: model, strategy. Algorithm. The model refers to the probability function or decision function to be learned. A strategy is a benchmark or guideline that we define, and then we can learn or choose the optimal model (without the strategy we cannot judge the model and make a choice).

Algorithms are the detailed methods used in learning.

1. K Nearest Neighbor method

This section briefly introduces the K nearest neighbor method from the following two parts: the thought of K-nearest neighbor method and three elements.

The thought of K-nearest neighbor law.

K-Nearest Neighbor method is a main classification method, which can also be used for regression.

This method uses data to explain the "flock together, birds of a Feather" implication: Assume that a sample in the feature space in the K most similar (that is, the most adjacent in the feature space) the majority of the sample belongs to a category. The sample is also part of this category. Most of your K friends are rich, so you're rich too (and of course there's a problem with accuracy, HH).

Three of elements.

From the thought of K-nearest neighbor, it also embodies three elements: distance measurement, that is, what kind of standard defines a person's distance from your relationship. And then decide if it's your friend. The choice of K value. That is to see a few friends to guess your situation; classification decision, the general choice of majority voting law. That is, most of the money. You're a rich man (do you want a different classification strategy?). K A friend has a rich, then decided this belongs to the rich, also can AH).

2, Clustering

In this section, we briefly introduce two kinds of clustering algorithms: Hierarchical clustering, K-mean clustering.

< Span style= "Font-family:microsoft Yahei; Color: #333333 "> Hierarchical clustering constructs a hierarchical structure of a group by continuously merging the most similar groups. In the process of each iteration. The Hierarchical clustering algorithm calculates the distance between 22 groups and merges the nearest two groups to form a single group.

< Span style= "Font-family:microsoft Yahei; Color: #333333 "> K mean method (which used to be the same algorithm that Kmeans and KNN refer to, HH). The working procedure of the K-means algorithm is illustrated by the following example: First, the random selection of K objects from N data Objects as the initial cluster center. And for the rest of the other objects. are based on their similarity to these clustering centers (distance). Assign them to the clusters that are most similar (the cluster centers represent). The cluster center (the mean value of all the objects in the cluster) is then computed for each new cluster, and the process is repeated until the standard measurement function has started to converge. Mean variance is generally used as the standard measurement function. K clusters have the following characteristics: Each cluster itself as compact as possible, and the clustering between the two as far as possible separation.

3, Naive Bayes

Naive Bayesian method is a classification method based on Bayesian theorem and conditional independence. The basic process is: for a given training data set, first based on the characteristics of the conditional independence if the learning/input/output joint probability distribution, and then based on the model for the given input input x. By using the posterior probability of each classification of Bayesian theorem, the maximum probability of selecting the classification is output Y.

There are several words in this block: Bayes theorem, characteristic condition independence if, priori probability, posterior difficulty probability, joint probability.

The encyclopedia has some textual interpretations of Bayes ' theorem: usually. When event a occurs, the probability of B occurring is not equal to that of B, and the Bayesian theorem is used to describe the relationship between the two. So what's the relationship between the two? is a formula, Bayesian formula/theorem.

Prior probability: For an instance that is currently being inferred, you do not understand whatever knowledge it is, and only based on previous experience to infer which class the current instance belongs to.

Posterior probability: A conditional probability that is calculated after knowing some knowledge of the current instance to be inferred.

Conditional independence if: When multiple features are used to infer the category of an instance, it is assumed that this feature is independent.

Joint probability: Represents the probability that two events occur in conjunction with each other.

4, Support vector machine

Support Vector Machine (SVM) is a two classification model. is a generic term for a set of methods. Read some of the information. But there is no depth to deduce at the moment. Just say my two-point impression: First, support vector machine in the classification of the standards pursued. Second, what to do when the given data is "non-divided".

There are very many black spots and white spots in the two-dimensional space. If these points can be separated by black and white. Support Vector machines not only find a line to separate it, but also find a line that is the most spaced.

As for how to calculate the maximum interval? Straight point of view, there are two points, this line is a two-point connection of the perpendicular bisector.

As for the number of dimensions higher, the most spaced lines are correspondingly super-planar.

What if the data are "not divided"? The data is mapped into higher dimensional data spaces for separation. The line with the largest corresponding interval becomes the super plane.

5, Maximum entropy model

This section briefly describes the following two parts: entropy, maximum entropy principle.

Entropy is the unit of measurement of information. Given a symbol, how do you evaluate the amount of information it contains? Shannon first mentioned the concept of information entropy in the field of communication, which is mainly defined by its calculation formula.

Maximum entropy principle, which is a criterion of probability model learning. The maximum entropy principle feels. If there are very many models that satisfy the current conditions, then the model with the largest entropy is the optimal one.

Intuitively, in satisfying the known knowledge condition, the knowledge of unknown condition does not make no matter what the prediction, such as probability treatment.

6, decision tree model

Decision tree is a main classification method and regression method, and its learning usually consists of three steps: Feature selection, decision tree generation, decision tree pruning.

In this section, we briefly introduce two parts: the basic concept of decision tree and the ID3 algorithm.

Decision tree model is a kind of tree structure describing the classification of examples. It consists of a node and a forward edge.

There are two types of nodes: internal nodes and Edge nodes. The Internal node table is a feature or attribute, and the leaf node represents a classification. The process of classifying with decision tree: from the root node shape, such as the example of a certain characteristic test, according to test results, the instance is assigned to one of its child nodes. This child node corresponds to a characteristic value. Then the instance is tested and assigned recursively, until the leaf node is reached.

It can be seen from the descriptive narrative of the decision tree that the more closely the characteristic of the root node is, the greater the degree of differentiation of the classification , while the root node is the most distinguished. ID3 algorithm is the recursive selection of the most differentiated characteristics of the current node and finally formed a decision tree, the role of ID5 algorithm is the same. It's just that the difference between the two is a different measure of the characteristics.

7, Hidden Markov model

The Hidden Markov model (Hidden Markov MODEL,HMM) is a statistical model. It is used to describe a Markov process containing unknown parameters. The difficulty is to determine the implied parameters of the process from the observable parameters, and then use these parameters for further analysis. my understanding is the same as the description of the hidden horse. This scenario can be used with hidden horses only if the event is observed by guessing the hidden events. We can see that the hidden horse thought has been applied to a lot of fields by flipping through the information of the net paper. Here is my simple understanding of three areas: speech recognition, through the ability to observe the acoustic characteristics of the perception of the hidden text sequence; machine translation. Guessing the second language sequence in a hidden language by observing a linguistic sequence of words that can be perceived. Part-of-speech tagging, which detects the sequence of parts of speech through the ability to observe the perceptual phrase sequence.

Impressions

After trying to learn Chinese word segmentation, pre-measurement and text categorization, I feel that I know some of the above-mentioned models, and want to simply describe these models with words.

But open the blog, start writing, it feels too difficult!

There is quite a fraction of the probability that, apart from the formula, there is no way to explain it. There are some models that just get the idea, and then use some out-of-the-box toolkits to achieve their effects, up to a tune. But when it comes to explaining things, I find nothing to say!

Originally intended to use an afternoon time to write a simple, but the model listed, do not know how to do, habitual copy, stickers.

But found that there is no end of the thing, affixed to the picture!

And each model is very much and better than the blog summary, and oneself in one of the model written in a good.

So later to the positioning of their own, for a model, you can follow their own understanding, remember something to see, and then say. On this basis, try to make it clear.

At last, I didn't learn to get home by myself. Keep trying!

learning material: &NBSP;

" Statistical learning method "Li Hang &NBSP;

"Collective intelligence programming." Programming Collective inteligence "

Shes Yan's blog. K-means algorithm and KNN (k near neighbor) algorithm

Understanding the three-layer situation of SVM-an introduction to support vector machine popular

Statistical algorithm learning carding (i.)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More