Summary of machine learning methods

Source: Internet
Author: User
Tags svm

Source: http://biostar.blog.sohu.com/61246458.html

Training: Training--Feature selection--training--classifier

Category: New samples--Feature selection----and verdict



Most of the initial data mining classification applications are based on these methods and the algorithm built on the basis of memory. Currently, data mining methods require the ability to handle large-scale data sets based on external memory and have scalability. Here is a brief introduction to several major classification methods:



(1) Decision tree



Decision Tree Induction is a classical classification algorithm. It constructs a decision tree using a top-down recursive conquer approach. Each node of the tree uses the information gain metric to select the test properties. Rules can be extracted from the resulting decision tree.



(2) Knn method (K-nearest Neighbor)

The Knn method, the K nearest neighbor method, was originally proposed by cover and Hart in 1968, which is a theoretically mature method. The idea of this method is very simple and intuitive: if a sample in the feature space of the K most similar (that is, the most adjacent in the feature space) of the majority of the sample belongs to a category, then the sample belongs to this category. This method determines the category to which the sample is to be divided based on the category of the nearest one or several samples in the categorical decision-making.

Although the KNN method relies on the limit theorem in principle, it is only related to a very small number of adjacent samples in the class decision. Therefore, the problem of sample imbalance can be better avoided by using this method. In addition, the KNN method is more suitable than other methods, because the KNN method mainly relies on the surrounding finite sample, rather than the Discriminant class domain method to determine the category of the class, so it is more appropriate for the sample set to be divided by the cross or overlap.

The disadvantage of this method is that it is computationally large, because each text to be classified is calculated from its distance to all known samples in order to obtain its K nearest neighbor points. At present, the common solution is to pre-edit the known sample points in advance to remove the small sample of the role of classification. In addition, there is a reverse KNN method, which can reduce the computational complexity of KNN algorithm and improve the efficiency of classification.

This algorithm is suitable for the automatic classification of the class domain with large sample capacity, while those with smaller sample capacity are more prone to error points.



(3) SVM method

SVM (Support vector machine) method, which was proposed by Vapnik and other people in 1995, has relatively good performance index. This method is a machine learning method based on statistical learning theory. Through the learning algorithm, SVM can automatically find out those support vectors which have good distinguishing ability of classification, so the classifier can maximize the interval between class and class, so it has better adaptability and higher registration rate. This method only needs to determine the final classification result by the category of boundary samples of various domains.

The aim of the SVM algorithm is to find a super plane H (d), which can separate the data of the training concentration and the maximum distance from the boundary of the class domain perpendicular to the plane direction, so the SVM method is also called the maximal edge (maximum margin) algorithm. Most of the samples in the sample set are not support vectors, removing or reducing these samples has no effect on the classification results, and SVM method has better classification results for the automatic classification in small sample cases.



(4) Vsm method

The VSM method, the vector space model, was proposed by Salton and others in the late 60. This is the first and most famous mathematical model of information retrieval. The basic idea is to represent the document as a weighted eigenvector: D=d (t1,w1;t2,w2; ...). ; tn,wn), and then determine the category of the sample to be divided by calculating the text similarity method. When text is represented as a spatial vector model, the similarity of the text can be expressed by the Nebilai between the eigenvectors.

In practical application, VSM is generally based on the training sample and classification system of corpus to establish the class vector space in advance. When it is necessary to classify a sample to be divided, we only need to calculate the similarity between the sample and each class vector, i.e. the inner product, then select the category with the most similarity as the corresponding category of the sample to be divided.

Due to the need to calculate the spatial vector of the class in advance, the establishment of the space vector depends heavily on the feature items contained in the class vector. According to the study, the more non-0 feature items contained in a category, the weaker the ability of each feature item to be expressed for a category. Therefore, the VSM method is more suitable for the classification of professional literature than other classification methods.



(5) Bayes method

Bayes method is a kind of pattern classification method in the case of known prior probability and class conditional probability, and the classification result of the sample will depend on the whole of the sample in various fields.

Set the training sample set into M class, recorded as c={c1,...,ci,...cm}, each class has a prior probability of P (CI), i=1,2,...,m. When the sample set is very large, you can think of P (ci) = CI class Sample number/Total sample count. For a sample X to be divided into the CJ class conditional probability is P (x/ci), according to the Bayes theorem, the posterior probability P (ci/x) of the CJ class can be obtained:

P (ci/x) =p (X/CI) · P (CI)/P (x) (1)

If P (ci/x) =MAXJP (cj/x), i=1,2,...,m,j=1,2,...,m, there is x∈ci (2)

Formula (2) is the maximum posteriori probability decision criterion, and the formula (1) substituting (2) is:

If P (x/ci) p (CI) =maxj(p (X/CJ) p (CJ)),i=1,2,...,m,j=1,2,...,m, the X∈ci

This is the commonly used Bayes classification decision criterion. After a long period of research, Bayes classification method has been proved to be more sufficient in theory and widely used in application.

The weak link of Bayes method lies in the fact that the probability distribution of the class population and the probability distribution function (or density function) of various samples are often unknown. In order to obtain them, the samples are required to be large enough. In addition, the Bayes method requires the subject words of the expression text to be independent of each other, such conditions are difficult to satisfy in the actual text, so the method is often difficult to reach the theoretical maximum value in the effect.



(6) Neural network

The focus of neural network classification algorithm is to construct threshold logical unit, a value logical unit is an object, it can enter a set of weighted coefficients of the amount, sum them, if this and meet or exceed a certain threshold, output a quantity. If there are input values X1, X2, ..., Xn and their weight coefficients: W1, W2, ..., Wn, sum calculated xi*wi, produces the excitation layer a = (X1 * W1) + (X2 * W2) +...+ (Xi * Wi) +...+ (Xn * Wn), where x I is the frequency or other parameters of each record, WI is the weight coefficient obtained in the real-time feature evaluation model. Neural network is a learning algorithm based on the principle of empirical risk minimization, some inherent defects, such as the number of layers and neurons are difficult to determine, easy to get into local minima, there are learning phenomena, these defects in the SVM algorithm can be well solved.

Source: Http://www.cnblogs.com/zhangchaoyang

A summary of machine learning problem methods

Big class

Name

Keywords

Supervised classification

Decision Tree

Information gain

Categorical regression Tree

Gini index, χ2 statistic, pruning

Naive Bayesian

Non-parametric estimation, Bayesian estimation

Linear discriminant Analysis

Fishre discriminant, feature vector solution

K Nearest Neighbor

Similarity measurement: Euclidean distance, block distance, editing distance, vector angle, Pearson correlation coefficient

Logistic regression (two-value classification)

Parameter estimation (maximum likelihood estimation), S-type function

Radial basis function Network

Nonparametric estimation, regularization theory, S-type function

Dual propagation networks

Competitive learning without mentors, Widrow-hoff learning with mentors

Learning Vector Quantization Network

An output layer cell is connected to several competing layers of cells.

Error Reverse Propagation Network

S-type function, gradient descent method

Support Vector Machines (two-value classification)

Two-time regulation, Lagrange multiplier method, dual problem, optimization, Sequence minimization optimization, nuclear skills

Single-Layer Perceptron

Only the ability to be linearly divided

Dual hidden layer Perceptron

Enough to solve any complex classification problem

Unsupervised classification

Kmeans

Centroid

Chamelone

Graph partitioning, relative interconnection, relative tightness

BIRCH

B-Tree, CF ternary group

DBScan

Core point, density up to

EM algorithm (Gaussian mixture model)

Parameter estimation (maximum likelihood estimation)

Spectral clustering

Graph division, singular value solution. Global Convergence

Self-Organizing Map Network

Competitive learning without a mentor

Regression analysis

General linear Regression

Parameter estimation, least squares, generally not used for classification but for prediction

Logistic regression (two-value classification)

Parameter estimation (maximum likelihood estimation), S-type function

Mining Association Rules

Fp-tree

Frequent 1 itemsets, fp-tree, conditional pattern base, suffix mode

Dimension reduction

Principal component Analysis

Covariance matrix, singular value decomposition

Recommended

Collaborative filtering

Similarity measure of sparse vectors

Method subdivision

Application Sites

Parameter estimation

Maximum likelihood estimation

Linear regression. Assuming that the error satisfies a normal distribution with a mean of 0, which translates to the least squares

Logistic regression. The extremum of the likelihood function by the gradient descent iterative method

Gaussian mixture model.

Non-parametric estimation

Radial basis function Network

Independence test

Non-parametric hypothesis test

Χ2 Inspection

Feature word selection and the termination condition of categorical regression tree

Rank and test

Correlation test

Pearson correlation coefficient (assuming that x, Y is obtained from the normal distribution)

Text categorization based on vector space model, user Preferences recommendation system

Spearman rank correlation coefficient (no parameter hypothesis test)

Optimization method

Unconstrained optimization Method

Gradient Descent method

Maximum likelihood estimation (regression analysis, GMM)

Support Vector Machine

Linear discriminant Analysis

Newton iterative method and its variants

Conversion to unconstrained problem by Lagrange multiplier method when constrained

Finding eigenvalues/Eigenvectors

Power method

Linear discriminant Analysis

Dimension reduction

Singular value decomposition (for symmetric matrices only)

Principal component Analysis

Spectral clustering

Information

Information gain

Feature word Selection

Decision Tree

Mutual information

Feature word Selection

Cross Entropy

Feature word selection, modeling and simulation of rare events, multi-peak optimization problems

Kernel function

Polynomial kernel functions

Svm

RBF Network

Gaussian kernel function (radial basis function)

Bipolar Kernel function

unipolar sigmoid function

Logistic regression

BP Neural network

Covariance

Pearson correlation coefficient

Pca

EM algorithm

Gaussian mixture model

Forward Backward algorithm

Base function

Gaussian mixture model

Radial basis function Network

Smoothing algorithm

Laplace smoothing

Bayesian classification

Hidden Markov model

Good-turing Smoothing

Hidden Markov model

Evaluate problem-forward algorithm

-viterbi algorithm for decoding problem

Chinese word segmentation, pos tagging

-baumwelch Algorithm for learning problem

The cover theorem points out that the nonlinear mapping of complex pattern classification problems to high-dimensional spaces is more likely to be linearly divided than projected into low-dimensional spaces. So both SVM and RBF network try to map samples from low-dimensional space to high-dimensional space and then classify them.

The funny thing is, the other method is to reduce the input samples from high-dimensional to low-dimensional classification or regression analysis, such as PCA, SOFM Network, LDA, spectral clustering, they believe that the sample in the low-dimensional feature space has a clearer expression, easier to find the law.

Summary of machine learning methods

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.