Summary of machine learning methods

Last Update:2015-04-12 Source: Internet

Author: User

Tags svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Source: http://biostar.blog.sohu.com/61246458.html

Training: Training--Feature selection--training--classifier

Category: New samples--Feature selection----and verdict

Most of the initial data mining classification applications are based on these methods and the algorithm built on the basis of memory. Currently, data mining methods require the ability to handle large-scale data sets based on external memory and have scalability. Here is a brief introduction to several major classification methods:

(1) Decision tree

Decision Tree Induction is a classical classification algorithm. It constructs a decision tree using a top-down recursive conquer approach. Each node of the tree uses the information gain metric to select the test properties. Rules can be extracted from the resulting decision tree.

(2) Knn method (K-nearest Neighbor)

The Knn method, the K nearest neighbor method, was originally proposed by cover and Hart in 1968, which is a theoretically mature method. The idea of this method is very simple and intuitive: if a sample in the feature space of the K most similar (that is, the most adjacent in the feature space) of the majority of the sample belongs to a category, then the sample belongs to this category. This method determines the category to which the sample is to be divided based on the category of the nearest one or several samples in the categorical decision-making.

Although the KNN method relies on the limit theorem in principle, it is only related to a very small number of adjacent samples in the class decision. Therefore, the problem of sample imbalance can be better avoided by using this method. In addition, the KNN method is more suitable than other methods, because the KNN method mainly relies on the surrounding finite sample, rather than the Discriminant class domain method to determine the category of the class, so it is more appropriate for the sample set to be divided by the cross or overlap.

The disadvantage of this method is that it is computationally large, because each text to be classified is calculated from its distance to all known samples in order to obtain its K nearest neighbor points. At present, the common solution is to pre-edit the known sample points in advance to remove the small sample of the role of classification. In addition, there is a reverse KNN method, which can reduce the computational complexity of KNN algorithm and improve the efficiency of classification.

This algorithm is suitable for the automatic classification of the class domain with large sample capacity, while those with smaller sample capacity are more prone to error points.

(3) SVM method

SVM (Support vector machine) method, which was proposed by Vapnik and other people in 1995, has relatively good performance index. This method is a machine learning method based on statistical learning theory. Through the learning algorithm, SVM can automatically find out those support vectors which have good distinguishing ability of classification, so the classifier can maximize the interval between class and class, so it has better adaptability and higher registration rate. This method only needs to determine the final classification result by the category of boundary samples of various domains.

The aim of the SVM algorithm is to find a super plane H (d), which can separate the data of the training concentration and the maximum distance from the boundary of the class domain perpendicular to the plane direction, so the SVM method is also called the maximal edge (maximum margin) algorithm. Most of the samples in the sample set are not support vectors, removing or reducing these samples has no effect on the classification results, and SVM method has better classification results for the automatic classification in small sample cases.

(4) Vsm method

The VSM method, the vector space model, was proposed by Salton and others in the late 60. This is the first and most famous mathematical model of information retrieval. The basic idea is to represent the document as a weighted eigenvector: D=d (t1,w1;t2,w2; ...). ; tn,wn), and then determine the category of the sample to be divided by calculating the text similarity method. When text is represented as a spatial vector model, the similarity of the text can be expressed by the Nebilai between the eigenvectors.

In practical application, VSM is generally based on the training sample and classification system of corpus to establish the class vector space in advance. When it is necessary to classify a sample to be divided, we only need to calculate the similarity between the sample and each class vector, i.e. the inner product, then select the category with the most similarity as the corresponding category of the sample to be divided.

Due to the need to calculate the spatial vector of the class in advance, the establishment of the space vector depends heavily on the feature items contained in the class vector. According to the study, the more non-0 feature items contained in a category, the weaker the ability of each feature item to be expressed for a category. Therefore, the VSM method is more suitable for the classification of professional literature than other classification methods.

(5) Bayes method

Bayes method is a kind of pattern classification method in the case of known prior probability and class conditional probability, and the classification result of the sample will depend on the whole of the sample in various fields.

Set the training sample set into M class, recorded as c={c1,...,ci,...cm}, each class has a prior probability of P (CI), i=1,2,...,m. When the sample set is very large, you can think of P (ci) = CI class Sample number/Total sample count. For a sample X to be divided into the CJ class conditional probability is P (x/ci), according to the Bayes theorem, the posterior probability P (ci/x) of the CJ class can be obtained:

P (ci/x) =p (X/CI) · P (CI)/P (x) (1)

If P (ci/x) =MAXJP (cj/x), i=1,2,...,m,j=1,2,...,m, there is x∈ci (2)

Formula (2) is the maximum posteriori probability decision criterion, and the formula (1) substituting (2) is:

If P (x/ci) p (CI) =maxj(p (X/CJ) p (CJ)),i=1,2,...,m,j=1,2,...,m, the X∈ci

This is the commonly used Bayes classification decision criterion. After a long period of research, Bayes classification method has been proved to be more sufficient in theory and widely used in application.

The weak link of Bayes method lies in the fact that the probability distribution of the class population and the probability distribution function (or density function) of various samples are often unknown. In order to obtain them, the samples are required to be large enough. In addition, the Bayes method requires the subject words of the expression text to be independent of each other, such conditions are difficult to satisfy in the actual text, so the method is often difficult to reach the theoretical maximum value in the effect.

(6) Neural network

The focus of neural network classification algorithm is to construct threshold logical unit, a value logical unit is an object, it can enter a set of weighted coefficients of the amount, sum them, if this and meet or exceed a certain threshold, output a quantity. If there are input values X1, X2, ..., Xn and their weight coefficients: W1, W2, ..., Wn, sum calculated xi*wi, produces the excitation layer a = (X1 * W1) + (X2 * W2) +...+ (Xi * Wi) +...+ (Xn * Wn), where x I is the frequency or other parameters of each record, WI is the weight coefficient obtained in the real-time feature evaluation model. Neural network is a learning algorithm based on the principle of empirical risk minimization, some inherent defects, such as the number of layers and neurons are difficult to determine, easy to get into local minima, there are learning phenomena, these defects in the SVM algorithm can be well solved.

Source: Http://www.cnblogs.com/zhangchaoyang

A summary of machine learning problem methods

Big class	Name	Keywords
Supervised classification	Decision Tree	Information gain
	Categorical regression Tree	Gini index, χ2 statistic, pruning
	Naive Bayesian	Non-parametric estimation, Bayesian estimation
	Linear discriminant Analysis	Fishre discriminant, feature vector solution
	K Nearest Neighbor	Similarity measurement: Euclidean distance, block distance, editing distance, vector angle, Pearson correlation coefficient
	Logistic regression (two-value classification)	Parameter estimation (maximum likelihood estimation), S-type function
	Radial basis function Network	Nonparametric estimation, regularization theory, S-type function
	Dual propagation networks	Competitive learning without mentors, Widrow-hoff learning with mentors
	Learning Vector Quantization Network	An output layer cell is connected to several competing layers of cells.
	Error Reverse Propagation Network	S-type function, gradient descent method
	Support Vector Machines (two-value classification)	Two-time regulation, Lagrange multiplier method, dual problem, optimization, Sequence minimization optimization, nuclear skills
	Single-Layer Perceptron	Only the ability to be linearly divided
	Dual hidden layer Perceptron	Enough to solve any complex classification problem
Unsupervised classification	Kmeans	Centroid
	Chamelone	Graph partitioning, relative interconnection, relative tightness
	BIRCH	B-Tree, CF ternary group
	DBScan	Core point, density up to
	EM algorithm (Gaussian mixture model)	Parameter estimation (maximum likelihood estimation)
	Spectral clustering	Graph division, singular value solution. Global Convergence
	Self-Organizing Map Network	Competitive learning without a mentor
Regression analysis	General linear Regression	Parameter estimation, least squares, generally not used for classification but for prediction
Regression analysis	Logistic regression (two-value classification)	Parameter estimation (maximum likelihood estimation), S-type function
Mining Association Rules	Fp-tree	Frequent 1 itemsets, fp-tree, conditional pattern base, suffix mode
Dimension reduction	Principal component Analysis	Covariance matrix, singular value decomposition
Recommended	Collaborative filtering	Similarity measure of sparse vectors

Method subdivision				Application Sites
Parameter estimation	Maximum likelihood estimation			Linear regression. Assuming that the error satisfies a normal distribution with a mean of 0, which translates to the least squares
				Logistic regression. The extremum of the likelihood function by the gradient descent iterative method
				Gaussian mixture model.
Non-parametric estimation				Radial basis function Network
Independence test	Non-parametric hypothesis test		Χ2 Inspection	Feature word selection and the termination condition of categorical regression tree
			Rank and test
Correlation test	Pearson correlation coefficient (assuming that x, Y is obtained from the normal distribution)			Text categorization based on vector space model, user Preferences recommendation system
	Spearman rank correlation coefficient (no parameter hypothesis test)
Optimization method	Unconstrained optimization Method	Gradient Descent method		Maximum likelihood estimation (regression analysis, GMM) Support Vector Machine Linear discriminant Analysis
		Newton iterative method and its variants
	Conversion to unconstrained problem by Lagrange multiplier method when constrained
Finding eigenvalues/Eigenvectors	Power method			Linear discriminant Analysis	Dimension reduction
	Singular value decomposition (for symmetric matrices only)			Principal component Analysis
				Spectral clustering
Information	Information gain			Feature word Selection
				Decision Tree
	Mutual information			Feature word Selection
	Cross Entropy			Feature word selection, modeling and simulation of rare events, multi-peak optimization problems
Kernel function	Polynomial kernel functions			Svm RBF Network
	Gaussian kernel function (radial basis function)
	Bipolar Kernel function
unipolar sigmoid function				Logistic regression
				BP Neural network
Covariance				Pearson correlation coefficient
				Pca
EM algorithm				Gaussian mixture model
				Forward Backward algorithm
Base function				Gaussian mixture model
				Radial basis function Network
Smoothing algorithm	Laplace smoothing			Bayesian classification Hidden Markov model
	Good-turing Smoothing
Hidden Markov model	Evaluate problem-forward algorithm
	-viterbi algorithm for decoding problem			Chinese word segmentation, pos tagging
	-baumwelch Algorithm for learning problem

The cover theorem points out that the nonlinear mapping of complex pattern classification problems to high-dimensional spaces is more likely to be linearly divided than projected into low-dimensional spaces. So both SVM and RBF network try to map samples from low-dimensional space to high-dimensional space and then classify them.

The funny thing is, the other method is to reduce the input samples from high-dimensional to low-dimensional classification or regression analysis, such as PCA, SOFM Network, LDA, spectral clustering, they believe that the sample in the low-dimensional feature space has a clearer expression, easier to find the law.

Summary of machine learning methods

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More