Machine learning is undoubtedly a hot topic in the field of current data analysis. Many people use machine learning algorithms more or less in their usual work. This article summarizes common machine learning algorithms for you to reference in your work and learning.
There are many algorithms for machine learning. Many times confusing people are, many algorithms are a kind of algorithm, and some algorithms are extended from other algorithms. Here, we from two aspects to introduce to you, the first aspect is the way of learning, the second aspect is the similarity of the algorithm.
Learning Style
Depending on the type of data, there are different ways to model a problem. In the field of machine learning or artificial intelligence, people will first consider the algorithm's learning style. In the field of machine learning, there are several main ways of learning. It is a good idea to classify the algorithm according to the way of learning, which allows people to consider the best possible results by choosing the most suitable algorithm based on the input data when modeling and algorithm selection.
Supervised Learning:
Under supervised learning, the input data is called "training data", each set of training data has a clear identification or results, such as the anti-spam system "spam" "non-spam", the handwritten numeral recognition of "1", "2", "3", "4" and so on. In the establishment of the predictive model, supervised learning establishes a learning process, compares the predicted results with the actual results of the "training data", and adjusts the predictive model continuously until the predicted results of the model reach an expected accuracy rate. Common application scenarios for supervised learning such as classification problems and regression problems. Common algorithms are logical regression (logistic Regression) and reverse transfer neural networks (back propagation neural network).
non-supervised learning:
In unsupervised learning, the data is not specifically identified, and the learning model is designed to infer some intrinsic structure of the data. Common application scenarios include learning about association rules and clustering. Common algorithms include the Apriori algorithm and the K-means algorithm.
semi-supervised learning:
In this learning mode, the input data part is identified, the part is not identified, the learning model can be used for prediction, but the model first needs to learn the internal structure of the data in order to reasonably organize the data to make predictions. The application scenarios include classification and regression, and the algorithm includes some extensions to the commonly supervised learning algorithms, which first attempt to model the non-identified data, and then predict the identified data. On the inference algorithm (Graph inference) or Laplace support vector machine (Laplacian SVM).
Intensive Learning:
In this learning mode, input data as feedback to the model, unlike the monitoring model, the input data is only as a check model of the wrong way, under the reinforcement learning, the input data directly feedback to the model, the model must be immediately adjusted. Common application scenarios include dynamic systems and robot control. Common algorithms include q-learning and time difference learning (temporal difference learning)
In the case of enterprise Data application, the most commonly used is the model of supervised learning and unsupervised learning. In the field of image recognition, semi-supervised learning is a hot topic because of the large number of non-identifiable data and a small amount of identifiable data. Reinforcement learning is more used in robot control and other areas where system control is required.
Algorithmic Similarity
According to the function and form similarity of the algorithm, we can classify the algorithm, for example, tree-based algorithm, neural network based algorithm and so on. Of course, the scope of machine learning is very large, and some algorithms are difficult to classify into a certain category. For some classifications, the same classification algorithm can be used for different types of problems. Here, we try to classify commonly used algorithms in the easiest way to understand them.
Regression algorithm:
The regression algorithm is a kind of algorithm that tries to use the measurement of error to explore the relationship between variables. Regression algorithm is a powerful tool for statistical machine learning. In the field of machine learning, people talk about regression, sometimes refers to a kind of problem, sometimes refers to a kind of algorithm, which often makes beginners confused. Common regression algorithms include: least squares (ordinary Least square), Logistic regression (logistic Regression), stepwise regression (stepwise Regression), multiple adaptive regression splines (multivariate Adaptive Regression splines) and local scatter smoothing estimates (locally estimated scatterplot smoothing).
an instance-based algorithm
Instance-based algorithms are often used to model decision problems, and such models often pick up a batch of sample data and then compare the new data with the sample data based on some approximation. Find the best match in this way. Thus, instance-based algorithms are often referred to as "winner-take-all" learning or "memory-based learning". Common algorithms include K-nearest Neighbor (KNN), Learning vector quantization (learning vector quantization, LVQ), and self-organizing mapping algorithm (self-organizing map, SOM)
Regularization Method
The regularization method is the extension of other algorithms (usually the regression algorithm), which adjusts the algorithm according to the complexity of the algorithm. The regularization method usually rewards the simple model and punishes the complex algorithm. Common algorithms include: Ridge Regression, Least Absolute Shrinkage and Selection Operator (LASSO), and elastic networks (Elastic Net).
Decision Tree Learning
Decision Tree algorithm uses tree structure to establish decision-making model according to the attribute of data, and decision tree model is often used to solve classification and regression problems. Common algorithms include: Classification and regression tree (classification and Regression tree, CART), ID3 (iterative Dichotomiser 3), C4.5, chi-squared Automatic Inte Raction Detection (CHAID), decision Stump, stochastic forest (random Forest), multivariate adaptive regression spline (MARS) and gradient propulsion (Gradient boosting machine, GBM)
Bayesian Method
Bayesian algorithm is a kind of algorithm based on Bayesian theorem, which is mainly used to solve the problem of classification and regression. Common algorithms include: naive Bayesian algorithm, average single-dependency estimation (averaged one-dependence estimators, Aode), and Bayesian belief Network (BBN).
kernel-based algorithms
The most famous of kernel-based algorithms is support vector machine (SVM). The kernel-based algorithm maps the input data to a higher-order vector space, in which some classification or regression problems can be solved more easily. Common kernel-based algorithms include: Support Vector machines (SVM), Radial basis functions (Radial Basis function, RBF), and linear discriminant analysis (Linear discriminate analyses , LDA), etc.
Clustering Algorithm
Clustering, like regression, is sometimes described as a kind of problem, sometimes describing a class of algorithms. Clustering algorithms typically merge input data by either a central point or a hierarchical approach. So the clustering algorithm tries to find the intrinsic structure of the data in order to classify the data in the most common way. Common clustering algorithms include the K-means algorithm and the desired maximization algorithm (expectation maximization, EM).
Association Rule Learning
Association rule Learning finds useful association rules in a large number of multivariate datasets by finding rules that best explain the relationship between data variables. Common algorithms include Apriori algorithm and Eclat algorithm.
Artificial Neural network
Artificial neural network algorithm is a kind of pattern matching algorithm simulating biological neural network. Typically used to solve classification and regression problems. Artificial neural network is a huge branch of machine learning, there are hundreds of kinds of different algorithms. (Deep learning is one of these algorithms, which we will discuss separately), important artificial neural network algorithms include: Perceptron Neural Networks (Perceptron neural network), reverse transfer (back propagation), Hopfield network, Self-organizing mappings (self-organizing map, SOM). Learning vector quantization (learning vector quantization, LVQ)
Deep Learning
Deep learning algorithm is the development of artificial neural network. In the near future won a lot of attention, especially Baidu also began to exert deep learning, is in the domestic caused a lot of concern. In today's increasingly inexpensive computing power, deep learning attempts to build a much larger and more complex neural network. Many deep learning algorithms are semi-supervised learning algorithms used to handle large datasets with small amounts of data that are not identified. Common depth learning algorithms include: Restricted Boltzmann machines (Restricted Boltzmann machine, RBN), deep belief Networks (DBN), convolutional networks (convolutional network), Stack-type Automatic encoder (stacked auto-encoders).
reduce the dimension of the algorithm
Like the clustering algorithm, the reduced dimension algorithm tries to analyze the intrinsic structure of the data, but the reduced dimension algorithm attempts to use less information to summarize or interpret the data in an unsupervised learning way. Such algorithms can be used to visualize high-dimensional data or to simplify data for supervised learning. Common algorithms include: PCA (Principle Component Analysis, PCA), Partial least squares regression (partial Least Square regression,pls), Sammon mappings, Multidimensional scales (multi-dimensional scaling, MDS), projection tracking (Projection Pursuit), etc.
Integration algorithm:
The integrated algorithm trains the same sample independently with some relatively weak learning models, then integrates the results for overall prediction. The main difficulty of integration algorithm is how to integrate the independent weak learning models and how to integrate the learning results. This is a very powerful algorithm, but also very popular. Common algorithms include: Boosting, bootstrapped Aggregation (Bagging), AdaBoost, stacking generalization (stacked generalization, Blending), gradient pusher (Gradient Boosting machine, GBM), random forest (randomly Forest).
Machine Learning common algorithm subtotals