Machine Learning Algorithms Overview

Last Update:2015-03-21 Source: Internet

Author: User

Tags svm stock prices

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article is a translation of the article, but I did not translate the word by word, but some limitations, and added some of their own additions.

Machine Learning (machines learning, ML) is what, as a mler, is often difficult to explain to everyone what is ML. Over time, it is found to understand or explain what machine learning can be, from the perspective of the problem that machine learning can solve. For mlers, understanding the types of problems that ML solves can also help us better prepare data and select algorithms.

10 examples of machine learning problems

Want to get started machine learning students, often go to see some introductory books, such as "collective Wisdom Programming", "machine learning Combat", "Data Mining", "recommendation system practice" and so on. In the process of reading, you will see the following examples:

Junk e-mail identification
Credit card transaction anomaly detection
Handwritten digit recognition
Speech recognition
Human Face Detection
Product recommendation
Disease detection (according to previous case records, determine if the patient is ill)
Stock forecasts
User classification (determines whether the user will be converted to a paid user based on user behavior)
Shape detection (determines what shape the user is drawing, based on the shape the user draws on the tablet)

So, when someone asks what ML is, it can be said that ML can be handle, and this problem ml can be handle,blahblah.

Machine learning problem Types

The advantage of classifying the problem is that you can better grasp the nature of the problem and know what type of algorithm to use.

Generally there are four types:

Classification (classification): There are some categories of data that have been labeled well, modeled on well-labeled data, and judged by the new sample. such as junk e-mail identification
Regression (regression): There are some well-labeled data, labeling values and classification problems, the classification problem is a discrete value of the label, and the regression problem is the label is a real number, on the well-labeled data modeling, for the new sample, to get its label value. such as stock forecasts.
Clustering (clustering): Data is not labeled, but there are some similarity metrics that can be used to classify data according to these criteria. For example, in a pile of photos that do not give a name, the photos of the same person are automatically gathered together.
Rule extraction: Discover the statistical relationships between attributes in the data, not just predict things. such as beer and diapers.

Machine learning Algorithms

Know the machine learning to solve the problem, you can think about a problem, the type of data to be collected and the machine learning algorithm that can be used, machine learning developed to today, the birth of a lot of algorithms, in practical applications often the problem lies in the choice of algorithms, in this paper, using two criteria to classify the algorithm, That is, the similarity between learning methods and algorithms.

Learning Style (learning style)

In ML, there are only a few mainstream ways of learning, and in the following introduction, some examples of algorithms and problems are used to explain these methods. The classification of machine learning algorithms according to learning mode can make us think more about the role of input data in the algorithm and the preparation required before using the model, which is very helpful for us to choose the most suitable model.

Supervised learning (supervised learning): input data has a category tag or result marker, called training data, such as spam and non-spam, stock prices at a certain point in time. The model is obtained by the training process, and the model can be used to speculate on the new samples, and the accuracy of these predictions can be calculated. Training process often needs to achieve a certain degree of accuracy in the training set, do not owe fit or overfitting. The general problem of supervised learning is classification and regression, which represent the algorithm of logical bottom regression (logistic Regression) and neural network back propagation algorithm (propagation neural networks).
Unsupervised learning (unsupervised learning): the input data does not have any markup, and the model is constructed by inferring the existing structure in the data. The general problem is rule learning and clustering, which represent algorithms with Apriori algorithm and K-means algorithm.
Semi-supervised learning (semi-supervised learning): The input data is a mixture of callout data and non-callout data, it is also to solve the prediction problem, but the model must take into account the existing structure of the learning data and make predictions, namely the above supervised learning and unsupervised learning integration. The problem of this method is still the regression of classification, and the representative algorithm is usually extended on the supervised learning algorithm, so that it can model the unlabeled data.
Enhanced Learning (reinforcement learning): In this learning mode, the model is first constructed, then the input data stimulation model, the input data is often from the environment, the results of the model is called feedback, using feedback to adjust the model. It differs from supervised learning in that the feedback data is more from the environmental feedback than by the person specified. This approach solves the problem of system-to-robot control, which represents the Q-Learning (q-learning) and sequential difference algorithm (temporal difference learning).

In business decision-making, the general approach is to supervise learning and unsupervised learning. The next hot topic is semi-supervised learning, for example, in image classification, there are a lot of data sets that have a small amount of tag data and a lot of non-tagged data. Enhanced learning is more used in other control systems for robotic control machines.

Algorithm similarity (algorithm similarity)

The algorithm is usually divided according to the pattern of the model or the similarity of the function pattern. such as Tree-based methods (Tree-based method) and neural network algorithms (neural Networks). Of course, this method is not perfect, because many algorithms can be easily divided into multiple categories, such as learning vector quantization algorithm (learning vector quantization) is both a neural network algorithm and a sample-based algorithm (instance-based method). In this article, you can see a number of different classification methods.

Regression (Regression)

Regression is the construction of a model between an independent variable and a variable that needs to be predicted, and the iterative approach is used to gradually reduce the error between the predicted value and the true value. Regression method is a kind of statistical machine learning
The usual regression algorithms are as follows:

Ordinary Least squares (least squares)
Logistic Regression (logic bottom regression)
Stepwise Regression (stepwise regression)
Multivariate Adaptive Regression splines (Multivariate Adaptive Regression spline method)
Locally estimated Scatterplot smoothing (local weighted scatter smoothing method)

Sample-based approach (instance-based Methods)

The sample-based approach requires a sample library that, when a new sample appears, finds several samples of the best match in the sample library and then makes a guess. The method based on the sample is also considered as the winner-King method and memory-based learning, the algorithm mainly focuses on the calculation method of similarity between samples and the representation of stored data.

K-nearest Neighbour (KNN)
Learning Vector Quantization (LVQ)
Self-organizing Map (SOM)

Regularization method (regularization Methods)

This is an extension to other methods (usually the regression method), and this extension is the addition of a penalty on the model, which is equivalent to the fact that the more simple the model is, the more advantageous it is to prevent overfitting, and more adept at induction. I'm listing it here because it's popular and powerful.

Ridge Regression
Least Absolute Shrinkage and Selection Operator (LASSO)
Elastic Net

Decision Trees Model (decision tree Learning)

The decision tree method establishes a model based on the actual value of the attributes in the data. Decision trees are used to solve induction and regression problems.

Classification and Regression Tree (CART)
Iterative Dichotomiser 3 (ID3)
C4.5
chi-squared Automatic Interaction Detection (CHAID)
Decision Stump
Random Forest
Multivariate Adaptive Regression splines (MARS)
Gradient boosting Machines (GBM)

Bayesian (Bayesian)

Bayesian method is used to solve the problem of classification and regression by applying Bayesian theorem.

Naive Bayes
Averaged one-dependence estimators (Aode)
Bayesian belief Network (BBN)

Nuclear method (Kernel Methods)

The most famous kernel method is support vector machines (SVM). This approach maps input data to higher dimensions and makes it easier to model collation and regression problems.

Support Vector machines (SVM)
Radial Basis Function (RBF)
Linear discriminate Analysis (LDA)

Cluster (clustering Methods)

Clustering itself describes the problems and methods. Clustering methods are usually categorized by modeling methods such as center-based clustering and hierarchical clustering. All clustering methods use the intrinsic structure of the data to organize the data so that the points within each group have the greatest commonality.

K-means
Expectation maximisation (EM)

Joint Rules Learning (Association rule Learning)

Joint rule learning is a method used to extract the law between data, which can be used to find the connection between the huge amount of multidimensional spatial data, and these important links can be organized to use or profit.

Apriori algorithm
Eclat algorithm

Artificial neural Network (Artificial neural Networks)

The artificial neural network, which is inspired by the structure and function of the biological neural network, belongs to pattern matching, which is often used for regression and classification problems, but it has hundreds of algorithms and variants. Some of them are classic popular algorithms (deep learning comes out separately):

Perceptron
Back-propagation
Hopfield Network
Self-organizing Map (SOM)
Learning Vector Quantization (LVQ)

Deep Learning (Deepin learning)

Deep Learning (Depth learning) method is a variant of artificial neural network in the present. Compared to traditional neural networks, it focuses more on more complex network composition, many of which are concerned with semi-supervised learning, which is a problem with only a small amount of data in a large data set.

Restricted Boltzmann Machine (RBM)
Deep belief Networks (DBN)
Convolutional Network
Stacked Auto-encoders

dimensionality reduction (dimensionality Reduction)

Similar to the clustering method, the inherent structure in the data is utilized, and unsupervised methods are used to learn a way to summarize and describe the data with less information. This is useful for visualizing data or simplifying data, as well as for removing noise, often using this method to make the algorithm more efficient.

Principal Component Analysis (PCA)
Partial Least Squares Regression (PLS)
Sammon Mapping
Multidimensional Scaling (MDS)
Projection Pursuit

Combination method (Ensemble Methods)

Ensemble methods (Combinatorial method) consists of a number of small models, which are independently trained to make independent conclusions, which are summed up to form final predictions. The study of combinatorial methods focuses on what models are used and how they are grouped together.

Boosting
bootstrapped Aggregation (Bagging)
AdaBoost
Stacked Generalization (blending)
Gradient boosting Machines (GBM)
Random Forest

Original link

A tour of the machine learning algorithms
Practical machine learning Problems

Machine Learning Algorithms Overview

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More