Machine Learning Algorithm Tour

Source: Internet
Author: User
Tags svm

from:http://blog.jobbole.com/60809/

After understanding the machine learning problems that we need to solve, we can think about what data we need to collect and what algorithms we can use. In this article, we'll go through the most popular machine learning algorithms and get a general idea of which methods are available and helpful.

There are many algorithms in the machine learning field, and then there are many extensions to each algorithm, so it is difficult to determine the correct algorithm for a particular problem. In this article I want to give you two ways to summarize the algorithms that you will encounter in reality.

Learning Style

The algorithm is divided into different types based on how the experience, environment, or any data we call input is handled. Machine learning and AI textbooks usually first consider the learning methods that algorithms can adapt to.

Only a few major learning styles or learning models are discussed here, and there are several basic examples. This classification or organization is good because it forces you to think about the role of the input data and the process of preparing the model, and then choose the algorithm that best suits your problem to get the best results.

    • supervised learning: The input data is called the training data and has known results or is marked. Say whether an email is spam, or a share price over time. The model makes predictions that, if wrong, will be corrected, and the process continues until it reaches a certain correct standard for training data. Examples of problems include classification and regression problems, and examples of algorithms include logistic regression and inverse neural networks.
    • Unsupervised Learning: The input data is not marked and there are no definite results. The model sums up the structure and numerical value of the data. Examples of problems include association rule learning and clustering, and the algorithm examples include the Apriori algorithm and the K-means algorithm.
    • semi-supervised learning: The input data is a mixture of tagged and unlabeled data, with some predictive problems but the model must also learn the structure and composition of the data. Examples of problems include classification and regression, and algorithm examples are essentially extensions of unsupervised learning algorithms.
    • Enhanced Learning: input data stimulates the model and responds to the model. Feedback is obtained not only from the learning process of supervised learning, but also from rewards or punishments in the environment. Examples of problems are robot control, examples of algorithms include q-learning and temporal difference learning.

When consolidating data to simulate business decisions, most will use supervised learning and unsupervised learning methods. A hot topic for the moment is semi-supervised learning, which has a large database, but only a small number of images are marked, compared to the problem of classification. Reinforcement learning is mostly used in the development of robotic controls and other control systems.

Similarity of algorithms

The algorithm is basically categorized by function or form. For example, tree-based algorithms, neural network algorithms. This is a very useful way of classifying, but not perfect. Because there are many algorithms can easily be divided into two categories, such as learning Vector quantization is also a neural network class algorithm and an instance-based approach. Just as the machine learning algorithm itself does not have a perfect model, the algorithm's classification method is not perfect.

In this section I have listed the algorithms that I think are the most intuitive methods of classifying. I have not exhausted the algorithm or the classification method, but I think it is helpful for the reader to have a general understanding. If you know that I did not list, welcome message to share. Now let's get started!

Regression

Regression (regression analysis) is concerned with the relationship between variables. It applies statistical methods, examples of several algorithms include:

    • Ordinary Least Squares
    • Logistic Regression
    • Stepwise Regression
    • Multivariate Adaptive Regression splines (MARS)
    • Locally estimated Scatterplot smoothing (loess)
Instance-based Methods

Instance Based Learning (case-based learning) simulates a decision problem, and the examples or examples used are very important to the model. This method builds a database of existing data and adds the new data, and then uses a similarity measurement method to find an optimal match in the database to make a prediction. For this reason, this approach is also known as the winner-King method and the memory-based approach. The focus of attention now is on the methods of measuring the representation and similarity of the stored data.

    • K-nearest Neighbour (KNN)
    • Learning Vector Quantization (LVQ)
    • Self-organizing Map (SOM)
Regularization Methods

This is an extension to other methods (usually the regression method), which is more advantageous to the simpler model and more adept at induction. I'm listing it here because it's popular and powerful.

    • Ridge Regression
    • Least Absolute Shrinkage and Selection Operator (LASSO)
    • Elastic Net
Decision Tree Learning

Decision tree methods (decision tree method) establishes a model based on the actual values in the data. Decision trees are used to solve induction and regression problems.

    • Classification and Regression Tree (CART)
    • Iterative Dichotomiser 3 (ID3)
    • C4.5
    • chi-squared Automatic Interaction Detection (CHAID)
    • Decision Stump
    • Random Forest
    • Multivariate Adaptive Regression splines (MARS)
    • Gradient boosting Machines (GBM)
Bayesian

Bayesian method (Bayesian approach) is a method of Bayesian theorem applied in solving classification and regression problems.

    • Naive Bayes
    • Averaged one-dependence estimators (Aode)
    • Bayesian belief Network (BBN)
Kernel Methods

The most famous of the Kernel method (kernel methods) is the support vector machines (SVM). This method maps the input data to a higher dimension, and some collation and regression problems are easier to model.

    • Support Vector machines (SVM)
    • Radial Basis Function (RBF)
    • Linear discriminate Analysis (LDA)
Clustering Methods

Clustering (clustering), in itself, describes the problems and methods. Clustering methods are typically categorized by modeling. All clustering methods use a uniform data structure to organize the information, making the most common in each group.

    • K-means
    • Expectation maximisation (EM)
Association Rule Learning

Association rule Learning (Union Rule learning) is a method used to extract laws between data, which can be used to find the connection between the huge amount of multidimensional spatial data, and these important links can be utilized by organizations.

    • Apriori algorithm
    • Eclat algorithm
Artificial Neural Networks

Artificial Neural Networks (Artificial neural network) is inspired by the structure and function of the biological neural network. It belongs to pattern matching, which is often used for regression and classification problems, but it is composed of hundreds of algorithms and variants. Some of them are classic popular algorithms (I take the deep learning out alone):

    • Perceptron
    • Back-propagation
    • Hopfield Network
    • Self-organizing Map (SOM)
    • Learning Vector Quantization (LVQ)
Deep learning

Deep Learning (Depth learning) method is a modern update of artificial neural networks. Compared with the traditional neural network, it has more complex network composition, many methods are concerned about semi-supervised learning, this learning problem has a lot of data, but it is rarely labeled data.

    • Restricted Boltzmann Machine (RBM)
    • Deep belief Networks (DBN)
    • Convolutional Network
    • Stacked Auto-encoders
dimensionality Reduction

dimensionality Reduction (dimensionality reduction), like the clustering method, pursues and leverages the uniform structure of the data, but it uses less information to generalize and describe the data. This is useful for visualizing data or simplifying data.

    • Principal Component Analysis (PCA)
    • Partial Least Squares Regression (PLS)
    • Sammon Mapping
    • Multidimensional Scaling (MDS)
    • Projection Pursuit
Ensemble Methods

Ensemble methods (Combinatorial method) consists of a number of small models that are independently trained to make independent conclusions, and finally form a general prediction. Many studies focus on what models are used and how they are grouped together. This is a very powerful and popular technology.

    • Boosting
    • bootstrapped Aggregation (Bagging)
    • AdaBoost
    • Stacked Generalization (blending)
    • Gradient boosting Machines (GBM)
    • Random Forest

This is an example of fitting using a combination method (from a wiki), each fire-fighting method is grayed out, and the final result of the synthesis is red.

Other resources

This trip to machine learning algorithms is intended to give you a general idea of what algorithms and associated algorithms you have.

Here are some other resources, please do not feel too much, understand the more algorithms are good for you, but a deep understanding of some algorithms can also be useful.

    • List of machine learning algorithms: This is a wiki of resources, although very full, but I think the classification is not very good.
    • Machine learning Algorithms Category: This is also the resource on the wiki, slightly better than the above, in alphabetical order.
    • CRAN Task View:machine Learning & Statistical Learning: The R Language Extension pack for machine learning algorithms, and see what's better for you to know what others are using.
    • TOP algorithms in Data Mining: This is the published article (Published article) and is now a book, including the most popular data mining algorithms. Another basic algorithm list, the algorithms listed here are a lot less and help you learn more deeply.

Machine Learning Algorithm Tour

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.