Scikit-learn Atlas of Machine learning

Source: Internet
Author: User

Scikit-learn is a very popular open source library in the field of machine learning, written in the Python language. Free to use.
Website: http://scikit-learn.org/stable/index.html

There are a lot of tutorials, programming examples. And also made a good summary, the following figure summarizes the traditional machine learning field of most theories and related algorithms.

We can see that machine learning is divided into four chunks, namely classification (classification), Clustering (clustering), regression (regression), dimensionality reduction (dimensionality reduction).

Given a sample feature x , we want to predict its corresponding property value y If y is discrete, then this is acategoryProblem, conversely, if y is a continuous real number, which is areturnProblem.

If a set of sample characteristics is given S={x∈ R D } , we don't have a corresponding y , but to explore the set of samples D The distribution of dimensional spaces, such as the analysis of which samples are closer to each other and which samples are far apart, isClusteringProblem.

If we want to use the subspace with lower dimensionality to represent the original high-dimensional feature space, then this is the dimensionality reduction problem.

Classification & Regression

Whether it's classification or regression, it's about building a predictive model. H , given an input x , you can get an output y :
y=H(x)

The difference is only in the classification problem, y is discrete; And in the regression problem, y is continuous. So the learning algorithms for both kinds of problems are very similar. So on this graph, we see that the learning algorithms used in the classification problem can also be used in regression problems. The most common learning algorithms for classification problems include SVM (support vector machine), SGD (random gradient descent algorithm), Bayes (Bayesian estimation), Ensemble, KNN, etc. The regression problem can also use SVR, SGD, Ensemble and other algorithms, as well as other linear regression algorithms.

Clustering

Clustering is also an attribute of the analysis sample, somewhat similar to classification, and the difference is that classification is known before predicting y Span style= "Display:inline-block; width:0px; Height:2.279em; " > Scope, or know exactly how many categories, and clustering is not aware of the scope of the property. So classification is also often called supervised learning, and clustering is called unsupervised learning.
Clustering does not know the attribute range of the sample beforehand, it can only analyze the properties of the sample based on the distribution of the sample in the feature space. This problem is generally more complex. The commonly used algorithms include K-means (K-means), GMM (Gaussian mixture model) and so on.

dimensionality reduction

Dimensionality reduction is another important field of machine learning, there are many important applications in dimensionality reduction, the dimension of features is too high, it will increase the burden and storage space of training, dimensionality reduction is the redundancy that wants to remove the feature, and the feature is represented by less dimension. The most fundamental of the dimensionality reduction algorithm is PCA, and many of the algorithms are based on PCA.

Scikit-learn Atlas of Machine learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.