Machine learning and Pattern Recognition Learning Summary (i.)

Last Update:2015-04-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Fortunately with the last two months of spare time to "statistical machine learning" a book a rough study, while combining the "pattern recognition", "Data mining concepts and technology" knowledge point, the machine learning of some knowledge structure to comb and summarize:

Machine learning consists of two major issues 1, what to learn, 2, how to learn.

First of all to comb what to learn

I. What to learn1. What problems do you want to solve? Machine learning mainly solves the following three types of problems:

a) Supervised learning issues: given an input and output set (that is, a sample collection of manually tagged samples), the data set is used to train a selected model, and the resulting model can predict its output for the new input. The specific prediction tasks include classification, labeling and regression problems.  B) Semi-supervised learning problem: The model is trained using a sample set of partially manually labeled samples and some samples that are not manually labeled, and the resulting model can predict its output for the new input.  C) Unsupervised learning problems: learning from samples that have not been manually labeled to uncover structural knowledge in the data. Cluster analysis and correlation analysis are all of this kind of problem.

2. What model to study: Select practical models and solutions for specific problems

  The following is a list of the basic models of various learning problems, and the models used in practical applications have been improved on these basic models for specific business requirements. The basic model of the   Labeling problem includes: Hidden Markov, conditional random field.   regression problem: Neural network, decision regression tree, logistic regression, and normal linear regression model   c)          unsupervised learning issues: These include clustering models and association analysis models. In the Problem of association analysis, frequent model mining (discovering the sub-structure frequently appearing in data set) and association rule Mining (often used in shopping cart commodity analysis) are common. Clustering problem mainly from four aspects of cluster mining (1), based on the partition clustering model: K mean, K center point, the principle is mainly based on the similarity of attributes (2) Hierarchical clustering Model: mainly condensed clustering and the inverse process of the Method (Division division), the method is mainly used to form the cluster and division of ethnic groups. (3) Density-based method: The disadvantages of the above (1) (2) method are difficult to find the structure with arbitrary shape in clustering, the density-based method can overcome this shortcoming, and use the high-density Unicom region to identify the clustering structure (which can be used to preprocess the character image in the image processing OCR recognition). (4) A grid-based approach.

A) The generation model (naive Bayesian, neural network) used to supervise the classification problem of learning, discriminant model (k nearest neighbor, Perceptron, decision tree, Logistic regression, SVM, boost, etc.).

Secondly, the basic understanding of the problem. After selecting a model, you need to solve the problem of how the model learns:

second, how to learn

1. Collect data, preprocess data, extract features: preprocessing data usually needs to fill or remove missing values, outliers, and also include appropriate transformations of the original data (e.g. PCA, ICA, wavelet transform, FFT, etc.), as well as conversion of data format and size ( Compress a high-definition image into a fixed-size, specified format, as in processing.

2. What algorithm is used to solve and optimize the model: different models and algorithms determine the cost and timeliness of system learning. Common optimization algorithms include gradient descent algorithm, Newton method, Quasi-Newton method, LM algorithm, and constrained solution algorithm using Lagrange duality. In the process of building the model according to the different needs of the model optimization criteria corresponding method (the distribution parameter estimation using the maximum likelihood method, the implicit variable estimation using the EM method, the decision tree to solve the use of information gain method, etc.), the different model objects their optimization criteria are different, this process is worth in-depth study. At the same time, in order to avoid overfitting as much as possible, the regularization method is usually added to the model.

3. Model evaluation: After the model is solved, a certain criterion is needed to measure the quality of the model, and the commonly used evaluation indexes include: accuracy rate, recall rate, TP, FN, FP, TN, Roc Curve and area, cross-validation, etc., the regression problem will also be measured with fitting residuals and goodness of fit. Not every metric is effective, and measuring with the right metrics for your business problems is the key.

Machine learning and Pattern Recognition Learning Summary (i.)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Machine learning and Pattern Recognition Learning Summary (i.)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Machine learning and Pattern Recognition Learning Summary (i.)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support