Advantages and disadvantages of common machine learning algorithms and its application summary

Source: Internet
Author: User
Tags svm

First, K-means Clustering algorithm

Advantages:

(1) Simple principle, easy to achieve, fast convergence speed

(2) Spherical boundary effect is better

Disadvantages:

(1) k value is not good to grasp

(2) Non-spherical boundary effect is poor

(3) More sensitive to noise and anomalies.

Application:

(1) Used by most search engines to cluster Web pages by similarity, and to identify the correlation rate of search results, helping search engines reduce user computing time

(2) User portrait

(3) Exploring the inside of a data set

(4) for data discrete compression

(5) Dealing with data imbalance issues (sample imbalance topic detail)

Measurement of Similarity:

(1) discrete variable with Manhattan distance

(2) Continuous variable with European distance

(3) The text uses the cosine similarity or the J-card coefficient

Second, support vector machine

Advantages:

(1) Nonlinear problems can be solved by kernel function

(2) It is very effective to solve the high dimension feature problem, which is still effective when the feature dimension is larger than the sample number.

(3) High classification accuracy rate and strong generalization ability

Disadvantages:

(1) The characteristic dimension is much larger than the sample number, the performance is general (Dimension disaster)

(2) When the SVM is large in sample size, the kernel function mapping dimension is very high, the computational amount is too large.

(3) There is no uniform standard for nuclear function selection

(4) Not suitable for big data age large sample

(5) SVM is a two-yuan classification algorithm, although it is extended to support multi-classification, but the computational capacity is huge. Currently, spark only implements two classifications

Application:
(1) Stock market forecasts commonly used in various financial institutions

Third, decision Tree

Advantages:

(1) Simple and intuitive, provide visual display

(2) Basically do not need to preprocess the data, do not need to be normalized, do not need to deal with missing values

(3) High fault tolerance and robustness for anomaly points

(4) Good interpretation

Disadvantages:

(1) Easy overfitting, weak generalization ability, can be improved by setting the minimum sample number of nodes or limiting the depth of decision tree

(2) The structure of the tree changes due to a little change in the sample, which can be improved by integrated learning.

Application:

(1) Financial options for option pricing are of great use

(2) Remote sensing is the application field of pattern recognition based on decision Tree

(3) Banks use decision tree algorithm to classify the probability of default payment by loan applicant

(4)Gerber Products Inc., a popular baby products company, uses decision tree machine learning algorithms to determine whether they should continue to use plastic PVC (polyvinyl chloride) in their products.

(5)Rush University Medical Center has developed a tool called Guardian, which uses decision tree machine learning algorithms to identify risk patients and disease trends

Iv. Random Forest

Advantages:

(1) Training can be highly parallelized, for large sample training speed in the big data age has advantages

(2) Insensitive to missing values and outliers

(3) Strong generalization ability, no pruning required

(4) It is difficult to establish a bad random forest, high classification accuracy

Disadvantages:

(1) Easy to use, but theoretical analysis is more difficult

(2) slower because of the inclusion of multiple decision tree weak classifiers

(3) It is easy to influence the decision-making of random forest to influence the model effect by comparing the characteristics of multiple values.

Application:

(1) Used by banks to predict whether a loan applicant may be a high-risk group

(2) Predicting the failure of mechanical components in the automotive industry

(3) The health care industry predicts whether patients may develop chronic diseases

(4) regression, predicting the average of social media shares and performance scores

(5) Predicting patterns in speech recognition software and classifying images and text

V. Naive Bayes

Advantages:

(1) Good performance on small-scale data, can handle multi-classification tasks

(2) Insensitive to missing data, simple algorithm, often used for text classification

Disadvantages:

(1) Naive Bayes is based on the hypothesis that the attributes are independent of each other.

(2) The probability of the posterior examination is determined by transcendental and data to determine the classification, so there is a certain error rate in the classification decision.

Application:

(1) Sentiment analysis

(2) Document classification

(3) Junk Mail Filter

Six, Aprior frequent item excavation

Basic principle:

(1) If a collection of items occurs frequently, all subsets of the item collection also appear frequently.

(2) If the item collection does not appear frequently, all the superset of the item collection does not appear frequently.

Advantages:

(1) Easy to implement, and easy to parallelize

(2) Frequent itemsets mining classical algorithms, understand the algorithm are based on aprior implementation, including FP-TREE,GSP,CBA, etc.

Disadvantages:

(1) Low efficiency

Application:

(1) Market Basket analysis

(2) Auto-complete application

Advantages and disadvantages of common machine learning algorithms and its application summary

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.