Summary of advantages and disadvantages of machine learning common algorithms

Source: Internet
Author: User
Tags svm

Summary of advantages and disadvantages of machine learning common algorithms 

k Nearest Neighbor : The algorithm uses the method of measuring the distance between different eigenvalues to classify.
Advantages:
1. Easy to use, easy to understand, high precision, mature theory, can be used to do classification can also be used to do regression;
2. Can be used for numerical data and discrete data;
3. The training time complexity is O (n); no data input assumptions;
4. Not sensitive to outliers
Disadvantages:
1. High computational complexity, high spatial complexity;
2. Sample imbalance problem (i.e., there are a large number of samples in some categories, while the number of other samples is very small);
3. The general numerical value is very large time does not use this, the computation quantity is too big. But a single sample can not be too small or prone to false points.
4. The biggest drawback is the inability to give the intrinsic meaning of the data.

naive Bayesian
Advantages:
1. The generative model, which is classified by calculating probabilities, can be used to deal with multi-classification problems,
2. Small-scale data performance is very good, suitable for multi-classification tasks, suitable for incremental training, the algorithm is relatively simple.
Disadvantages:
1. The expression of the input data is very sensitive,
2. Due to the "simplicity" characteristic of naive Bayes, it will bring some accurate loss.
3. A priori probability needs to be calculated, and the classification decision has an error rate.

Decision Tree
Advantages:
1. The concept is simple, the computational complexity is not high, the explanatory ability is strong, the output result is easy to understand;
2. Data preparation is simple, capable of processing both data and conventional attributes, and other techniques often require a single data attribute.
3. It is not sensitive to the middle, it is more suitable to deal with the missing attribute value of the sample, can deal with irrelevant characteristics;
4. A wide range of applications, you can construct a decision tree for many attributes of the data set, extensibility is strong. The decision tree can be used for unfamiliar collections of data, and extracting some column rules from it is stronger than KNN.
Disadvantages:
1. easy to fit;
2. For data with inconsistent sample numbers, the results of information gain in decision trees are biased towards those with more numerical values.
3. It is difficult to deal with information when it is missing. The dependency between attributes in the dataset is ignored.

SVM
Advantages:
1. Can be used for linear/non-linear classification, can also be used for regression, the generalization error rate is low, the calculation cost is small, the results are easy to explain;
2. Can solve the problem of machine learning in small sample case, can solve high dimension problem can avoid neural network structure choice and local minimum point problem.
3.SVM is the best ready-made classifier, ready to use without modification. And can get a low error rate, SVM can make a good classification decision for data points outside the training set.
Cons: sensitive to parameter adjustment and function selection, the original classifier is only suitable for handling two classification problems.

Logistic regression: According to the existing data, the classification boundary line is established, and then the regression formula is classified.
Advantages: Simple implementation, easy to understand and implement, low computational cost, fast speed, lower storage resources;
Disadvantages: easy to fit, classification accuracy may not be high

em expectation maximization algorithm-God algorithm as long as there are some training data, and then define a maximization function, using the EM algorithm, the computer through a number of iterations, you can get the desired model. The EM algorithm is a self-convergent classification algorithm, which does not need to be set up in advance and does not require data to see the 22 comparison merging operations. The disadvantage is that when the function to be optimized is not a convex function, the EM algorithm is easy to give the local best solution rather than the optimal solution.

"References"
Machine learning-discriminant model and generative model
Http://www.cnblogs.com/fanyabo/p/4067295.html
--em algorithm of ten algorithms for data mining (maximum expectation algorithm)
Http://www.tuicool.com/articles/Av6NVzy
Advantages and disadvantages of various classification algorithms-Study Note 1.0-Home Economics (formerly NPC Economic Forum)
Http://bbs.pinggu.org/thread-2604496-1-1.html
Machine Learning & Data Mining Note _16 (Common interview machine learning algorithm idea simple combing)
Http://www.cnblogs.com/tornadomeet/p/3395593.html
Wu The beauty of mathematics [M]. Beijing: People's post and Telecommunications press, 2014.
Peter Harrington, Lirui, Li Peng, Chuadon, Wang bin. Machine learning Combat [M]. Beijing: People's post and Telecommunications publishing house 2013.
Hangyuan Li Statistical learning methods [M]. Beijing: Tsinghua University Press 2012.
Sugiyama will, Xu Yongwei. Graphical machine learning [M]. Beijing: People's post and Telecommunications publishing house 2015.
Stanford University Open Class: Machine learning Course

Learning representations from EEG with deep rcnn reading notes2016-11-22 a-li brain Electrical and machine learning

Paper: "Learning representations from the EEG with deep recurrent-convolutional neural NETWORKS"

Article Source: ArXiv 2016

The original link is attached.

I. Introduction of the Thesis

This article is the EEG signal after a series of processing to make image and then use Convnet, Lstm and other methods to carry out feature extraction, and finally to obtain the classification results.

Second, the thesis innovation point

This paper presents a new method of EEG feature expression. The general EEG treatment method only contains the time domain feature and the frequency domain feature does not contain the spatial characteristics, this paper added the spatial characteristics to the EEG analysis.

Introduction of data sets

This is a brain-electric experiment that records memory capacity. The experimental steps are as follows: Black paper gives some columns of the English alphabet 0.5S let the test to remember, and then interval three seconds to start testing, each time a letter appears, the trial made a choice whether to appear in the data set just now. We identify each condition that contains 2, 4, 6, 8 characters, respectively, for the load 1,2,3,4. A total of 13 subjects were tested, and each was tested 240 times. EEG data for the first 3.5 seconds were recorded in each experiment. The classification task is to identify the load level that corresponds to the set size (the number of characters presented to the subject) from the EEG record.

Iv. The main work of this paper

1) Fast Fourier Transform (FFT) is performed on the time seriesfor each trial to estimate the power spectrum of the signal.

2) sumof squared absolute values within each of the three frequency bands wascomputed and used as separate measurement for Each electrode.

3) We propose to transform the Measurementsinto a 2-d image to preserve the spatial structure and use multiple Colorchanne LS to represent the spectral dimension.

4) Finally,we Use the sequence of images derived from consecutive time windows to account fortemporal evolutions in brain Activity.

First, a fast Fourier transform is performed on the EEG sequence to predict the power spectrum for each test. Then the square and absolute values are used to calculate the measurement of three frequencies in each EEG. The three frequencies were: Theta (4-7hz), alpha (8-13hz), beta (13-30hz).

Then, the calculated measure is converted to 2-d image using the method presented in this paper.

Finally, these image sequences are used to represent the brain activity, i.e. input into the classifier.

Five, classifier

This article makes two formats for the conversion of the picture, one is the single graph method, and the other is the multi-graph method. The single-image method is to generate only one picture for an experiment and then enter it into the Convnet network. This is done to select the best convolution structure for extracting features. Another kind of multi-graph method is to make 3.5S of data every 0.5S a picture of each channel to seven images, and then the three-channel image together into seven picture sequence input to the classifier. The best convolution structure here is part of the classifier, with Max-pooling, lstm, and 1-d convolutional networks being made into a classifier.

The structure used:

where blue bar C is the best convolution structure previously selected, here is the seven-layer structure of D:

Vi. Results Discussion

Here you can see the effect is good, improve a lot of error rate from 15.34% to 8.9%. In addition, the article also has a single-image classification effect and multi-image classification effect of the article has not been posted, only the final structure and the analysis of the classification effect. In this paper, the effect of the method is significantly proposed to join the spatial characteristics of the innovation point in the EEG processing is worthy of reference.

Summary of advantages and disadvantages of machine learning common algorithms

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.