1. Machine Learning Algorithm Fast selection

Source: Internet
Author: User
Keywords Machine learning fast algorithm selection machine learning algorithm K algorithm
Tags .mall analysis application class data find help high

Machine learning algorithm spicy, for small white me, shearing constantly confused, special finishing some help me quickly understand the picture

Machine Learning algorithm Subdivision-1. Many algorithms are a class of algorithms, and some of the algorithms are extended from other algorithms-2. From two aspects of the classification-2.1 Learning ways of supervised learning common application scenarios such as classification problems and regression problems common algorithms have logistic regression (logistic regression) and back-propagation neural networks (back propagation neural receptacle) unsupervised learning scenarios including association rules learning and clustering algorithms including Apriori algorithm and K algorithm semi-supervised learning The application scenarios include classification and regression algorithms, including some extensions to the commonly used supervised learning algorithms, which first attempt to model the data without identification, and then predict the identified data. such as the graph theory inference algorithm (graph inference) or Laplace support vector machine (Laplacian SVM.) and other reinforcement learning scenarios, including dynamic systems and robot control common algorithms including q-learning and time lag learning (temporal The similarity regression algorithm of difference learning-2.2 algorithm is a kind of algorithm least squares (ordinary least square) Scatterplot smoothing) that tries to explore the relationship between variables by measuring the error. Logistic regression (logistic regression) stepwise regression (stepwise regression) Multiple adaptive regression spline (multivariate re-use regression) Local Scatter smoothing estimation (locally estimated algorithm is often used to model decision problems, such models often select a batch of sample data and then compare the new data with the sample data according to some approximations. In this way to find the best matching k-nearest neighbor (KNN) learning vector quantization (Learning vector quantization, LVQ) self-organizing mapping algorithm (self-organizing map, SOM) regularization method The extension of other algorithms (usually the regression algorithm) adjusts the algorithm according to the complexity of the algorithm. The regularization method usually rewards the simple model and punishes the complex algorithm Ridge regression least differs shrinkage and Selection Operator (LASSO) resilient network (elastic Net) Decision Tree Learning decision tree models are often used toResolution Classification and regression problem classification and regression trees (classification and regression tree, CART) ID3 (iterative Dichotomiser 3) C4.5 chi-squared Automatic Consortium Detection (Chaid) Decision Stump (Random Dara) multivariate adaptive Regression spline (MARS) gradient propulsion machine (gradient Boosting Machine, GBM) Bayesian method is mainly used to solve classification and regression problem naive Bayesian algorithm average single-dependency estimation (averaged one-dependence estimators, Aode) Bayesian belief (receptacle) based kernel algorithm Based on the kernel algorithm, the input data is mapped to a higher order vector space, in these high-order vector spaces, some classification or regression problems can make it easier to solve the radial basis functions of support vector machines (Support vector Machine, SVM) (Radial base function, RBF) Linear discriminant analysis (Linear discriminate analyses, LDA) clustering algorithms all attempt to find the intrinsic structure of the data in order to classify the data according to the greatest common denominator k the expectation maximization algorithm (expectation Maximization, EM) Association rule Learning to find out the rules that can explain the relationship between data variables, it is a kind of pattern matching algorithm to identify a lot of useful association rules in multivariate data Apriori Eclat artificial neural network algorithm to simulate biological neural network. Often used to solve classification and regression problems. Artificial neural network is a huge branch of machine learning, there are hundreds of different algorithms perceptron Neural network (perceptron neural receptacle) reverse transmission (back propagation) Hopfield network Self-organizing mapping (self-organizing map, SOM) learning vector quantization (Learning vector quantization, LVQ) Depth learning algorithm is the development of artificial neural networks. A lot of depth learning algorithms are semi-supervised learning algorithms. Limited Boltzmann machine (restricted Boltzmann Machine, RBN) Deep belief NX (DBN) convolution network (convolutional receptacle) Stacking automatic encoder (stacked auto-encoders) reduces dimension algorithms like clustering algorithms, reducing dimension algorithms to attempt to analyze the intrinsic structure of data, However, the dimensionality reduction algorithm attempts to use less information to induce or interpret data PCA (principle Component Analysis, PCA) Partial least squares regression in unsupervised learning (Partial least square regression, PLS) Sammon mapping multidimensional scaling (multi-dimensional scaling, MDS) projection tracking (projection Pursuit) integration algorithm with some relatively weak learning models independently of the same sample training, The results are then integrated for overall prediction Boosting bootstrapped Aggregation (bagging) AdaBoost stacked generalization (stacked generalization, blending) Gradient propulsion (gradient Boosting Machine, GBM) random forest (Random Dara)-3. How to choose fast Algorithm-3.1 If you really care about the accuracy rate, The best way to do this is to test a whole bunch of different algorithms (and also make sure that you test the parameters on each algorithm). The last choice to perform best in cross-validation-3.2 training set has a size training set, high deviation/low variance classifier (such as naive Bayesian) than low deviation/ The Gaofang classifier (such as K nearest neighbor) has an advantage, because the latter is easy to fit with the increase of the training set, the low deviation/Gaofang difference classifier will start to have the advantage (they have lower asymptotic error), because the high deviation classifier is less force to provide the accurate model-3.3 some common algorithms of the advantages and disadvantages of simple Bayesian Ganima simple, you just have to do some arithmetic. If the conditional independence assumption is true, naive Bayesian classifier will converge faster than discriminant models, such as logistic regression, so you need less training data. Even if the assumption is not tenable, naive Bayesian classifier in practice still has a decent performance. If you need to be quick and simple and perform well, this will be a good choice. The main disadvantage is that it does not learn the interaction between features (for example, it can not learn you like Donnie Yen and Jiang movies, but hate their joint film "Close Long" the logic of regression there are many methods of regularization, and you do not have to be like the naïve Bayesian to worry about your characteristics are related. Compared with the decision tree and support vector machine, you will also get a good probability explanation, you can even lightUse new data loosely to update the model (using the online gradient descent algorithm). If you need a probability schema (such as simply adjusting the classification thresholds, indicate uncertainty, or get a confidence interval, or you want to quickly integrate more training data into the model, use it to make the decision tree easy to explain (for some people-I'm not sure I'm here). It can handle the interaction between features without pressure and is not parameterized, so you don't have to worry about outliers or whether the data is linear or not (for example, a decision tree can easily handle a class A at the end of a feature dimension X, Category B in the middle, and then category A will appear in the feature Dimension x front end). One drawback is that it does not support online learning, so the decision tree needs to be rebuilt after the new sample arrives. Another disadvantage is that it is easy to fit, but this is the entry point for integration methods such as random forests (or ascending trees). In addition, random forests are often the winners of many classification problems (usually better than support vector machines), I think it's fast and adjustable, and you don't have to worry about tuning a bunch of parameters like a support vector machine, so lately it looks pretty popular. Support Vector machine High accuracy, to avoid the fitting provides a good theoretical guarantee, And even if the data is linear in the original feature space, as long as the proper kernel function is given, it works well. It is particularly popular in the text categorization of hyper-high-dimensional. Unfortunately, memory consumption is large, difficult to explain, running and tuning is also a bit annoying, so I think the random forest to begin to replace the-3.4 Nevertheless, recall that good data is better than good algorithms, design good features is helpful. If you have a large dataset, whichever algorithm you use may not have much impact on the performance of the taxonomy (at this time, depending on speed and usability). Data Analysis Life FAQ category

Classification
Clustering
Regression
Associated
dimensionality reduction

Realization Scheme

A fast searching algorithm diagram for Aliyun habitats

A quick selection algorithm diagram for Python Scikit-learn

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.