Machine learning is a core skill of the data analyst advanced Step. Share the article about machine learning, no algorithms, no code, just get to know machine learning quickly!
--------------------------------------------------------------------------------------------------------------- -----------------------------------
after understanding the types of machine learning problems that need to be addressed, you can start to consider the types of data collected and the machine learning algorithms that you can try. The most popular machine learning algorithms are introduced here, and it is helpful to explore the main algorithms to get a general idea of the available methods.
There are many algorithms available, and the difficulty is that there are different kinds of methods and extensions to these methods. This makes it very difficult to distinguish exactly what is the Orthodox algorithm. There are two ways to think and differentiate the algorithms you will encounter in this field.
The first method of dividing the algorithm is based on the way of learning, the second is based on the similarity of form and function. Both of these methods are useful.
Learning Style
Based on experience, environment, or any interaction we call input data, an algorithm can model a problem in different ways. In machine learning and AI textbooks, the popular approach is to first consider an algorithmic learning style. The main learning mode and learning model of the algorithm are only a few, introduced each, and given several algorithms and the problem types they are suitable to solve as examples.
supervised Learning: input data is called training data, they have known labels or results, such as spam/non-spam or stock prices for a certain period of time. The model parameter determination needs to pass a training process, in this process the model will request to make the forecast, when the forecast does not match, then needs to make the change.
Unsupervised Learning : The input data is not labeled or has a known result, and the model is modeled by guessing the structure that exists in the input data. Examples of such problems relate to the learning of union rules and clustering. Examples of algorithms include the Apriori algorithm and the K-means algorithm.
semi-supervised learning: The input data consists of tagged and unmarked components. While the right predictive model already exists, the model must be able to predict and organize the data by discovering the underlying structure. Such issues include classification and regression. Typical algorithms include the generalization of some other flexible models that make assumptions about how to model unlabeled data.
Intensive Learning: input data is provided to the model as an incentive from the environment, and the model must react. Feedback does not come from the training process as supervised learning, but rather as a punishment or reward for the environment. Typical problems are system and robot control. Examples of algorithms include Q-Learning and sequential differential learning.
When you work with large amounts of data to model business decisions, you typically use supervised and unsupervised learning. A hot topic at the moment is semi-supervised learning, for example, in image classification, where the datasets involved are large but contain only a handful of tagged data.
--------------------------------------------------------------------------------------------------------------- --------------------------------
Similarity of algorithms
In general, we will distinguish the algorithm according to the similarity of function and form. such as tree structure and neural network methods. This is a useful classification method, but it is not perfect. There are still some algorithms that can easily be grouped into several categories. For example, learning vector quantization, which is both inspired by the neural network method, is an instance-based approach, there are some algorithms that describe the name of the problem, it is also a class of algorithms, such as the name of the regression and clustering. Because of this, the inverse will see different collations of the algorithm from different sources. Just like its learning algorithm itself, there is no perfect model, only good enough model.
In this section, many popular machine learning algorithms will be listed in the way I find most intuitive. Although neither the categories nor the algorithms are exhaustive, I think they are representative and contribute to a general understanding of the whole field.
Regression analysis
Regression is a modeling method that determines the amount of prediction errors for a model, and then iteratively optimizes the relationship between the variables by this amount. Regression method is the main application of statistics, which is classified as statistical machine learning. This is confusing because doors can refer to a class of problems or a class of algorithms using regression. In fact, regression is a process. Here are some examples:
Ordinary least squares
Logistic regression
Stepwise regression
Multiple Adaptive spline regression Mars
Local polynomial regression fitting loess
The instance-based learning model models the decision-making problems, which are based on examples that are considered important in the training data or that are necessary for the model. This approach typically builds a sample database and then compares the new data to the database based on a similarity metric to find the most matching item and finally make predictions. Thus, the case-based approach is also known as the "winner-take-all" approach and memory-based learning. The midpoint of this approach lies in the representation of existing instances and the measurement of similarity between instances.
K Nearest Neighbor Algorithm KNN
Learning Vector Quantization LVQ
Self-organizing mappings som
Regularization method
This is an extension of another method (usually a regression analysis method) that punishes a model with a high degree of complexity and tends to promote a better, simpler model. There are some regularization methods listed here, because they are popular, powerful, and usually just simple improvements to other methods.
Ridge return
Lasso Algorithm Lasso
Elastic Network
Decision Tree Learning
The decision tree method models the decision-making process based on the actual values of the attributes in the data. Decisions are forked on a tree structure until a particular record can be predicted. In the problem of classifying the latter regression, we use data to train decision trees.
Classification and regression number algorithm cart
Iterative binary Tree 3 generation ID3
C4.5 algorithm
Chi-square Automatic interactive view Chaid
Single-layer decision tree
Random Forest
Multiple Adaptive spline regression Mars
Gradient Propulsion Machine GBM
Bayesian algorithm
Bayes method is an algorithm that applies bayes theorem in the definite classification and regression problems.
Naive Bayesian algorithm
Aode algorithm
Bayesian Reliability Network BBN
Kernel function method
The most famous method of kernel function is the popular support vector machine algorithm, which is actually a series of methods. The kernel function method is concerned with how to map the input data to a high latitude vector space, in which some classification or regression problems can be easily solved.
Support Vector Machine SVM
Radial basis function RBF
Linear discriminant Analysis Lda
Clustering method
Just like regression, clustering represents both a class of problems and a class of methods. Clustering methods are generally divided according to the modeling method: Centroid-based or hierarchical structure. All methods use the intrinsic structure of the data to classify the data into the most common category.
K Mean Value method
Max expectation algorithm em
Association Rule Learning
Association rule learning is a class of algorithms for extracting rules that best explain the relationships between variables in the observed data. These rules can find important and commercially useful associations in a large cube and are then further exploited.
Apriori algorithm
Eclat algorithm
Artificial neural network
Artificial neural network is an algorithm that is inspired by the structure or function of the biological neural network. They are commonly used in regression and classification problems in the pattern matching method, but in fact, this huge subclass contains hundreds of algorithms and algorithms of deformation, can solve various types of problems, some classic popular methods (deep learning has been separated from this classification):
Perception Machine
Inverse propagation algorithm
Hopfield Neural Network
Adaptive mapping som
Learning Vector Quantization LVQ
Deep learning
The deep learning method is a modern version of artificial neural network using inexpensive and redundant computational resources. This type of approach attempts to resume a much larger and more complex neural network, as mentioned earlier, many methods are based on very limited tag data in large data sets to solve semi-supervised learning problems.
Limited Boltzmann Machine RBM
Depth of belief net dbm
convolutional Neural Networks
Cascade Automatic Encoder SAE
Dimensionality reduction method
As with clustering methods, the dimensionality reduction approach attempts to summarize or describe the data using the intrinsic structure of the data, and the difference is that it uses less information in an unsupervised manner. This is useful for visualizing high-dimensional data or simplifying data for subsequent supervised learning.
Principal component Analysis PCA
partial least squares regression pls
Salmon mapping
Multidimensional Scale analysis MDS
Projection Pursuit
Integration method
The integration method is composed of several weaker models, which are trained independently and whose predictive structures are integrated in some way to produce a general forecast. Much effort is focused on choosing what type of learning model to use as a sub-model and how to integrate their results. This is a very powerful technology, and therefore very popular.
Propulsion Technology Boosting
Self-exhibition integrated bagging
Adaptive Propulsion AdaBoost
Cascading generalization strategy blending
Gradient Propulsion Machine GBM
Random Forest
Common algorithms for machine learning---2016/7/19