This article introduces several of the most popular machine learning algorithms. There are many machine learning algorithms. The difficulty is to classify methods. Here we will introduce two methods for thinking and classifying these algorithms. The first group of algorithms is the learning style, and the second group is similar in form and function.
Learning Style
There are different methods for modeling an algorithm based on problems. whether the problem is based on experience or environment interaction, or based on the data we need to input, the learning style is the first question that must be considered in machine learning.
Next, let's take a look at the main learning style or learning model of some algorithms.
- Supervised Learning: the input data is called the training data. A model requires a training process to make the expected judgment during this process. If the model is wrong, the model is corrected, the training process continues until the expected accuracy is achieved based on the training data. The key methods are classification and regression, and the algorithms are logical regression and BP neural networks.
- Unsupervised learning: without any training data, a model is derived based on unlabeled input data. The key method is association rule learning and aggregation, the algorithms include the Apriori algorithm and K-means algorithm.
- Semi-supervised learning semi-Supervised Learning: the input data is a hybrid case of tag and non-tag. The model must learn the structure and then organize the data as expected. The key method is classification and regression.
- Reinforcement Learning: The model must be able to respond and respond from an environmental stimulus. Feedback is not in the form of a teaching process, but can be rewarded and punished by the environment. The key method is system and robot control. Algorithms include Q-learning and temporal difference learning.
When processing data for business decision modeling, you usually use supervised and unsupervised learning methods. Currently, a hot topic is semi-supervised learning in image classification and other fields. There are few labeled examples of large datasets. Reinforcement Learning is easier to apply in Robot Control and other control systems.
Similarity Algorithm
Algorithms generally present similarity in functions or forms. For example, the tree-based method and neural network method are inspired. This is a useful grouping method, but it is imperfect. There are still some algorithms that are easy to integrate into multiple categories, such as learning vector quantization, which is both a neural network-inspired method and an instance-based algorithm.
There are also some algorithms that have the same name on the problem domains and algorithm categories, such as regression analysis and aggregation. Therefore, like the machine learning algorithm itself, there is no perfect model and only a suitable model.
Below we will display some popular machine learning algorithms.
Regression
Regression Models focus on the relationship between variables, and uses Model Prediction Error Measurement for repeated extraction. The regression method is statistical and has been incorporated into statistical machine learning. This may be confusing because we can use regression to reference various problems and algorithms. Regression is actually a process. Some example algorithms are as follows:
- Ordinary Least Squares
- Logistic Regression
- Stepwise Regression
- Multivariate adaptive regression splines (MARS)
- Locally estimated scatterplot smoothing (loess)
Instance-Based Method
The instance-based learning model uses training data that is very important to the model. This type of method usually uses a database based on the sample data, use new data and database data to find the best match in a similarity method to make a prediction. For this reason, the instance-based approach is also known as the winner's all-in-one approach and memory-based learning. Focuses on the Performance of similarity measurement between storage instances.
- K-nearest neighbour (KNN)
- Learning vector quantization (LVQ)
- Self-Organizing Map (SOM)
Regularization Method
Method-based extension (typically based on the regression method) may be complicated and easier to promote. The regularization methods listed below are popular, powerful, and simple.
- Ridge Regression
- Least Absolute shrinkage and selection operator (lasso)
- Elastic net
Decision Tree Learning
The decision tree method is used to establish a decision model based on the actual data attribute values. Decision Making uses a tree structure until prediction decisions are made based on a given record. Decision tree training is performed on data of classification and regression.
- Classification and regression tree (Cart)
- Iterative dichotomiser 3 (ID3)
- C4.5
- Chi-squared automatic interaction detection (chaid)
- Demo-stump
- Random Forest
- Multivariate adaptive regression splines (MARS)
- Gradient boosting machines (GBM)
Bayesian Bayes
The Bayesian method clearly uses Bayesian Theorem for classification and regression:
- Naive Bayes
- Averaged one-dependence estimators (aode)
- Bayesian Belief Network (BBN)
Kernel Methods Kernel Method
Kernel Methods is the most popular method of support vector machine. Kernel Methods focuses more on ing data to high-dimensional space vectors, where we can perform modeling for classification or regression problems.
- Support Vector Machines (SVM)
- Radial Basis Function (RBF)
- Linear discriminate analysis (LDA)
Clustering tertering Method
The tering clustering method, similar to regression, belongs to the categories that describe the problem and method. The clustering method is usually modeled on centroid-based and hierarchical organization of the center. All methods are related to using the structure inherent in the data to better organize the data into the most common grouping.
- K-means
- Expectation maximisation (EM)
Association rule learning
The Learning Method of association rules is to extract rules that can interpret the data relationship between observed variables. These rules can be used to discover important and commercial associations that are useful to an organization or company in a large multi-dimensional data set.
- Apriori algorithm
- Eclat Algorithm
Artificial Neural Network
The artificial neural network model is inspired by the structure and function of the biological neural network. They are a type of pattern matching and are often used for regression and classification problems, because there are hundreds of branch algorithms of various types of problems. Some classic popular methods:
- Perceptron
- Back-Propagation
- Tmpnetwork
- Self-Organizing Map (SOM)
- Learning vector quantization (LVQ)
Deep Learning
The deep learning method is an upgraded version of the modern artificial neural network method. It uses rich and inexpensive computing to build larger and more complex neural networks, many methods involve semi-Supervised Learning (large data contains few labeled data ).
- Restricted Boltzmann Machine (RBM)
- Deep belief networks (DBN)
- Convolutional Network
- Stacked auto-encoders
Dimensionality Reduction Method
Similar to the cluster clustering method, dimensionality reduction is the internal structure of seeking and utilizing data. However, in this case, unsupervised methods can only summarize or describe data with less information. It is useful to use in supervised mode to form visualized 3D data or simplify data.
- Principal Component Analysis (PCA)
- Partial Least Squares Regression (PLS)
- Sammon Mapping
- Multidimen1_scaling (MDS)
- Projection Pursuit
Ensemble Integration Method
The integration method is composed of multiple weak models trained independently. These models are combined for overall prediction in some way. A large amount of energy is required to learn what weak types and their combinations. This is a very powerful and popular technology category:
- Boosting
- Bootstrapped aggregation (bagging)
- AdaBoost
- Stacked generalization (blending)
- Gradient boosting machines (GBM)
- Random Forest
The weak is gray, and the combined prediction is red. The specific display is the temperature/ozone data.