TensorFlow integrates and implements a variety of machine learning-based algorithms that can be called directly.
Supervised learning
1) Decision Trees (decision tree)
Decision tree is a tree structure, providing people with decision-making basis, decision tree can be used to answer yes and no problem, it through the tree structure of the various situations are represented, each branch represents a choice (select Yes or no), until all the choices are finished, and finally give the correct answer.
The decision tree (decision tree) is a tree structure (can be a two-fork tree or a non-binary). When the decision tree is actually constructed, pruning is usually done in order to deal with the problem of overfitting caused by noise and outliers in the data. There are two types of pruning:
Pruning first-in the construction process, when a node satisfies the pruning condition, the construction of this branch is stopped directly.
Post-pruning-The complete decision tree is constructed first, and then the pruning is done by some conditional traversal tree.
2) Naive Bayesian classifier (Naive Bayesian MODEL,NBM)
Naive Bayesian classifier is based on Bayes theorem and its hypothesis (that is, the characteristics are independent and non-reciprocal), which is mainly used to solve the problem of classification and regression.
P (a| B) is a posteriori probability, P (b| A) is likelihood, p (A) is a priori probability, p (B) is the value we want to predict.
Specific applications are: Mark an e-mail message as spam or non-spam, divide news articles into technical, political, or sporting categories, examine a text to express positive emotions, or negative emotions; used for face recognition software.
Students who have learned the probabilities must know the Bayes theorem, an algorithm invented more than 250 years ago that has an unparalleled position in the field of information. Bayesian classification is a general term for a series of classification algorithms, which are based on Bayesian theorem, so collectively referred to as Bayesian classification. Naive Bayesian algorithm (Naive Bayesian) is one of the most widely used classification algorithms. the naive Bayesian classifier is based on a simple hypothesis: when a target value is given, the properties are independent of each other's conditions.
Through the above theorem and the assumption of "simplicity", we know:
P (Category | Document) = P (document | category) * p (category)/P (Document)
For example, give a paragraph of text, return emotional classification, this paragraph of the attitude of the text is positive, or negative.
To solve this problem, you can just look at some of these words.
This text will only be represented by some words and their count.
The original question is: give you a word, which category does it belong to?
Through the Bayes rules becomes a relatively simple and easy to obtain problems.
The question becomes, what is the probability of this sentence appearing in this category, of course, don't forget the other two probabilities in the formula.
Example: the probability that the word love appears in the case of positive is 0.1, and in the case of negative, the probability of occurrence is 0.001.
The following will give you a detailed explanation of naive Bayesian classification algorithm.
3) least squares (Least squares)
If you know something about statistics, then you must have heard of linear regression. The least square is used to find linear regression. As shown, there will be a series of points in the plane, and then we can take a line so that the line fits the point distribution as closely as possible, which is linear regression. There are many ways to find this line, and the least squares method is one of them. The least squares principle is as follows, finding a line that makes all points within the plane to the Euclidean distance and minimum of this line. This line is what we ask to get the line.
Least squares (also known as the least squares method) is a mathematical optimization technique. It matches by minimizing the squared error and finding the best function of the data. By using the least squares method, the unknown data can be easily obtained, and the sum of the errors between the obtained data and the actual data is minimized. The least squares can also be used for curve fitting. Other optimization problems can also be expressed by minimizing the energy or maximizing the entropy using the least squares method.
4) Logistic regression (logistic Regression)
The logistic regression model is a two classification model, which chooses different features and weights to classify the samples, and uses a log function to calculate the probability of the samples belonging to a certain class. That is, a sample will have a certain probability belongs to a class, there will be a certain probability belongs to another category, the probability of a class is the sample belongs to the class. The possibility of estimating something.
5) Support Vector Machine (SVM)
Support Vector Machines (SVM) is a two classification algorithm that can find a (N-1) dimension of a hyper-plane in N-dimensional space, which can be divided into two categories. In other words, if there are two types of points in the plane that are linearly separable, SVM can find an optimal line to separate the points. SVM has a wide range of applications.
To separate the two categories, want to get a super-plane, the best super-plane is to the two types of margin to reach the maximum, margin is the ultra-plane and distance from its nearest point, such as, Z2>Z1, so the green super-plane is better.
6) K Nearest neighbor algorithm (Knn,k-nearestneighbor )
The proximity algorithm, or K nearest neighbor (Knn,k-nearestneighbor) classification algorithm, is one of the simplest methods in data mining classification technology. The core idea of the KNN algorithm is that if the majority of the k nearest samples in a feature space belong to a category, the sample also falls into this category and has the characteristics of the sample on this category. This method determines the category to which the sample is to be divided, depending on the category of one or more adjacent samples in determining the classification decision. The KNN method is only associated with a very small number of adjacent samples when deciding on a class. The KNN method is more suitable than other methods because the KNN method mainly relies on the surrounding finite sample, rather than the Discriminant class domain method to determine the category of the class.
The main application field is the recognition of unknown things, that is, to judge what kind of unknown things belong to, judgment thought is, based on Euclid theorem, judge the characteristics of unknown things and what kind of known things are closest. For example, the green circle is decided to give which class, is the red triangle or blue Quad square? If k=3, because the red triangle is the proportion of 2/3, the green circle will be given the red triangle that class, if k=5, because the blue four-square scale is 3/5, so the green circle is given the blue four-square class. It is also shown that the results of KNN algorithm depend largely on the choice of K.
7) Integrated Learning (Ensemble learning)
Integration Learning is the integration of many classifiers together, each classifier has different weights, the classification results of these classifiers are merged together, as the final classification results. the initial integration method is Bayesian decision.
The integrated algorithm trains the same sample independently with some relatively weak learning models, then integrates the results for overall prediction. The main difficulty of integration algorithm is how to integrate the independent weak learning models and how to integrate the learning results. This is a very powerful algorithm, but also very popular. Common algorithms include: Boosting, bootstrapped Aggregation (Bagging), AdaBoost, stacking generalization (stacked generalization, Blending), gradient pusher (Gradient Boosting machine, GBM), random forest (randomly Forest).
So how does the integration approach work, and why are they better than a single model?
- They leveled out the output bias: If you average a Democrat-leaning poll and a Republican-biased poll, you'll get a neutral result with no inclination.
- They reduce the variance: The aggregation results of a stack of models have less noise than the results of a single model. In the financial sector, this is called diversification--a mixed investment in multiple stocks is smaller than a stock change. That's why the more data points, the better your model will be, rather than the less data points the better.
- They are less likely to have been fitted: if you have a single model that has not been fitted, you are combining these predictions in a simple way (average, weighted average, logistic regression), and then there is no space for fitting.
Unsupervised learning
8) Clustering algorithm
Clustering algorithm is to process a bunch of data, according to their similarity to the data clustering .
Clustering, like regression, is sometimes described as a kind of problem, sometimes describing a class of algorithms. Clustering algorithms typically merge input data by either a central point or a hierarchical approach. So the clustering algorithm tries to find the intrinsic structure of the data in order to classify the data in the most common way. Common clustering algorithms include the K-means algorithm and the desired maximization algorithm (expectation maximization, EM).
There are many kinds of clustering algorithms, such as: Central clustering, associative clustering, density clustering, probability clustering, dimensionality reduction, neural network/deep learning .
9) K-Mean value algorithm (k-means)
K-means algorithm is a hard clustering algorithm, is a typical prototype-based target function clustering method representative, it is the data point to the prototype of a certain distance as the objective function of optimization, using the function to find the extremum of the method of the iterative operation of the adjustment rules. The K-means algorithm takes the Euclidean distance as the similarity measure, it is the optimal classification of the vector v corresponding to a certain initial cluster center, which makes the evaluation index J minimum. The algorithm uses the error square sum criterion function as the cluster criterion function. The K-means algorithm is a typical distance-based clustering algorithm, using distance as the evaluation index of similarity, that is, the closer the two objects are, the greater the similarity. The algorithm considers the clusters to be composed of objects that are close to each other, so that a compact and independent cluster is used as the ultimate goal.
In general, people define clustering based on some distance or similarity between samples, that is, to cluster similar (or distant) samples into the same class, and to classify non-identical (or distant) samples in other classes.
10) Principal component analysis (Principal Component ANALYSIS,PCA)
Principal component analysis is to find the principal component by using orthogonal transformations to convert some of the column's potentially related data into linearly unrelated data. The most famous application of PCA method is the feature extraction and data dimensionality reduction in facial recognition.
PCA is mainly used for simple learning and visualization of data compression, simplification. But PCA has some limitations, and it requires you to have knowledge of specific areas. The more noisy data does not apply.
One) SVD matrix decomposition (Singular Value decomposition)
Also called singular value decomposition (Singular value decomposition), is an important matrix decomposition in linear algebra, and is the generalization of the unitary diagonalization of normal matrices in matrix analysis. In the field of signal processing, statistics and other fields have important applications. The SVD matrix is a complex real-complex negative matrix, given a M-row, N-column matrix M, then the M-matrix can be decomposed into M = uσv. U and V are unitary matrices, Σ is diagonal array.
PCA is actually a simplified version of SVD decomposition. In the field of computer vision, the first facial recognition algorithm is based on PCA and SVD, features the facial characteristics, and then dimensionality and finally face matching. Although facial recognition methods are now complex, the rationale is similar.
12) Independent component Analysis (ICA)
Independent component Analysis (independent Component Analysis,ica) is a statistical technique used to identify hidden factors that exist under random variables. ICA defines a generation model for the observational data. In this model, it is assumed that the data variable is composed of a recessive variable, which is linearly mixed by a mixed system, and the hybrid system is unknown. And it is assumed that the latent factors are non-Gaussian and independent, which is called the independent component of observable data.
ICA is associated with PCA, but it works well in discovering potential factors. It can be applied in digital image, file database, economic index, mind measurement and so on.
is an ICA-based face recognition model. In fact, these machine learning algorithms are not all as complex as imagined, and some are closely related to high school mathematics.
We will explain these common algorithms in detail in the following sections.
Intensive Learning
) q-learning algorithm
Q-learning to solve this problem: an autonomous agent who can perceive the environment, how to choose the best action that can achieve its goal through learning.
The purpose of reinforcement learning is to construct a control strategy, which makes the performance of agent behavior reach maximum. The agent perceives information from a complex environment and processes the information. By learning to improve their own performance and choose behavior, Agent to produce the choice of group behavior, individual behavior selection and group Behavior selection makes the agent make decision to choose a certain action, and then affect the environment. Reinforcement learning refers to the development of animal learning, stochastic approximation and optimal control, which is a non-tutor on-line learning technology, from the environment State to the action map learning, which makes the agent adopt the optimal strategy according to the maximum reward value; Agent-aware state information in the environment, The search strategy (which strategy can produce the most effective learning) chooses the optimal action, which causes the state to change and gets a deferred return value, updates the evaluation function, completes a learning process, enters the next round of learning training, repeats the loop iteration, until satisfies the whole learning condition, terminates the study.
Q-learning is a non-model reinforcement learning technique. Specifically, q Learning can be used to find the optimal action selection strategy for any given (finite) Markov decision process (MDP). By learning an Action value function, it finally gives the expected utility of taking a given action in a given state, and then follows the optimal strategy. A policy is a rule that the agent follows after selecting an action. When this action-valued function is learned, the optimal strategy can be built by simply selecting the action with the highest value in each state. One of the advantages of q-learning is the ability to compare the expected utility of available operations without the need for an environmental model. In addition, Q learning can handle random transitions and rewards issues without any need for adaptation. It has been shown that for any limited mdp,q learning to eventually find an optimal strategy, returning from the expected value of the overall reward to all successive steps starting from the current state is the maximum achievable meaning.
Machine Learning & Deep Learning Basics (TensorFlow version Implementation algorithm overview 0)