What mathematical foundations will be used in
machine learning
In the first part, let's take a look at the mathematical foundations of machine learning. We can quote an expert's definition first. This expert is Pero Domingos from the University of Washington. This is also a veteran in the field of artificial intelligence. He gave such a definition of machine learning. Machine learning is composed of three parts: representation, evaluation, and optimization. These three steps actually correspond to the mathematics needed in machine learning.
Machine learning trilogy
Means
In the presentation step, we need to establish an abstract model of the data and actual problems. Therefore, it includes two aspects. On the one hand, we have to abstract the actual problem to be solved. For example, if we want to design an algorithm to determine whether an email is a spam or not, then the results obtained are nothing more than two, either yes or no. If such a problem is abstracted, it is actually a binary classification problem. Yes, we can define it as 0, no, we can define it as 1. So, what is the ultimate solution to this problem? Output a result of 0 or 1. Of course, the meaning of 0 and 1 can also be reversed. 1 means spam and 0 means no, which is also possible. Therefore, in the process of representation, the problem we have to solve is to abstract some of the physical problems in the real world we face into a mathematical problem. After abstracting out this mathematical problem, we have to solve it further and represent this data.
After abstracting the problem, we have to abstract the data. How do we determine whether this email is spam? To judge based on its characteristics, take a look at whether the key word in this e-mail is about promotion or some keywords about the product. These features, these keywords, we have to express it as a feature, as a vector, or as other forms. Whether it is expressed as a vector or expressed in other forms, it is an abstraction of this data.
In the presentation stage, what we need to build is the data and the abstract model of the problem. Build this model, and then look for a reasonable algorithm.
K-nearest neighbor algorithm. In machine learning, we commonly have K-nearest neighbor algorithm. The K-nearest neighbor algorithm is not mentioned in our column because it is too simple. In fact, it is to find the nearest neighbors of a sample point and this sample point, and the nearest K neighbors. According to the principle that the minority obeys the majority, classify it, which is the K-nearest neighbor algorithm.
Regression model. In addition, there is linear regression, such a statistical learning method. I build a linear regression model. Of course, we can build a logistic regression model for binary classification.
Decision tree. There are methods like decision trees. Decision tree does not depend on data, it is a top-down design. Whether linear regression or logistic regression is good, it derives the model from the data in reverse, while the decision tree directly uses the model to determine the data. The two directions are different.
SVM support vector machine. Finally, there are pure mathematical methods such as SVM support vector machines. So in the presentation part, we need to abstract the problem and the data. At this time, we need to use abstract tools.
Evaluation
Given a model, how do we evaluate the quality of the model? At this time, an objective function needs to be set to evaluate the nature of the model.
Set objective function
The selection of the objective function can also take many forms. For the problem of spam we talked about, we can define an error rate. For example, an email was originally not spam, but my algorithm misjudged it as spam. This is a wrong example. Therefore, the error rate is a commonly used indicator in classification problems, or a commonly used objective function.
Minimum mean square error and maximum posterior probability
So in regression, we will use a common objective function such as minimum mean square error, especially in linear regression. In addition, there are maximum posterior probability and some other indicators.
optimization
With the objective function, we need to solve an optimal solution of the objective function under the model. What is the minimum error rate or minimum mean square error that this model can obtain? We ask for a specific value. Without this value, how do you evaluate whether different models are good or bad? Therefore, the function of the optimization step is to solve an optimal solution of the objective function under the model, and see what degree the model can best achieve when solving this problem.
In summary, the three steps of machine learning summarized by Professor Domingos include three steps of representation, evaluation, and optimization. In these three steps, we will use different mathematical formulas to solve these three steps. Questions.
Three mathematical tools
Linear algebra
In these three steps, three different tools are applied. In this step, what are the main tools we use? It's linear algebra. As for linear algebra, we also mentioned in this column that one of its main functions is to transform concrete things into abstract mathematical models. No matter how complicated your world is, we can transform it into a vector or a matrix. This is the main function of linear algebra.
Therefore, in the process of solving the problem of representation in linear algebra, we mainly include these two parts. On the one hand, there is linear space theory, which is what we call vector, matrix, and transformation problems. The second is matrix analysis. Given a matrix, we can do so-called SVD (singular value decomposition) decomposition, that is, do singular value decomposition, or do some other analysis. These two parts together constitute the linear algebra needed in our machine learning, of course, the two also have their own emphasis. In terms of linear space, we mainly apply it in solving some theoretical problems. Matrix analysis is used in theory and some in practice.
Probability statistics
We said that linear algebra works in the process of representation. In the evaluation process, we need to use probability statistics. Probability statistics includes two aspects, one is mathematical statistics, and the other is probability theory.
Mathematical statistics is easy to understand. Many models used in our machine learning are derived from mathematical statistics. For example, the simplest linear regression and logistic regression are actually derived from statistics. After the objective function is specifically given, when we actually evaluate the objective function, we will use some probability theories. For example, given a distribution, I want to solve the expected value of the objective function. In an average sense, how far can this objective function be? Probability theory needs to be used at this time. So in the evaluation process, we will mainly apply some knowledge of probability and statistics.
In fact, for mathematical statistics, when we evaluate the model, we not only pay attention to an objective function, we may also pay attention to some of its statistical characteristics. For example, its confidence, or some other indicators. After your model is established, how reliable is it? These are also considerations in the early machine learning algorithms. Of course, with the rise of neural networks and deep learning, this part of the content actually gradually declined, or gradually ignored. In the neural network, you may only need to achieve this good objective function and good indicators. As for its confidence, we will not consider these.
Therefore, this is also one of the reasons why deep learning is not popular with people who learn mathematics or statistics. Because what does statistics emphasize? Emphasis on interpretability. What kind of indicators can your model achieve? We can clearly explain why it can achieve such indicators. What is the principle? What is the basis behind it? I give a distribution, if we say Gaussian distribution, then given a model, I can use this rigorous and concise mathematical derivation to present the result to it in the form of a formula, this looks very tall, Or it is very clear. But neural networks and deep learning have not yet reached such an interpretable level. Therefore, there are some people criticizing that deep learning is alchemy. The main reason is here. I can only call up a better result by adjusting the parameters, but why does this result appear? What factors will affect it? It may not be so clear. So, regarding probability statistics, we mainly apply it in the evaluation process.
Optimization theory
Regarding optimization, needless to say, we must use optimization theory. In the optimization theory, the main research direction is convex optimization.
Convex optimization certainly has some limitations, but what are its benefits? Can simplify the solution of this problem. Because in optimization, we all know that what we require is a maximum or minimum, but in practice we may encounter some local maxima, local minima, and saddle points. Convex optimization can avoid this problem. In convex optimization, the maximum value is the maximum value, and the minimum value is the minimum value.
But in practice, especially after the introduction of neural networks and deep learning, the scope of application of convex optimization is getting narrower and narrower, and in many cases it is no longer applicable, so we mainly use unconstrained optimization. I am in the whole range, and I have no restrictions on the parameters and input. Solve in the entire input range without setting additional constraints. At the same time, one of the most widely used algorithms in neural networks, an optimization method, is back propagation.
There is no one-to-one correspondence between three mathematical tools and three steps
We are talking about this machine learning today. What are the basic mathematics used? Including these three, linear algebra, probability and statistics, and optimization theory. These are some of the most basic mathematical tools we use in machine learning. If you roughly make a classification, it corresponds to the three steps of our machine learning, representation, evaluation, and optimization.
Of course, this application does not mean one-to-one correspondence. In the representation, I only use linear algebra. Probability and statistics are not involved at all. Similarly, when I evaluate, linear algebra is not involved. This is not the case. There will be a crossover process, but in each step The main tools are still different.
Advanced mathematics is the foundation of mathematical tools
Of course, in the mathematical tools, we did not involve advanced mathematics, advanced mathematics we regard it as a foundation, a foundation within a foundation. Not only artificial intelligence, or machine learning, wherever mathematics is involved, we all need to have the foundation of advanced mathematics. So specifically in machine learning, we use more in advanced mathematics, which may include derivation, differentiation, and such content. Of course, there is also this integral, which we may also encounter when solving the expected value of this objective function.
So at this point, let's just say that we introduced what mathematics is used in machine learning. Mainly these three pieces, linear algebra, probability statistics, and optimization.