Tommitchell defines machine learning as: acomputer program is said to learn from experience e, with respect to some taskt, and some performance measure P, if its performance on T, as measuredby P, improves with experience e. very rhyming and poetic. Simply put, a program can learn from experience e to Improve the Performance of task T. The performance is measured by P.
Machine Learning is dividedSupervised Learning),Unsupervised learning), AndReinforcement Learning)AndRecommender System).Supervised LearningIt refers to knowing the correct results during training. For example, teaching a child to classify fruits, giving him an apple first, telling him that it is an apple, giving him another apple, and then telling him, this is an apple. After such training and learning, if you give him an apple, ask him, what is this? He should tell you that this is an apple. If you give him a pear, he should tell you that it is not an apple. Supervised Learning is divided into two types:Regression)AndClassification)If the output value of the machine learning algorithm is a continuous value, it is a regression problem. If it is a discrete value, it is a classification problem. Unlike supervised learning,Unsupervised learningDuring training, I did not know the correct results. I went on to give the above example a bunch of fruits to the children, such as apples, oranges, and pears. At the beginning, the children did not know what the fruits were, let the children classify these fruits. After the child classifies the child, give him an apple. He should put it in the heap of the apple he just divided. The common method for unsupervised learning isClustering).Enhanced LearningIn the process of running the Machine Learning Algorithm Program, we make comments on his behavior, and there are two kinds of positive and negative comments. Through the learning evaluation program, we should make behaviors that are more likely to be well evaluated.
This time, we will focus on the single-variable linear regression in linear regression. First, we will give several definitions:
M: indicates the number of samples in the training set.
X: input variables, that is, features, can have multiple input values. For Single-variable linear regression, there is only one input.
Y: output variable, that is, the target value.
(X, y): training sample set.
(X (I), y (I): indicates the I training sample.
The objective of linear regression is to train a linear objective function H based on a given set of training samples. Through this function, other unknown inputs can be predicted with correct results,Target FunctionIndicates:
Our goal is to find the appropriate sum.
How do I know whether the target function is correct?
We should have an evaluation function to compare the gap between the predicted value and the target value. The smaller the gap, the more accurate the target function will be,Rating FunctionsIndicates:
Here is the predicted value, and Y is the target value of the sample. When the evaluation function J is about the open-up parabolic curve of the independent variable. When the evaluation function J is a three-dimensional surface about the variable and. Our goal is to minimize the evaluation function:
How to obtain the target function?
We useGradient Descent), Just like going down the hill, walking down at a certain pace until the bottom of the hill ,:
The algorithm process is as follows:
Repeat until convergence {
(For J = 0 andj = 1) (1)
}
The positive number indicates the learning speed, that is, the size of the step we use to update. The larger the value indicates the faster the learning speed, and the smaller the value indicates the slow learning speed. And must be updated synchronously, that is
Let's take a look at the geometric meaning of the gradient descent method. Let's simplify the problem, so that the evaluation function J is about the open-up parabolic of the independent variables, evaluate J to obtain the slope of the tangent at the point on the parabolic curve. When the slope is positive, the result of formula (1) is positive, smaller and smaller, and the value of the evaluation function is closer to the midline position of the parabolic curve, that is, closer to the minimum value. Similarly, when the slope is negative, the second result is negative. If the negative value is subtracted, a positive number is added. Therefore, the larger the value is, the closer the value of the evaluation function is to the midline position of the parabolic curve, that is, the closer it is to the minimum value. It can be represented by an example:
Let's take a look at the meaning. When the value is too small, the update is slow, and the gradient descent algorithm will slow down in execution. When the value is too large, the gradient descent algorithm may exceed the target value (minimum value), leading to non-convergence, even divergence. As follows:
Even if the value is fixed, the gradient descent algorithm will converge to the local minimum value, because when it approaches the minimum value, the slope (that is, the value derived from J) will become smaller and smaller, therefore, the second item (1) will become smaller and smaller, so there is no need to reduce the value at any time.
The objective function is brought into the (1) type. After the derivation, the final result of the gradient descent algorithm is as follows:
Repeat until convergence {
(For J = 0 andj = 1)
}
The above are my notes for studying the online Machine Learning class of Andrew Ng, a professor at Stanford University. Address: Workshop