Public Course address:Https://class.coursera.org/ml-003/class/index
INSTRUCTOR:Andrew Ng
1. Problem motivation ( Problem generation )
Let's take a look at an example. If we want to inspect the aircraft engine, we know that the performance of the aircraft engine is related to many factors. Here I only choose two factors: Heat and vibration intensity, after obtaining a large amount of normal data, we can plot the relationship between the two factors in the two-dimensional coordinate system, as shown in red:
Now there is a new engine sent to us for detection. We have detected the heat and Vibration Strength of the engine, and now to determine whether the engine is normal, you can determine whether the corresponding vertex is in the normal region. If the corresponding vertex is in the normal region, you can think that the engine is normal. Otherwise, the vertex marked as anomaly is an exception.
Similarly, we can determine the exception based on the point density. The closer the point is, the more normal it is. Otherwise, it is an exception. For better measurement, we can set a threshold value, when the estimated value is less than the threshold value, it indicates that the density is too low, which is an exception:
Exception detection is very useful in the manufacturing and information industries. In addition to the aircraft engine detection mentioned above, adding exception detection to the data center can determine whether the computer is running normally.
2. Gaussian distribution ( Gaussian distribution )
Students who have learned the probability must know the Gaussian distribution, which is also named normal distribution. Its distribution shape and expression are as follows:
By adjusting the mean value and variance, the image shape will change, and the learned probability will be directly ignored:
If we know that the sample points are subject to the Gaussian distribution, we need to find the mean and variance. It is not difficult to obtain them through the formula:
Note that we divide the varianceMInstead of dividing by common onesM-1,AdrewIt is explained that this is for convenience of calculation. Although the expectation is inaccurate, this error can be ignored.
3. algorithm ( Algorithm )
Now that we know the Gaussian distribution, how can we calculate the density of a sample point? We first assume that each feature in the sample points satisfies the Gaussian distribution.NVitalize, then there isNGaussian distribution:
The density of a sample isNThe product of the density in the Gaussian distribution. The entire algorithm process is as follows:
We can see that after a new sample is given, we can calculate the density value and compare it with the threshold. If it is smaller than the threshold, it is considered as an exception. The following is a specific example:
4. Developing and evaluating an anomaly detection system ( Development and evaluation of an exception detection system )
Before developing a system, we must first have data:
For example, to solve the engine problem mentioned above, we can obtain the following data:
Note that no outlier exists in the training set. Only the outliers are included in the validation set and test set. Then training, verification, and testing can be used to evaluate the model quality.F1-scoreValue to evaluate, as to the threshold value, you can also choose through verification:
5. Anomaly Detection vs. Supervised Learning ( Exception Detection VS Supervised Learning )
Exception detection differentiates exceptions, which is similar to supervised learning. However, the two are still quite different. For example, exception detection has fewer abnormal values.(Positive)This is not the case in supervised learning. At the same time, if there are too many types of exceptions or future exceptions are hard to predict, we should also adopt exception detection rather than supervised learning:
The following are the applications of the two. The differences mentioned above can also be seen from the application:
6. Choosing what feature to use ( Select the feature to use )
We have assumed that the features of the sample are subject to Gaussian distribution, but this is not the case. Therefore, we need to make some changes:
As you can seeLogValue to convert the lower left to a Gaussian distribution that does not look like the Gaussian distribution. At the same time, we hope thatP (x)It is as large as possible under normal circumstances and as small as possible in case of exceptions, but it is often found that in both casesP (x)The value is slightly different. Considering the data center monitoring problem mentioned above, new features can be extracted from the current features to meet the requirements:
7. Multivariate Gaussian distribution ( Multivariate Gaussian distribution )
It is also the monitoring problem of the data center mentioned above. Here we do not divide the feature into multiple Gaussian distributions for multiplication, as we did before, but directly represent it with a Gaussian distribution, this is the multivariate Gaussian distribution, which is also mentioned in probability theory:
Here, variance is a matrix, that is, the covariance matrix we often hear, and the mean value is also reflected in the matrix form. Multivariate means that the Gaussian distribution is not a two-dimensional plane distribution, but a higher dimensional plane distribution. The following is a two-dimensional situation. We can see the distribution change by changing the mean value and the covariance:
8. Anomaly Detection Using the multivariate Gaussian distribution ( Exception Detection Using Multivariate Gaussian distribution )
First, we need to calculate the mean and covariance of the multivariate Gaussian distribution based on the sample points:
Then calculate the density of the newly added sample points:
In fact, there is a certain relationship between the multivariate Gaussian distribution and the previous Gaussian distribution. Through calculation, we can see that the primary and right corner elements of the covariance matrix in the multivariate Gaussian distribution are also composed of the variance of a single feature:
Finally, compare the features of the two models:
We can see that the original model must manually specify features, while the multivariate model automatically calculates the relationship between features. However, the original model does not introduce a matrix, which is easy to calculate and can be correctly executed if there are few samples. The multi-element model is complex to calculate after the matrix is introduced. to calculate the inverse of the matrix, the model must be executed when the sample value is greater than the feature value.
------------------------------------------Weak split line----------------------------------------------
Although exception detection is mentioned in this article, it is used to introduce Gaussian distribution. It can be said that Gaussian distribution is widely used in machine learning and is often used.EmThe core of the algorithm is Gaussian distribution, which is unfortunately not mentioned here. As for exception detection, I didn't know whether this was a supervised learning, unsupervised learning, or semi-supervised learning. I checked it on Wikipedia and found out that I was also confused about this:
Http://en.wikipedia.org/wiki/Anomaly_detection