Stanford ng Machine Learning course: Anomaly Detection

Source: Internet
Author: User

Anomaly Detectionproblem Motivation:

First example of anomaly detection: aircraft engine anomaly detection


Intuitively it is found that if the new engine is in the middle, we may think that it is OK, if the deviation is very large, we need more testing to determine whether it is a normal engine.

The following is a mathematical form of the description, through the probability of density estimates, such as:


To the normal data modeling, to find the probability of xtest, when in the center of the probability is relatively large, and greater than the set threshold, we determine the status of OK, away from the central state, the probability is relatively small, less than the threshold value we determine the anomaly point.

Anomaly Detection Common applications:


Ng class mentions three application directions, the first is the aircraft engine that starts with examples, and then fraud discovery, which is widely used on credit cards and shopping sites. The last one is the industry application, we need to monitor a computer system, we use the normal operating system of memory usage, CUP load modeling, when a system value is not within the normal range may be a computer system in the presence of abnormal state.

Exercise: When we model the system, it causes the abnormal state to be judged as the normal state, then we need to reduce the threshold to avoid miscarriage.


Gaussiandistribution:

Review the Gaussian distribution of some content, more familiar with can skip directly.



pattern and probability distribution functions.



The mean variance shows the difference of the Gaussian distribution pattern.

Parameter Estimation:

Simply said is the estimated mean and variance, the formula can actually be obtained by the maximum likelihood of the mathematical solution to prove, here is not detailed (open the mathematical statistics textbook can be found), the variance formula can choose M or m-1 this does not matter, because often the data set is very large, so the final calculation results no difference , M is usually chosen in machine learning and m-1 is often chosen in statistics. The choice of M or m-1 is quite different in theory, but there is not much difference in practical application.


Exercise: Gaussian distribution density function solution


Algorithm

Density function Estimation algorithm:


To ask for P (X) is the process of density estimation. A multiplicative formula requires each condition to be independent, but it can be calculated so that the correct result is not independent of the condition.

Exercise: The formula for estimating the mean variance. J Subscript denotes the first J feature


Anomaly Detectionalgorithm


1. Select an example feature that you think can differentiate whether it is anomalous.

2. The fitting parameter is the mean and variance.

3. Calculates the joint probability density function on a given dataset, which is determined to be the exception data if it is less than the set threshold value.

Describe this algorithm as an example:


According to the above three-step process of calculation: look at the lower left corner of the graph, if we calculate the joint probability value of large graphs reflected as high, then the normal, if the calculation is low, the decision is abnormal. So far, we've just described the algorithm execution process, and we didn't describe each step in depth.

Developing andevaluating an Anomaly Detection System

We will find it very important to evaluate a learning algorithm with a numerical standard, we can try to join a feature to evaluate it, and then remove the feature to evaluate it again, so that the effect of feature on the learning algorithm is obtained.

So far anomaly detection we only use the data and no data class tags, is a unsupervised learning. Such aswe already have the tag-tagged data, so we can evaluate it very well using the anomaly detection algorithm! This is a very important kind of thinking transformation.

Continue with examples of aircraft engines mentioned above.



We recommend the use of the blue tag, but the division of the red Mark is also being manipulated.

Evaluation of the algorithm effect:


Exercise: It is obvious that the accuracy on the test set is not a good evaluation criterion because we are here to tilt the class! Precision and recall f_score need to be used for evaluation. The determination of the threshold value can be determined by the maximum value of the evaluation metric. When you are designing an anomaly detection system, the key is to consider which feature to choose and how much threshold to set.


Anomaly Detectionvs Supervised Learning

We must be confused when we have data tags, why we don't use anomaly detection directly using supervised learning, and then we compare the two.


First anomaly detection in the data set is characterized by a very small amount of positive data, a lot of negative data, so we use a large number of negative data can be well fitted to obtain a joint Gaussian probability density function. In supervised learning, the amount of positive negative data is large.

Next we have different types of exception data, but the amount of abnormal data is very small, any algorithm is difficult to learn from the small anomaly data set anomaly what it looks like. The above two comparisons are some of the important distinguishing criteria for whether you apply anomaly detection or supervise learning.

Spam is often mentioned as a learning system, although we have many types of spam, but each type of spam we have more data, so spam problem we apply supervise learning.

In fact, these two states are not completely divided, for example, if we are trading in a lot of fraud, then we study the problem from anomaly detection to supervise learning.


Exercise: Intuitive judgment of two situations


Choosingwhat Features to use

The previous approach is to assume that the data satisfies the Gaussian distribution, and also mentions that if the distribution is not Gaussian distribution, the above method can be used, but if we convert the distribution to approximate Gaussian distribution, it will get better results. For example, using log and other functions to transform, the implementation of the level octave below to try to convert can be very good approximation of Gauss.

How to Choose Feature:

The idea is similar to the error analysis in supervise learning, when we get the wrong result when we learn from the left, we need to add a new feature X2 make that point and normal data to be distinguished! As indicated from left to right.

Or back to the previously mentioned monitoring data center computer, we analyze which feature is causing anomaly, by constructing a new feature to judge.


Exercise: Anomaly detection algorithms do not differentiate between normal and Anomaly, we often need to add features to make them distinguishable.


Stanford ng Machine Learning course: Anomaly Detection

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.