Anomaly detection-anomaly Detection algorithm (COURSERA-NG-ML course)

Source: Internet
Author: User

Directory

    • Joint probability distribution
    • Normal
    • Anomaly Detection Model Training
    • Model evaluation

In real life there are many situations that need to be prevented in advance, for example, before the plane takes off, the aircraft parts are evaluated to see whether the engine and other parts are of normal performance, if there are potential problems (abnormal conditions may occur), it needs to be repaired or replaced in time.

So how do we assess whether an exception exists?

Using Joint probability distributions ~

Joint probability distribution

$X represents a combination of a series of random variables {x_1,x_2,x_3,...., x_n}, each of which conforms to one of the distributions of each random variable. Assuming that each variable is 22 independent of each other, then the joint probability distribution of these variables is:
$ $P (X) =p (x_1)P (x_2).... *p (x_n) =\pi p_i$$

The product of the probability that each random variable takes the corresponding value.

Normal

A normal distribution is a very common distributed function, in the form of:

The horizontal axis of the image represents the value of the random variable x, and the longitudinal axes represent the probability that x takes the corresponding value (between 0-1).

The function form of the image (probability density function) is:

$ $y =p (x) =\frac{1}{2\pi\sigma}*exp (-\frac{x-\mu}{2\sigma^2}) $$

The $\mu$ is the mean value of the X distribution, and the $\sigma$ is the standard deviation of the X distribution. The entire function image is about x=$\mu$ symmetric, and the integral of the entire function image for the entire horizontal axis is 1. (The probability density function has this attribute, which means that the value of x must fall at a certain point in the probability density function definition field)

According to the image we can clearly see that the value of x is near the mean value of the probability is very large, and the farther away from the mean, the lower the probability of the occurrence of the value.

And for the normal distribution, there is a very well-known 2-$\sigma$ characteristic, namely:

| interval | The integral/area of the function image and the horizontal axis in the interval |
|--|--|
| [μ-σ,μ+σ]|0.683|
| [μ-2σ,μ+2σ]|0.954|
| [μ-3σ,μ+3σ]|0.997|

To get a better idea of what's going on, here's a hint: for a normal distribution, we can take the X-distance mean as normal, because the probability is large, and the value is far from the mean as an "anomaly" because the probability is very small. And our algorithm for detecting anomalies is from this idea.

Anomaly Detection Model Training

So how do you detect anomalies?

The first thing we think about is: When is the anomaly most likely to occur? Is it supposed to be the most likely occurrence of anomalies in some rare cases? ("Rare cases" in some other contexts can itself be understood as "anomalies", but here, in addition to the anomalies of the objects we are concerned about).

For example, if you want to know if a computer is broken, then you can see if the computer is not the same as the general computer, you may find that it often runs sluggish (rare 1), and found that it is very slow to boot (rare case 2), then in large cases, you will be able to determine that the computer should have a problem.

And in the anomaly detection algorithm, these rare cases are analogous for the distribution of a characteristic of the distance mean value of the more distant values appear.

First, let's say we choose features that might be useful for evaluating whether a computer is working properly, such as the time it takes to open a large program ($X _1=20s$) and the boot time ($X _2=50s$), and then we need to collect a series of data that is known as a normal computer, Get a normal computer's distribution of values on both the "Open program Time" and "Power on" features, as shown in the following table:
| Sample number | Open Program Time |
|--|--|--|
|1|8s|20s|
|2|11s|15s|
|3|19s|37s|
|4|15s|26s|
|5|17s|12s|

So based on these collected sample values, we can obtain an average of "open program time-consuming" average $\mu_1= (8+11+19+15+17)/5=14$ and the average of "boot Time" $\mu_2= (20+15+37+26+12)/5=22$ by averaging them, and the standard deviation $\sigma_1=4$ and $\sigma_2=8.87$ of two characteristics are obtained respectively. The calculation formula is:
$$\mu=\frac{\sigma_{i=1}^{n}t_i}{n} \
\sigma^2=\frac{\sigma_{i=1}^{n} (T_I-\MU) ^2}{n}$$

At this point, assuming that the distribution of the two features are normally distributed, we will use the collected sample values to calculate the mean value of $\mu_1,\sigma_1$ and $\mu_2,\sigma_2$ respectively as the parameters of the positive distribution function of two characteristics, and then use the principle of joint probability distribution, For a computer "open the program time-consuming" to 20s, "boot time" for the joint probability of 50s:

$ $P (x_1=20,x_2=50) =p (x_1=20)P (x_2=50) =\ [\frac{1}{2\pi4}exp (-\frac{20-14}{2* (+) ^2})][\frac{1 }{2\pi8.87}exp (-\frac{50-22}{2* (8.87) ^2})]=0.00059755$$

The result is about 0.0006, that is, the probability of 0.6%, is a very small probability value, the meaning of this value is: a normal computer at the same time as "open program time" to 20s, "boot time" 50s is the possibility of 0.6%.

Then, we can basically judge: the computer must be a problem (abnormal situation).

Here are two things to keep in mind:

1. Why can we assume that the distribution value of a feature is normally distributed?

    • This is determined by the characteristic value of the feature. There are two kinds of discrete value and continuous value, the distribution of discrete values is Poisson distribution, Bernoulli distribution, the distribution of continuous values is uniform distribution, normal distribution, chi-square distribution and so on. The reason why we assume the two eigenvalues of the above example is normal distribution is because the distribution of the majority of continuous-value variables in the real world is close to or normal distribution, and the experiment proves that the model effect is generally good.

2. Why is it possible to assume that two characteristics are independent of one another?

    • This is actually an experience-dependent approach. In reality, some features are likely to actually be interrelated, such as if a computer boot time is very slow, it is likely that it has too many boot running program in the background, then open the program after it will naturally need more time, then these two characteristics are actually related, not independent, The computational formula of the joint probabilities is not theoretically feasible here. But in practice, we are still relatively simple and rough hypothesis as the characteristics of each other independent, found that the effect of abnormal detection is still very good, so it is not a big problem.
Model evaluation

Anomaly detection-anomaly Detection algorithm (COURSERA-NG-ML course)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.