Anomaly detection-anomaly Detection algorithm (COURSERA-NG-ML course)

Last Update:2018-07-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Directory

Joint probability distribution
Normal
Anomaly Detection Model Training
Model evaluation

In real life there are many situations that need to be prevented in advance, for example, before the plane takes off, the aircraft parts are evaluated to see whether the engine and other parts are of normal performance, if there are potential problems (abnormal conditions may occur), it needs to be repaired or replaced in time.

So how do we assess whether an exception exists?

Using Joint probability distributions ~

Joint probability distribution

$X represents a combination of a series of random variables {x_1,x_2,x_3,...., x_n}, each of which conforms to one of the distributions of each random variable. Assuming that each variable is 22 independent of each other, then the joint probability distribution of these variables is:
$ $P (X) =p (x_1)P (x_2).... *p (x_n) =\pi p_i$$

The product of the probability that each random variable takes the corresponding value.

Normal

A normal distribution is a very common distributed function, in the form of:

The horizontal axis of the image represents the value of the random variable x, and the longitudinal axes represent the probability that x takes the corresponding value (between 0-1).

The function form of the image (probability density function) is:

$ $y =p (x) =\frac{1}{2\pi\sigma}*exp (-\frac{x-\mu}{2\sigma^2}) $$

The $\mu$ is the mean value of the X distribution, and the $\sigma$ is the standard deviation of the X distribution. The entire function image is about x=$\mu$ symmetric, and the integral of the entire function image for the entire horizontal axis is 1. (The probability density function has this attribute, which means that the value of x must fall at a certain point in the probability density function definition field)

According to the image we can clearly see that the value of x is near the mean value of the probability is very large, and the farther away from the mean, the lower the probability of the occurrence of the value.

And for the normal distribution, there is a very well-known 2-$\sigma$ characteristic, namely:

| interval | The integral/area of the function image and the horizontal axis in the interval |
|--|--|
| [μ-σ,μ+σ]|0.683|
| [μ-2σ,μ+2σ]|0.954|
| [μ-3σ,μ+3σ]|0.997|

To get a better idea of what's going on, here's a hint: for a normal distribution, we can take the X-distance mean as normal, because the probability is large, and the value is far from the mean as an "anomaly" because the probability is very small. And our algorithm for detecting anomalies is from this idea.

Anomaly Detection Model Training

So how do you detect anomalies?

The first thing we think about is: When is the anomaly most likely to occur? Is it supposed to be the most likely occurrence of anomalies in some rare cases? ("Rare cases" in some other contexts can itself be understood as "anomalies", but here, in addition to the anomalies of the objects we are concerned about).

For example, if you want to know if a computer is broken, then you can see if the computer is not the same as the general computer, you may find that it often runs sluggish (rare 1), and found that it is very slow to boot (rare case 2), then in large cases, you will be able to determine that the computer should have a problem.

And in the anomaly detection algorithm, these rare cases are analogous for the distribution of a characteristic of the distance mean value of the more distant values appear.

First, let's say we choose features that might be useful for evaluating whether a computer is working properly, such as the time it takes to open a large program ($X _1=20s$) and the boot time ($X _2=50s$), and then we need to collect a series of data that is known as a normal computer, Get a normal computer's distribution of values on both the "Open program Time" and "Power on" features, as shown in the following table:
| Sample number | Open Program Time |
|--|--|--|
|1|8s|20s|
|2|11s|15s|
|3|19s|37s|
|4|15s|26s|
|5|17s|12s|

So based on these collected sample values, we can obtain an average of "open program time-consuming" average $\mu_1= (8+11+19+15+17)/5=14$ and the average of "boot Time" $\mu_2= (20+15+37+26+12)/5=22$ by averaging them, and the standard deviation $\sigma_1=4$ and $\sigma_2=8.87$ of two characteristics are obtained respectively. The calculation formula is:
$$\mu=\frac{\sigma_{i=1}^{n}t_i}{n} \
\sigma^2=\frac{\sigma_{i=1}^{n} (T_I-\MU) ^2}{n}$$

At this point, assuming that the distribution of the two features are normally distributed, we will use the collected sample values to calculate the mean value of $\mu_1,\sigma_1$ and $\mu_2,\sigma_2$ respectively as the parameters of the positive distribution function of two characteristics, and then use the principle of joint probability distribution, For a computer "open the program time-consuming" to 20s, "boot time" for the joint probability of 50s:

$ $P (x_1=20,x_2=50) =p (x_1=20)P (x_2=50) =\ [\frac{1}{2\pi4}exp (-\frac{20-14}{2* (+) ^2})][\frac{1 }{2\pi8.87}exp (-\frac{50-22}{2* (8.87) ^2})]=0.00059755$$

The result is about 0.0006, that is, the probability of 0.6%, is a very small probability value, the meaning of this value is: a normal computer at the same time as "open program time" to 20s, "boot time" 50s is the possibility of 0.6%.

Then, we can basically judge: the computer must be a problem (abnormal situation).

Here are two things to keep in mind:

1. Why can we assume that the distribution value of a feature is normally distributed?

This is determined by the characteristic value of the feature. There are two kinds of discrete value and continuous value, the distribution of discrete values is Poisson distribution, Bernoulli distribution, the distribution of continuous values is uniform distribution, normal distribution, chi-square distribution and so on. The reason why we assume the two eigenvalues of the above example is normal distribution is because the distribution of the majority of continuous-value variables in the real world is close to or normal distribution, and the experiment proves that the model effect is generally good.

2. Why is it possible to assume that two characteristics are independent of one another?

This is actually an experience-dependent approach. In reality, some features are likely to actually be interrelated, such as if a computer boot time is very slow, it is likely that it has too many boot running program in the background, then open the program after it will naturally need more time, then these two characteristics are actually related, not independent, The computational formula of the joint probabilities is not theoretically feasible here. But in practice, we are still relatively simple and rough hypothesis as the characteristics of each other independent, found that the effect of abnormal detection is still very good, so it is not a big problem.

Model evaluation

Anomaly detection-anomaly Detection algorithm (COURSERA-NG-ML course)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Anomaly detection-anomaly Detection algorithm (COURSERA-NG-ML course)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Anomaly detection-anomaly Detection algorithm (COURSERA-NG-ML course)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support