In the paper of anomaly detection algorithm based on Gaussian distribution, the principle and formula of anomaly detection algorithm are given in detail, and the octave simulation of the algorithm is presented in this paper. An instance is a server that marks an exception based on the throughput (throughput) and delay time (Latency) data of the training sample (a set of network servers).
The data set is visualized as follows:
We calculate the mathematical expectation mu and variance sigma2 of the second Gaussian distribution according to the data set X:
function [mu sigma2] = Estimategaussian (X)%estimategaussian This function estimates the parameters of a%gaussian Distribu tion using the data in percent x [mu sigma2] = Estimategaussian (x),% the input X is the dataset with each n-dimensional Data point in one row% the output was an n-dimensional vector mu, the mean of the data set% and the variances Sigma ^2, an n x 1 vector%% useful variables[m, n] = size (x), mu = Zeros (n, 1); sigma2 = Zeros (n, 1); mu = SUM (x,1) '/m; %note:mu and Sigma are both n-demension.for (i=1:m) e= (X (i,:) '-mu); Sigma2 + = E.^2;endforsigma2 = Sigma2/mend
Calculate probability density:
function P = Multivariategaussian (X, Mu, Sigma2)%multivariategaussian computes the probability density function of The%mul Tivariate Gaussian distribution.% p = Multivariategaussian (X, mu, Sigma2) computes the probability% density func tion of the examples X under the multivariate Gaussian% distribution with parameters Mu and Sigma2. If Sigma2 is a matrix, it is% treated as the covariance matrix. If Sigma2 is a vector, it's treated% as the \sigma^2 values of the variances in each dimension (a diagonal% Covar Iance matrix)%k = Length (MU); if (Size (SIGMA2, 2) = = 1) | | (Size (Sigma2, 1) = = 1) Sigma2 = Diag (Sigma2) EndX = Bsxfun (@minus, X, mu (:) ');p = (2 * pi) ^ (-k/2) * DET (Sigma2) ^ ( -0.5) * ... exp ( -0.5 * SUM (Bsxfun (@times, X * PINV (SIGMA2), x), 2)); end
After visualization:
Select a threshold value based on a training sample (CV set) that is known to be unusual for a reserved part:
function [Bestepsilon bestF1] = Selectthreshold (Yval, pval)%selectthreshold Find the best threshold (epsilon) to use for S electing%outliers% [Bestepsilon bestF1] = Selectthreshold (Yval, PVal) finds the best% threshold to use for Select ING outliers based on the results from a% validation set (pval) and the Ground Truth (yval).%bestepsilon = 0;BESTF1 = 0; F1 = 0;stepsize = (max (pval)-min (pval))/1000;for epsilon = min (pval): Stepsize:max (pval) pred = (Pval<epsilon); C4/>p_e_1 = (pred==1); Y_e_1 = (yval==1); P1 = 0; m = size (p_e_1,1); for (I=1:M) if ((P_e_1 (i) ==1) && (P_e_1 (i) ==y_e_1 (i))) p1++; endif endfor P_12 = SUM (pred); p_13 = SUM (y_e_1); P=p1/p_12; r=p1/p_13; F1 = 2*p*r/(p+r); If F1 > bestF1 bestF1 = F1; Bestepsilon = Epsilon; Endendend
Final tagged Result:
Octave simulation of anomaly detection algorithm