Isolation forest Algorithm implementation detailed

Source: Internet
Author: User
Tags terminates

The principle of isolation Forest algorithm introduced in this article is described in my blog: Isolation Forest anomaly detection algorithm principle, we only introduce the detailed code implementation process in this article.

1, the design and implementation of Itree

First, we refer to the construction pseudocode of Itree in the original paper:

Write a picture description here

1.1 Designing the data structures of the Itree class

Itree is a binary tree based on the original paper and the pseudo-code above, and the algorithm used to construct the Itree is recursive. At the same time the end condition of the construction is:

The height of the current node exceeds the threshold value set by the algorithm L;
The current subtree contains only one leaf node;
All properties of all node values of the current subtree are exactly the same.
And in recursion, we need to randomly select a property in the attribute set Q and a value q between the maximum and minimum values of the attribute in the given input data, to divide the samples contained in the current node into left and right subtrees. Therefore, for the convenience of the subsequent algorithm design, we need to record the index value of the selected attribute Qi Attrindex, as well as the calculated Q value Attrvalue, because the algorithm needs to estimate the total height of the node based on the total number of leaf nodes that the node is a subtree of the root node. So we also need to define a variable leafnodes the total number of leaf nodes in the record tree, and we also need a member variable curheight to record the actual height of the node. Of course, the binary tree has to define the left and right subtree pointers ltree and RTree.

Therefore, I have designed the following data structure Itree:

public class Itree {

Index of the property being selected
public int attrindex;

A specific value for the selected property
public double attrValue;

Total leaf node points of the tree
public int leafnodes;

The height of the node in the tree species
public int curheight;

About Children's book
Public Itree Ltree, RTree;

constructors, initializing values in Itree
Public itree (int attrindex, double attrValue) {
Default height, the height of the tree is calculated starting from 0
this.curheight = 0;

This.ltree = null;
This.rtree = null;
This.leafnodes = 1;
This.attrindex = Attrindex;
This.attrvalue = AttrValue;

1.2 Recursive construction of two-fork tree Itree

Based on the pseudo-code of algorithm 2 in the original paper, we know that the recursive construction of the two-fork tree Itree is divided into two parts:

First, the first is to determine whether the three recursive end conditions listed in section 1.1 are satisfied;

Second, randomly selects a property in the attribute set and a specific value under the set, then divides the sample data contained in the parent node into the left and right subtree, based on the property and the resulting property value, and recursively creates the left and right subtree.

Also record the number of leaf nodes each node contains and the actual height of the current node in the entire tree.

See the following detailed code implementations:

/**
* Itree Tree is created recursively based on samples sample data
*/
public static Itree Createitree (http://www.wmyl15.com/double[][] samples, int curheight,
int limitheight)
{

Itree itree = null;

/*************** First step: Determine whether recursion satisfies the end condition **************/
if (Samples.length = = 0) {
return itree;
} else if (curheight >= limitheight | | samples.length = = 1) {
Itree = new Itree (0, samples[0][0]);
Itree.leafnodes = 1;
Itree.curheight = Curheight;
return itree;
}

int rows = Samples.length;
int cols = Samples[0].length;

Determine if all samples are the same, and if all are the same build terminates
Boolean isallsame = true;
Break_label:
for (int i = 0; i < rows-1; i++) {
for (int j = 0; j < cols; http://www.wmyl11.com/j++) {
if (samples[i][j]! = samples[i + 1][j]) {
Isallsame = false;
Break Break_label;
}
}
}

All samples are the same, the build terminates, and the leaf nodes are returned
if (Isallsame = = True) {
Itree = new Itree (0, samples[0][0]);
Itree.leafnodes = Samples.length;
Itree.curheight = Curheight;
return itree;
}


/*********** Second step: Do not meet the recursive end condition, continue to recursively produce subtree *********/
Random random = new Random (System.currenttimemillis ());
int attrindex = Random.nextint (cols);

Find the maximum and minimum values for the selected dimension
Double min, Max;
min = Samples[0][attrindex];
max = min;
for (int i = 1; i < rows; i++) {
if (Samples[i][attrindex] < min) {
min = samples[www.zhouyajinguawang.cn I][attrindex];
}
if (Samples[i][attrindex] > max) {
max = Samples[i][attrindex];
}
}

Calculate dividing attribute values
Double attrValue = random.nextdouble () * (max-min) + min;

Attrindex the corresponding properties of all the samples
AttrValue to select the corresponding samples of the left and right subtrees.
int lnodes = 0, rnodes = 0;
Double Curvalue;
for (int i = 0; i < rows; i++) {
Curvalue = Samples[i www.wmylcs.com][attrindex];
if (Curvalue < AttrValue) {
lnodes++;
} else {
rnodes++;
}
}

double[][] Lsamples = new Double[lnodes][cols];
double[][] Rsamples = new Double[rnodes][cols];

lnodes = 0;
rnodes = 0;
for (int i = 0; i < rows; i++) {
Curvalue = Samples[i][attrindex];
if (Curvalue < AttrValue) {
lsamples[lnodes++] = samples[i];
} else {
rsamples[rnodes++] = samples[i];
}
}

Create parent Node
Itree parent = new Itree (www.feifanshifan8.cn attrindex, attrValue);
Parent.leafnodes = rows;
Parent.curheight = Curheight;
Parent.ltree = Createitree (lsamples, Curheight + 1, limitheight);
Parent.rtree = Createitree (rsamples, Curheight + 1, limitheight);

return parent;
Today is late, I have to work, not to be continued ...
Write a picture description here

Write a picture description here

Reference documents:

Http://www.yongshiyule178.com/zhouzh.files/publication/icdm08b.pdf
Http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/tkdd11.pdf

Isolation forest Algorithm implementation detailed

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.