Isolation forest algorithm to achieve detailed knowledge of machine learning

Source: Internet
Author: User

In this paper, the complete source code has been open source to my github (if it is helpful to you, please give a star), see the Iforest package under the Iforest and Itree two classes: https://github.com/JeemyJohn/ Anomalydetection Preface

The principle of isolation Forest algorithm is introduced in this paper, please refer to my blog: Isolation Forest anomaly detection algorithm principle, this article we only introduce the detailed code implementation process. 1, the design and implementation of Itree

First, we refer to the construction pseudocode of the Itree in the original paper:

1.1 Design of Itree class data structure

From the original thesis [1,2] and the above pseudocode, we know that itree is a binary tree, and the algorithm of constructing Itree adopts recursive construction. The constructed end condition is that the current node's height exceeds the threshold value of the algorithm setting L l; The current subtree contains only one leaf node, and all attributes of all node values of the current subtree are identical.

And in recursion, we need to randomly select one of the attributes in the attribute set Q Q, Qi q_i, and a value q Q between the corresponding maximum and minimum values of the given input data, to divide the samples contained in the current node into the left and right subtrees. Therefore, in order to facilitate the subsequent algorithm design, we need to record the selected attribute Qi q_i index value attrindex, as well as the calculated Q Q value attrValue, Because the following algorithm needs to estimate the total height of the node based on the total number of leaf nodes that the subtree contains for the root node, we also need to define a variable leafnodes The total number of leaf nodes in the tree, and we need a member variable curheight To record the actual height of the node. Of course, the binary tree is indispensable to define the left and right subtree pointer ltree ltree and Rtree Rtree.

Therefore, I have designed the following data structure Itree:

public class Itree {

    //The selected property index public
    int attrindex;

    A specific value of the selected property is public
    double attrValue;

    The total leaf node number of the tree is public
    int leafnodes;

    The node at the height of the tree is public
    int curheight;

    Left and right children book public
    Itree Ltree, Rtree;

    constructor, initializes the value in Itree public
    itree (int attrindex, double attrValue) {
        //default height, the height of the tree is calculated from 0
        this.curheight = 0;

        This.ltree = null;
        This.rtree = null;
        This.leafnodes = 1;
        This.attrindex = Attrindex;
        This.attrvalue = AttrValue;
    }
   ...
}
1.2 Recursive construction of two-forked tree Itree

According to the pseudo code of the algorithm 2 in the original paper, we know that the recursive construction of the two-forked tree itree is divided into two parts:

First, the first is to determine whether the three recursive termination conditions listed in Section 1.1 are met;

Second, randomly select a property in the attribute set and a specific value under it, and then divide the sample data contained in the parent node into the left and right subtree based on the property and the generated property values, and then recursively create the left and right subtree.

Also record the number of leaf nodes each node contains and the actual height of the current node throughout the tree.

See the following detailed code implementation:

    /** * Based on samples data recursive creation of Itree tree/public static Itree Createitree (double[][) samples, int curheight,

        int limitheight) {Itree itree = null;
        /*************** The first step: to determine whether recursion satisfies the end condition **************/if (samples.length = = 0) {return itree;
            else if (curheight >= limitheight | | samples.length = = 1) {Itree = new Itree (0, samples[0][0]);
            Itree.leafnodes = 1;
            Itree.curheight = Curheight;
        return itree;
        int rows = Samples.length;

        int cols = Samples[0].length;
        To determine if all samples are the same, if all the same constructs also terminates boolean isallsame = true;
                break_label:for (int i = 0; i < rows-1; i++) {for (int j = 0; J < cols; J +) {
                    if (Samples[i][j]!= samples[i + 1][j]) {isallsame = false;
                Break Break_label;
       }
            } All samples are the same, the build terminates, and the leaf node if (isallsame = = true) {Itree = new Itree (0, Samples[0][0]) is returned;
            Itree.leafnodes = Samples.length;
            Itree.curheight = Curheight;
        return itree; 
        /*********** Second step: does not satisfy the recursive end condition, continues recursively to produce subtree *********/Random Random = new Random (System.currenttimemillis ());

        int attrindex = Random.nextint (cols);
        Find the maximum and minimum value of the selected dimension double min, Max;
        min = Samples[0][attrindex];
        max = min; for (int i = 1; i < rows; i++) {if (Samples[i][attrindex] < min) {min = Samples[i][att
            Rindex];
            } if (Samples[i][attrindex] > max) {max = Samples[i][attrindex];

        ///Compute partition attribute value Double attrValue = random.nextdouble () * (max-min) + min;
        Comparing all the attrindex corresponding properties of the samples with//AttrValue to select the corresponding sample int lnodes = 0, rnodes = 0; Double curvalue;
            for (int i = 0; i < rows; i++) {curvalue = Samples[i][attrindex];
            if (Curvalue < attrValue) {lnodes++;
            else {rnodes++;
        } double[][] Lsamples = new Double[lnodes][cols];

        double[][] Rsamples = new Double[rnodes][cols];
        lnodes = 0;
        rnodes = 0;
            for (int i = 0; i < rows; i++) {curvalue = Samples[i][attrindex];
            if (Curvalue < AttrValue) {lsamples[lnodes++] = samples[i];
            else {rsamples[rnodes++] = samples[i];
        }//create parent node Itree parent = new Itree (Attrindex, AttrValue);
        Parent.leafnodes = rows;
        Parent.curheight = Curheight;
        Parent.ltree = Createitree (lsamples, Curheight + 1, limitheight);

        Parent.rtree = Createitree (rsamples, Curheight + 1, limitheight);
    return to parent; }
2, the design and implementation of Iforest

The pseudo code for the original paper's algorithm 1 is shown below:

From the pseudo code above, we know that the main function of the Iforest class is to do two things:

Constructing the Itree after the sub sampling of the input data;

All constructed Itree are merged to form a test forest. 2.1 Design of Iforest class data structure

Therefore, we have designed the following basic data structure class iforest. where Center0 Center0 and Center1 Center1 are used to record the final anomaly and anomaly categories of the Anomaly Factor Center (you know why the anomaly center is called when the Anomaly Center is followed), these two variables are needed for categorical prediction. Subsamplesize Subsamplesize is the number of child samples of the entire algorithm (default value 256). Itreelist Itreelist is a list that holds all itree.

The public class Iforest {

    //Center0 represents the center of the exception class, and Center1 represents the normal class center
    private Double Center0;
    Private Double Center1;

    Number of sample collection samples
    private int subsamplesize;

    Iforest included in the Itree linked list
    private list<itree> itreelist;

    /**
     * Parameterless constructor, contamination set to default value 0.1/public
    iforest () {
        This.center0 = null;
        This.center1 = null;
        This.subsamplesize = 256;
        This.itreelist = new arraylist<> ();
    }
    ...
}
2.2 Building forests

The first thing to initialize after playing iforest, of course, is to build a single itree and add them to the Itreelist itreelist to form a tree-detecting forest. Of course, first we have to set the height of the tree cap Limitheight limitheight.

    /**
     * Create iforest
     *
    /private void Createiforest (double[][] samples, int t) throws Exception {

        // Method parameter Validity check
        if (samples = = NULL | | samples.length = 0

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.