Core Idea of KD tree

Source: Internet
Author: User

The KD-tree is short for K-dimension tree and is a data structure divided by log data points in a K-dimensional space. In fact, the KD-tree is a balanced binary tree.

For example:

Suppose there are six two-dimensional data points = {(2, 3), (5, 4), (9, 6), (4, 7), (8, 1), (7, 2 )}, the data point is in two-dimensional space. In order to effectively find the nearest neighbor, the KD-tree adopts the divide-and-conquer idea, which divides the whole space into several small parts. The KD-tree graph generated by six two-dimensional data points is:

For KD-trees with n known points, the complexity is as follows:

    1. Build: O (log2n)
    2. Insert: O (log n)
    3. Delete: O (log n)
    4. Query: O (n1-1/K + M) m --- number of recent points to be searched each time

Construction of a KD-tree

The KD-tree is a binary tree, and each node represents a spatial range. The following table lists the data structures contained in each node in the KD-tree. The range field indicates the Space Range contained by the node. The node-data field is an n-dimensional data point in the dataset. The hyperplane is divided into two subspaces by dividing the data point node-data and perpendicular to the plane of the Axis split. The value of the split field is I. If the I-Dimension Data of a data point in the spatial range is smaller than node-data [I], it belongs to the left sub-space of the node space, otherwise, it belongs to the right sub-space. The left and right fields represent the KD-tree composed of data points with left and right subspaces empty.

Domain Name

Data Type

Description

Node-Data

Data Vector

A data point in a dataset is an n-dimensional vector.

Range

Spatial Vector

The Space Range represented by the node

Split

Integer

Number of the direction axis perpendicular to the split superplane

Left

KD-tree

KD-tree composed of all data points in the left sub-space of the subarea separated by the node

Right

KD-tree

KD-tree composed of all data points in the left sub-space of the subarea separated by the node

Parent

KD-tree

Parent node

The pseudo code for building the KD-tree is:

Algorithm: Build KD-tree

Input: data point set data_set and its space.

Output: Kd, type: KD-tree

1 If data-set is null, return empty KD-tree

2 call node generationProgram

(1) determine the split domain: For all descriptive sub-data (feature vectors), calculate their data variance on each dimension and select the maximum value of the variance, the corresponding dimension is the value of the split field. A large data variance indicates that data points are scattered along the axis. In this direction, the best resolution can be obtained through data segmentation.

(2) determine the node-data domain. The data-set of the data point set is sorted by the split dimension value. The data point located in the center is selected as node-data, data-set '= data-set \ node-Data

3 dataleft = {d belongs to data-set '& D [: Split] <= node-data [: Split]}

Left-range = {range & dataleft}

Dataright = {d belongs to data-set' & D [: Split]> node-data [: Split]}

Right-range = {range & dataright}

4: Left = KD-tree created by (dataleft, leftrange)

Set: The left parent field (parent node) to KD.

: Right = KD-tree created by (dataright, rightrange)

Set the parent field of right to KD.

In the preceding example,

(1) confirm: The data variance of the split domain = x, 6 data points on the X and Y dimensions is 39, 28. 63. the variance in the X axis direction is large, so the split field value is X.

(2) OK: node-Data = (). Sort the data according to the value on the X dimension. The value of the six data items is 7, therefore, the node-data domain is a data point (7, 2 ). In this way, the Super Plane of the node is divided through () and perpendicular to: Split = x axis of the Line X = 7.

(3) The left and right subspaces are divided into two parts by dividing the hyperplane x = 7. X <= 7 is the left sub-space, which contains nodes (2, 3), (5, 4), (4, 7), and the other part is the right sub-space. Contains nodes (9, 6), (8, 1)

This building process is a recursive process. Repeat the preceding process until only one node is included.

The KD-tree is short for K-dimension tree and is a data structure divided by log data points in a K-dimensional space. In fact, the KD-tree is a balanced binary tree.

For example:

Suppose there are six two-dimensional data points = {(2, 3), (5, 4), (9, 6), (4, 7), (8, 1), (7, 2 )}, the data point is in two-dimensional space. In order to effectively find the nearest neighbor, the KD-tree adopts the divide-and-conquer idea, which divides the whole space into several small parts. The KD-tree graph generated by six two-dimensional data points is:

For KD-trees with n known points, the complexity is as follows:

    1. Build: O (log2n)
    2. Insert: O (log n)
    3. Delete: O (log n)
    4. Query: O (n1-1/K + M) m --- number of recent points to be searched each time

Construction of a KD-tree

The KD-tree is a binary tree, and each node represents a spatial range. The following table lists the data structures contained in each node in the KD-tree. The range field indicates the Space Range contained by the node. The node-data field is an n-dimensional data point in the dataset. The hyperplane is divided into two subspaces by dividing the data point node-data and perpendicular to the plane of the Axis split. The value of the split field is I. If the I-Dimension Data of a data point in the spatial range is smaller than node-data [I], it belongs to the left sub-space of the node space, otherwise, it belongs to the right sub-space. The left and right fields represent the KD-tree composed of data points with left and right subspaces empty.

Domain Name

Data Type

Description

Node-Data

Data Vector

A data point in a dataset is an n-dimensional vector.

Range

Spatial Vector

The Space Range represented by the node

Split

Integer

Number of the direction axis perpendicular to the split superplane

Left

KD-tree

KD-tree composed of all data points in the left sub-space of the subarea separated by the node

Right

KD-tree

KD-tree composed of all data points in the left sub-space of the subarea separated by the node

Parent

KD-tree

Parent node

The pseudo code for building the KD-tree is:

Algorithm: Build KD-tree

Input: data point set data_set and its space.

Output: Kd, type: KD-tree

1 If data-set is null, return empty KD-tree

2. Call the node generation program

(1) determine the split domain: For all descriptive sub-data (feature vectors), calculate their data variance on each dimension and select the maximum value of the variance, the corresponding dimension is the value of the split field. A large data variance indicates that data points are scattered along the axis. In this direction, the best resolution can be obtained through data segmentation.

(2) determine the node-data domain. The data-set of the data point set is sorted by the split dimension value. The data point located in the center is selected as node-data, data-set '= data-set \ node-Data

3 dataleft = {d belongs to data-set '& D [: Split] <= node-data [: Split]}

Left-range = {range & dataleft}

Dataright = {d belongs to data-set' & D [: Split]> node-data [: Split]}

Right-range = {range & dataright}

4: Left = KD-tree created by (dataleft, leftrange)

Set: The left parent field (parent node) to KD.

: Right = KD-tree created by (dataright, rightrange)

Set the parent field of right to KD.

In the preceding example,

(1) confirm: The data variance of the split domain = x, 6 data points on the X and Y dimensions is 39, 28. 63. the variance in the X axis direction is large, so the split field value is X.

(2) OK: node-Data = (). Sort the data according to the value on the X dimension. The value of the six data items is 7, therefore, the node-data domain is a data point (7, 2 ). In this way, the Super Plane of the node is divided through () and perpendicular to: Split = x axis of the Line X = 7.

(3) The left and right subspaces are divided into two parts by dividing the hyperplane x = 7. X <= 7 is the left sub-space, which contains nodes (2, 3), (5, 4), (4, 7), and the other part is the right sub-space. Contains nodes (9, 6), (8, 1)

This building process is a recursive process. Repeat the preceding process until only one node is included.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.