The KD-tree is short for K-dimension tree and is a data structure divided by log data points in a K-dimensional space. In fact, the KD-tree is a balanced binary tree.
For example:
Suppose there are six two-dimensional data points = {(2, 3), (5, 4), (9, 6), (4, 7), (8, 1), (7, 2 )}, the data point is in two-dimensional space. In order to effectively find the nearest neighbor, the KD-tree adopts the divide-and-conquer idea, which divides the whole space into several small parts. The KD-tree graph generated by six two-dimensional data points is:
For KD-trees with n known points, the complexity is as follows:
- Build: O (log2n)
- Insert: O (log n)
- Delete: O (log n)
- Query: O (n1-1/K + M) m --- number of recent points to be searched each time
Construction of a KD-tree
The KD-tree is a binary tree, and each node represents a spatial range. The following table lists the data structures contained in each node in the KD-tree. The range field indicates the Space Range contained by the node. The node-data field is an n-dimensional data point in the dataset. The hyperplane is divided into two subspaces by dividing the data point node-data and perpendicular to the plane of the Axis split. The value of the split field is I. If the I-Dimension Data of a data point in the spatial range is smaller than node-data [I], it belongs to the left sub-space of the node space, otherwise, it belongs to the right sub-space. The left and right fields represent the KD-tree composed of data points with left and right subspaces empty.
Domain Name |
Data Type |
Description |
Node-Data |
Data Vector |
A data point in a dataset is an n-dimensional vector. |
Range |
Spatial Vector |
The Space Range represented by the node |
Split |
Integer |
Number of the direction axis perpendicular to the split superplane |
Left |
KD-tree |
KD-tree composed of all data points in the left sub-space of the subarea separated by the node |
Right |
KD-tree |
KD-tree composed of all data points in the left sub-space of the subarea separated by the node |
Parent |
KD-tree |
Parent node |
The pseudo code for building the KD-tree is:
Algorithm: Build KD-tree
Input: data point set data_set and its space.
Output: Kd, type: KD-tree
1 If data-set is null, return empty KD-tree
2 call node generationProgram
(1) determine the split domain: For all descriptive sub-data (feature vectors), calculate their data variance on each dimension and select the maximum value of the variance, the corresponding dimension is the value of the split field. A large data variance indicates that data points are scattered along the axis. In this direction, the best resolution can be obtained through data segmentation.
(2) determine the node-data domain. The data-set of the data point set is sorted by the split dimension value. The data point located in the center is selected as node-data, data-set '= data-set \ node-Data
3 dataleft = {d belongs to data-set '& D [: Split] <= node-data [: Split]}
Left-range = {range & dataleft}
Dataright = {d belongs to data-set' & D [: Split]> node-data [: Split]}
Right-range = {range & dataright}
4: Left = KD-tree created by (dataleft, leftrange)
Set: The left parent field (parent node) to KD.
: Right = KD-tree created by (dataright, rightrange)
Set the parent field of right to KD.
In the preceding example,
(1) confirm: The data variance of the split domain = x, 6 data points on the X and Y dimensions is 39, 28. 63. the variance in the X axis direction is large, so the split field value is X.
(2) OK: node-Data = (). Sort the data according to the value on the X dimension. The value of the six data items is 7, therefore, the node-data domain is a data point (7, 2 ). In this way, the Super Plane of the node is divided through () and perpendicular to: Split = x axis of the Line X = 7.
(3) The left and right subspaces are divided into two parts by dividing the hyperplane x = 7. X <= 7 is the left sub-space, which contains nodes (2, 3), (5, 4), (4, 7), and the other part is the right sub-space. Contains nodes (9, 6), (8, 1)
This building process is a recursive process. Repeat the preceding process until only one node is included.
The KD-tree is short for K-dimension tree and is a data structure divided by log data points in a K-dimensional space. In fact, the KD-tree is a balanced binary tree.
For example:
Suppose there are six two-dimensional data points = {(2, 3), (5, 4), (9, 6), (4, 7), (8, 1), (7, 2 )}, the data point is in two-dimensional space. In order to effectively find the nearest neighbor, the KD-tree adopts the divide-and-conquer idea, which divides the whole space into several small parts. The KD-tree graph generated by six two-dimensional data points is:
For KD-trees with n known points, the complexity is as follows:
- Build: O (log2n)
- Insert: O (log n)
- Delete: O (log n)
- Query: O (n1-1/K + M) m --- number of recent points to be searched each time
Construction of a KD-tree
The KD-tree is a binary tree, and each node represents a spatial range. The following table lists the data structures contained in each node in the KD-tree. The range field indicates the Space Range contained by the node. The node-data field is an n-dimensional data point in the dataset. The hyperplane is divided into two subspaces by dividing the data point node-data and perpendicular to the plane of the Axis split. The value of the split field is I. If the I-Dimension Data of a data point in the spatial range is smaller than node-data [I], it belongs to the left sub-space of the node space, otherwise, it belongs to the right sub-space. The left and right fields represent the KD-tree composed of data points with left and right subspaces empty.
Domain Name |
Data Type |
Description |
Node-Data |
Data Vector |
A data point in a dataset is an n-dimensional vector. |
Range |
Spatial Vector |
The Space Range represented by the node |
Split |
Integer |
Number of the direction axis perpendicular to the split superplane |
Left |
KD-tree |
KD-tree composed of all data points in the left sub-space of the subarea separated by the node |
Right |
KD-tree |
KD-tree composed of all data points in the left sub-space of the subarea separated by the node |
Parent |
KD-tree |
Parent node |
The pseudo code for building the KD-tree is:
Algorithm: Build KD-tree
Input: data point set data_set and its space.
Output: Kd, type: KD-tree
1 If data-set is null, return empty KD-tree
2. Call the node generation program
(1) determine the split domain: For all descriptive sub-data (feature vectors), calculate their data variance on each dimension and select the maximum value of the variance, the corresponding dimension is the value of the split field. A large data variance indicates that data points are scattered along the axis. In this direction, the best resolution can be obtained through data segmentation.
(2) determine the node-data domain. The data-set of the data point set is sorted by the split dimension value. The data point located in the center is selected as node-data, data-set '= data-set \ node-Data
3 dataleft = {d belongs to data-set '& D [: Split] <= node-data [: Split]}
Left-range = {range & dataleft}
Dataright = {d belongs to data-set' & D [: Split]> node-data [: Split]}
Right-range = {range & dataright}
4: Left = KD-tree created by (dataleft, leftrange)
Set: The left parent field (parent node) to KD.
: Right = KD-tree created by (dataright, rightrange)
Set the parent field of right to KD.
In the preceding example,
(1) confirm: The data variance of the split domain = x, 6 data points on the X and Y dimensions is 39, 28. 63. the variance in the X axis direction is large, so the split field value is X.
(2) OK: node-Data = (). Sort the data according to the value on the X dimension. The value of the six data items is 7, therefore, the node-data domain is a data point (7, 2 ). In this way, the Super Plane of the node is divided through () and perpendicular to: Split = x axis of the Line X = 7.
(3) The left and right subspaces are divided into two parts by dividing the hyperplane x = 7. X <= 7 is the left sub-space, which contains nodes (2, 3), (5, 4), (4, 7), and the other part is the right sub-space. Contains nodes (9, 6), (8, 1)
This building process is a recursive process. Repeat the preceding process until only one node is included.