R-tree a dynamic index structure of spatial search

Source: Internet
Author: User

Forest: R-tree A dynamic index structure of spatial search
Antonm Guttman

Summary
In order to effectively process spatial data, as required in computer-aided design and geographic data applications,
A database requires an indexing mechanism that can quickly obtain data items based on their spatial location.
However, the traditional index method does not fit well with the non-0 size data objects in the multidimensional space.
In this paper, we describe a dynamic index structure called R-tree to satisfy this requirement, and the corresponding search and update algorithms are given.
We carried out a series of tests, the results show that the performance of this structure is very good, the conclusion is that the current database system in the space application is very helpful.

1. Introduction

Spatial data objects often cover several areas in multidimensional space and cannot be represented well by the position of a point.
For example, a map object such as a country or census area occupies a non-0 spatial area in a two-dimensional space.
The general operation in spatial data is to search for all objects within a region, for example, to find all the villages within 20 miles of a given point.
Such spatial searches are often encountered in computer-aided design (CAD) and geographic data applications, so it is important to be able to retrieve objects efficiently based on where they are located.

It is desirable to get an object-space-based index, but the traditional one-dimensional database index structure does not apply to multidimensional spatial search.
A numeric-based exact matching data structure, such as a hash table, is not useful because a range search is required.
Data structures that use key values for one-dimensional sorting, such as the B-tree and ISAM indexes, also have no effect because the search space is multidimensional.

There are a number of data structures that are presented to handle multidimensional point data, and the summary of these methods can be found in [5].
The cell method [4,8,16] is not very suitable for dynamic data structures, because the bounds of the cell must be determined in advance.
QUAD-TREES[7] and k-d trees do not consider the classification of secondary memory.
K-d-b Trees is dedicated to paging storage, but only for point data.
In [15] The use of the index range is proposed, but this method does not apply to multidimensional space.
Corner stitching [12] is a data structure suitable for searching non 0 size data objects in a two-dimensional space, but it assumes that the primary storage is homogeneous, and that random random searches are not efficient in large scale datasets.
Grid files[10] By mapping each object to a point in a higher-dimensional space.
In this article, we describe a data structure called R-tree that represents a data object with intervals in several dimensions.

The second part outlines the structure of the R-tree, and the third part gives the algorithm for searching, inserting, deleting and updating operations.
The experimental results of R-tree index performance are described in part fourth.
Part V contains a summary of our conclusions.

2. R-TREE Index Structure

R-tree is a highly balanced tree similar to b-tree[2,6] that contains pointers to data objects in the index records of its leaf nodes.
If the indexes are disk-resident, the nodes should be on the disk's page, which is designed to only access a small subset of nodes when searching for space.
This index is completely dynamic, and the insert and delete operations can be mixed with the search, without the need to periodically reorganize and maintain the tree structure.


A spatial database contains a collection of tuples used to represent spatial objects, and each tuple has a unique identifier for retrieving tuples.
The leaf nodes in R-tree contain index record entries in the form of:
(I, Tuple-identifier)
Where Tuple-identifier refers to a tuple in the database, and I is an n-dimensional rectangle that serves as the border of the indexed spatial object:
I= (i0,i1,..., In-1)
Here n is the dimension, and II is a closed interval of [a, b] that describes the scope of the object on the first dimension.
II There may be one or two equals infinity, which indicates that the object is infinitely extended outward.
Non-leaf nodes contain entries in the form of:
(I,child-poiner)
Where Child-pointer is the address of the next-level node in R-tree, I contains all the rectangles in the entry for the next layer of nodes.

So that M is the largest number of entries in a node, so that M<=M/2 is the parameter that specifies the minimum number of entries in a node.
R-tree meets the following features:
(1) If it is not a root node, each leaf node contains m to M index records
(2) For each index record in the leaf node (i,tuple-identifier), I is the smallest rectangle in the space represented by the tuple that contains n-dimensional data Objects
(3) If it is not a root node, each non-leaf node contains m to m child nodes
(4) For each entry in a non-leaf node (i,child-poiner), I is the smallest of the rectangles in the space containing the child nodes.
(5) If the root node is not a leaf node, it contains at least 22 child nodes
(6) All the leaves are on the same floor.

Figure 2 1a and Figure 2 1b show the structure of the r-tree and depict the extent and overlapping relationships of the rectangles in the tree

Because the branching factor for each node is at least m, the height of a r-tree containing n index records is at most |logmn|-1.
The node is up to ceiling (n/m) +ceiling (n/m^2) +1.
In the worst case, the space utilization of all nodes after the root node is removed is m/m.
The number of nodes tends to be more than m entries, which reduces the height of the tree and improves space utilization.
If the node's entries are extra 3 or 4, the tree becomes very wide, and almost all of the space is used to store the leaf nodes that contain the index records.
M can be changed as an adjustable parameter, in the fourth part of the experiment, testing the different values of M.

3. Search and update

3.1 Search
The search algorithm continues down from the root node in a similar b-tree manner.
However, there is more than one subtree under a visited node that needs to be searched, so there is no guarantee of good worst-case performance.
However, with many kinds of data, the update algorithm can maintain the tree structure in an efficient way, which allows the search algorithm to eliminate the unrelated areas in the indexed space and only check the data adjacent to the search area.

R-tree a dynamic index structure of spatial search

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.