Reading Notes-hbase in action-Part 3 Application-(2) GIS

Source: Internet
Author: User

This chapter describes how to use hbase to store and efficiently query Geographical location information.

Geohash spatial index

Consider two common problems in the LBS application: 1) find the nearest K locations; 2) Find the locations in a region. If you want to use hbase for efficient search, you must first consider the space locality (spatial locality), that is, the points close to each other must be physically stored together. The simplest geographic location data is composed of two dimensions: longitude x and latitude Y. Therefore, the simplest rowkey can also be composed of X and Y. The order of rowkey determines that data is first sorted by longitude x and then by latitude y. The biggest problem with this method is location a and location B with equal longitude values, there may be a 108,000 difference in latitude.

The solution of geohash is to build a spatial index with the same weights of longitude and latitude. Specific algorithms include: Binary Search is performed continuously in the longitude range [-180,180] and latitude range [-90, 90]. If the value is in the upper half, the mark bit is 1, if it is in the lower half, the mark bit is 0. The final result is composed of the Cross of longitude and latitude signs. (Note: In hbase, you can store the base32 encoding string of the identification bit. Each character is the encoding result of five bits)

 

Observe the following data samples to see that geohash better reflects the spatial locality: data is arranged in an orderly manner according to the distance, and geohash values of similar points have more identical prefixes.

Find the nearest K neighbors

By scanning the geohash prefix, You can efficiently solve the problem 1: find the nearest K locations. Of course, you need to select the appropriate number of digits for prefix matching scanning. Using a small number of digits can reduce the number of scans, but may return excessive data. Using a large number of digits may give priority to the results returned by each scan, resulting in multiple scans.

However, there are also some problems with the geohash value. You cannot use a simple prefix matching scan to find the neighbor. For example, the geohash value with a limited length is represented as a rectangular area on the map. In the center is the dr5ruzb region. The neighbor region below the dr5ruzb region has the same prefix as the 5-digit length, while the three regions above are adjacent, but only the two characters have the same prefix.


Therefore, if you want to find the nearest K neighbor of dr5ruzb, you can always look for the nearest K neighbor of the eight adjacent regions, then sort all the searched points by distance to get the final result. The pseudo code is as follows: taken searches for the last n points in a region


Queryknn uses the taken to find the nearest N points in the eight neighboring regions, and then sorts the values.

 

Search within a region

Example: How many WiFi hotspots are there in a certain square? The solution is divided into two steps:

Step 1: Convert the search in the region to scan a series of geohash indexes.

Step 2: Determine whether the scanned coordinate points are included in the polygon of the area to be searched.

In terms of tools, JTs topology Suite (http://tsusiatsoftware.net/jts/main.html) can be used, JTs implements Common geometric objects, spatial topology data structures and operational algorithms. Follow these steps to use JTs to find the geohash coordinates to be scanned:

  1. Initialize the ry of the polygon object based on the vertices of the area to be searched, and obtain the centroid of the polygon object.
  2. Geohash is used to encode centorid coordinates of the center. The precision is a certain number of digits. If the closure represented by geohash encoding already overwrites the geometry of the polygon object to be searched, the center is directly returned as the coordinate to be scanned. If not, continue step 3.
  3. Similar to the previous section, find the eight adjacent regions around centroid. The closure ranges to the vertices that contain the eight adjacent regions, and determine whether the closure covers the area to be searched again. If the data is overwritten, the nine points are returned as the coordinates to be scanned. If the data cannot be overwritten, the data is returned to step 2. The shorter geohash encoding length is used to expand the location range, until the area to be searched is overwritten.

After obtaining the geohash coordinates to be scanned, use the last K neighbor search algorithms in the previous section to scan a series of nearby coordinate points in the hbase table. Finally, filter out the coordinate points that are not within the range of the area to be searched. Among them, the filtering steps can be completed through the filter, which can leverage the distributed parallel processing capability of hbase to reduce the amount of data transmitted by the client.

Reading Notes-hbase in action-Part 3 Application-(2) GIS

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.