Addressing near the Geohash algorithm

Source: Internet
Author: User

Original source: Zhanlijun

Introduction

Machine is a good move and studious children, on weekdays like to take a mobile phone map point Click to query some fun things. One day the machine to the Beihai Park to play, belly hungry, and then opened the mobile phone map, search for restaurants near Beihai parks, and selected one of the meals.

After the meal is full, the machine starts to rethink, how does the map backstage query the nearby restaurant according to its location? After a long thought, the machine came up with a method: calculate the distance between the location p and all the restaurants in Beijing, and then return to the restaurant <=1000 meters away. Small proud for a while, the machine found Beijing restaurant How many ah, so calculate, so think, since know latitude and longitude, then it should know oneself in Xicheng District, that should calculate location p and Xicheng District all Restaurants distance Ah, machine use recursive thought, think of Xicheng District also many restaurants Ah, The distance between the location p and all restaurants in the street should be calculated so that the calculation is small and the efficiency is increased.

The calculation of the machine is very simple, that is, through the filtering method to reduce the number of restaurants participating in the calculation, from a certain point of view, the machine in the use of indexing technology.

As soon as the index is mentioned, the B-tree index is emerging in the brain, because a large number of databases (such as MySQL, Oracle, PostgreSQL, and so on) are using B-trees. The B-tree index essentially sorts the indexed fields, and then finds them quickly by means of a binary-like lookup, which requires that the indexed fields be sortable, in general, one-dimensional fields, such as time, age, salary, and so on. But for a point in space (two dimensions, including longitude and latitude), how do you sort it? And how do you index it? There are a number of ways to solve this problem.

thought: If you could somehow convert two-dimensional point data into one-dimensional data, then you wouldn't be able to continue using the B-Tree index. So this method really exists, the answer is yes. At present the very fire Geohash algorithm is to use the above thought, below we will begin the Geohash journey.

First, perceptual Geohash

First of all, a bit of perceptual knowledge,http://openlocation.org/geohash/geohash-js/ provides the ability to display Geohash encoding on the map.

1) Geohash The two-dimensional latitude and longitude into a string, such as showing the Geohash string in Beijing's 9 regions, WX4ER,WX4G2, WX4G3, and so on, each of which represents a rectangular region. That is, all the points in the rectangular region (latitude and longitude coordinates) share the same Geohash string, which can protect privacy (only the approximate area location rather than the specific point), and is easier to cache, such as the upper left corner of the area where users continue to send location Information request restaurant data , because these users of the Geohash string are wx4er, so you can use the wx4er as a key, the restaurant information in the region as value to cache, and if you do not using Geohash, because the area of the user's latitude and longitude is different, Caching is hard to do.

2) The longer the string, the more accurate the range represented. , a 5-bit encoding can represent a rectangular area of 10 square kilometers, while a 6-bit encoding can represent a finer area (about 0.34 km²)

3) Similar representations of strings are close (described in special cases) so that the prefix matching of strings can be used to query nearby POI information. As shown in the following two figures, one is similar between the Geohash strings in urban areas, one in the suburbs, and in the suburbs, while the suburban strings are similar, while the Geohash strings in urban and suburban areas are much less similar.

City Suburbs

Through the above introduction we know that Geohash is a way to convert latitude and longitude into a string, and in most cases, the more the string prefix matches the closer the distance, back to our case, according to the location of the query to find nearby restaurants, You only need to convert the latitude and longitude of the location into a geohash string, and match the Geohash string of the individual restaurants to match the more distance.

Ii. steps of the Geohash algorithm

Below take Beihai Park as an example to introduce the calculation steps of Geohash algorithm

2.1. Calculate Geohash binary code based on latitude and longitude

Latitude of the earth is [ -90,90], North Sea Park latitude is 39.928167, can be approximated by the following algorithm to the latitude 39.928167 code:

1) the interval [ -90,90] is divided into [ -90,0], [0,90], known as the left and right intervals, can be determined that 39.928167 belongs to an interval [0,90], to mark 1;

2) then the interval [0,90] is divided into [0,45], [45,90], can be determined that 39.928167 belongs to the left interval [0,45], to mark 0;

3) recursion The above process 39.928167 always belongs to a certain interval [a, b]. As each iteration interval [a, b] is always shrinking and approaching 39.928167 more and more;

4) If the given latitude X (39.928167) belongs to the left interval, then record 0, if the right interval is recorded 1, so as the algorithm will produce a sequence 1011100, the length of the sequence is related to the number of intervals given.

According to the Latitude calculation code

Bit Min Mid Max
1 -90.000 0.000 90.000
0 0.000 45.000 90.000
1 0.000 22.500 45.000
1 22.500 33.750 45.000
1 33.7500 39.375 45.000
0 39.375 42.188 45.000
0 39.375 40.7815 42.188
0 39.375 40.07825 40.7815
1 39.375 39.726625 40.07825
1 39.726625 39.9024375 40.07825

Similarly, the longitude range of the earth is [-180,180], and longitude 116.389550 can be encoded.

According to the Longitude calculation code

Bit Min Mid Max
1 -180 0.000 180
1 0.000 90 180
0 90 135 180
1 90 112.5 135
0 112.5 123.75 135
0 112.5 118.125 123.75
1 112.5 115.3125 118.125
0 115.3125 116.71875 118.125
1 115.3125 116.015625 116.71875
1 116.015625 116.3671875 116.71875

2.2. Group code

With the above calculation, latitude produces an encoding of 10111 00011, and longitude produces an encoding of 11010 01011. The even digits are placed in longitude, the odd digits are placed in latitude, and 2 strings of coded combinations are generated in a new string: 11100 11101 00100 01111.

Finally use 0-9, b-z (remove A, I, L, O) These 32 letters BASE32 encoding, first 11100 11101 00100 01111 into decimal, corresponding to 28, 29, 4, 15, the decimal corresponding to the encoding is wx4g. Similarly, the decoding algorithm that transforms the code into latitude and longitude is the opposite, the specific no longer repeat.

Three, Geohash Base32 encoding length and precision

The following table is excerpted from Wikipedia:http://en.wikipedia.org/wiki/Geohash

As can be seen, when the Geohash base32 encoding length of 8 o'clock, the accuracy of about 19 meters, and when the encoding length of 9 o'clock, the accuracy of about 2 meters, encoding length needs to be based on data conditions to choose.

Three, Geohash algorithm

Geohash's calculation steps are described above, simply stating what it is and not explaining why? Why do you encode longitude and dimension separately? Why do we need to combine the two strings of latitude and longitude into a string of codes? This section attempts to answer this question.

, we fill in the results of the binary encoding into space, when dividing the space into four blocks, the encoding sequence is the lower left corner 00, the upper left corner 01, the right bottom 10, the upper right corner 11, which is similar to the z curve, when we recursively decompose the blocks into smaller sub-blocks, the order of the encodings is self-similar (fractal), Each child is also fast forming a Z-curve, and this type of curve is called the Peano space fill curve.

The advantage of this type of space filling curve is to convert two-dimensional space into a one-dimensional curve (in fact, fractal dimension), for the most part, coding similar distances, but the biggest drawback of Peano space filling curve is the mutation, some code adjacent but the distance is very far, such as 0111 and 1000, The encoding is adjacent, but the distance varies greatly.

In addition to the Peano space filling curve, there are many spatial filling curves, in which the effect is generally well known as the Hilbert space filling curve, compared to the Peano curve, the Hilbert curve has no large mutation. Why Geohash not choose Hilbert space fill curve? It may be Peano curve thinking and calculation is relatively simple, in fact, the Peano curve is a quadtree linear coding method.

Iv. use of attention points

1) Since Geohash is dividing the area into a rule rectangle and encoding each rectangle, this causes the following problems when querying the POI information, such as the red dot is our location, the green two points are the nearby two restaurants, But when we look at the query, we find that the Geohash code of the restaurant is the same as ours (because it is on the same geohash area block), and the Geohash code of the nearest restaurant is inconsistent with us. This problem often arises at the border.

The solution is very simple, we query, in addition to using the anchor point of the Geohash encoding to match, but also use the surrounding 8 regions of the Geohash encoding, this can avoid this problem.

2) We already know that the existing Geohash algorithm uses the Peano space fill curve, this curve will produce mutations, resulting in a similar coding, but the distance may be very different problems, so in the query near the restaurant, first filter Geohash encoded similar POI points, Then the actual distance is calculated.

Reference documents:

Http://en.wikipedia.org/wiki/Geohash

http://openlocation.org/geohash/geohash-js/

Addressing near the Geohash algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.