Geohash is an address code that encodes two-dimensional latitude and longitude into a one-dimensional string. For example, Beihai Park's code is WX4G0EC1.
The principle and algorithm of Geohash
The following is an example of (39.92324, 116.3906) to introduce the Geohash coding algorithm.
First, the latitude range (-90, 90) is divided into two intervals (-90, 0), (0, 90), and if the target latitude is in the previous interval, the encoding is 0, otherwise the encoding is 1. Because 39.92324 belongs to (0, 90), the encoding is 1. The (0, 90) is then divided into (0, 45), (45, 90) Two, and 39.92324 is (0, 45), so the encoding is 0. And so on, until the precision meets the requirements, the latitude code is 1011 1000 1100 0111 1001.
Latitude Range |
Division interval 0 |
Division interval 1 |
39.92324 Owning Range |
(-90, 90) |
(-90, 0.0) |
(0.0, 90) |
1 |
(0.0, 90) |
(0.0, 45.0) |
(45.0, 90) |
0 |
(0.0, 45.0) |
(0.0, 22.5) |
(22.5, 45.0) |
1 |
(22.5, 45.0) |
(22.5, 33.75) |
(33.75, 45.0) |
1 |
(33.75, 45.0) |
(33.75, 39.375) |
(39.375, 45.0) |
1 |
(39.375, 45.0) |
(39.375, 42.1875) |
(42.1875, 45.0) |
0 |
(39.375, 42.1875) |
(39.375, 40.7812) |
(40.7812, 42.1875) |
0 |
(39.375, 40.7812) |
(39.375, 40.0781) |
(40.0781, 40.7812) |
0 |
(39.375, 40.0781) |
(39.375, 39.7265) |
(39.7265, 40.0781) |
1 |
(39.7265, 40.0781) |
(39.7265, 39.9023) |
(39.9023, 40.0781) |
1 |
(39.9023, 40.0781) |
(39.9023, 39.9902) |
(39.9902, 40.0781) |
0 |
(39.9023, 39.9902) |
(39.9023, 39.9462) |
(39.9462, 39.9902) |
0 |
(39.9023, 39.9462) |
(39.9023, 39.9243) |
(39.9243, 39.9462) |
0 |
(39.9023, 39.9243) |
(39.9023, 39.9133) |
(39.9133, 39.9243) |
1 |
(39.9133, 39.9243) |
(39.9133, 39.9188) |
(39.9188, 39.9243) |
1 |
(39.9188, 39.9243) |
(39.9188, 39.9215) |
(39.9215, 39.9243) |
1 |
Longitude also uses the same algorithm, which is subdivided into (-180, 180) sequentially, and 116.3906 is encoded as 1101 0010 1100 0100 0100.
Longitude Range |
Division interval 0 |
Division interval 1 |
116.3906 Owning Range |
(-180, 180) |
(-180, 0.0) |
(0.0, 180) |
1 |
(0.0, 180) |
(0.0, 90.0) |
(90.0, 180) |
1 |
(90.0, 180) |
(90.0, 135.0) |
(135.0, 180) |
0 |
(90.0, 135.0) |
(90.0, 112.5) |
(112.5, 135.0) |
1 |
(112.5, 135.0) |
(112.5, 123.75) |
(123.75, 135.0) |
0 |
(112.5, 123.75) |
(112.5, 118.125) |
(118.125, 123.75) |
0 |
(112.5, 118.125) |
(112.5, 115.312) |
(115.312, 118.125) |
1 |
(115.312, 118.125) |
(115.312, 116.718) |
(116.718, 118.125) |
0 |
(115.312, 116.718) |
(115.312, 116.015) |
(116.015, 116.718) |
1 |
(116.015, 116.718) |
(116.015, 116.367) |
(116.367, 116.718) |
1 |
(116.367, 116.718) |
(116.367, 116.542) |
(116.542, 116.718) |
0 |
(116.367, 116.542) |
(116.367, 116.455) |
(116.455, 116.542) |
0 |
(116.367, 116.455) |
(116.367, 116.411) |
(116.411, 116.455) |
0 |
(116.367, 116.411) |
(116.367, 116.389) |
(116.389, 116.411) |
1 |
(116.389, 116.411) |
(116.389, 116.400) |
(116.400, 116.411) |
0 |
(116.389, 116.400) |
(116.389, 116.394) |
(116.394, 116.400) |
0 |
The next step is to merge the longitude and latitude codes, the odd digits are latitude, and even digits are longitude, which is encoded 11100 11101 00100 01111 00000 01101 01011 00001.
Finally, the 32 letters of 0-9, b-z (remove A, I, L, O) are encoded BASE32, and the encoding of (39.92324, 116.3906) is WX4G0EC1.
Decimal |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
Base32 |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
B |
C |
D |
E |
F |
G |
Decimal |
16 |
17 |
18 |
19 |
20 |
21st |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
Base32 |
H |
J |
K |
M |
N |
P |
Q |
R |
S |
T |
U |
V |
W |
X |
Y |
Z |
decoding algorithm and coding algorithm, in contrast to the first base32 decoding, and then the separation of latitude and longitude, and finally according to the binary code to the latitude range of subdivision can be, here no longer repeat. However, since Geohash represents an interval, the longer the encoding, the more accurate it is, but it is not possible to decode the exact same address.
Geohash application: Near Address search
The Geohash's best use is the nearby address search. However, it can be seen from the Geohash encoding algorithm: two points on both sides of the lattice boundary, although very close, the coding will be completely different. In practical application, we can search the 8 squares around the current lattice to solve this problem.
Finally, let's take a look at the two issues raised at the beginning of this article: slow speed, low cache hit rate. Use the Geohash query for a nearby location with a string prefix match:
Copy Code code as follows:
SELECT * from place WHERE geohash like ' wx4g0% ';
The
and prefix matching can take advantage of the index on the Geohash column, so the query speed is not too slow. In addition, even small changes in user coordinates can be encoded into the same geohash, which ensures that the cache hit rate is greatly improved by executing the same SQL statement every time.