[go] How to achieve Sort by distance, range lookup

Last Update:2016-12-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Brief introduction

There are now almost all of the applications in the application of "search by range, closest to me, show distance" and so on location-based interaction, then how is this function implemented? This article provides an implementation method that applies to all Databases.

Realize

To facilitate the following instructions, first give an initial table structure, I am using mysql:

CREATETABLE' Customer ' (' ID 'INT (One) UNSIGNEDNotNULL auto_increment COMMENT' Self-increment primary key ', ' name ' varchar (5) not null COMMENT  ' name ',  ' lon ' double (9, 6) not null COMMENT  ' lat ' double (8 , 6) not null COMMENT  latitude ', primary key ( ID ')) comment= ' merchant table ' charset=utf8mb4engine=innodb;

The implementation process is divided into four main steps:
1. Search
Search the database for businesses that are close to the specified range, such as: search within 1 km range.
2. filtering
The results of the search may be over 1 km and need to be filtered Again. If you are not strict with precision, you can skip.
3. sorting
Distance is sorted by near to Far. You can skip it if you don't need it.
4. Paging
If you need 2 or 3 steps, special handling of pagination is Required. If not, you can do a direct SQL paging in step 1th.

The 1th step of the database completes, and the next 3 steps the application Completes.

Step1 Search

Search can be implemented in the following two ways.

Interval Lookup

The Customer table uses two fields to store the longitude and latitude, and if you calculate the latitude and longitude in advance and then index the two fields, the search performance will be good.
So how do you calculate the latitude and longitude range? The known condition is the latitude and longitude of the mobile device, and the radius that satisfies the business requirements, which is much like a plane geometry problem in junior high school: given the center coordinate and radius, the coordinates of the four vertices of the circle's tangent square are obtained. and we're dealing with a sphere that can be calculated using SPATIAL4J.

<dependency>    <groupId>com.spatial4j</groupId> <artifactId>spatial4j</artifactId> <version>0.5</version></dependency>

Latitude and longitude of mobile device double lon =116.312528, lat =39.983733;//km int radius =1; Spatialcontext Geo = Spatialcontext.getdistcalc () .makepoint (lon, lat), radius * Distanceutils.km_to_deg, geo, null) ; System.out.println (rectangle.getminx () + .getMaxX ( )) ;//longitude range system.out . println (rectangle.getminy () +  "-" + Rectangle.getmaxy ()) ;//latitude range

After calculating the latitude and longitude range, sql is this:

SELECT id, nameFROM customerWHERE (lon BETWEEN ? AND ?) AND (lat BETWEEN ? AND ?);

You need to establish a federated index for the Lon and LAT two fields:

`idx_lon_lat` (`lon`, `lat`)

Geohash

Geohash the principle of not speaking, detailed can read this article, speak very detailed. The Geohash algorithm can encode two-dimensional latitude and longitude into a one-dimensional string, it is characterized by the more similar latitude and longitude code after the similarity, so you can prefix like the way to match the surrounding Merchants.
The Customer table adds a field to store the Geohash encoding for each merchant and to index it.

CREATETABLE' Customer ' (' ID 'INT (One) UNSIGNEDNotNULL auto_increment COMMENT' Self-increment primary key ',' Name 'VARCHAR (5)NotNULL COMMENTNameCOLLATE' Latin1_swedish_ci ',' Lon 'DOUBLE (9,6) not null COMMENT  ' lat ' double (8 , 6) not null COMMENT  latitude ',  ' geo_code ' char ( 12) not NULL COMMENT  Geohash encode ', primary key (  ' idx_geo_code ' ( geo_ Code ')) comment= ' merchant table ' charset=utf8mb4engine=innodb;

When adding or modifying a merchant, maintenance good geo_code, that geo_code how to calculate? SPATIAL4J also provides a tool class with a GeohashUtils.encodeLatLon(lat, lon) default precision of 12 Bits. Once this storage is done, you can search through the Geo_code. Get the latitude and longitude of the mobile device, calculate the geo_code, then can specify the accuracy calculation, that specifies how long? We need a table of geo_code length and distance:

tr>

geohash length	width	height
1	5,009.4km	4,992.6km
2	1,252.3km	624.1k M
3	156.5km	156km
4	39.1km	19.5km
5	4.9km	4.9km
6	1. 2km	609.4m
7	152.9m	152.4m
8	38.2m	19m
9	4.8m	4.8m
1 0	1.2m	59.5cm
all	14.9cm	14.9cm
	3.7cm	1.9cm

Https://en.wikipedia.org/wiki/Geohash#Cell_Dimensions

Suppose our demand is 1 km in the range of merchants, the length of the Geo_code is set to 5 on it, GeohashUtils.encodeLatLon(lat, lon, 5) . After calculating the geo_code of the latitude and longitude of the mobile device, SQL is this:

SELECT id, nameFROM customerWHERE geo_code LIKE CONCAT(?, ‘%‘);

This is much faster than interval lookups, and because of the similarity of geo_code, you can cache hotspot Areas. however, There is a problem with using geohash, and Geohash finally has a grid on the map, each of which represents a Geohash value, and when the incoming coordinates are close to the boundary of the current grid, the data that is near it is lost by using the search method Above. For example, in the location of the green Point search for the White House courtyard, Green Point and the white Home courtyard in the division when divided into two lattice.

The idea of solving this problem is also relatively simple, when we query, in addition to using the Green Dot Geohash encoding to match, but also use the surrounding 8 grid Geohash encoding, This can avoid this problem. How to calculate the Geohash of the surrounding 8 grids, you can use the Geohash-java to Solve.

<dependency>    <groupId>ch.hsr</groupId> <artifactId>geohash</artifactId> <version>1.3.0</version></dependency>

//mobile device latitude and longitude double lon = 116.312528, lat = 39.983733; Geohash Geohash = Geohash.withcharacterprecision (lat, lon, 10) ;//current system.out.tobase32 ()) ;//N, NE, E, SE, S, SW, W, nwgeohash[] adjacent = Geohash.getadjacent () ;for ( Geohash hash:adjacent) {system.out.println (hash< Span class= "hljs-preprocessor" >.tobase32 ()) ;

In the end, our SQL becomes this:

SELECT id, NameFrom customerWHERE Geo_codeLike CONCAT (?,‘%‘)OR Geo_codeLike CONCAT (?,or geo_code like CONCAT (?,  '% ') Span class= "hljs-keyword" >or geo_code like CONCAT (?,  '% ') or geo_code like CONCAT (?,  '% ') or geo_code like CONCAT (?,  '% ') or geo_code like CONCAT (?,  '% ') or geo_code like CONCAT (?,  '% ') or geo_code like CONCAT (?,  '% ');

The original 1 queries into 9 queries, performance will certainly fall, here can be optimized under. Also use the above demand scenario, search 1 km range of merchants, from the above table know, Geo_code length of 5 o'clock, grid width is 4.9KM, with 9 Geo_code query, the range is too large, so you can set the Geo_code length to 6, that is, to narrow the scope of the query, Also meet the NEEDS. You can also continue to optimize, when storing geo_code, only 6 bits are computed, so that SQL can be turned into this:

SELECT id, nameFROM customerWHERE geo_code IN (?, ?, ?, ?, ?, ?, ?, ?, ?);

This will change the prefix match to a direct match, and the speed will increase a lot.

Step2 Filtration

The above two search methods, are not accurate search, just try to narrow the search scope, improve response Speed. So you need to filter the application to filter out merchants that are more than 1 kilometers away. Calculate distances also use SPATIAL4J.

// 移动设备经纬度double lon1 = 116.3125333347639, lat1 = 39.98355521792821;// 商户经纬度double lon2 = 116.312528, lat2 = 39.983733;SpatialContext geo = SpatialContext.GEO;double distance = geo.calcDistance(geo.makePoint(lon1, lat1), geo.makePoint(lon2, lat2)) * DistanceUtils.DEG_TO_KM;System.out.println(distance);// KM

The filter code will not be written, go through the search results again.

Step3 sort

similarly, sorting needs to be handled in the Application. Sorting is done based on the filter results above Collections.sort(list, comparator) .

Step4 sub-page

If you need 2, 3 steps, can only be paged in memory, the practice is very simple, you can refer to this Article.

Summarize

The focus of the full text is to search how to achieve better use of the database index, the two search methods of millions of data for the split line, the first applies to millions of below, the second applies to more than million 未经过严格验证 . Some people may have doubts, filtering and sorting are done at the application level, memory consumption will be very serious? This is a potential problem, but in most cases it will Not. See most of our application scenarios, are a single kind of poi (point of Interest) search, such as hotels, food, KTV, Cinemas and so on, this data density is very small, 1 km within the hotel, how many homes, 50 are counted, so ultimately to see the specific business data Density. This article does not have the analysis principle, only has said the concrete realization, the analysis article may see the reference Link.

Reference

Http://www.infoq.com/cn/articles/depth-study-of-Symfony2
Http://tech.meituan.com/lucene-distance.html
http://blog.csdn.net/liminlu0314/article/details/8553926
Http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates
Http://www.cnblogs.com/LBSer/p/3310455.html
http://cevin.net/geohash/

This article from: Gao | coder, the original address: http://blog.csdn.net/ghsau/article/details/50591932, reprint please specify.

[go] How to achieve Sort by distance, range lookup

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[go] How to achieve Sort by distance, range lookup

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support