Brief introduction
There are now almost all of the applications in the application of "search by range, closest to me, show distance" and so on location-based interaction, then how is this function implemented? This article provides an implementation method that applies to all Databases.
Realize
To facilitate the following instructions, first give an initial table structure, I am using mysql:
CREATETABLE' Customer ' (' ID 'INT (One) UNSIGNEDNotNULL auto_increment COMMENT' Self-increment primary key ', ' name ' varchar (5) not null COMMENT ' name ', ' lon ' double (9, 6) not null COMMENT ' lat ' double (8 , 6) not null COMMENT latitude ', primary key ( ID ')) comment= ' merchant table ' charset=utf8mb4engine=innodb;
The implementation process is divided into four main steps:
1. Search
Search the database for businesses that are close to the specified range, such as: search within 1 km range.
2. filtering
The results of the search may be over 1 km and need to be filtered Again. If you are not strict with precision, you can skip.
3. sorting
Distance is sorted by near to Far. You can skip it if you don't need it.
4. Paging
If you need 2 or 3 steps, special handling of pagination is Required. If not, you can do a direct SQL paging in step 1th.
The 1th step of the database completes, and the next 3 steps the application Completes.
Step1 Search
Search can be implemented in the following two ways.
Interval Lookup
The Customer table uses two fields to store the longitude and latitude, and if you calculate the latitude and longitude in advance and then index the two fields, the search performance will be good.
So how do you calculate the latitude and longitude range? The known condition is the latitude and longitude of the mobile device, and the radius that satisfies the business requirements, which is much like a plane geometry problem in junior high school: given the center coordinate and radius, the coordinates of the four vertices of the circle's tangent square are obtained. and we're dealing with a sphere that can be calculated using SPATIAL4J.
<dependency> <groupId>com.spatial4j</groupId> <artifactId>spatial4j</artifactId> <version>0.5</version></dependency>
Latitude and longitude of mobile device double lon =116.312528, lat =39.983733;//km int radius =1; Spatialcontext Geo = Spatialcontext.getdistcalc () .makepoint (lon, lat), radius * Distanceutils.km_to_deg, geo, null) ; System.out.println (rectangle.getminx () + .getMaxX ( )) ;//longitude range system.out . println (rectangle.getminy () + "-" + Rectangle.getmaxy ()) ;//latitude range
After calculating the latitude and longitude range, sql is this:
SELECT id, nameFROM customerWHERE (lon BETWEEN ? AND ?) AND (lat BETWEEN ? AND ?);
You need to establish a federated index for the Lon and LAT two fields:
`idx_lon_lat` (`lon`, `lat`)
Geohash
Geohash the principle of not speaking, detailed can read this article, speak very detailed. The Geohash algorithm can encode two-dimensional latitude and longitude into a one-dimensional string, it is characterized by the more similar latitude and longitude code after the similarity, so you can prefix like the way to match the surrounding Merchants.
The Customer table adds a field to store the Geohash encoding for each merchant and to index it.
CREATETABLE' Customer ' (' ID 'INT (One) UNSIGNEDNotNULL auto_increment COMMENT' Self-increment primary key ',' Name 'VARCHAR (5)NotNULL COMMENTNameCOLLATE' Latin1_swedish_ci ',' Lon 'DOUBLE (9,6) not null COMMENT ' lat ' double (8 , 6) not null COMMENT latitude ', ' geo_code ' char ( 12) not NULL COMMENT Geohash encode ', primary key ( ' idx_geo_code ' ( geo_ Code ')) comment= ' merchant table ' charset=utf8mb4engine=innodb;
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
When adding or modifying a merchant, maintenance good geo_code, that geo_code how to calculate? SPATIAL4J also provides a tool class with a GeohashUtils.encodeLatLon(lat, lon) default precision of 12 Bits. Once this storage is done, you can search through the Geo_code. Get the latitude and longitude of the mobile device, calculate the geo_code, then can specify the accuracy calculation, that specifies how long? We need a table of geo_code length and distance:
| geohash length |
width |
height |
| 1 |
5,009.4km |
4,992.6km |
| 2 |
1,252.3km |
624.1k M |
| 3 |
156.5km |
156km |
| 4 |
39.1km |
19.5km |
| 5 |
4.9km |
4.9km |
| 6 |
1. 2km |
609.4m |
| 7 |
152.9m |
152.4m |
| 8 |
38.2m |
19m |
| 9 |
4.8m |
4.8m |
| 1 0 |
1.2m |
59.5cm |
| all |
14.9cm |
14.9cm |
tr>
| |
3.7cm |
1.9cm |
Https://en.wikipedia.org/wiki/Geohash#Cell_Dimensions
Suppose our demand is 1 km in the range of merchants, the length of the Geo_code is set to 5 on it, GeohashUtils.encodeLatLon(lat, lon, 5) . After calculating the geo_code of the latitude and longitude of the mobile device, SQL is this:
SELECT id, nameFROM customerWHERE geo_code LIKE CONCAT(?, ‘%‘);
This is much faster than interval lookups, and because of the similarity of geo_code, you can cache hotspot Areas. however, There is a problem with using geohash, and Geohash finally has a grid on the map, each of which represents a Geohash value, and when the incoming coordinates are close to the boundary of the current grid, the data that is near it is lost by using the search method Above. For example, in the location of the green Point search for the White House courtyard, Green Point and the white Home courtyard in the division when divided into two lattice.
The idea of solving this problem is also relatively simple, when we query, in addition to using the Green Dot Geohash encoding to match, but also use the surrounding 8 grid Geohash encoding, This can avoid this problem. How to calculate the Geohash of the surrounding 8 grids, you can use the Geohash-java to Solve.
<dependency> <groupId>ch.hsr</groupId> <artifactId>geohash</artifactId> <version>1.3.0</version></dependency>
//mobile device latitude and longitude double lon = 116.312528, lat = 39.983733; Geohash Geohash = Geohash.withcharacterprecision (lat, lon, 10) ;//current system.out.tobase32 ()) ;//N, NE, E, SE, S, SW, W, nwgeohash[] adjacent = Geohash.getadjacent () ;for ( Geohash hash:adjacent) {system.out.println (hash< Span class= "hljs-preprocessor" >.tobase32 ()) ;
In the end, our SQL becomes this:
SELECT id, NameFrom customerWHERE Geo_codeLike CONCAT (?,‘%‘)OR Geo_codeLike CONCAT (?,or geo_code like CONCAT (?, '% ') Span class= "hljs-keyword" >or geo_code like CONCAT (?, '% ') or geo_code like CONCAT (?, '% ') or geo_code like CONCAT (?, '% ') or geo_code like CONCAT (?, '% ') or geo_code like CONCAT (?, '% ') or geo_code like CONCAT (?, '% ');
The original 1 queries into 9 queries, performance will certainly fall, here can be optimized under. Also use the above demand scenario, search 1 km range of merchants, from the above table know, Geo_code length of 5 o'clock, grid width is 4.9KM, with 9 Geo_code query, the range is too large, so you can set the Geo_code length to 6, that is, to narrow the scope of the query, Also meet the NEEDS. You can also continue to optimize, when storing geo_code, only 6 bits are computed, so that SQL can be turned into this:
SELECT id, nameFROM customerWHERE geo_code IN (?, ?, ?, ?, ?, ?, ?, ?, ?);
This will change the prefix match to a direct match, and the speed will increase a lot.
Step2 Filtration
The above two search methods, are not accurate search, just try to narrow the search scope, improve response Speed. So you need to filter the application to filter out merchants that are more than 1 kilometers away. Calculate distances also use SPATIAL4J.
// 移动设备经纬度double lon1 = 116.3125333347639, lat1 = 39.98355521792821;// 商户经纬度double lon2 = 116.312528, lat2 = 39.983733;SpatialContext geo = SpatialContext.GEO;double distance = geo.calcDistance(geo.makePoint(lon1, lat1), geo.makePoint(lon2, lat2)) * DistanceUtils.DEG_TO_KM;System.out.println(distance);// KM
The filter code will not be written, go through the search results again.
Step3 sort
similarly, sorting needs to be handled in the Application. Sorting is done based on the filter results above Collections.sort(list, comparator) .
Step4 sub-page
If you need 2, 3 steps, can only be paged in memory, the practice is very simple, you can refer to this Article.
Summarize
The focus of the full text is to search how to achieve better use of the database index, the two search methods of millions of data for the split line, the first applies to millions of below, the second applies to more than million 未经过严格验证 . Some people may have doubts, filtering and sorting are done at the application level, memory consumption will be very serious? This is a potential problem, but in most cases it will Not. See most of our application scenarios, are a single kind of poi (point of Interest) search, such as hotels, food, KTV, Cinemas and so on, this data density is very small, 1 km within the hotel, how many homes, 50 are counted, so ultimately to see the specific business data Density. This article does not have the analysis principle, only has said the concrete realization, the analysis article may see the reference Link.
Reference
Http://www.infoq.com/cn/articles/depth-study-of-Symfony2
Http://tech.meituan.com/lucene-distance.html
http://blog.csdn.net/liminlu0314/article/details/8553926
Http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates
Http://www.cnblogs.com/LBSer/p/3310455.html
http://cevin.net/geohash/
This article from: Gao | coder, the original address: http://blog.csdn.net/ghsau/article/details/50591932, reprint please specify.
[go] How to achieve Sort by distance, range lookup