The current job is to analyze some user data. each user has several records, each of which has a user location, expressed by longitude and latitude. there is also a given database that stores some known locations and their longitude and latitude, with more than rows of data... the current job is to analyze some user data. each user has several records, each of which has a user location, expressed by longitude and latitude.
There is also a given database that stores more than pieces of data from known locations and their longitude and latitude. now we need to match the distance between the user's latitude and longitude with the known locations, if the distance between them is less than a certain amount of data, for example, 500, the user is considered to be in this location.
MYSQL supports spatial indexes,. in version x, the support for Distance () and Related () is canceled, and the spatial Distance function cannot be used to directly query vertices within a certain range. therefore, the first thing I think of is to traverse each record and calculate the distance from each vertex in the database. when the distance is less than 500, this can indeed produce results, but the efficiency is extremely low, because every record needs to be matched with 40 million pieces of data cyclically, and the time consumed can be imagined. after record, it is found that the processing time of each record is 1700 ms. for the amount of data stored in hundreds of millions of records per day, how is this processing speed so human-like.
I also had the idea that I could find an approximate range around the latitude and longitude of each record point, for example, four points of a square, and then use mysql space computing, use MBR to obtain the known records in the rectangle and then perform matching. Unfortunately, I didn't come up with a method to calculate the longitude and latitude of four points.
Unexpectedly, I found a preliminary research on the location near the computing. I used python to implement this idea.
Therefore, after referring to the algorithm in the original article and implementing it using PHP, the implementation principle is very similar. First, calculate the four points of the rectangle around the vertex, then, use the longitude and latitude to directly match the records in the database.
The red part is the required search range, and the green part is the result range that we can indirectly obtain.
The red part is the required search range, and the green part is the result range that we can indirectly obtain.
Refer to some spherical computation formulas on Wikipedia:
Great-circle distance
Haversine formula
Assume that the longitude and latitude of known points are $ lng, $ lat
First, implement the query of the longitude range. in the haversin formula, set the number of records equal to? 1 =? 2. use PHP for calculation. the code is as follows:
// $ Lat latitude of a known point
$ Dlng = 2 * asin (sin ($ distance/(2 * EARTH_RADIUS)/cos (deg 2rad ($ lat )));
$ Dlng = rad2deg ($ dlng); // converts radians.
Then there is a query of the latitude range. in the haversin formula, make △λ = 0. you can get it. in PHP, the code is as follows:
$ Dlat = $ distance/EARTH_RADIUS; // EARTH_RADIUS Earth radius
$ Dlat = rad2deg ($ dlat); // converts radians.
Finally, we can get the coordinates of the four points:
left-top : (lat + dlat, lng – dlng) right-top : (lat + dlat, lng + dlng) left-bottom : (lat – dlat, lng – dlng) right-bottom: (lat – dlat, lng + dlng)
I have written the above method into a function. The combination is:
Define (EARTH_RADIUS, 6371); // Earth radius, the average radius is 6371 km/***, and the four points of the square with a certain distance around a certain latitude and longitude are calculated ** @ param lng float longitude * @ param lat float latitude * @ param distance float radius of the circle where the point is located, this circle is tangent to this square. the default value is 0.5 km * @ return array coordinate of the longitude and latitude of the four points of the square */function returnSquarePoint ($ lng, $ lat, $ distance = 0.5) {$ dlng = 2 * asin (sin ($ distance/(2 * EARTH_RADIUS)/cos (deg 2rad ($ lat); $ dlng = rad2deg ($ dlng ); $ dlat = $ distance/EARTH_RADIUS; $ dlat = rad 2deg ($ dlat); return array ('left-top' => array ('lat' => $ lat + $ dlat, 'lng '=> $ lng-$ dlng), 'right-top' => array ('lat' => $ lat + $ dlat, 'lng '=> $ lng + $ dlng), 'left-bottom' => array ('lat' => $ lat-$ dlat, 'lng '=> $ lng-$ dlng), 'right-bottom' => array ('lat' => $ lat-$ dlat, 'lng '=> $ lng + $ dlng);} // use this function to calculate the result and bring it to the SQL query. $ Squares = returnSquarePoint ($ lng, $ lat); $ info_ SQL = "select id, locateinfo, lat, lng from 'lbs _ info' where lat <> 0 and lat >{$ squares ['right-bottom '] ['lat']} and lat <{$ squares ['left -top '] ['lat']} and lng> {$ squares ['left-top'] ['lng']} and lng <{$ squares ['right-bottom '] ['lng']} ";
After creating a joint index on lat and lng, the query consumes an average of 0.8 milliseconds for each record. compared with the previous 1700 ms, the query time is really different, the efficiency is really 2125 times that of the past ~~
Conclusion: This should not be the best way to improve efficiency, but it is indeed more efficient than before. Remember, there is always a better way.
Article URL:
Reprint ^ at will, but please attach the tutorial address.