If you need to retrieve data with latitude and longitude, such as finding a hotel 1000 meters near the current location, an easy way is to get all the hotel data in the database, calculate the distance by latitude, and return the data that is less than 1000 meters away.
This is useful when the data is small, but the efficiency of the retrieval is low when the volume of data is large, and this article describes the use of SOLR's spatial query for spatial search.
The principle of space search
Space search, also known as spatial query, based on space search technology, can do:
1 indexing point (latitude and longitude) and other geometries
2) sorted by distance
3 Filter search results based on rectangles, circles, or other geometric shapes
In Solr, space search is based primarily on Geohash and Cartesian Tiers 2 concepts:
Geohash algorithm
Through the Geohash algorithm, the two-dimensional coordinates of longitude and latitude can be transformed into a sort and comparable string encoding.
Each character in the encoding represents an area, and the preceding character is the parent area of the back character. The process of its algorithm is as follows:
Calculation of Geohash binary code based on latitude and longitude
The latitude interval of the earth is [ -90,90], if a latitude is 39.92324, the following algorithm can be used to approximate the 39.92324 code:
1) the interval [ -90,90] is divided into [ -90,0], [0,90], known as the left and right interval, can be determined 39.92324 belongs to the right-hand interval [0,90], to mark 1;
2) then the interval [0,90] is divided into [0,45], [45,90], can be determined 39.92324 belong to the left interval [0,45), to mark 0;
3 recursion above process 39.92324 always belongs to an interval [a,b]. With each iteration interval [a,b] is always shrinking, and is approaching 39.928167 more and more;
4 if the given latitude (39.92324) belongs to the left interval, the record 0, if the right interval is recorded 1, so as the algorithm will produce a sequence 1011 1000 1100 0111 1001, the length of the sequence with the given interval of the number of partitions.
Similarly, the longitude interval of the earth is [-180,180], and the process of encoding longitude 116.3906 is similar:
Group code
By this calculation, the latitude generated by the code is 1011 1000 1100 0111 1001, the longitude generated by the code is 1101 0010 1100 0100 0100. The number of digits in the longitude, the odd number of latitude, the 2-string coding combination of the generation of a new string: 11100 11101 00100 01111 00000 01101 01011 00001.
Finally, use 0-9, b-z (remove A, I, L, O) the 32 letters for BASE32 encoding, first 11100 11101 00100 01111 00000 01101 01011 to the decimal 28,29,4,15,0,13,11,1, ten The corresponding encoding of the system is WX4G0EC1. By the same token, the decoding algorithm which converts the code into the latitude and longitude is not the same as the other.
From the above, the longer the string, the more accurate the range of representations. When the Geohash base32 encoding length is 8 o'clock, the precision is about 19 meters, and when the code length is 9 o'clock, the accuracy is about 2 meters, the encoding length needs to choose according to the data condition. However, it can be seen from the Geohash encoding algorithm, which is located at two points on both sides of the boundary, although very close, but the coding will be completely different. In practice, this problem can be solved by searching for points in the other eight regions of the area of the point.
Cartesian Tiers Cartesian layer
The Cartesian layered model is the idea of converting latitude and longitude into a larger granularity of layered grids, which creates many geographical layers, each of which refines the granularity of segmentation on the basis of the previous layer, each of which is assigned an ID, representing a geographic location.
Each layer increases by 2 squared, so the first layer is 4 grids, the second layer is 16, so the latitude and longitude of the entire map will be reflected in the grid of each layer:
So how to build such an index structure, in fact very simple, only need to correspond to the Cartesian layer of the layer to build the domain, a domain or coordinate corresponding to multiple tiers levels. It is also the Tiers0->field_0,tiers1->field_1,tiers2->field_2,......, tiers19->field_19. (General 20 layers can be). Each field corresponding to the Cartesian hierarchy calculates the grid belonging to the current layer through the Cartesian algorithm based on the longitude and latitude of the current record, and then term the Gridid (grid only) to the index. So each record about the Cartesian 0-19 field will have a gridid corresponding. But the query usually needs to check the surrounding address, then the scope of the perimeter is more than a grid, then the actual operation process is based on the latitude and longitude and a distance to determine the need to involve the query from 19-0 (from the high to low) several layers corresponding to several grids of data. The query for a latitude and longitude location requires only the data in the following chart circle:
From the above, the search steps based on Cartesian tier are:
1, according to the Cartesian tier layer to obtain the location of the coordinate point Gridid
2, and the system index Gridid matching calculation
3. Calculating the distance between the result set and the target coordinate point returns a set of results within a specific range
The Cartesian layer can effectively shrink and reduce the filtering range and quickly locate the coordinate points.
Space search combat based on SOLR
SOLR has provided 3 kinds of filedtype for space search:
1) latlontype (for plane coordinates, not geodetic coordinates)
2) Spatialrecursiveprefixtreefieldtype (abbreviated as RPT)
3) Bboxfield (for Boundary index query)
This article focuses on the use of Spatialrecursiveprefixtreefieldtype, not only can be used in points, but also for polygon queries.
1. Configure SOLR
First look at the data:
SOLR's schema.xml configuration:
<field name= "station_id" type= "Long" indexed= "true" stored= "true" required= "true" multivalued= "false"/>
<field name= "station_address" type= "Text_general" "true" indexed= "true" stored=
/> <field "Name=" Position "type=" location_rpt "indexed= true" stored= "true"/>
<uniqueKey>station_id</uniqueKey>
The emphasis here is station_position, whose type is location_rpt, which is defined in SOLR as follows:
<!--A specialized field for geospatial search. If indexed, this fieldtype must is multivalued. --> <fieldtype name= "Location" class= "SOLR." Latlontype "subfieldsuffix=" _coordinate "/> <!--a alternative geospatial field type new to SOLR 4.
It supports multivalued and polygon shapes. For more information about this and other Spatial fields new to SOLR 4, See:http://wiki.apache.org/solr/solradapter SForLuceneSpatial4--> <fieldtype name= "Location_rpt" SOLR. Spatialrecursiveprefixtreefieldtype "geo=" true "disterrpct=" 0.025 "maxdisterr=" 0.000009 "units=" degrees "/> <! -Spatial Rectangle (bounding box) field. It supports most spatial predicates, and has special relevancy (modes:score=overlapratio|area|area2d to The query). Docvalues is required for relevancy. --> <fieldtype name= "bbox" class= "SOLR." Bboxfield "geo=" true "units=" degrees "numbertype=" _bbox_coord "/> <fieldtype" name= "Class= "SOLR. Triedoublefield "precisionstep=" 8 "docvalues=" true "stored=" false "/>
To SOLR. Configuration instructions for Spatialrecursiveprefixtreefieldtype:
Spatialrecursiveprefixtreefieldtype
The FieldType used for depth traversal of the prefix tree is primarily used to obtain recursiveprefixtreestrategy based on Lucene.
Geo
The default is true, the coordinates are based on the spherical coordinate system, using the Geohash method, and the coordinates are based on the 2D plane coordinate system in the case of the value false, and the Euclidean/cartesian method is used.
disterrpct
Defines the precision of a non-point graphic, ranging between 0-0.5. This value determines the level of a graphical index or query that is not point (such as the length of the Geohash encoding when Geohash mode). When the maxlevels for 0 o'clock, that is, the most accurate, precision more general cost more space and time to build the index.
Maxdisterr/maxlevels:maxdisterr
Defines the top-level maxlevels of the index data, which is defined as 0.000009 and calculates the encoded length of 11 bits according to Geohashutils.lookuphashlenforwidthheight (0.000009, 0.000009), Accuracy of about 1 meters, directly determine the point index term number. Maxlevels priority is higher than maxdisterr, that is, maxlevels words maxdisterr failure. See Spatialprefixtreefactory.init () for details. However, the general use of Maxdisterr.
Units
Unit is degrees.
Worldbounds
World coordinates value: "MinX miny MaxX Maxy". When Geo=true is the Geohash mode, the value defaults to "-180-90 180 90". When Geo=false is quad, the value is a positive or negative boundary of the Java double type, which is required to be specified, set to "180-90 180 90".
2, the establishment of the index
Here you use SOLRJ to build the index:
Index some base station data for test public void Indexbasestation () {basestationdb basestationdb = new
Basestationdb ();
List<basestation> stations = Basestationdb.getallbasestations ();
collection<solrinputdocument> doclist = new arraylist<solrinputdocument> (); for (basestation basestation:stations) {///Add base station data to SOLR index solrinputdocument doc = new Solrinputd
Ocument ();
Doc.addfield ("station_id", Basestation.getbasestationid ());
Doc.addfield ("Station_address", basestation.getaddress ());
String posstring = basestation.getlongitude () + "" +basestation.getlatitude ();
Doc.addfield ("Station_position", posstring);
Doclist.add (DOC);
try {server.add (doclist);
Server.commit ();
catch (Solrserverexception e) {e.printstacktrace (); catch (IOException e) {e.printstacktrAce ();
} System.out.println ("Index Base station data done!"); }
Here, a string format such as longitude latitude is used to index the latitude and longitude to the Station_position field.
3, inquiry
Query syntax Examples:
Q={!geofilt pt=45.15,-93.85 sfield=poi_location_p d=5 score=distance}
Q={!bbox pt=45.15,-93.85 sfield=poi_location_p d=5 score=distance}
Q=poi_location_p: "Intersects ( -74.093 41.042-69.347 44.558)"//a bounding box (not in WKT)
Q=poi_location_p: "Intersects (POLYGON ( -10, -40, -10-20, 0 0, -10))"//a WKT Example
Description of the fields involved:
Field |
Meaning |
Q |
Query criteria, such as q=poi_id:134567 |
Fq |
Filter conditions, such as F Q=store_name: Agriculture |
FL |
Return fields, such as Fl=poi_id,store_name |
Pt |
Coordinate points, such as pt=54.729696,-98.525391 |
D |
Search radius, such as d=10 represents 10km range |
Sfield |
Specify coordinate index fields, such as Sfie Ld=geo |
Deftype |
Specifies that the query type can be multiplied by the Dismax and Edismax,edismax support boost functions. Dismax is to calculate the final score by means of accumulation. |
QF |