Space search based on SOLR

Source: Internet
Author: User
Tags solr string format

If you need to retrieve data with latitude and longitude, such as finding a hotel 1000 meters near the current location, an easy way is to get all the hotel data in the database, calculate the distance by latitude, and return the data that is less than 1000 meters away.

This is useful when the data is small, but the efficiency of the retrieval is low when the volume of data is large, and this article describes the use of SOLR's spatial query for spatial search.

The principle of space search

Space search, also known as spatial query, based on space search technology, can do:

1 indexing point (latitude and longitude) and other geometries

2) sorted by distance

3 Filter search results based on rectangles, circles, or other geometric shapes

In Solr, space search is based primarily on Geohash and Cartesian Tiers 2 concepts:

Geohash algorithm

Through the Geohash algorithm, the two-dimensional coordinates of longitude and latitude can be transformed into a sort and comparable string encoding.
Each character in the encoding represents an area, and the preceding character is the parent area of the back character. The process of its algorithm is as follows:

Calculation of Geohash binary code based on latitude and longitude

The latitude interval of the earth is [ -90,90], if a latitude is 39.92324, the following algorithm can be used to approximate the 39.92324 code:

1) the interval [ -90,90] is divided into [ -90,0], [0,90], known as the left and right interval, can be determined 39.92324 belongs to the right-hand interval [0,90], to mark 1;

2) then the interval [0,90] is divided into [0,45], [45,90], can be determined 39.92324 belong to the left interval [0,45), to mark 0;

3 recursion above process 39.92324 always belongs to an interval [a,b]. With each iteration interval [a,b] is always shrinking, and is approaching 39.928167 more and more;

4 if the given latitude (39.92324) belongs to the left interval, the record 0, if the right interval is recorded 1, so as the algorithm will produce a sequence 1011 1000 1100 0111 1001, the length of the sequence with the given interval of the number of partitions.

Similarly, the longitude interval of the earth is [-180,180], and the process of encoding longitude 116.3906 is similar:

Group code

By this calculation, the latitude generated by the code is 1011 1000 1100 0111 1001, the longitude generated by the code is 1101 0010 1100 0100 0100. The number of digits in the longitude, the odd number of latitude, the 2-string coding combination of the generation of a new string: 11100 11101 00100 01111 00000 01101 01011 00001.

Finally, use 0-9, b-z (remove A, I, L, O) the 32 letters for BASE32 encoding, first 11100 11101 00100 01111 00000 01101 01011 to the decimal 28,29,4,15,0,13,11,1, ten The corresponding encoding of the system is WX4G0EC1. By the same token, the decoding algorithm which converts the code into the latitude and longitude is not the same as the other.

From the above, the longer the string, the more accurate the range of representations. When the Geohash base32 encoding length is 8 o'clock, the precision is about 19 meters, and when the code length is 9 o'clock, the accuracy is about 2 meters, the encoding length needs to choose according to the data condition. However, it can be seen from the Geohash encoding algorithm, which is located at two points on both sides of the boundary, although very close, but the coding will be completely different. In practice, this problem can be solved by searching for points in the other eight regions of the area of the point.

Cartesian Tiers Cartesian layer

The Cartesian layered model is the idea of converting latitude and longitude into a larger granularity of layered grids, which creates many geographical layers, each of which refines the granularity of segmentation on the basis of the previous layer, each of which is assigned an ID, representing a geographic location.

Each layer increases by 2 squared, so the first layer is 4 grids, the second layer is 16, so the latitude and longitude of the entire map will be reflected in the grid of each layer:

So how to build such an index structure, in fact very simple, only need to correspond to the Cartesian layer of the layer to build the domain, a domain or coordinate corresponding to multiple tiers levels. It is also the Tiers0->field_0,tiers1->field_1,tiers2->field_2,......, tiers19->field_19. (General 20 layers can be). Each field corresponding to the Cartesian hierarchy calculates the grid belonging to the current layer through the Cartesian algorithm based on the longitude and latitude of the current record, and then term the Gridid (grid only) to the index. So each record about the Cartesian 0-19 field will have a gridid corresponding. But the query usually needs to check the surrounding address, then the scope of the perimeter is more than a grid, then the actual operation process is based on the latitude and longitude and a distance to determine the need to involve the query from 19-0 (from the high to low) several layers corresponding to several grids of data. The query for a latitude and longitude location requires only the data in the following chart circle:

From the above, the search steps based on Cartesian tier are:
1, according to the Cartesian tier layer to obtain the location of the coordinate point Gridid
2, and the system index Gridid matching calculation
3. Calculating the distance between the result set and the target coordinate point returns a set of results within a specific range

The Cartesian layer can effectively shrink and reduce the filtering range and quickly locate the coordinate points.

Space search combat based on SOLR

SOLR has provided 3 kinds of filedtype for space search:

1) latlontype (for plane coordinates, not geodetic coordinates)

2) Spatialrecursiveprefixtreefieldtype (abbreviated as RPT)

3) Bboxfield (for Boundary index query)

This article focuses on the use of Spatialrecursiveprefixtreefieldtype, not only can be used in points, but also for polygon queries.

1. Configure SOLR

First look at the data:

SOLR's schema.xml configuration:

<field name= "station_id" type= "Long" indexed= "true" stored= "true" required= "true" multivalued= "false"/>
<field name= "station_address" type= "Text_general" "true" indexed= "true" stored=
/> <field "Name=" Position "type=" location_rpt "indexed= true" stored= "true"/>

<uniqueKey>station_id</uniqueKey>

The emphasis here is station_position, whose type is location_rpt, which is defined in SOLR as follows:

<!--A specialized field for geospatial search. If indexed, this fieldtype must is multivalued. --> <fieldtype name= "Location" class= "SOLR."  Latlontype "subfieldsuffix=" _coordinate "/> <!--a alternative geospatial field type new to SOLR 4.
      It supports multivalued and polygon shapes. For more information about this and other Spatial fields new to SOLR 4, See:http://wiki.apache.org/solr/solradapter SForLuceneSpatial4--> <fieldtype name= "Location_rpt" SOLR. Spatialrecursiveprefixtreefieldtype "geo=" true "disterrpct=" 0.025 "maxdisterr=" 0.000009 "units=" degrees "/> <! -Spatial Rectangle (bounding box) field. It supports most spatial predicates, and has special relevancy (modes:score=overlapratio|area|area2d to  The query). Docvalues is required for relevancy. --> <fieldtype name= "bbox" class= "SOLR." Bboxfield "geo=" true "units=" degrees "numbertype=" _bbox_coord "/> <fieldtype" name= "Class= "SOLR. Triedoublefield "precisionstep=" 8 "docvalues=" true "stored=" false "/>

To SOLR. Configuration instructions for Spatialrecursiveprefixtreefieldtype:

Spatialrecursiveprefixtreefieldtype

The FieldType used for depth traversal of the prefix tree is primarily used to obtain recursiveprefixtreestrategy based on Lucene.

Geo

The default is true, the coordinates are based on the spherical coordinate system, using the Geohash method, and the coordinates are based on the 2D plane coordinate system in the case of the value false, and the Euclidean/cartesian method is used.

disterrpct

Defines the precision of a non-point graphic, ranging between 0-0.5. This value determines the level of a graphical index or query that is not point (such as the length of the Geohash encoding when Geohash mode). When the maxlevels for 0 o'clock, that is, the most accurate, precision more general cost more space and time to build the index.

Maxdisterr/maxlevels:maxdisterr

Defines the top-level maxlevels of the index data, which is defined as 0.000009 and calculates the encoded length of 11 bits according to Geohashutils.lookuphashlenforwidthheight (0.000009, 0.000009), Accuracy of about 1 meters, directly determine the point index term number. Maxlevels priority is higher than maxdisterr, that is, maxlevels words maxdisterr failure. See Spatialprefixtreefactory.init () for details. However, the general use of Maxdisterr.

Units

Unit is degrees.

Worldbounds
World coordinates value: "MinX miny MaxX Maxy". When Geo=true is the Geohash mode, the value defaults to "-180-90 180 90". When Geo=false is quad, the value is a positive or negative boundary of the Java double type, which is required to be specified, set to "180-90 180 90".

2, the establishment of the index

Here you use SOLRJ to build the index:

    Index some base station data for test public void Indexbasestation () {basestationdb basestationdb = new
        Basestationdb ();
        List<basestation> stations = Basestationdb.getallbasestations ();
        collection<solrinputdocument> doclist = new arraylist<solrinputdocument> (); for (basestation basestation:stations) {///Add base station data to SOLR index solrinputdocument doc = new Solrinputd  
            Ocument ();  
            Doc.addfield ("station_id", Basestation.getbasestationid ());
            Doc.addfield ("Station_address", basestation.getaddress ());
            String posstring = basestation.getlongitude () + "" +basestation.getlatitude ();
            Doc.addfield ("Station_position", posstring);
        Doclist.add (DOC);
            try {server.add (doclist);
        Server.commit ();
        catch (Solrserverexception e) {e.printstacktrace (); catch (IOException e) {e.printstacktrAce ();
    } System.out.println ("Index Base station data done!"); }

Here, a string format such as longitude latitude is used to index the latitude and longitude to the Station_position field.

3, inquiry

Query syntax Examples:

Q={!geofilt pt=45.15,-93.85 sfield=poi_location_p d=5 score=distance}

Q={!bbox pt=45.15,-93.85 sfield=poi_location_p d=5 score=distance}

Q=poi_location_p: "Intersects ( -74.093 41.042-69.347 44.558)"//a bounding box (not in WKT)

Q=poi_location_p: "Intersects (POLYGON ( -10, -40, -10-20, 0 0, -10))"//a WKT Example

Description of the fields involved:

Field

Meaning

Q

Query criteria, such as q=poi_id:134567

Fq

Filter conditions, such as F Q=store_name: Agriculture

FL

Return fields, such as Fl=poi_id,store_name

Pt

Coordinate points, such as pt=54.729696,-98.525391

D

Search radius, such as d=10 represents 10km range

Sfield

Specify coordinate index fields, such as Sfie Ld=geo

Deftype

Specifies that the query type can be multiplied by the Dismax and Edismax,edismax support boost functions. Dismax is to calculate the final score by means of accumulation.

QF

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.