Index and retrieve POI data using Lucene

Source: Internet
Author: User
Tags solr

Index and retrieve POI data using Lucene

Abstract: 1, Introduction about spatial data search, previously written in the "Use SOLR for spatial search" This article is based on SOLR GIS data indexing and retrieval. Both SOLR and Elasticsearch are based on Lucene, both of which can be spatially searched, and in some scenarios we need to embed Lucene in an existing system to provide ...

1. Introduction

With regard to spatial data search, the previous article, "Using SOLR for Spatial search", is an index and retrieval based on SOLR's GIS data.

Both SOLR and Elasticsearch are based on Lucene, both of which can be spatially searched, and in some scenarios we need to embed lucene into existing systems to provide data indexing and retrieval capabilities, This article describes how to index and retrieve poi information with the latitude and longitude in Lucene.

2. Environmental data

Lucene version: 5.3.1

POI database: Base_station test data, each data is mainly ID, longitude and address.

3. Realize

Basic variable definition, where the "address" information is participle, participle uses the smartcnsmartchineseanalyzer of Lucene.

    Private String Indexpath = "D:/indexpoidata";    Private IndexWriter indexwriter = null;    Private Smartchineseanalyzer Analyzer = new Smartchineseanalyzer (true);    Private Indexsearcher indexsearcher = null;    Field Name private static final String Idfieldname = "id";    private static final String Addressfieldname = "Address";    private static final String Latfieldname = "lat";    private static final String Lngfieldname = "LNG";        private static final String Geofieldname = "Geofield";    Spatial Index and search private Spatialcontext ctx;    Private Spatialstrategy strategy;    Public Poiindexservice () throws IOException {init ();        } public Poiindexservice (String Indexpath) throws IOException {This.indexpath = Indexpath;    Init (); } protected void Init () throws IOException {Directory directory = new Simplefsdirectory (Paths.get (Indexpath        ));        Indexwriterconfig config = new Indexwriterconfig (analyzer); InchDexwriter = new IndexWriter (directory, config);        Directoryreader Ireader = directoryreader.open (directory);        Indexsearcher = new Indexsearcher (Ireader); Typical geospatial Context//These can also be constructed from spatialcontextfactory CTX = Spatialconte Xt.        GEO; int maxlevels = 11;        Results in Sub-meter precision for geohash//This can also is constructed from spatialprefixtreefactory        Spatialprefixtree Grid = new Geohashprefixtree (CTX, maxlevels);    Strategy = new Recursiveprefixtreestrategy (grid, geofieldname); }

Index data

    public boolean indexpoidatalist (list<poidata> dataList) {try {if (dataList! = null &&amp ;                Datalist.size () > 0) {list<document> docs = new arraylist<> ();                    for (Poidata data:datalist) {Document doc = new Document ();                    Doc.add (New Longfield (Idfieldname, Data.getid (), Field.Store.YES));                    Doc.add (New Doublefield (Latfieldname, Data.getlat (), Field.Store.YES));                    Doc.add (New Doublefield (Lngfieldname, DATA.GETLNG (), Field.Store.YES));                    Doc.add (New TextField (Addressfieldname, Data.getaddress (), Field.Store.YES));                    Point point = Ctx.makepoint (DATA.GETLNG (), Data.getlat ());                    For (Field F:strategy.createindexablefields (point)) {doc.add (f);                } docs.add (DOC);              } indexwriter.adddocuments (Docs);  Indexwriter.commit ();            return true;        } return false;            } catch (Exception e) {log.error (e.tostring ());        return false; }    }

The poidata here is an ordinary pojo.

Retrieves the data in the circle range, sorted by distance from near to far:

    Public list<poidata> searchpoiincircle (double LNG, double lat, double radius) {list<poidata> result        s= new arraylist<> ();        Shape circle = ctx.makecircle (LNG, lat, distanceutils.dist2degrees (radius, distanceutils.earth_mean_radius_km));        Spatialargs args = new Spatialargs (spatialoperation.intersects, circle);        Query query = strategy.makequery (args);        Point pt = Ctx.makepoint (LNG, LAT);  Valuesource Valuesource = Strategy.makedistancevaluesource (PT, distanceutils.deg_to_km);//the distance (in KM) Sort        Distsort = null;        Topdocs docs = null; try {//false = ASC dist Distsort = new Sort (Valuesource.getsortfield (false)). Rewrite (Indexsearcher)            ;        Docs = indexsearcher.search (query, ten, distsort);        } catch (IOException e) {log.error (e.tostring ());            } if (Docs!=null) {scoredoc[] Scoredocs = Docs.scoredocs;  Printdocs (Scoredocs);          Results = Getpoidatasfromdoc (Scoredocs);    } return results; } private list<poidata> Getpoidatasfromdoc (scoredoc[] scoredocs) {list<poidata> datas = new Arraylis        T<> ();            if (Scoredocs! = null) {//system.out.println ("total:" + scoredocs.length); for (int i = 0; i < scoredocs.length; i++) {try {Document Hitdoc = indexsearcher.do                    C (Scoredocs[i].doc);                    Poidata data = new Poidata ();                    Data.setid (Long.parselong (Hitdoc.get (Idfieldname)));                    DATA.SETLNG (Double.parsedouble (Hitdoc.get (Lngfieldname)));                    Data.setlat (Double.parsedouble (Hitdoc.get (Latfieldname)));                    Data.setaddress (Hitdoc.get (addressfieldname));                Datas.add (data);                } catch (IOException e) {log.error (e.tostring ()); }}} return Datas }

To search for data in a rectangular range:

    Public list<poidata> Searchpoiinrectangle (double minlng, double Minlat, double maxlng, double maxlat) {        list& Lt poidata> results= new arraylist<> ();        Point lowerleftpoint = Ctx.makepoint (MINLNG, Minlat);        Point upperrightpoint = Ctx.makepoint (MAXLNG, Maxlat);        Shape rect = Ctx.makerectangle (Lowerleftpoint, upperrightpoint);        Spatialargs args = new Spatialargs (spatialoperation.intersects, rect);        Query query = strategy.makequery (args);        Topdocs docs = null;        try {            docs = indexsearcher.search (query, ten),        } catch (IOException e) {            log.error (e.tostring ());        }                if (docs!=null) {            scoredoc[] Scoredocs = Docs.scoredocs;            Printdocs (Scoredocs);            Results = Getpoidatasfromdoc (Scoredocs);        }                return results;    

Search for a range and retrieve the POI based on the Address keyword information:

Public list<poidata>searchpobyrangeandaddress (DOUBLELNG, Doublelat, double range, String address) {List<po        Idata> results= newarraylist<> (); Spatialargsargs = Newspatialargs (spatialoperation.intersects, Ctx.makecircle (LNG, LAT, distanceutils.dist2degrees (r        Ange, distanceutils.earth_mean_radius_km));                Query geoquery = strategy.makequery (args);        QueryBuilder Builder = newquerybuilder (analyzer);                Query addquery = Builder.createphrasequery (addressfieldname, address);        Booleanquery.builderboolbuilder = Newbooleanquery.builder ();        Boolbuilder.add (AddQuery, occur.should);                Boolbuilder.add (Geoquery,occur.must);                Query query = Boolbuilder.build ();        Topdocs docs = null;        try {docs = indexsearcher.search (query, 10);        } catch (IOException e) {log.error (e.tostring ()); } if (Docs!=null) {scoredoc[] Scoredocs = DOCS.SCoredocs;            Printdocs (Scoredocs);        Results = Getpoidatasfromdoc (Scoredocs);    } return results; }

4, about participle

Both the address attribute and the description attribute of the POI need to be participle to better retrieve and search.

Simple comparison of several word segmentation effects:

Original:

This is an example of lucene Chinese word segmentation, you can run it directly! Chinese Analyer can analysis Chinese text too. ABC (ABC) and Construction Bank (CCB), Jiangsu Nanjing Jiangning 12th Yuan Street. Southeast University is a 985 university.

Participle Result:

SMARTCN smartchineseanalyzer \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ It \chines\analy\can\analysi\english\text\too\ china \ Agriculture \ Bank \ ABC \ and \ Construction \ Bank \ CCB \ jiangsu \ nanjing \ Jiang \ ning \ on \ yuan \ avenue \12\ \ Southeast \ University \ is \ a \ \985\ University Mmseganalyzer Complexanalyzer This is a \lucene\ chinese \ participle \ Example \ you can \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \chinese\analyer\can\analysis\english\text\too\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ r \ n is a \985\ university Ikanalyzer this is \ A \lucene\ chinese \ participle \ The \ example \ you \ can \ direct \ Run \ It \chinese\analyer\can\analysis\english\text\too.\ Agricultural Bank of China \ Agricultural Bank \ and \ CCB \ CCB \ jiangsu \ nanjing \ Jiangning \ on yuan \ Main Street \ no. 12th \ Southeast University \ is a \985\ University

Comparison of participle effect:

1) SMARTCN can not correctly distinguish some English words, some Chinese words are also divided into single words.

2) Mmseganalyzer can correctly distinguish between English and Chinese, but for similar "Jiangning" such as the place name and "CCB" and other information is not very accurate. Mmseganalyzer Support custom thesaurus, thesaurus can greatly improve the accuracy of word segmentation.

3) Ikanalyzer can correctly distinguish between English and Chinese, Chinese word segmentation is quite good, but there are some small problems, such as the word too and the last dot points together. Ikanalyzer also supports custom thesaurus, but to extend some of the source code.

Summary: Using Lucene's powerful data indexing and retrieval capabilities can provide search capabilities for some data with latitude and longitude and need for word search.

Index and retrieve POI data using Lucene

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.