Lucene full-text retrieval Getting Started experience

Source: Internet
Author: User

Lucene is an Apache open-source full-text retrieval framework, simple search tool, easy to use. Now that it's out of the 5.2.1 version, you can use it just by importing the necessary few jar packages into your project. The process used can be summarized as

1) Building the Index

2) Search find, get search results


Here we come together to learn some of the core classes that will be used:


Directory

This class is used in Lucene to describe the location information for index storage. Like what:

where "C:\\lucene\\index" is the folder location where the index is stored.

Analyzer

Analyzer is Lucene's word breaker, it can be said that the word segmentation analysis technology is also one of the core search engine technology. A sentence is segmented segmentation analysis,. Make search results smarter and more accurate. Chinese word bank participle, you can use Ikanalyzer and other Chinese word breaker toolkit.


The role of the analyzer class is to be understood in conjunction with the two classes of Indexwriterconfig and IndexWriter:

Indexwriterconfig, known from the class name, is a class that holds parameter configurations for generating IndexWriter. Like what:

Indexwriterconfig IWC = new Indexwriterconfig (luceneanalyzer);              Iwc.setopenmode (openmode.create);              

Setopenmode (...) sets the way IndexWriter is opened.

Of course there are more parameter settings, you can refer to this article Oh. Indexwriterconfig Configuration parameter Description


The above three lines of code are also the process of creating a indexwriter. (dir, this parameter is the first class directory mentioned.)

IndexWriter is the core class for indexing. If you also know Android sharepreference, there is an editor class in Sharepreference. IndexWriter is a class like editor that can be added to an index (creating a new index and writing to an indexed document), deleting (deleting an index from an indexed document), and updating (updating an index in an indexed document). By the way, Lucene indexes generate corresponding index documents, so it's a good idea to set up folders specifically for these documents.


Document

The document class, as its name implies, is the "file" class, which is actually used to store field collections. Can be understood as the storage of files, are generally convertible text information, such as Doc, TXT and so on. When the file information used for the search is added to the document, the

Indexwriter.adddocument (document);
This adds an index. Come here, perhaps you will think, later to query directly first to index here on the line.


Indexreader

Corresponds to IndexWriter, there is indexreader. To create a method:

Indexreader reader = Directoryreader.open (Fsdirectory.open (Paths.get (index)));

It only reads the index document. Then give the search tool Indexsearcher to complete the search. Based on the query retrieval criteria, a result set of Scoredoc type is obtained, and then the document information is read to obtain the specific information of the search results, such as the contents of the keywords contained in the content, the storage path of these content documents, and so on. This is the completion of the entire retrieval process.


OK, below we have a simple example experience, is interested in can see the document to learn more about OH.

Note: This example code from the network, is very simple to understand the example, so no longer write, here is just experience learning, we directly learn from others. Well, the original author doesn't know who it is, thank you here.

After the introduction of the key classes above, I believe the following example code will be easy to understand a lot, so directly on the code.

1) Create an index document.

Set up a folder locally, such as the C packing directory to create the index folder. The path is C:\index. The content document to be retrieved is placed in the source folder of the C packing directory, and the path is C:\source.

public class CreateIndex {public static void main (string[] args) throws Exception {/* Indicates the location of the folder to be indexed, this is the sour of the C drive                  Ce folder */file Filedir = new file ("C:\\source");          /* Place the index file location *///file Indexdir = new File ("C:\\index");                  String Indexpath = "C:\\index";    Directory dir = Fsdirectory.open (Indexdir);                    v3.6.0 Directory dir = Fsdirectory.open (Paths.get (Indexpath));          Analyzer Luceneanalyzer = new StandardAnalyzer (VERSION.LUCENE_3_6_0);         Analyzer Luceneanalyzer = new StandardAnalyzer ();           Indexwriterconfig IWC = new Indexwriterconfig (Luceneanalyzer);           Iwc.setopenmode (openmode.create);             IndexWriter indexwriter = new IndexWriter (DIR,IWC);             file[] Textfiles = Filedir.listfiles ();                          Long startTime = new Date (). GetTime (); Add document to index go for (int i = 0; i < textfiles.length; i++) {if (Textfiles[i].isfile ()) {System.out.println ("File" + textfiles[i].getcanonicalpath () + "is being indexed.                     ..");                     String temp = Filereaderall (Textfiles[i].getcanonicalpath (), "GBK");                     SYSTEM.OUT.PRINTLN (temp);                                    Document document = new document ();                 Field Fieldpath = new Stringfield ("Path", Textfiles[i].getpath (), Field.Store.YES);                     Field fieldbody = new TextField ("body", temp, Field.Store.YES);                     Document.add (Fieldpath);                     Document.add (Fieldbody);                 Indexwriter.adddocument (document);                          }} indexwriter.close ();             Test the index time long endTime = new Date (). GetTime (); System.out. println ("This takes up" + (Endtime-starttime)                             + "milliseconds to add the document to the index!"         + Filedir.getpath ());             The public static string Filereaderall (String FileName, String charset) throws IOException { BufferedReader reader = new BufferedReader (new InputStreamReader (New FileInputStream (Filena             Me), CharSet);             String line = new string ();                          String temp = new string ();             while (line = Reader.readline ()) = null) {temp + = line;             } reader.close ();         return temp; }    }

2) Perform the Search lookup class:

public class ExecuteQuery {public static void main (string[] args) throws IOException, parseexception {String i    ndex= "C:\\index";//Search index path//Indexreader Reader=indexreader.open (Fsdirectory.open (Paths.get (index));        v3.6.0 Indexreader reader = Directoryreader.open (Fsdirectory.open (index));          Indexsearcher searcher=new indexsearcher (reader);//Search tool scoredoc[] hits=null;  String querystring= "good";          Search index name of Query Query=null;          Analyzer analyzer= new StandardAnalyzer (); try {//queryparser qp=new queryparser (version.lucene_3_6_0, "body", analyzer);//tools for parsing user input v3.6.0 Qu          Eryparser qp=new Queryparser ("Body", analyzer);//tool for parsing user input query=qp.parse (queryString);              } catch (ParseException e) {//Todo:handle exception} if (Searcher!=null) { Topdocs results=searcher.search (query, 10);//Only the top ten search results are taken Hits=results.scorEDocs;              Document Document=null;                  for (int i = 0; i < hits.length; i++) {Document=searcher.doc (Hits[i].doc);                  String Body=document.get ("body");                  String path=document.get ("path");                  String modifiedtime=document.get ("Modififield");                   System.out.println ("BODY---" +body+ "");               System.out.println ("path--" +path); } if (hits.length>0) {System.out.println ("Input keyword:" +querystring+ "," + "to find the" +hits.length+ "bar result            !");                                }              Searcher.close ();          Reader.close (); }      }  }

In the example, search for the "good" word. Effect:


Then modify the content document:


Search "effort"


Chinese search ok~


This is, of course, a very simple example. This article is just experiential learning, let us in the future to achieve full-text retrieval of a more study direction. In addition, a good search tool, the word breaker is very critical. At the same time, the English and Chinese thesaurus is different. So when you implement a real search, consider the factors that affect your search results.


Example source code


Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Lucene full-text retrieval Getting Started experience

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.