Full-text retrieval-lucene entry HelloWorld,

Source: Internet
Author: User
Tags createindex

Full-text retrieval-lucene entry HelloWorld,

First, check the directory structure.


Step 1: Create a jave project in eclipse. The jar package must be introduced. There are only three parts: lucene's word divider and core package, as well as highlighted display. Create a lib folder, copy the jar package, right-click it, select Build Path, and add to Build Path to the project ).


Create a datasource folder and add a few txt files. (However, we recommend that you use Chinese and English to test the word segmentation of the two languages ). The deleeindex folder is also used to store the created index.

The content of IndexWriteraddDocument's a javadoc.txt is

Adds room adocument to this room index. If the room document contains room more than setMaxFieldLength (int) terms for agiven field, the remainder are discarded.

Joke _ President's Room. txt content is

President's room

A gentleman wants to open a room at a hotel in a tourist attraction,

The waiter refused to say, "the room is full and cannot be arranged ."

"Listen !" The gentleman said, "if I tell you that the President is coming here, will you provide him with a room right away ?"

"Of course, he is always ......"

"Well, I am honored to inform you that the President will not be here tonight. Give me his room !"

Create a package, com. lucene. helloworld, and create a HelloWorld class. The two main methods are to create an index and query keywords.

Package com. lucene. helloworld; import org. apache. lucene. analysis. analyzer; import org. apache. lucene. analysis. standard. standardAnalyzer; import org.apache.e.doc ument. document; import org. apache. lucene. index. indexWriter; import org. apache. lucene. index. indexWriter. maxFieldLength; import org. apache. lucene. queryParser. multiFieldQueryParser; import org. apache. lucene. queryParser. queryParser; import org. apach E. lucene. search. filter; import org. apache. lucene. search. indexSearcher; import org. apache. lucene. search. query; import org. apache. lucene. search. scoreDoc; import org. apache. lucene. search. topDocs; import org. junit. test; import com. lucene. units. file2DocumentUtils; public class HelloWorld {// path of the file to be queried String filePath = "F: \ Users \ liuyanling \ workspace \ LuceneDemo \ datasource \ peoplewhocannot.txt "; // set the path for storing Indexes Path String indexPath = "F :\\ Users \ liuyanling \ workspace \ LuceneDemo \ luceneIndex "; // set the Analyzer analyzer as the standard Analyzer analyzer = new StandardAnalyzer ();/*** to create an index. First, convert the File to the Document type, and then use IndexWriter, create an index for the Document according to the preset word divider and other rules and store it in the corresponding path */@ Testpublic void createIndex () throws Exception {Document doc = File2DocumentUtils. file2Document (filePath); IndexWriter indexWriter = new IndexWriter (indexPath, analyzer, true, MaxFi EldLength. LIMITED); indexWriter. addDocument (doc); indexWriter. close ();}/*** search, query keyword queryString, first set the query range, query in "name" and "content. Then you can use IndexSearcher to query the results. * You can use the query index to return the results. The returned result is of the TopDocs type and must be processed to obtain the result. */@ Testpublic void search () throws Exception {String queryString = "Internet"; String [] fields = {"name", "content "}; queryParser queryParser = new MultiFieldQueryParser (fields, analyzer); Query query = queryParser. parse (queryString); IndexSearcher indexSearcher = new IndexSearcher (indexPath); Filter filter = null; // query the first 10000 records TopDocs topDocs = indexSearcher. search (query, filter, 10000); System. out. println ("total [" + topDocs. totalHits + "] matching results"); // traverses scoreDocs of the query results for (ScoreDoc scoreDoc: topDocs. scoreDocs) {// retrieve the doc Number of the file int docSn = scoreDoc.doc; // use indexSearcher to find the document doc = indexSearcher.doc (docSn ); // print the content of the Document file File2DocumentUtils. printDocumentInfo (doc );};}}

Then, because the printDocumentInfo () and file2Document () of the File2DocumentUtils class are used, the Document content and File are converted to Document. Check the content of the File2DocummentUtils file.

Package com. lucene. units; import java. io. bufferedReader; import java. io. file; import java. io. fileInputStream; import java. io. fileNotFoundException; import java. io. inputStreamReader; import org.apache.w.e.doc ument. document; import org.apache.e.doc ument. field; import org.apache.e.doc ument. numberTools; import org.apache.w.e.doc ument. field. index; import org.apache.e.doc ument. field. store ;/** * Convert File to Document tool class * @ author liuyanling **/public class File2DocumentUtils {/*** File to Document. read the File according to the path. then, index is created based on the file Name, content, size, and path to check whether the file is stored. * @ param path * @ return Document, converted result */public static Document file2Document (String path) {File file = new File (path ); document doc = new Document (); // you can use the readFileContent method to split file names and content, create indexes, and store them. Doc. add (new Field ("name", file. getName (), Store. YES, Index. ANALYZED); doc. add (new Field ("content", readFileContent (file), Store. YES, Index. ANALYZED); // store the file size, but create an index without word segmentation; the path does not need to create an index doc. add (new Field ("size", String. valueOf (file. length (), Store. YES, Index. NOT_ANALYZED); doc. add (new Field ("path", file. getPath (), Store. YES, Index. NO); return doc;}/*** read the file content. Using the FileInputStream file input stream, InputStreamReader reads the input stream and wraps it with BufferedReader. You can read the data in a row in readLine. * @ Param file object * @ return String, File content */public static String readFileContent (file) {try {BufferedReader reader = new BufferedReader (new InputStreamReader (new FileInputStream (file); // stores the File content StringBuffer content = new StringBuffer (); for (String line = null; (line = reader. readLine ())! = Null;) {content. append (line ). append ("\ n");} return content. toString () ;}catch (Exception e) {throw new RuntimeException () ;}}/*** print the content of the Document file, directly obtain the field content based on the field name used to create the index and print * @ param doc */public static void printDocumentInfo (Document doc) {// The first read method, first getfiled, then stringValue // Field r = doc. getField ("name"); // r. stringValue (); // The second read method, getSystem directly. out. println ("name =" + doc. get ("name"); System. out. println ("content =" + doc. get ("content"); System. out. println ("size =" + doc. get ("size"); System. out. println ("path =" + doc. get ("path "));}}

After the code is written, create an index first. Right-click the createIndex method and select Junit Test from Run As to perform the unit Test.


After the test is successful, you can see that the index is added, indicating that the index is added.


Then test and test. The query results are as follows. Only one result is queried. name, context, size, and path are printed.


The above is a simple full-text search query with many defects. For example, the index storage location is manually created in advance, for example, the query result is not highlighted, for example, only the standard word divider is used for the word divider. For improvements, see the next article titled optimize lucene for full-text search-create a directory for storing indexes.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.