Create and query the HelloWorld (including detailed notes) of the index library by Lucene)

Last Update:2014-02-09 Source: Internet

Author: User

Tags createindex

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This case uses the Lucene-3.6.2 version, Lucene Official Website: http://developere.apache.org /.

Case study:

In this example, the post search function is simulated, and the post object is simulated by creating the Article class. After the user inputs the retrieval information, Lucene can obtain the related Article object based on the retrieval information and return it to the user.

1. Establish a project

First, create a Java project in MyEclipse, and create a lib folder in it to store the jar package we used during development.

Ii. Import the jar package

In this case, four basic jar packages of Lucene are required. As follows:

Lucene-core-3.6.2.jar

Contrib \ analyzers \ common \ lucene-analyzers-3.6.2.jar (word divider)

Contrib \ highlighter \ lucene-highlighter-3.6.2.jar (highlighted)

Contrib \ memory \ lucene-memory-3.6.2.jar (highlighted)

Then, Build Path for the four jar packages in lib.

3. Create a HelloWorld class

Create a self-built package under src and create the HelloWorld. java file in the package.

The main function is not required in this file. We will test the program through jUnit.

Therefore, we need to add the @ Test annotation to our method.

Public class HelloWorld {// create index library @ Testpublic void createIndex () {}// search index library @ Testpublic void seacherIndex (){}}

4. Create a search PO

Create the Article class to be retrieved (simulate post object), which has three fields: id, title, content,

Numbers, titles, and content.

Public class Article {private Integer id; // idprivate String title; // title private String content; // content public Integer getId () {return id ;} public void setId (Integer id) {this. id = id;} public String getTitle () {return title;} public void setTitle (String title) {this. title = title;} public String getContent () {return content;} public void setContent (String content) {this. content = content ;}@ Overridepublic String toString () {return "Article [id =" + id + ", title =" + title + ", content = "+ content +"] ";}}

5. Compile HelloWorld to create and query the index database.

Remember the two core APIs for creating and querying:

Use the IndexWriter object when adding and deleting data to the index database.

The main methods are addDocument (), updateDocument (), and deleteDocument ().

Use the IndexSearcher object when searching from the index database.

The main method is search ().

The architecture diagram of Some APIs is listed in the final attachment.

The steps in the program follow the serial number. Because you need to prepare various parameters, it seems a bit messy, as long as you follow the steps to refer.

Import java. io. file; import java. io. IOException; import java. util. arrayList; import java. util. list; import org. apache. lucene. analysis. analyzer; import org. apache. lucene. analysis. standard. standardAnalyzer; import org.apache.e.doc ument. document; import org.apache.e.doc ument. field; import org.apache.e.doc ument. field. index; import org.apache.e.doc ument. fieldable; import org.apache.w.e.doc u Ment. field. store; import org. apache. lucene. index. corruptIndexException; import org. apache. lucene. index. indexReader; import org. apache. lucene. index. indexWriter; import org. apache. lucene. index. indexWriterConfig; import org. apache. lucene. queryParser. multiFieldQueryParser; import org. apache. lucene. queryParser. queryParser; import org. apache. lucene. search. indexSearcher; import org. apache. lucene. search. query; imp Ort org. apache. lucene. search. scoreDoc; import org. apache. lucene. search. topDocs; import org. apache. lucene. store. directory; import org. apache. lucene. store. FSDirectory; import org. apache. lucene. store. lockObtainFailedException; import org. apache. lucene. util. version; import org. junit. test; import lucene. a_domain.Article; public class HelloWorld {// create an index database/*** execute this method to convert the attributes: id, title, and content in the Article object created in this method into Doc Ument field. * The index database can only store Document-type objects, but cannot store the objects we created. After converting Article into Document, * use the addDocument method of IndexWriter to add the converted Document Object to the index database. * after this method is executed, an index file will be created in the specified index database directory, this is a bunch of binary files. * Because I/O is used, you must disable IndexWriter. * @ Throws Exception */@ Testpublic void createIndex () throws Exception {/** 2. Create the index library Directory, which is the first parameter of the IndexWriter constructor. * This Directory is an abstract class. Press ctrl + t to view the inheritance system of this class. * You will find that it has a subabstract class called FSDirectory, * We need to use the instance of this class as our Directory, * but it is abstract and cannot be new. * There is an open method in FSDirectory, and the open method receives a File, * This open method can be used to obtain the FSDirectory instance. * File is the path of the directory to be stored. You can create an indexDir folder in the current project as the path. * at this point, after the index library directory is configured, add the reference to the location of the first parameter of IndexWriter. */Directory directory = FSDirectory. open (new File ("./indexDir/");/** 4. Create the second parameter of the IndexWriterConfig constructor, analyzer. * This Analyzer is also an abstract class. You can view its inheritance system by ctrl + t and find that it has many sub-classes. * The standard tokenizer StandardAnalyzer is used first. * The new Class requires a Version number parameter, which is also given through the static constant of Version. * The word divider is created. Add to the second parameter of IndexWriterConfig. */Analyzer analyzer = new StandardAnalyzer (Version. paie_36);/** 3. Create the second parameter of the IndexWriter constructor and configure it. * New directly. An error is reported. Its constructor requires two parameters: * Version, Analyzer, and * Version. LUCENE_36 can be created. In this case, a static constant in the Version class is created. * Analyzer is a word divider that comes with Lucene. The Analyzer does not support Chinese characters */IndexWriterConfig indexWriterConfig = new IndexWriterConfig (Version. paie_36, analyzer); // 1. two parameters are required here. The first is the index library directory, and the second is the configuration. You can provide the two parameters above to IndexWriter indexWriter = new IndexWriter (directory, indexWriterConfig);/** 6. create the object Article to be saved to the index database. * after the object is created, convert the attributes in Article to the Document field */Article article = new Article (); article. setId (1); article. setTitle ("What Is Lucene"); article. s EtContent ("Lucene, quick way to get rid of losing weight, the actual score card design is generous");/** 7. to convert Article to a Document object, * You need to convert the attribute in Article to the Document field * directly new a Document, and then add the field through the add method of Document, * The add method requires a Fieldable parameter. Fieldable is an interface. */Document doc = new Document ();/** 8. create a subclass of Fieldable, ctrl + t view the inheritance system, and find that there is a Field subclass * four parameters are required to create a Field: String name, String value, Store, index index * The first parameter indicates the name of the field in the Index database; * The second parameter indicates the value stored on the field; * The third parameter indicates whether to store * The fourth parameter indicates word segmentation * In the method, new and Article have three attributes, and three fields need to be added to the Document, so add three times * // Fieldable field = new Field ("id", article. getId (). toString (), Store. YES, Index. ANALYZED); doc. add (new Field ("id", article. getId (). toString (), Store. YES, Index. ANALYZED); doc. add (new Field ("title", article. getTitle (), Store. YES, Index. ANALYZED); doc. add (new Field ("content", article. getContent (), Store. YES, Index. ANALYZED);/** 5. create an index library and use the add Document method. * This method requires a Document parameter. This Document is the data put into the index database, * but we need to put the Article object, which requires converting our Article type to the Document type. */IndexWriter. addDocument (doc);/** 9. close the stream */indexWriter. when you close ();} // search for the index database/***, you first create a List set to store the final query results. * In this program, the query conditions are written to death, and cannot be written to death during actual development. Therefore, this query condition needs to be obtained during development. * According to the query conditions, the final result set can be obtained from the index database, but the index database stores the Document Object. Therefore, the obtained result set is also a set of Document objects. * You also need to convert the Document Object to the required Article object, which can be obtained according to the get method of the document Object. The get (name) parameter is the field name. * The showResults (List list) method is called to retrieve the obtained result set and display it on the console. * @ Throws Exception */@ Testpublic void seacherIndex () throws Exception {// create a set to store the queried data. List <Article> list = new ArrayList <Article> ();/** 2. create an index library Directory, specify the index library directory path, and provide it to IndexSearcher */Directory directory = FSDirectory. open (new File (". /indexDir/");/** 1. create an IndexSearcher object. The constructor of this object needs to receive an IndexReader object * This IndexReader is an abstract class with an abstract method: open. * The parameter of the open method is Directory, that is, the index library Directory. You must specify the index library Directory to read data. * Create the directory above and go to step 2. */IndexSearcher indexSearcher = new IndexSearcher (IndexReader. open (directory);/** 4. create a query condition * QueryParser can only specify one field for retrieval. For example, if the id field is specified, only the id field can be queried. Is a single field search. * QueryParser has three parameters. * parameter 1: version * parameter 2: indicates which field to search. Here, a field name * parameter 3 is input: word divider * the query condition here is "Lucene", which indicates querying "Lucene" in a field ". */String queryString = "Lucene"; // 6. prepared word divider: Analyzer analyzer = new StandardAnalyzer (Version. required e_36); // 5. create a parser QueryParser queryParser = new QueryParser (Version. paie_36, "title", analyzer); // 7. put the query conditions in the parser. The returned query object is used as the first parameter Query query = queryParser of the IndexSearcher search method. parse (queryString);/** 3. use the search method of IndexSearcher to create a query * parameter 1: indicates the query condition * parameter 2: indicates the first number of records returned * The search return value is of the TopDocs type, which is the result set after the query. */TopDocs topDocs = indexSearcher. search (query, 100); System. out. println ("Total number of records:" + topDocs. totalHits); // This field obtains the total number of queried records. // 8. Obtain the number of result set ScoreDoc [] scoreDocs = topDocs. scoreDocs; // traverse scoreDocs, if (scoreDocs! = Null & scoreDocs. length> 0) {for (int I = 0; I <scoreDocs. length; I ++) {ScoreDoc scoreDoc = scoreDocs [I]; System. out. println ("score for this record:" + scoreDoc. score); // obtain the unique ID of the retrieved record in the index database. Based on this number, you can obtain the required data int doc = scoreDoc.doc; // The doc method of IndexSearcher can obtain the required data Document document Document = indexSearcher.doc (doc) from the index database using the unique number just obtained; // get the Document object, you also need to convert the Document Object to the Article object. Article article = new Article ();/** get the value based on the field name through the get method of the document Object. * The name here is obtained through * doc. add (new Field ("id", article. getId (). toString (), Store. YES, Index. ANALYZED); * The Name Of The field set in this method */article. setId (Integer. parseInt (document. get ("id"); article. setTitle (document. get ("title"); article. setContent (document. get ("content"); // Add it to the set where the result is stored. List. add (article) ;}}// 9. Disable the stream indexSearcher. close (); // 10. traverse the final result set of the output if (list! = Null & list. size ()> 0) {showResults (list) ;}} private void showResults (List <Article> list) {for (Article article: list) {System. out. println ("article No.:" + article. getId (); System. out. println ("article title:" + article. getTitle (); System. out. println ("article content:" + article. getContent (); System. out. println ("------------------------------------------------");}}}

Execute the createIndex method first, so that the created index database has data, and then execute the seacherIndex method to display the data obtained from the index database on the console.

6. view the files generated in the index database

We defined the index library under the root directory of the project:

Enter this directory and you will see the generated files, which are binary files.

VII. Attachment

Here we provide information about the inheritance system of some classes.

Directory class:

Analyzer class:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More