Lucene. Net accessibility learning and use: Index

Source: Internet
Author: User
ArticleDirectory
    • 1. Saving indexes to files
    • 2. Saving indexes to memory
    • 3. Use Lucene. Net for database query

Lucene. NET may need to be used again in the project. It uses free time to write a demo, mainly involving index creation, deletion, update, and a simple query. In this example, the Lucene. Net version is 2.4.0, and some classes and methods are different from the latest or older versions.

I. simple understanding of Indexes

Lucene. NET applications are relatively simple. For a period of time, I can only write points in the project at most.CodeUse its class library. It is not very clear about many terms, and may even be misunderstood. From my previous blog, you can also see that language expressions have never been a personal director. Even if you have "expressed", you may be suspected of copying large volumes of books, therefore, many conceptual introductions can save time (unless otherwise stated). It is very important for beginners to clarify concepts and identify technical terms. Please refer to the relevant documents.

Lucene indexes consist of one or more segments. One segment consists of multiple documents, and one document consists of one or more fields, A field is composed of one or more terms. The following figure shows everything:

It is not hard to see that Lucene's index is a structure consisting of a point-to-line and a line-to-plane. We can see this by viewing the index file generated by Lucene.

Reference Source: http://alone2004.spaces.live.com/blog/cns! C2525069080d7bb! 675. Entry

2. Create, optimize, delete, and update Indexes

Note: In the solution folder, there is a testResourceThe folder contains 4. txt files. During the local test, I used four text files under resource.

1. Saving indexes to files (1). Creating Indexes

InitializeIndexmodifierAnd then execute the core method for creating the index:

/// <Summary> /// create an index for the TXT file /// </Summary> /// <Param name = "file"> </param> // <param name = "modifier"> </param> private void indexfile (fileinfo file, indexmodifier modifier) {try {document DOC = new document (); // create a document, add fields to the document, and add the document to the index writer. setoutput ("an index is being created, file Name: "+ file. fullname); Doc. add (new field ("ID", id. tostring (), field. store. yes, field. index. tokenized); // store and index ID ++;/* filename begin */Doc. add (new field ("FILENAME", file. fullname, field. store. yes, field. index. tokenized); // store and index // Doc. add (new field ("FILENAME", file. fullname, field. store. yes, field. index. un_tokenized); // Doc. add (new field ("FILENAME", file. fullname, field. store. no, field. index. tokenized); // Doc. add (new field ("FILENAME", file. fullname, field. store. no, field. index. un_tokenized);/* filename end * // * Contents begin * // Doc. add (new field ("contents", new streamreader (file. fullname, system. text. encoding. default); string contents = string. empty; using (textreader RDR = new streamreader (file. fullname, system. text. encoding. default) {contents = RDR. readtoend (); // extract the file content. Doc. add (new field ("contents", contents, field. store. yes, field. index. tokenized); // store and index // Doc. add (new field ("contents", contents, field. store. no, field. index. tokenized); // No indexes are stored}/* Contents end */modifier. adddocument (DOC);} catch (filenotfoundexception fnfe ){}}

Finally, the indexmodifier object is executed.CloseMethod.

Notes:

A. The indexmodifier class encapsulates the frequently used indexwriter and indexreader, and we do not need to consider extra multithreading;

B. standardanalyzer is a frequently used analyzer and currently supports Chinese word segmentation. (For the famous pangu word segmentation, refer to eaglet );

C. The execution of the optimize method of indexmodifier can optimize the index file, but it takes a lot of time. According to my test, the larger the index file, the longer the optimization time will linearly increase, therefore, in actual development, this method will be executed according to certain policies;

D. The close method of indexmodifier must be executed. Otherwise, everything you do is useless.

(2) Delete an index by ID

The code is relatively simple.DeletedocuentsMethod:

Directory directory = fsdirectory. getdirectory (index_store_path, false); indexmodifier modifier = new indexmodifier (directory, new standardanalyzer (), false); term = new term ("ID", ID); modifier. deletedocuments (TERM); // Delete modifier. close (); directory. close ();

Among them, indexmodifier also has a method deletedocument, whose parameter is an integer docnum. We usually do not know the number of internal docnum in the index file, so we seldom use it.

(3) update an index by ID

Paste the following methods:

Bool enablecreate = isenablecreated (); // whether the index file has been created; term = new term ("ID", ID); document DOC = new document (); doc = new document (); // create a document, add fields to the document, and add the document to the index writer. Doc. add (new field ("ID", ID, field. store. yes, field. index. tokenized); // store and index Doc. add (new field ("FILENAME", filename, field. store. yes, field. index. tokenized); Doc. add (new field ("contents", filename, field. store. yes, field. index. tokenized); implements EIO. directory directory = luceneio. fsdirectory. getdirectory (index_store_path, enablecreate); indexwriter writer = new indexwriter (directory, new standardanalyzer (), indexwriter. maxfieldlength. limited); writer. updatedocument (term, DOC); writer. optimize (); // writer. commit (); writer. close (); directory. close ();

Note that this time, we useUpdatedocumentWhile indexmodifier does not find the ready-made updatedocument method. Optimize usually needs to be executed, otherwise there will be two indexes with the same ID in the index file.

2. Saving indexes to memory

If you already understand 1, 2 does not actually need to be detailed. In the indexmodifier constructor, there is an overload:

 
Public indexmodifier (directory, analyzer, bool create );

In the following sample code, the first parameter ramdirectory is a directory. We can define it as static and save it to the memory when creating an index:

 
Private Static ramdirectory ramdir = NULL; indexmodifier modifier = new indexmodifier (ramdir, new standardanalyzer (), true );

Tested addition, deletion, modification, and query principlesSame as 1.

3. Use Lucene. Net for database query

During normal development, frequent reading of massive data in databases may not meet the needs of efficiency and speed. We can also use Lucene. Net in combination with the database to quickly query results. How to create indexes, add, delete, modify, query, andSame as 1The descriptions in are exactly the same. For example, the index creation implementation in the demo in this article takes the first 1000 people to index their IDs and names. Before coding, I first inserted some data into the person table:

Insert person (firstname, lastname, weight, height) values ('ming', 'Yao ', 200,223) insert person (firstname, lastname, weight, height) values ('establish Alliance ', 'yi', 180,213) insert person (firstname, lastname, weight, height) values ('docker', 'nowitz', 180,211) insert person (firstname, lastname, weight, height) values ('White ', 'Howard', 190,218) insert person (firstname, lastname, weight, height) values ('ash ', 'Howard', 178,197) insert person (firstname, lastname, weight, height) values ('time', 'Duncan ', 183,211) insert person (firstname, lastname, weight, height) values ('ven', 'garnett ', 182,215) insert person (firstname, lastname, weight, height) values ('delon', 'williams ', 166,197)

Next, we will retrieve 1000 people:

String SQL = "select Top 1000 ID, firstname, lastname from person (nolock)"; ilist <person> listpersons = entityconvertor. queryforlist <person> (SQL, strsqlconn, null );

Then create an index:

Private void indexdb (indexmodifier modifier, ilist <person> listmodels) {setoutput (string. format ("creating database index, total {0} persons", listmodels. count); foreach (person item in listmodels) {document DOC = new document (); // create a document, add fields to the document, and add the document to the index writer Doc. add (new field ("ID", item. id. tostring (), field. store. yes, field. index. tokenized); // store and index Doc. add (new field ("fullname", String. format ("{0} {1}", item. firstname, item. lastname), field. store. yes, field. index. tokenized); // store and index modifier. adddocument (DOC );}}

In the same way, we finally execute these two methods (the optimize method is not required ):

Modifier. Optimize (); // optimize the index modifier. Close (); // close the index Reader
Iii. Search

The search in the sample code in this article uses Lucene. net indexsearcher is a simple search method by default. Search (query, filter, int N). I have never used many overload methods:

/// <Summary> /// search by index /// </Summary> /// <Param name = "keyword"> </param> /// <returns> </returns> private topdocs search (string keyword, string Field) {topdocs docs = NULL; int n = 10; // The maximum number of returned results setoutput (string. format ("retrieving keywords: {0}", keyword); try {queryparser parser = new queryparser (field, new standardanalyzer ()); // query = Parser for content. parse (keyword); // search for contents (use queryparser. parse Method to instantiate a query) stopwatch watch = new stopwatch (); watch. start (); docs = searcher. search (query, (filter) null, n); // get the search result watch. stop (); stringbuffer sb = "index completed, in total:" + watch. elapsed. hours + "Hour" + watch. elapsed. minutes + "Minute" + watch. elapsed. seconds + "seconds" + watch. elapsed. milliseconds + "millisecond"; setoutput (SB);} catch (exception ex) {setoutput (ex. message); docs = NULL;} return docs;} // <summary> /// Show search results /// </Summary> /// <Param name = "queryresult"> </param> private void showfilesearchresult (topdocs queryresult) {If (queryresult = NULL | queryresult. totalhits = 0) {setoutput ("sorry, no result is found. "); Return;} int counter = 1; foreach (scoredoc SD in queryresult. scoredocs) {try {document DOC = searcher.doc(sd.doc); string id = Doc. get ("ID"); // get ID string filename = Doc. get ("FILENAME"); // get the file name string contents = Doc. get ("contents"); // get the file content string result = string. format ("this is the {0} search result, ID: {1}, file name: {2}, file content: {3} {4}", counter, ID, filename, environment. newline, contents); setoutput (result);} catch (exception ex) {setoutput (ex. message) ;}counter ++ ;}}

In the next article, I will introduce the frequently used searches, sorting, and paging in Lucene. net. Today, I am lazy.

Finally, the Code in the demo in this article is not elegant, and the readability is still coincidental. I hope you can download it and check it out. I am still imagining that it will be helpful for new users, or it will attract some mistaken experts to give advice, if you are more than yourself, it's really good.

Download Demo: lucenenetapp

Refer:

Http://www.cnblogs.com/birdshover/category/152283.html

Http://lucene.apache.org/lucene.net/

Http://lucene.apache.org/lucene.net/docs/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.