Simple implementation of lucene

Source: Internet
Author: User
Tags createindex

Search for the database's turtle speed. Now we have to find another way. Then lucene will be able to show its talents.

First, we will make a demo to insert 10 million data records into the database, totaling 778 MB.

 

Next, we will search for the "popular" record in the news content.

 

Mmd: It takes 78 s to search for the database. All of them have to hack into the database.

Let's take a look at how lucene works. Download Address: http://incubator.apache.org/lucene.net/download.html

The code is as follows: Copy code
Using System;
Using System. Collections. Generic;
Using System. Linq;
Using System. Text;
Using Lucene. Net. Index;
Using Lucene. Net. Store;
Using Lucene. Net. Analysis. Standard;
Using Lucene. Net. Documents;
Using System. Data;
Using System. Diagnostics;
Using Lucene. Net. Search;
 
Using Lucene. Net. QueryParsers;
 
Namespace First
 {
Class Program
     {
Static string path = @ "D: Sample ";
 
Static void Main (string [] args)
         {
// Create an index
CreateIndex ();
 
Var watch = Stopwatch. StartNew ();
 
// Search
IndexSearcher search = new IndexSearcher (path );
 
// Query expression
QueryParser query = new QueryParser (string. Empty, new StandardAnalyzer ());
 
// Query. parse: inject query conditions
Var hits = search. Search (query. Parse ("Content: Popular "));
 
For (int I = 0; I              {
Console. writeLine ("current content: {0}", hits. doc (I ). get ("Content "). substring (0, 20) + "... ");
             }
 
Watch. Stop ();
 
Console. WriteLine ("search time consumed: {0}", watch. ElapsedMilliseconds );
         }
 
Static void CreateIndex ()
         {
// Create the index Library Directory
Var directory = FSDirectory. GetDirectory (path, true );
 
// Create an index and use StandardAnalyzer to split sentences
IndexWriter indexWriter = new IndexWriter (directory, new StandardAnalyzer ());
 
Var reader = DbHelperSQL. ExecuteReader ("select * from News ");
 
While (reader. Read ())
             {
// A set of fields: a document, similar to a table row
Document doc = new Document ();
 
// The field to be indexed
Doc. Add (new Field ("ID", reader ["ID"]. ToString (), Field. Store. YES, Field. Index. NOT_ANALYZED ));
Doc. Add (new Field ("Title", reader ["Title"]. ToString (), Field. Store. NO, Field. Index. ANALYZED ));
Doc. Add (new Field ("Content", reader ["Content"]. ToString (), Field. Store. YES, Field. Index. ANALYZED ));
 
IndexWriter. AddDocument (doc );
             }
 
Reader. Close ();
 
// Optimize the index file
IndexWriter. Optimize ();
 
IndexWriter. Close ();
         }
     }
 }

 

I relied on 448 ms and suddenly lost my S. Of course, this time does not include the time for "creating indexes". In terms of time complexity, this type of pre-loaded index is a constant.

 

As a beginner, we will briefly introduce the implementation process of lucene. lucene is divided into two steps: "Index" and "search ".

 

I. Index:

I believe everyone is familiar with indexing. lucene can split our content into many words, use words as keys, create inverted indexes, and put them in the index database.

In this example, we can see that IndexWriter, FSDirectory, StandardAnalyzer, Document, and Field classes are used in the indexing process. The following is a brief analysis.

 

1: IndexWriter

We can see that this class has an AddDocument method, so we think this class implements the index write operation.

 

2: FSDirectory

This is simpler. It provides the storage location of the index database. For example, if we use D: Sample here, someone may ask whether the database can be stored in the memory. In the face of powerful lucene, of course

Yes, RAMDirectory in lucene can be implemented. Of course, if our memory is large enough, we can still use the memory to carry the index Library, thus improving the search efficiency.

 

3: StandardAnalyzer

This is the most critical step in the indexing process. It is also something we think very carefully about when using lucene. The key reason why we can search for the second kill is how to extract the input content.

In what form of segmentation, of course, different splitting forms produce different analyzer, StandardAnalyzer is a analyzer based on word segmentation.

Chapter.

 

4: Document

In the above example, we can see that it is a set that carries field and then added to IndexWriter, which is similar to the concept of rows in the table.

 

5: Field

The processing of the fields to be analyzed is presented in KV format.

①: Field. Store. YES, Field. Index. NOT_ANALYZED indicates that the Index Field is saved as is and not split by StandardAnalyzer.

②: Field. Store. NO, Field. Index. ANALYZED is not saved but must be split by StandardAnalyzer.

 

II. Search

This is relatively easy. According to the words we entered, lucene can quickly locate the words we are looking for in the index Library. We can also see IndexSearcher, QueryParser, and Hits.

 

1: IndexSearcher

This can be understood as opening the index library created by IndexWriter in a read-only manner. search provides QueryParser with a bridge to query.

 

2: QueryParser

This provides a parse method to convert the words we want to search into lucene which can understand the query expression.

 

3: Hits

This is a pointer to obtain the matching result. The advantage is similar to the delay loading in C #. The goal is the same and the performance is improved.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.