Dotlucene (Lucene. Net) Research

Source: Internet
Author: User
Tags split words

1. Create a project

A) Reference Lucene. net. dll

Using Lucene. net. index;

Using Lucene. net. documents;

Using Lucene. net. analysis;

Using Lucene. net. analysis. Standard;

Using Lucene. net. search;

Using Lucene. net. queryparsers;

2. Create an index)
Before you start searching, you need to create an index and add the data you want to search.

A) create an index and use a class named indexwriter.
Indexwriter writer = new indexwriter (@ "C: \ your \ Index \ directory", new standardanalyzer (), true );
In this way, you can easily create a new index in the hard disk directory (you need to write permissions to the hard disk). You can also store the index to the memory or database.

B) then, after you create an index, you can open it when you need to add a document to it.
Indexwriter writer = new indexwriter (@ "C: \ your \ Index \ directory", new standardanalyzer (), false );

3. Add documents to the index

Once you open the index, you can add a document to it. every entity you insert into the index is a document. each domain in the index contains related information. in each domain, you need to specially catalog it, which means that no matter whether the domain meets the following requirements:

StoredIn the index (you can get any stored value from the index. This is useful for short fields, such as author, title, and so on)

Indexed(The indexing is required for the domain you want to request.)

Tokenized(Mark,Split words before recording)

Fields can be included in useful structures:

Public field (string fieldname, string fieldvalue, bool stored, bool indexed, bool tokenized );

For your convenience, you can also use static methods to create a new domain instance:

Field. Keyword ()-Store and catalog domain values without marking

Field. Text (string name, streamreader Val)-The domain is not stored, but can be indexed and marked.
Field. Text (string name, string Val)-The domain is stored, indexed, and marked.

Field. unindexed ()-The domain is only stored.

 

If you want to obtain the tag value and index, you need to support unformatted text. To compile other formatted text, you need a word divider. We will view an HTML document for analysis.

 

In our example, we will index a text file by using the following fields:

"Filename"(Storage, not indexed, not marked)

"Text"(Not stored, indexed, and tagged)

 

When we use such a domain, we can only search for the content (the "text" field)

 

Void adddocument (indexwriter writer, string path)

{

Document Doc = new document ();

Streamreader sr = new streamreader (path, system. Text. encoding. Default );

Doc. Add (field. Text ("text", Sr ));

Doc. Add (field. Keyword ("FILENAME", PATH ));

Writer. adddocument (DOC );

Sr. Close ();

}

4. Save the index
Do not forget to save the index

Writer. Close ();

You can have an index before disabling indexwriter.

Writer. Optimize ();

Writer. Close ();

Optimization may take some time, but it can improve the search performance. You should optimize it before the index write operation is completed.

5. Start searching
Before you search for an index, useIndexsearcherYou can open it and perform a search when there is another process or optimization for adding a document.

Indexsearcher searcher = new indexsearcher (@ "C: \ your \ Index \ directory ");

 

Then you can create a query:

String q = "dotlucene ";

Query query = queryparser. parse (Q, "text", new standardanalyzer ());

 

Now we can get the search results and print them out. Our function prints out the file name, because we store other fields in the index.

Hits hits = searcher. Search (query );

Console. writeline ("found" + hits. Length () + "document (s) that matched query '" + q + "': \ r \ n ");

For (INT I = 0; I

Document Doc = hits. DOC (I );

Console. writeline (Doc. Get ("FILENAME") + "\ r \ n ");

}

Finally, do not forget to close the search engine:

Searcher. Close ();

 

6. query syntax (query syntax)

 

Query

Example

Notes

Single Term

Word

Document

Searches for documents that contain "document" term in the default field.

Search for the default domain containing the word "document" in a document

Phrase

Phrase

"Important Document"

Searches for documents that contain the phrase "important document" in the default fiels.

Search for the default domain of the phrase "important document" in the document

Searching Fields

Search domain

Title: Document

Searches for documents that contain "document" term in the "title" field.

Search for the word "document" in the "title" field in the document"

Wildcard search

Wildcard search

Doc? Ment

Single-character wildcard search. It will match "document" and "dociment" but not "docooment ".

When a letter or wildcard is queried, it matches "Doc ".UMent "and" DocIMent ", but does not match" DocOoMent"

 

Document *

Multi-character wildcard search. It will match "document" and "documentation ".

Multi-letter wildcard query, which matches "document" and "documentation"

Fuzzy search

Fuzzy search

Document ~

Search based on similar spelling.

Query similar spelling

 

Document ~ 0.9

Search based on similar spelling. 0.9 is the required similarity (default: 0.5)

If the query similarity is 0.9, the default value is 0.5.

Proximity search

Near Query

"Important Document "~ 5

Find words of a phrase that are not next to each other. Maximum Distance in this example is 5 words.

Search for words in a phrase. In the following phrase, the longest distance is five words.

Range search

Range Query

Author: {Einstein to Newton}

Searches for document with "author" field value between specified values.

The "author" value in the query document is between the list values.

 

Date: {20050101 to 20050201}

Searches for document with "date" field (datetime type) value between specified dates.

Query the "date" (date type) in the document between the listed values

Relevance

Appropriate

Important ^ 4 Document

Set boost factor of the term "important" to 4. Default boost factor is 1.

 

"Important Document" ^ 4 "Search Engine"

You can set boost factor for phrases too.

Or Operator

Or operation

Important Document

"Or" is the default operator.

 

Important or document

The default field must contain either "important" or "document ".

The default document must contain "important" or "document"

And Operator

And operations

Important and document

The default field must contain both word.

The default document must contain all words.

+ Operator

+ Operation

Important + document

The default field must contain "document" and may contain "important ".

The default document must contain "document" and can contain "important"

Not/-Operator

Non-operation

-Important Document

The default field must contain "document" but not "important ".

The default document must contain "document", but "important" is not saved"

Grouping

Group

(Important or office) and document

Use parentheses for expression grouping.

Use garden arc grouping

 

Author :( Einstein or Newton)

Parentheses work with fields as well.

 

7. Prohibited queries (query prohibited)

Query

Examples

Notes

Wildcard at the beginning of a term

Wildcard at the beginning of a word

? Ocument, * ocument

Throws Lucene. net. queryparsers. parseexception.

Stop Words

Comma

A, the, and

Stop words are not indexed.

Special characters: +-& |! () {} [] ^ "~ *? :\

Special characters

\ + ,\:

 

Use a backslash to escape the special characters.

 

Learning accumulation:

1.TermqueryQuery a specific word

Term T = new term ("ISBN", "1930110995 ");
Query query = new termquery (t );

2.Rangequery Used for query range,Usually used for time query

For example:Query query = new rangequery (new term ("time", "20070516"), new term ("time", "20070517"), false );

The third parameter of rangequery is used to indicate whether the start and end dates are included.

3.PrefixqueryIt is used to search whether a specific prefix is included. It is often used for catalog retrieval.
Prefixquery query = new prefixquery (new term ("category", "/computers "));

4.BooleanqueryUsed to test whether multiple conditions are met.

Termquery searchingbooks =
New termquery (new term ("subject", "JUnit "));

Rangequery currentbooks =
New rangequery (new term ("pubmonth", "200301 "),
New term ("pubmonth", "200312 "),
True );
Booleanquery currentsearchingbooks = new booleanquery ();
Currentsearchingbooks. Add (searchingbooks, true, false );
Currentsearchingbooks. Add (currentbooks, true, false );
Indexsearcher searcher = new indexsearcher (directory );
Hits hits = searcher. Search (currentsearchingbooks );

When and when is or? The key lies in the parameter of the add method of the booleanquery object.

Parameter 1 is the query condition to be added.

Parameter 2: Does required indicate that this condition must be met? True indicates that the condition must be met. False indicates that the condition cannot be met.

Does parameter 3 prohibited indicate that this condition must be rejected? True indicates that the result that meets this condition must be excluded, and false indicates that the condition can be met.

Note that the size of and or can be expressed as a and-B if you want a and non-B, or + A-B.

By default, queryparser considers spaces as or links, just like Google. However, you can modify this attribute through the queryparser object.

5. phrasequery
Query phrase. There is mainly a concept of slop, that is, the displacement deviation between words,

This value will affect the score of the result. if slop is 0, it is the most matched. it is easy to understand the following example. The slop computing users do not need to understand it, but the slop is too large.
The query efficiency is affected, so we need to set this value to a smaller value in actual use.
Phrasequery does not care about the sequence of phrases. In addition to increasing the hit rate, phrasequery also has a great impact on performance,
Spannearquery can be used to control the sequence of phrases to improve performance.

Private bool matched (string [] phrase, int slop)
{
Phrasequery query = new phrasequery ();
Query. setslop (slop );

For (INT I = 0; I <phrase. length; I ++)
{
Query. Add (new term ("field", phrase [I]);
}

Hits hits = searcher. Search (query );
Return hits. Length ()> 0;
}

When you use queryparse to query phrases, you must set the slop value first. There are two ways to do this:

Query q2 = queryparser. parse ("" quick Fox "~ 1 ",//Method 1
"Field", new simpleanalyzer ());

Queryparser QP = new queryparser ("field", new simpleanalyzer ());
QP. setphraseslop (1 );//
Method 2

 

 

Original article: http://www.shenjk.com/detail/277

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.