1. Create a project
A) Reference Lucene. net. dll
Using Lucene. net. index;
Using Lucene. net. documents;
Using Lucene. net. analysis;
Using Lucene. net. analysis. Standard;
Using Lucene. net. search;
Using Lucene. net. queryparsers;
2. Create an index)
Before you start searching, you need to create an index and add the data you want to search.
A) create an index and use a class named indexwriter.
Indexwriter writer = new indexwriter (@ "C: \ your \ Index \ directory", new standardanalyzer (), true );
In this way, you can easily create a new index in the hard disk directory (you need to write permissions to the hard disk). You can also store the index to the memory or database.
B) then, after you create an index, you can open it when you need to add a document to it.
Indexwriter writer = new indexwriter (@ "C: \ your \ Index \ directory", new standardanalyzer (), false );
3. Add documents to the index
Once you open the index, you can add a document to it. every entity you insert into the index is a document. each domain in the index contains related information. in each domain, you need to specially catalog it, which means that no matter whether the domain meets the following requirements:
StoredIn the index (you can get any stored value from the index. This is useful for short fields, such as author, title, and so on)
Indexed(The indexing is required for the domain you want to request.)
Tokenized(Mark,Split words before recording)
Fields can be included in useful structures:
Public field (string fieldname, string fieldvalue, bool stored, bool indexed, bool tokenized );
For your convenience, you can also use static methods to create a new domain instance:
Field. Keyword ()-Store and catalog domain values without marking
Field. Text (string name, streamreader Val)-The domain is not stored, but can be indexed and marked.
Field. Text (string name, string Val)-The domain is stored, indexed, and marked.
Field. unindexed ()-The domain is only stored.
If you want to obtain the tag value and index, you need to support unformatted text. To compile other formatted text, you need a word divider. We will view an HTML document for analysis.
In our example, we will index a text file by using the following fields:
"Filename"(Storage, not indexed, not marked)
"Text"(Not stored, indexed, and tagged)
When we use such a domain, we can only search for the content (the "text" field)
Void adddocument (indexwriter writer, string path)
{
Document Doc = new document ();
Streamreader sr = new streamreader (path, system. Text. encoding. Default );
Doc. Add (field. Text ("text", Sr ));
Doc. Add (field. Keyword ("FILENAME", PATH ));
Writer. adddocument (DOC );
Sr. Close ();
}
4. Save the index
Do not forget to save the index
Writer. Close ();
You can have an index before disabling indexwriter.
Writer. Optimize ();
Writer. Close ();
Optimization may take some time, but it can improve the search performance. You should optimize it before the index write operation is completed.
5. Start searching
Before you search for an index, useIndexsearcherYou can open it and perform a search when there is another process or optimization for adding a document.
Indexsearcher searcher = new indexsearcher (@ "C: \ your \ Index \ directory ");
Then you can create a query:
String q = "dotlucene ";
Query query = queryparser. parse (Q, "text", new standardanalyzer ());
Now we can get the search results and print them out. Our function prints out the file name, because we store other fields in the index.
Hits hits = searcher. Search (query );
Console. writeline ("found" + hits. Length () + "document (s) that matched query '" + q + "': \ r \ n ");
For (INT I = 0; I
Document Doc = hits. DOC (I );
Console. writeline (Doc. Get ("FILENAME") + "\ r \ n ");
}
Finally, do not forget to close the search engine:
Searcher. Close ();
6. query syntax (query syntax)
Query |
Example |
Notes |
Single Term Word |
Document |
Searches for documents that contain "document" term in the default field. Search for the default domain containing the word "document" in a document |
Phrase Phrase |
"Important Document" |
Searches for documents that contain the phrase "important document" in the default fiels. Search for the default domain of the phrase "important document" in the document |
Searching Fields Search domain |
Title: Document |
Searches for documents that contain "document" term in the "title" field. Search for the word "document" in the "title" field in the document" |
Wildcard search Wildcard search |
Doc? Ment |
Single-character wildcard search. It will match "document" and "dociment" but not "docooment ". When a letter or wildcard is queried, it matches "Doc ".UMent "and" DocIMent ", but does not match" DocOoMent" |
|
Document * |
Multi-character wildcard search. It will match "document" and "documentation ". Multi-letter wildcard query, which matches "document" and "documentation" |
Fuzzy search Fuzzy search |
Document ~ |
Search based on similar spelling. Query similar spelling |
|
Document ~ 0.9 |
Search based on similar spelling. 0.9 is the required similarity (default: 0.5) If the query similarity is 0.9, the default value is 0.5. |
Proximity search Near Query |
"Important Document "~ 5 |
Find words of a phrase that are not next to each other. Maximum Distance in this example is 5 words. Search for words in a phrase. In the following phrase, the longest distance is five words. |
Range search Range Query |
Author: {Einstein to Newton} |
Searches for document with "author" field value between specified values. The "author" value in the query document is between the list values. |
|
Date: {20050101 to 20050201} |
Searches for document with "date" field (datetime type) value between specified dates. Query the "date" (date type) in the document between the listed values |
Relevance Appropriate |
Important ^ 4 Document |
Set boost factor of the term "important" to 4. Default boost factor is 1. |
|
"Important Document" ^ 4 "Search Engine" |
You can set boost factor for phrases too. |
Or Operator Or operation |
Important Document |
"Or" is the default operator. |
|
Important or document |
The default field must contain either "important" or "document ". The default document must contain "important" or "document" |
And Operator And operations |
Important and document |
The default field must contain both word. The default document must contain all words. |
+ Operator + Operation |
Important + document |
The default field must contain "document" and may contain "important ". The default document must contain "document" and can contain "important" |
Not/-Operator Non-operation |
-Important Document |
The default field must contain "document" but not "important ". The default document must contain "document", but "important" is not saved" |
Grouping Group |
(Important or office) and document |
Use parentheses for expression grouping. Use garden arc grouping |
|
Author :( Einstein or Newton) |
Parentheses work with fields as well. |
7. Prohibited queries (query prohibited)
Query |
Examples |
Notes |
Wildcard at the beginning of a term Wildcard at the beginning of a word |
? Ocument, * ocument |
Throws Lucene. net. queryparsers. parseexception. |
Stop Words Comma |
A, the, and |
Stop words are not indexed. |
Special characters: +-& |! () {} [] ^ "~ *? :\ Special characters |
\ + ,\: |
Use a backslash to escape the special characters. |
Learning accumulation:
1.TermqueryQuery a specific word
Term T = new term ("ISBN", "1930110995 ");
Query query = new termquery (t );
2.Rangequery Used for query range,Usually used for time query
For example:Query query = new rangequery (new term ("time", "20070516"), new term ("time", "20070517"), false );
The third parameter of rangequery is used to indicate whether the start and end dates are included.
3.PrefixqueryIt is used to search whether a specific prefix is included. It is often used for catalog retrieval.
Prefixquery query = new prefixquery (new term ("category", "/computers "));
4.BooleanqueryUsed to test whether multiple conditions are met.
Termquery searchingbooks =
New termquery (new term ("subject", "JUnit "));
Rangequery currentbooks =
New rangequery (new term ("pubmonth", "200301 "),
New term ("pubmonth", "200312 "),
True );
Booleanquery currentsearchingbooks = new booleanquery ();
Currentsearchingbooks. Add (searchingbooks, true, false );
Currentsearchingbooks. Add (currentbooks, true, false );
Indexsearcher searcher = new indexsearcher (directory );
Hits hits = searcher. Search (currentsearchingbooks );
When and when is or? The key lies in the parameter of the add method of the booleanquery object.
Parameter 1 is the query condition to be added.
Parameter 2: Does required indicate that this condition must be met? True indicates that the condition must be met. False indicates that the condition cannot be met.
Does parameter 3 prohibited indicate that this condition must be rejected? True indicates that the result that meets this condition must be excluded, and false indicates that the condition can be met.
Note that the size of and or can be expressed as a and-B if you want a and non-B, or + A-B.
By default, queryparser considers spaces as or links, just like Google. However, you can modify this attribute through the queryparser object.
5. phrasequery
Query phrase. There is mainly a concept of slop, that is, the displacement deviation between words,
This value will affect the score of the result. if slop is 0, it is the most matched. it is easy to understand the following example. The slop computing users do not need to understand it, but the slop is too large.
The query efficiency is affected, so we need to set this value to a smaller value in actual use.
Phrasequery does not care about the sequence of phrases. In addition to increasing the hit rate, phrasequery also has a great impact on performance,
Spannearquery can be used to control the sequence of phrases to improve performance.
Private bool matched (string [] phrase, int slop)
{
Phrasequery query = new phrasequery ();
Query. setslop (slop );
For (INT I = 0; I <phrase. length; I ++)
{
Query. Add (new term ("field", phrase [I]);
}
Hits hits = searcher. Search (query );
Return hits. Length ()> 0;
}
When you use queryparse to query phrases, you must set the slop value first. There are two ways to do this:
Query q2 = queryparser. parse ("" quick Fox "~ 1 ",//Method 1
"Field", new simpleanalyzer ());
Queryparser QP = new queryparser ("field", new simpleanalyzer ());
QP. setphraseslop (1 );//Method 2
Original article: http://www.shenjk.com/detail/277