A summary of Lucene search methods

Source: Internet
Author: User

turn from: summerbellhttp: // www.iteye.com/topic/569358

1. Multi-field Search

You can use Multifieldqueryparser to specify multiple search fields.

Query query = multifieldqueryparser.parse ("name*", new string[] {fieldname, fieldvalue}, Analyzer);

Indexreader reader = indexreader.open (directory);

Indexsearcher searcher = new Indexsearcher (reader);

Hits hits = Searcher.search (query);

2. Multi-Criteria Search

In addition to using Queryparser.parse to decompose complex search syntax, you can combine multiple query to achieve the goal.

Query Query1 = new Termquery (New term (fieldvalue, "name1′)"); Word Search

Query Query2 = new Wildcardquery (New term (fieldname, "name*")); Wildcard characters

Query query3 = new Prefixquery (New term (fieldname, "name1′)"); Field Search Field:keyword, automatically add at the end *

Query Query4 = new Rangequery (New term (FieldNumber, numbertools.longtostring (11l)), new term (FieldNumber, Numbertools.longtostring (13l)), true); Range Search

Query query5 = new Filteredquery (query, filter); Search with filter criteria

Booleanquery query = new Booleanquery ();

Query.add (Query1, booleanclause.occur.must);

Query.add (Query2, booleanclause.occur.must);

Indexsearcher searcher = new Indexsearcher (reader);

Hits hits = Searcher.search (query);

3. Filtering

Filter the search results using filter to get more accurate results in a smaller range.

For example, we search for goods that are in the shelf time between 2005-10-1 and 2005-10-30.

For datetime, we need to convert it to be added to the index library, and it must also be an indexed field. Index

Document.add (Fielddate, datefield.datetostring (date), Field.store.yes, field.index.un_tokenized);

...

Search

Filter filter = new Datefilter (fielddate, DateTime.Parse ("2005-10-1′"), DateTime.Parse ("2005-10-30′");

Hits hits = Searcher.search (query, filter);

In addition to DateTime, you can also use integers. For example, the search price between 100 ~ 200 items.

Lucene.Net Numbertools for the number of the complement processing, if you need to use floating-point numbers can refer to the source code. Index

Document.add (new Field (FieldNumber, numbertools.longtostring (long) price), Field.store.yes, Field.index.un_ tokenized));

...

Search

Filter filter = new Rangefilter (FieldNumber, numbertools.longtostring (100l), numbertools.longtostring (200l), True, true);

Hits hits = Searcher.search (query, filter);

Use query as the filter condition. QueryFilter filter = new QueryFilter (Queryparser.parse ("name2′, Fieldvalue, analyzer)";

We can also use Filteredquery for multi-condition filtering.

Filter filter = new Datefilter (fielddate, DateTime.Parse ("2005-10-10′"), DateTime.Parse ("2005-10-15′");

Filter Filter2 = new Rangefilter (FieldNumber, numbertools.longtostring (11l), numbertools.longtostring (13l), True, True );

Query query = queryparser.parse ("name*", fieldname, analyzer);

query = new Filteredquery (query, filter);

query = new Filteredquery (query, filter2);

Indexsearcher searcher = new Indexsearcher (reader);

Hits hits = Searcher.search (query);

4. Distribution Search

we can use Multireader or Multisearcher Search multiple Index libraries.

Multireader reader = new Multireader (new indexreader[] {Indexreader.open (@ "C:\index"), Indexreader.open (@ "\\server\ Index ")});

Indexsearcher searcher = new Indexsearcher (reader);

Hits hits = Searcher.search (query);

Or

Indexsearcher searcher1 = new Indexsearcher (reader1);

Indexsearcher searcher2 = new Indexsearcher (READER2);

Multisearcher searcher = new Multisearcher (new searchable[] {searcher1, searcher2});

Hits hits = Searcher.search (query);

You can also use Parallelmultisearcher for multi-threaded parallel searches.

5. Display the search syntax string

We've combined a number of search terms, and we might want to see what the string of search syntax is like. Booleanquery query = new Booleanquery ();

Query.add (Query1, True, false);

Query.add (Query2, True, false);

...

Console.WriteLine ("Syntax: {0}", query.tostring ());

Output:

Syntax: + (name:name* value:name*) +number:[0000000000000000b to 0000000000000000d]

Oh, it's so simple.

6. How to delete an index

Lucene provides two ways to remove a document from an index method , one of which is

void deletedocument (int docnum)

This method is based on the document in the index of the deletion, each document added to the index will have a unique number, so according to the number is deleted is an exact deletion, but this number is the internal structure of the index, generally we do not know the number of a file in the end is a few, so it is not very useful. The other is

void Deletedocuments (term)

This method actually performs a search operation based on the term of the parameter, and then deletes the search results in bulk. We can use this method to provide a strict query condition to delete the specified document.

An example is given below:

Directory dir = Fsdirectory.getdirectory (PATH, false);

Indexreader reader = Indexreader.open (dir);

Term term = new term (field, key);

Reader.deletedocuments (term);

Reader.close ();

Ms also has operation

Deletedocuments (term);   Deletedocuments (term[]);   Deletedocuments (Query); Deletedocuments (query[]);

7. How to update the index

Note: According to multiple responses, a new version of Lucene and a method for updating the index are provided.

Writer.updatedocument (DOC);

———————————————————— Javaeye Split Line ——————————————

Lucene does not provide a dedicated index update method, we need to delete the corresponding document before adding the new document to the index. For example:

Directory dir = Fsdirectory.getdirectory (PATH, false);

Indexreader reader = Indexreader.open (dir);

Term term = new term ("title", "Lucene introduction");

Reader.deletedocuments (term);

Reader.close ();

IndexWriter writer = new IndexWriter (dir, New StandardAnalyzer (), true);

Document doc = new document ();

Doc.add (New Field ("title", "Lucene Introduction", Field.Store.YES, Field.Index.TOKENIZED));

Doc.add (New Field ("Content", "Lucene is funny", Field.Store.YES, Field.Index.TOKENIZED));

Writer.adddocument (DOC);

Writer.optimize ();

Writer.close ();

8. A variety of search

/** * * * A keyword to query a field * * * *

Queryparser QP = new Queryparser ("content", analyzer);

query = qp.parse (keyword);

Hits Hits = searcher.search (query);

/** * * * Fuzzy Query * * * *

Term term = new term ("content", keyword);

Fuzzyquery FQ = new Fuzzyquery (term);

Hits Hits = Searcher.search (FQ);

/** * * * a keyword, in two fields query * * * *

/*

* 1.booleanclause.occur[] Three kinds: must: + and Must_not:-Not Should:or

* 2. The following query means: The content must contain the keyword, and the title does not matter

* 3. The length of occur[] must be the same as the length of fields[] in the query below. Each constraint corresponds to a field

*/

Booleanclause.occur[] flags = new Booleanclause.occur[]{booleanclause.occur.should,booleanclause.occur.must};

Query=multifieldqueryparser.parse (keyword,new string[]{"title", "Content"},flags,analyzer);

/** * * * TWO (multiple) keywords to query two (more) fields, default matching rule * * *

/*

* 1. The number of keywords must be equal to the number of fields

* 2. Because there is no matching rule, the default is "should" therefore, the following query means: "title" contains Keyword1 or "content" contains keyword2.

* In this example, the Keyword1 and Keyword2 are the same

*/

Query=multifieldqueryparser.parse (New String[]{keyword,keyword},new

string[]{"title", "Content"},analyzer);

/** * * * TWO (multiple) keywords to query two (more) fields, manually specify matching rules * * *

/*

* 1. Number of required keywords = = number of field names = = Number of matching rules

* 2. The following query means: "title" must not contain keyword1, and "content" must contain KEYWORD2

*/

Booleanclause.occur[] Flags = new

Booleanclause.occur[]{booleanclause.occur.must_not,booleanclause.occur.must};

Query=multifieldqueryparser.parse (New String[]{keyword,keyword},new

string[]{"title", "Content"},flags,analyzer);

/** * * * Query the date Type field * * * *

/** * * * * Query the range of numbers * * * *

/*

* 1. Two conditions must be the same field

* 2. The previous condition must be smaller than the following one, or the data cannot be found

* The third parameter in the 3.new rangequery indicates whether it contains "=" true: >= or <= false: > or <

* 4. Find 55>=id>=53 or 60>=id>=57:

*/

Term LowerTerm1 = new term ("id", "53");

Term UpperTerm1 = new term ("id", "55");

Rangequery rq1 = new Rangequery (lowerterm1,upperterm1,true);

Term LowerTerm2 = new term ("id", "57");

Term UpperTerm2 = new term ("id", "60");

Rangequery rq2 = new Rangequery (lowerterm2,upperterm2,true);

Booleanquery BQ = new Booleanquery ();

Bq.add (rq1,booleanclause.occur.should);

Bq.add (rq2,booleanclause.occur.should);

Hits Hits = Searcher.search (BQ);

9. Sorting results

There are two key points for sorting:

1: First the field you want to sort must be index, and it is untokenized.

Such as:

Doc.add (New Field ("Click", Dv.get ("click"). ToString (), Field.Store.NO, Field.Index.UN_TOKENIZED));

2: At the time of the search:

Such as:

/***** Sort *****/

/*

* 1. The sorted field must be indexed (INDEXECD) and cannot be used at index Field.Index.TOKENIZED

* (with un_tokenized can be normal implementation.) The query is normal with no, but the sorting does not set up the lifting sequence normally.

* 2.SortField Type

* Score, DOC, AUTO, STRING, INT, FLOAT, CUSTOM This type is primarily based on the type of field selected

* The third parameter of 3.SortField indicates whether it is descending true: Descending false: Ascending

*/

Sort sort = new sort (new Sortfield[]{new SortField ("Click", Sortfield.int, True)});

Hits Hits = Searcher.search (Querystring,sort);

/*

* Sort By date

*/

Sort sort = new sort (new Sortfield[]{new SortField ("Createtime", Sortfield.int, False)});

/***** Filter ******/

Queryparser qp1 = new Queryparser ("content", analyzer);

Query fquery = Qp1.parse ("i");

Booleanquery BQF = new Booleanquery ();

Bqf.add (fquery,booleanclause.occur.should);

QueryFilter QF = new QueryFilter (BQF);

Hits Hits = searcher.search (query);

10. Merging small index files into large index files (this method performs poorly)

/** to merge small index files into large index files

* @param from the file that will be merged into the to file

* @param to merge the from file into the file

* @param Analyzer

*/

private void Mergeindex (File from,file to,analyzer Analyzer)

{

IndexWriter indexwriter = null;

try{

SYSTEM.OUT.PRINTLN ("Merging index file!\t");

IndexWriter = new IndexWriter (To,analyzer, false);

Indexwriter.setmergefactor (100000);

Indexwriter.setmaxfieldlength (Integer.max_value);

Indexwriter.setmaxbuffereddocs (Integer.max_value);

Indexwriter.setmaxmergedocs (Integer.max_value);

Fsdirectory[] fs = {fsdirectory.getdirectory (from,false)};

Indexwriter.addindexes (FS);

Indexwriter.optimize ();

Indexwriter.close ();

System.out.println ("Consolidated!\t completed");

}

catch (Exception e)

{

Utility.writelog ("Error merging index file! Mergeindex () "+e.getmessage ()," ");

}

Finally

{

try{

if (indexwriter!=null)

Indexwriter.close ();

}

catch (Exception e) {

}

}

}

The merger time is from 3 o'clock in the morning every day, until about 9 in the morning, a full 5 hours before the merger is completed, where the large index file size of 4G, small index of 10MB.

11. Question 2: The principle of local statistics of the single-word co-occurrence frequency

Answer:

The theoretical basis of high frequency string statistics is the n-meta model.

Set W1 W2 ... The WN is a string of length n, then the likelihood of the string W is P (w) = P (w i | W1 w2 ... w i-1) (1) The meaning of the above formula reflects the degree of union between successive n words, if several different historical combinations W1 W2 ... The last N-1 of the WN are the same, and they are all considered a category. In this hypothesis, the probability that each word appears is no longer related to the previous history, only with the nearest N-1 word, the prior probability of the string is P (w) = P (w i-(n-1) w i-(n-2) ... w i-1) (2) when P (W) is more than one The combination of these n characters is strong, we can assume that the string can be regarded as a "word".

It is according to the above principles, pre-treatment of Word segmentation text each word occurrence count and record their location in the text (as shown in the attachment legend), after preprocessing we traverse the single frequency statistics list more than 2 occurrences of all the words in the text appear in the position I, Judge position i+ 1 The number of occurrences of the word is also greater than 2, if the number of words to determine the position i+2 is also greater than 2, and so on until the position i+n+1 the number of occurrences less than 2, to obtain the candidate phrase W (i,i+1...i+n) and put into the candidate vocabulary set, Finally, the prefix suffix of the set of candidate vocabulary is processed to obtain the appropriate high frequency vocabulary set result

12. Index Merging

Writer.addindexes (Indexdirs);

A summary of Lucene search methods

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.