Lucene sequencing, setting weights, Optimizing, distributed Search (RPM)

Source: Internet
Author: User

Lucene sequencing, setting weights, Optimizing, distributed Search (RPM) 1. Basic applications
Using System;
Using System.Collections.Generic;
Using System.Text;
Using Lucene.Net;
Using Lucene.Net.Analysis;
Using Lucene.Net.Analysis.Standard;
Using Lucene.Net.Documents;
Using Lucene.Net.Index;
Using Lucene.Net.QueryParsers;
Using Lucene.Net.Search;
Using Lucene.Net.Store;
Using Lucene.Net.Util;

Namespace Consoleapplication1.lucene
{
public class Lucenetest
{
Private Const string FieldName = "name";
Private Const string Fieldvalue = "value";

Private Directory directory = new ramdirectory ();
Private Analyzer Analyzer = new StandardAnalyzer ();

Public Lucenetest ()
{
}

private void Index ()
{
IndexWriter writer = new IndexWriter (directory, analyzer, true);
writer.maxfieldlength = 1000;

for (int i = 1; i <=; i++)
{
Document document = new document ();

Document. ADD (New Field (FieldName, "name" + I, Field.Store.YES, Field.Index.UN_TOKENIZED));
Document. ADD (New Field (Fieldvalue, "Hello, world!", Field.Store.YES, Field.Index.TOKENIZED));

Writer. Adddocument (document);
}

Writer. Optimize ();
Writer. Close ();
}

private void Search ()
{
Query query = queryparser.parse ("name*", FieldName, analyzer);

Indexsearcher searcher = new Indexsearcher (directory);

Hits Hits = searcher. Search (query);

Console.WriteLine ("Qualified record: {0}; Total Index Library records: {1} ", hits. Length (), searcher. Reader.numdocs ());
for (int i = 0; I < hits. Length (); i++)
{
int docId = hits. Id (i);
String name = Hits. Doc (i). Get (FieldName);
String value = Hits. Doc (i). Get (Fieldvalue);
FLOAT score = hits. Score (i);

Console.WriteLine ("{0}: Docid:{1}; NAME:{2}; VALUE:{3}; Score:{4} ",
i + 1, docId, name, value, score);
}

Searcher. Close ();
}
}
}


In addition to Ramdirectory, you can also use Fsdirectory. (Note that Fsdirectory.getdirectory's create parameter, true when an existing index library file is deleted, can be judged by the indexreader.indexexists () method.) )

Opens an existing index library from the specified directory.

Private Directory directory = fsdirectory.getdirectory ("C:\index", false);


Load the index library into memory to increase the search speed.

Private Directory directory = new Ramdirectory (Fsdirectory.getdirectory (@ "C:\index", false));
Or
Private Directory directory = new Ramdirectory (c:\index ");


2. Multi-field Search

You can use Multifieldqueryparser to specify multiple search fields.

Query query = multifieldqueryparser.parse ("name*", new string[] {FieldName, fieldvalue}, Analyzer);

Indexreader reader = indexreader.open (directory);
Indexsearcher searcher = new Indexsearcher (reader);
Hits Hits = searcher. Search (query);


3. Multi-Criteria Search

In addition to using Queryparser.parse to decompose complex search syntax , you can combine multiple Query to achieve the goal.

Query Query1 = new Termquery (New term (fieldvalue, "name1")); Word Search
Query Query2 = new Wildcardquery (New term (FieldName, "name*")); Wildcard characters
Query query3 = new Prefixquery (New term (FieldName, "name1")); Field Search Field:keyword, automatically add at the end *
Query Query4 = new Rangequery (New term (FieldNumber, numbertools.longtostring (11L)), new term (FieldNumber, Numbertools.longtostring (13L)), true); Range Search
Query query5 = new Filteredquery (query, filter); Search with filter criteria

Booleanquery query = new Booleanquery ();
Query. ADD (Query1, BooleanClause.Occur.MUST);
Query. ADD (Query2, BooleanClause.Occur.MUST);

Indexsearcher searcher = new Indexsearcher (reader);
Hits Hits = searcher. Search (query);


4. Set weights

You can add weight (Boost) to Document and Field to make it more forward in the search results rankings. By default, search results are sorted by Document.score, and the larger the number, the higher the top. The Boost default value is 1.

Score = score * Boost


With the above formula, we can set different weights to influence the rankings.

The following example sets different weights according to the VIP level.

Document document = new document ();
Switch (VIP)
{
Case VIP. Gold:document. Setboost (2F); Break
Case VIP. Argentine:document. Setboost (1.5F); Break
}


As long as Boost is big enough, then you can let a hit result is always ranked first, this is Baidu and other sites "charge ranking" business. Obviously unfair, despise one.

5. Sorting

With SortField's construction parameters, we can set sort fields, sort conditions, and inverted rows.

Sort sort = new sort (new SortField (FieldName, Sortfield.doc, false));

Indexsearcher searcher = new Indexsearcher (reader);
Hits Hits = searcher. Search (query, sort);


Sorting is still a big influence on search speed, so try not to use more than one sort condition.

6. Filtering

Filter the search results using filter to get more accurate results in a smaller range.

For example, we search for goods that are in the shelf time between 2005-10-1 and 2005-10-30.
For datetime, we need to convert it to be added to the index library, and it must also be an indexed field.

Index
Document. ADD (Fielddate, datefield.datetostring (date), Field.Store.YES, Field.Index.UN_TOKENIZED);

//...

Search
Filter filter = new Datefilter (fielddate, DateTime.Parse ("2005-10-1"), DateTime.Parse ("2005-10-30"));
Hits Hits = searcher. Search (query, filter);


In addition to DateTime, you can also use integers. For example, the search price between 100 ~ 200 items.
Lucene.Net Numbertools for the number of the complement processing, if you need to use floating-point numbers can refer to the source code.

Index
Document. ADD (New Field (FieldNumber, numbertools.longtostring (long) price), Field.Store.YES, Field.Index.UN_TOKENIZED));

//...

Search
Filter filter = new Rangefilter (FieldNumber, numbertools.longtostring (100L), numbertools.longtostring (200L), True, true);
Hits Hits = searcher. Search (query, filter);


Use Query as the filter condition.

QueryFilter filter = new QueryFilter (Queryparser.parse ("name2", Fieldvalue, analyzer));


We can also use Filteredquery for multi-condition filtering.

Filter filter = new Datefilter (fielddate, DateTime.Parse ("2005-10-10"), DateTime.Parse ("2005-10-15"));
Filter filter2 = new Rangefilter (FieldNumber, numbertools.longtostring (11L), numbertools.longtostring (13L), True, True );

Query query = queryparser.parse ("name*", FieldName, analyzer);
query = new Filteredquery (query, filter);
query = new Filteredquery (query, filter2);

Indexsearcher searcher = new Indexsearcher (reader);
Hits Hits = searcher. Search (query);


7. Distribution Search

We can search multiple index libraries using Multireader or Multisearcher.

Multireader reader = new Multireader (new indexreader[] {Indexreader.open (@ "C:\index"), Indexreader.open (@ "\\server\ Index ")});
Indexsearcher searcher = new Indexsearcher (reader);
Hits Hits = searcher. Search (query);


Or

Indexsearcher searcher1 = new Indexsearcher (reader1);
Indexsearcher searcher2 = new Indexsearcher (READER2);
Multisearcher searcher = new Multisearcher (new searchable[] {searcher1, searcher2});
Hits Hits = searcher. Search (query);


You can also use Parallelmultisearcher for multi-threaded parallel searches.

8. Merging index libraries

Merges the directory1 into the Directory2.

Directory directory1 = fsdirectory.getdirectory ("Index1", false);
Directory Directory2 = fsdirectory.getdirectory ("Index2", false);

IndexWriter writer = new IndexWriter (Directory2, analyzer, false);
Writer. Addindexes (new directory[] {Directory});
Console.WriteLine (writer. Doccount ());
Writer. Close ();


9. Display the search syntax string

We've combined a number of search terms, and we might want to see what the string of search syntax is like.

Booleanquery query = new Booleanquery ();
Query. ADD (Query1, True, false);
Query. ADD (Query2, True, false);
//...

Console.WriteLine ("Syntax: {0}", query.) ToString ());


Output:
Syntax: + (name:name* value:name*) +number:[0000000000000000b to 0000000000000000d]

Oh, it's so simple.

10. Manipulating the Index Library

Delete (soft delete, only the delete tag is added.) Called Indexwriter.optimize () after the real deletion. )

Indexreader reader = indexreader.open (directory);

Deletes the Document for the specified ordinal (DocId).
Reader. Delete (123);

Deletes the Document containing the specified term.
Reader. Delete (New term (fieldvalue, "Hello"));

Recover soft Delete.
Reader. Undeleteall ();

Reader. Close ();


Incremental update (just set the Create parameter to False to add new data to an existing index library.) )

Directory directory = fsdirectory.getdirectory ("index", false);
IndexWriter writer = new IndexWriter (directory, analyzer, false);
Writer. Adddocument (Doc1);
Writer. Adddocument (DOC2);
Writer. Optimize ();
Writer. Close ();


11. Optimization

Increasing the merge factor (Mergefactor) and the minimum number of document merges (MINMERGEDOCS) helps improve performance and reduce indexing time when you are adding indexes to fsdirectory in bulk.

IndexWriter writer = new IndexWriter (directory, analyzer, true);

writer.maxfieldlength = 1000; Field Maximum length
Writer.mergefactor = 1000;
Writer.minmergedocs = 1000;

for (int i = 0; i < 10000; i++)
{
Add documentes ...
}

Writer. Optimize ();
Writer. Close ();


Related parameter description


Go to the deep Lucene indexing mechanism

With Lucene, you can make the most of your machine's hardware resources to improve the efficiency of your index in the project you are creating your index. When you need to index a large number of files, you will notice that the bottleneck of the indexing process is the process of writing index files to disk. To solve this problem, Lucene holds a buffer in memory. But how do we control the Lucene buffer? Fortunately, Lucene's class IndexWriter provides three parameters to adjust the size of the buffer and how often to write index files to disk.

1. Merge factor (Mergefactor)

This parameter determines how many documents can be stored in an index block in Lucene and how often the index blocks on the disk are merged into a large index block. For example, if the value of the merge factor is 10, then all documents must be written to a new index block on disk when the number of documents in memory reaches 10. Also, if the number of index blocks on the disk reaches 10, the 10 index blocks are merged into a new index block. The default value for this parameter is 10, which is very inappropriate if the number of documents required for indexing is very numerous. For batch indexing, assigning a larger value to this parameter results in a better index effect.

2. Minimum number of merged documents (MINMERGEDOCS)

This parameter also affects the performance of the index. It determines the minimum number of documents in memory that can be written back to disk. The default value for this parameter is 10, and if you have enough memory, setting this value as large as possible will significantly improve indexing performance.

3. Maximum number of merged documents (MAXMERGEDOCS)

This parameter determines the maximum number of documents in an index block. Its default value is Integer.max_value, setting this parameter to a larger value can improve index efficiency and retrieval speed, because the default value of this parameter is the maximum value of integral type, so we generally do not need to change this parameter.

Lucene sequencing, setting weights, Optimizing, distributed Search (RPM)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.