Lucene, Lucene. NET detailed usage and optimization [go]

Last Update:2017-07-04 Source: Internet

Author: User

Tags bitset

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1 About Lucene
1.1 What is Lucene
Lucene is a full-text search framework, not an app product. So it doesn't work like www.baidu.com or Google Desktop, it just provides a tool to enable you to implement these products.

1.2 What Lucene can do
To answer this question, first understand the nature of Lucene. In fact, Lucene is a very simple function, after all, you give it a number of strings, and then it provides you with a full-text search service, tell you where the keywords you want to search appear. Knowing the nature, you can imagine doing anything that fits this condition. You can index the news in the station, and make a database; You can index several fields of a table, so you don't have to worry about locking the table because of "%like%"; you can also write your own search engine ...

1.3 You should not choose Lucene
Here are some test data, if you feel acceptable, then you can choose.
Test one: 2.5 million records, 300M text, generate index around 380M, 800 threads under average processing time 300ms.
Test two: 37000 records, index database of two varchar fields, index file 2.6m,800 thread under average processing time 1.5ms.

2 How Lucene works
The services provided by Lucene actually consist of two parts: one in one out. The so-called entry is written, the source you provide (essentially a string) is written to the index or deleted from the index, so-called read out, that is, to provide users with full-text search services, so that users can locate the source by keyword.

2.1 Write Process
The source string is first processed by the analyzer, including: participle, divided into words; remove stopword (optional).
Add the required information from the source to each field in the document and index the field that needs to be indexed to store the fields that need to be stored.
Writes an index to memory, which can be either memory or disk.

2.2 Read out Process
Users to provide search keywords, through the analyzer processing.
Find the corresponding document for the Processed keyword search index.
The user extracts the required field from the document that is found as needed.

3 Some concepts to be aware of
Lucene uses some concepts to understand what they mean and is helpful for the following explanations.

3.1 Analyzer
Analyzer is a parser, it is the role of a string according to a certain rule into a word, and remove the invalid words, here said invalid word refers to the English "of", "the", "in Chinese", "ground" and other words, these words appear in the article, However, it does not contain any key information, which is helpful for shrinking index files, improving efficiency, and increasing hit ratios.
The rules of participle are changeable, but the purpose is only one: chapeau division. This is easier to achieve in English, because English itself is a word unit, has been separated by a space, while the Chinese must be in some way to divide into a sentence into a word. The specific partitioning method is described in detail below, where you only need to understand the parser concept.

3.2 Document
User-supplied sources are records that can be a text file, a string, or a record of a database table, and so on. Once a record has been indexed, it is stored in the index file as a document. The user searches and is returned as a list of the document.

3.3 Field
A document can contain multiple fields of information, such as an article that can contain information fields such as title, Body, and last modified, which are stored in document by field.
Field has two properties to choose from: Storage and indexing. You can control whether the field is stored by storing properties, and you can control whether the field is indexed by indexed properties. This may seem a bit of crap, in fact it's important to have the right combination of these two properties, as illustrated below:
As an example of the previous article, we need to search the title and body text, so we want to set the Index property to True, and we want to be able to extract the title of the article directly from the search results, so we set the Title field storage property to true, but because the body field is too large, we have to reduce the index file size, Set the storage property of the body field to false, and then read the file directly when needed; we just want to be able to extract the last modified time from the search results and do not need to search for it, so we set the storage property of the last modified time domain to true and the indexed property to False. The above three fields cover three combinations of two properties, and there is no use of a fake one, in fact field does not allow you to set it, because fields that are neither stored nor indexed are meaningless.

3.4 Term
Term is the smallest unit of search, which represents a word in a document that consists of two parts: the word it represents and the field in which the word appears.

3.5 Tocken
Tocken is the occurrence of a term that contains the Trem text and corresponding start and end offsets, as well as a type string. Words can appear multiple times the same word, they are used in the same term, but with different tocken, each tocken mark the place where the word appears.

3.6 Segment
When you add an index, not every document is added to the same index file at once, they are written to a different small file and then merged into a large index file, where each small file is a segment.

4 Structure of Lucene
Lucene includes core and sandbox, where core is the central part of Lucene's stability, and the sandbox contains additional features such as highlighter, various analyzers.
Lucene Core has seven packages: Analysis,document,index,queryparser,search,store,util.
4.1 Analysis
The analysis contains some built-in analyzers, such as the Whitespaceanalyzer of a blank character participle, with the addition of Stopwrod filtered Stopanalyzer, the most commonly used standardanalyzer.
4.2 Document
Document contains the data structures of the documents, for example, the document class defines the data structure in which the documents are stored, and the field class defines a domain for the file.
4.3 index
Index contains read-write classes for indexes, such as the IndexWriter class for writing, merging, and optimizing the segment of index files, and the Indexreader class for reading and deleting an index, it is important not to be misled by the name Indexreader. Think that it is the read class of the index file, and actually delete the index is also done by it, indexwriter only care about how to write the index segment, and combine them to optimize; Indexreader is concerned with the organization of each document in the index file.
4.4 Queryparser
Queryparser contains a class to parse query statements, lucene query statements and SQL statements are a bit similar, there are various reserved words, according to a certain syntax can be composed of various queries. Lucene has many kinds of query classes, all of which inherit from query and execute various special queries, the function of Queryparser is to parse the query statements, and call various query classes in order to find out the results.
4.5 Search
Search contains the various classes that searched for results from the index, such as the various query classes just mentioned, including Termquery, Booleanquery, and so on in this package.
4.6 Store
The store contains indexed storage classes, such as directory-defined storage structures for index files, fsdirectory as indexes stored in the file, ramdirectory as indexes stored in memory, mmapdirectory as indexes that use memory mappings.
4.7 Util
Util contains some common tool classes, such as conversion tools between time and string.
5 How to build an index
5.1 Simplest piece of code to complete the index

IndexWriter writer = new IndexWriter ("/data/index/", New StandardAnalyzer (), true);
Document doc = new document ();
Doc.add (New Field ("title", "Lucene Introduction", Field.Store.YES, Field.Index.TOKENIZED));
Doc.add (New Field ("Content", "Lucene works well", Field.Store.YES, Field.Index.TOKENIZED));
Writer.adddocument (DOC);
Writer.optimize ();
Writer.close ();

Let's examine this code.
First we create a writer and specify that the directory for the index is "/data/index", the parser used is StandardAnalyzer, and the third parameter shows that if there are already indexed files in the index directory, we will overwrite them.
Then we create a new document.
We add a field to document named "title" with the content "Lucene Introduction", which is stored and indexed.
Add a field named "Content" with the content "Lucene works well", which is also stored and indexed.
Then we add this document to the index, and if there are multiple documents, you can repeat the above actions, create the documents and add them.
After adding all the document, we optimize the index, and the optimization is mainly to merge multiple segment into one, which helps to improve the index speed.
It is important that the writer is subsequently closed.

Yes, it's so easy to create an index!
Of course you may modify the above code to get a more personalized service.

5.2 Write index directly in memory
You need to first create a ramdirectory and pass it to writer with the following code:

Directory dir = new Ramdirectory ();
IndexWriter writer = new IndexWriter (dir, New StandardAnalyzer (), true);
Document doc = new document ();
Doc.add (New Field ("title", "Lucene Introduction", Field.Store.YES, Field.Index.TOKENIZED));
Doc.add (New Field ("Content", "Lucene works well", Field.Store.YES, Field.Index.TOKENIZED));
Writer.adddocument (DOC);
Writer.optimize ();
Writer.close ();

5.3 Index text file
If you want to index plain text files instead of reading them into a string to create a field, you can create a field with the following code:

Field field = new Field ("Content", new FileReader (file));

The file here is the text. The constructor actually reads the contents of the file and indexes it, but does not store it.

6 How to maintain indexes
The maintenance operations for indexes are provided by the Indexreader class.

6.1 How to delete an index
Lucene provides two ways to remove the document from the index, one of which is

void deletedocument (int docnum)

This method is based on the document in the index of the deletion, each document added to the index will have a unique number, so according to the number is deleted is an exact deletion, but this number is the internal structure of the index, generally we do not know the number of a file in the end is a few, so it is not very useful. The other is

void Deletedocuments (term)

This method actually performs a search operation based on the term of the parameter, and then deletes the search results in bulk. We can use this method to provide a strict query condition to delete the specified document.
An example is given below:

Directory dir = Fsdirectory.getdirectory (PATH, false);
Indexreader reader = Indexreader.open (dir);
Term term = new term (field, key);
Reader.deletedocuments (term);
Reader.close ();

6.2 How to update an index
Lucene does not provide a dedicated index update method, we need to delete the corresponding document before adding the new document to the index. For example:

Directory dir = Fsdirectory.getdirectory (PATH, false);
Indexreader reader = Indexreader.open (dir);
Term term = new term ("title", "Lucene introduction");
Reader.deletedocuments (term);
Reader.close ();

IndexWriter writer = new IndexWriter (dir, New StandardAnalyzer (), true);
Document doc = new document ();
Doc.add (New Field ("title", "Lucene Introduction", Field.Store.YES, Field.Index.TOKENIZED));
Doc.add (New Field ("Content", "Lucene is funny", Field.Store.YES, Field.Index.TOKENIZED));
Writer.adddocument (DOC);
Writer.optimize ();
Writer.close ();
7 How to search
Lucene's search is quite powerful, it provides a lot of auxiliary query classes, each class inherits from the query class, each to complete a special kind of query, you can use the same as building blocks in any combination of them to complete a number of complex operations, and Lucene also provides the sort class to sort the results, The filter class is provided to restrict the query criteria. You may unconsciously compare it to the SQL statement: "Can lucene perform and, or, order by, where, like '%xx% ' operations?" "The answer is:" Of course no problem! ”

7.1 All kinds of query
Let's take a look at what the Lucene allows us to do with the query:

7.1.1 Termquery
First introduce the most basic query, if you want to execute a query such as: "In the content domain contains ' lucene ' document", then you can use Termquery:

Term T = new term ("content", "Lucene";
Query query = new Termquery (t);

7.1.2 Booleanquery
If you want to query this: "Include Java or Perl document in the content domain", then you can create two termquery and connect them with Booleanquery:

Termquery termQuery1 = new Termquery (New term ("content", "Java");
Termquery termquery 2 = new Termquery (New term ("content", "Perl");
Booleanquery booleanquery = new Booleanquery ();
Booleanquery.add (termquery 1, BooleanClause.Occur.SHOULD);
Booleanquery.add (termquery 2, BooleanClause.Occur.SHOULD);

7.1.3 Wildcardquery
If you want to make a wildcard query for a word, you can use the Wildcardquery, which includes the '? ' character. Match an arbitrary character and ' * ' to match 0 or more arbitrary characters, for example you search ' use* ', you may find ' useful ' or ' useless ':

Query query = new Wildcardquery ("Content", "use*");

7.1.4 Phrasequery
You may be more interested in Sino-Japanese relations, and want to find the ' middle ' and ' Day ' close (5 words within the distance) of the article, beyond the distance of the non-consideration, you can:

Phrasequery query = new Phrasequery ();
Query.setslop (5);
Query.add (New term ("content", "medium"));
Query.add (New term ("content", "Day"));

Then it may search for "Sino-Japanese cooperation ...", "China and Japan ...", but not "a senior Chinese leader said that Japan owes a flat".

7.1.5 Prefixquery
If you want to search for words beginning with ' Medium ', you can use Prefixquery:

Prefixquery query = new Prefixquery (New term ("content", "medium");

7.1.6 Fuzzyquery
Fuzzyquery is used to search for similar terms using the Levenshtein algorithm. Suppose you want to search for words similar to ' Wuzza ', you can:

Query query = new Fuzzyquery ("Content", "Wuzza");

You may get ' fuzzy ' and ' Wuzzy '.

7.1.7 Rangequery
Another commonly used query is rangequery, you might want to search for document from 20060101 to 20060130 in the time domain, and you can use Rangequery:

Rangequery query = new Rangequery ("Time", "20060101"), New term ("Time", "20060130"), true);

The last true indicates the use of a closed interval.

7.2 Queryparser
Looking at so many query, you may ask: "I will not let myself combine the various query it, too troublesome!" "Of course not, Lucene provides a query statement similar to the SQL statement, we call it Lucene statement, through it, you can put a variety of queries, Lucene will automatically check them into small pieces to the corresponding query execution. Here's a demonstration of each query:
Termquery can be used in "Field:key" mode, such as "Content:lucene".
Booleanquery ' with ' with ' + ', ' or ' with ', for example ' Content:java Contenterl '.
Wildcardquery still use '? ' and ' * ', such as "content:use*".
Phrasequery with ' ~ ', such as "content:" Chinese and Japanese "-".
Prefixquery with ' * ', for example "medium *".
Fuzzyquery with ' ~ ', for example "Content:wuzza ~".
Rangequery uses ' [] ' or ' {} ', which represents a closed interval, which indicates an opening interval, such as "time:[20060101 to 20060130]", noting that the to is case-sensitive.
You can combine the query string arbitrarily to complete complex operations, such as "the title or body includes Lucene, and the time between 20060101 and 20060130 articles" can be expressed as: "+ (Title:lucene content:lucene) + Time:[20060101 to 20060130] ". The code is as follows:

Directory dir = Fsdirectory.getdirectory (PATH, false);
Indexsearcher is = new Indexsearcher (dir);
Queryparser parser = new Queryparser ("Content", new StandardAnalyzer ());
Query query = parser.parse ("+ (Title:lucene content:lucene) +time:[20060101 to 20060130]";
Hits Hits = is.search (query);
for (int i = 0; i < hits.length (); i++)
{
Document doc = Hits.doc (i);
System.out.println (Doc.get ("title");
}
Is.close ();

First we create a indexsearcher on the specified file directory.
Then create a queryparser that uses StandardAnalyzer as the parser, and the domain it searches by default is content.
We then use Queryparser to parse the string and generate a query.
The query is then used to find the results, and the results are returned in hits form.
This Hits object contains a list, and we show it in turn.

7.3 Filter
The role of filter is to restrict querying only a subset of the index, it is a bit like where in the SQL statement, but there is a difference, it is not part of the formal query, but the data source preprocessing, and then handed to the query statement. Note that it performs preprocessing rather than filtering the query results, so the cost of using filter is significant, and it can make a query take up to 100 times times more time.
The most commonly used filter is rangefilter and QueryFilter. Rangefilter is set to search only the index within the specified range; queryfilter is searched for in the results of the last query.
The use of filter is very simple, you just need to create a filter instance, and then pass it to searcher. To continue the above example, query "articles between 20060101 and 20060130" In addition to writing restrictions in the query string, you can also write in Rangefilter:

Directory dir = Fsdirectory.getdirectory (PATH, false);
Indexsearcher is = new Indexsearcher (dir);
Queryparser parser = new Queryparser ("Content", new StandardAnalyzer ());
Query query = Parser.parse ("Title:lucene content:lucene";
Rangefilter filter = new Rangefilter ("Time", "20060101", "20060230", true, true);
Hits Hits = is.search (query, filter);
for (int i < hits.length (); i++)
{
Document doc = Hits.doc (i);
System.out.println (Doc.get ("title");
}
Is.close ();

7.4 Sort
Sometimes you want a well-ordered result set, like the SQL statement's "ORDER By", which Lucene can do: by sort.
Sort sort sort ("time"); Equivalent to the "Order by Time" of SQL
Sort sort = new sort ("time", true); "ORDER BY time DESC" equivalent to SQL
The following is a complete example:

Directory dir = Fsdirectory.getdirectory (PATH, false);
Indexsearcher is = new Indexsearcher (dir);
Queryparser parser = new Queryparser ("Content", new StandardAnalyzer ());
Query query = Parser.parse ("Title:lucene content:lucene";
Rangefilter filter = new Rangefilter ("Time", "20060101", "20060230", true, true);
Sort sort = new sort ("time");
Hits Hits = is.search (query, filter, sort);
for (int i = 0; i < hits.length (); i++)
{
Document doc = Hits.doc (i);
System.out.println (Doc.get ("title");
}
Is.close ();

8 Analyzer
In the previous concept we have learned that the function of the parser is to divide the sentence into words in terms of semantics. The English segmentation already has the very mature Analyzer: StandardAnalyzer, in many cases standardanalyzer is a good choice. Even you will find that StandardAnalyzer can also be a word for Chinese.
But our focus is Chinese participle, standardanalyzer can support Chinese word segmentation? The practice proved to be possible, but the effect is not good, search "if" will "milk is not as good as juice" also search out, and index file is very large. So what else do we have on hand that can be used? There is no core inside, we can find two in the sandbox: Chineseanalyzer and Cjkanalyzer. But they also have the problem of not allowed to participle. In contrast with StandardAnalyzer and Chineseanalyzer indexing time is similar, index file size is similar, cjkanalyzer performance will be worse, index file large and time-consuming longer.
To solve the problem, first analyze the three parser of the word breaker. StandardAnalyzer and Chineseanalyzer are the sentences in a single word, that is, "milk is not as good as juice," they will be cut into "milk is not as good as juice", and Cjkanalyzer will be cut into "cow grandma, if the juice is good to drink." 。 This also explains why the search for "juice" can match this sentence.
There are at least two drawbacks to the above participle: mismatched matching and large index file. Our goal is to break down the above sentences into "milk is not as good as juice." The key here is semantic recognition, how do we recognize that "milk" is a word and "milk not" is not a word? We will naturally think of the word-base segmentation method, that is, we get a thesaurus, which lists most of the words, we divide the sentence in a certain way, when the resulting words and words in the Thesaurus match, we think this segmentation is correct. The process of cutting words into a matching process, and the most simple way to match the most positive maximum matching and inverse maximum match two, White is a sentence from the beginning of the match, a forward from the end of the sentence to match. Thesaurus-based word base is very important, the capacity of the thesaurus directly affects the search results, under the premise of the same thesaurus, it is said that the inverse maximum match is better than the forward maximum match.
Of course, there are other methods of word segmentation, which is a discipline in itself, and I do not have in-depth study here. Back to specific applications, our goal is to find mature, ready-made word breaker tools to avoid reinventing the wheel. After online search, with more is the CAS Ictclas and a non-open source but free je-analysis. Ictclas There is a problem is that it is a dynamic link library, Java calls require local method calls, inconvenient and security risks, and the word of mouth is really not good. Je-analysis effect is also good, of course, there will be no place, compared to more convenient and assured. = new = 0;9 performance optimization
All the way up here, we're still talking about how to get Lucene to run and finish the assignment. You can do most of the work using what you said earlier. However, the test shows that Lucene performance is not very good, in the case of large data volume concurrency even half a minute to return. It is also a time-consuming process to set up an index for data initialization of large data volumes. So how do you improve the performance of Lucene? Here are two ways to optimize the creation of index performance and optimize search performance.

9.1 Optimizing the creation of index performance
This way of optimization is relatively limited, IndexWriter provides some interface to control the operation of indexing, in addition we can first write the index ramdirectory, and then bulk write Fsdirectory, anyway, the purpose is to minimize the file Io, Because the biggest bottleneck in creating indexes is disk IO. Also, choosing a better parser can improve performance.

9.1.1 Optimizing index creation by setting parameters of IndexWriter
Setmaxbuffereddocs (int maxbuffereddocs)
Controls the number of document saved in memory before writing to a new segment, and setting a large number can speed up indexing, which defaults to 10.
Setmaxmergedocs (int maxmergedocs)
Controls the maximum number of document numbers that can be saved in a segment, with a small value that facilitates the speed of the append index, integer.max_value by default, without modification.
Setmergefactor (int mergefactor)
Controls the frequency at which multiple segment are merged, the index is faster when the value is large, the default is 10, and can be set to 100 when indexing is established.

9.1.2 improves performance with ramdirectory slow write
We can write the index to ramdirectory, and then write the fsdirectory in bulk and reduce the number of disk IO.

Fsdirectory Fsdir = fsdirectory.getdirectory ("/data/index", true);
Ramdirectory Ramdir = new Ramdirectory ();
IndexWriter fswriter = new IndexWriter (Fsdir, New StandardAnalyzer (), true);
IndexWriter ramwriter = new IndexWriter (Ramdir, New StandardAnalyzer (), true);
While (there is documents to index)
{
... create Document ...
Ramwriter.adddocument (DOC);
if (condition for flushing memory to disk have been met)
{
Fswriter.addindexes (new directory[] {ramdir});
Ramwriter.close ();
Ramwriter = new IndexWriter (Ramdir, New StandardAnalyzer (), true);
}
}

9.1.3 Choosing a better parser
This optimization is primarily for disk space optimization, the index file can be reduced by nearly half, the same test data from 600M to 380M. But the time does not help, it will take longer, because better analyzers need to match the thesaurus, will consume more CPU, test data with StandardAnalyzer time spent 133 minutes, with Mmanalyzer time spent 150 minutes.

9.2 Optimizing Search Performance
Although indexing is a time-consuming operation, it is only needed when it was originally created, usually a small amount of maintenance, not to mention that it can be processed in a background process and does not affect user search. We create indexes to search for users, so the performance of the search is what we care about most. Here's a look at how to improve your search performance.

9.2.1 putting an index into memory
This is one of the most intuitive ideas, because memory is much faster than disk. Lucene provides the ramdirectory to accommodate the index in memory:

Directory Fsdir = fsdirectory.getdirectory ("/data/index/", false);
Directory Ramdir = new Ramdirectory (fsdir);
Searcher Searcher = new Indexsearcher (ramdir);

But the practice proves that ramdirectory and fsdirectory speed are similar, when the amount of data is very fast, when the amount of data is large (index file 400M) ramdirectory even slower than fsdirectory, which is really surprising.
And Lucene's search is very memory-intensive, even if the 400M index file is loaded into memory, after running for a period of time will be out of memory, so the individual thinks that the role of loading the RAM is not large.

9.2.2 Optimized time range limits
Since loading memory does not improve efficiency, there must be other bottlenecks, tested to find the biggest bottleneck is the time limit, then how can we make the time range limit the cost of the least?
When you need to search for results within a specified time range, you can:
1, with Rangequery, set the scope, but the implementation of Rangequery is actually the time range of time to expand, composed of Booleanclause added to Booleanquery query, so the time range can not be set too large, tested, A range of more than one months will be thrown booleanquery.toomanyclauses, can be expanded by setting booleanquery.setmaxclausecount (int maxclausecount), but the expansion is also limited, and as Maxclausecount expands and takes up memory
2, with Rangefilter instead of rangequery, the test speed will not be slower than rangequery, but there are performance bottlenecks, query more than 90% of the time spent in Rangefilter, The study of its source discovery Rangefilter is really time-consuming to iterate through all indexes, generate a bitset, mark each document, mark the time range as true, not mark false, and then pass the result to searcher lookup.
3, further improve performance, this has two ideas:
A, cache filter results. Since Rangefilter execution is before the search, then its input is certain, that is, Indexreader, and Indexreader is determined by the directory, so you can think rangefilter results are determined by the upper and lower bounds of the range, This is determined by the specific Rangefilter object, so we simply cache the filter result bitset with the Rangefilter object as the key. The Lucene API already provides a Cachingwrapperfilter class that encapsulates the filter and its results, so specifically implemented we can cache Cachingwrapperfilter objects, it is important to note that Do not be misled by Cachingwrapperfilter's name and its description, Cachingwrapperfilter seems to have a cache function, but the cache is for the same filter, that is, you use the same filter filtering different Indexreader, it can help you cache the results of different indexreader, and our demand is the opposite, we filter the same indexreader with different filters, so we can only use it as a wrapper class.
b, reduce the time accuracy. Research on the working principle of filter can be seen, it every time the work is to traverse the entire index, so the greater the time granularity, the faster the comparison, the shorter the search time, without affecting the function of the situation, the lower the accuracy of the better, and sometimes even sacrifice a little precision is worth, of course, the best case is not to make time constraints
The following two ideas for the above demonstration of the optimization results (all using the 800 thread random keyword time range):
The first group, the time precision is seconds:
Use Rangefilter directly with the cache without filter
Average time per thread 10s 1s 300ms

Second group, time Precision is day
Use Rangefilter directly with the cache without filter
Average time per thread 900ms 360ms 300ms

The above data can draw the conclusion that:
1, to minimize the time accuracy, the accuracy from the second to the day to bring performance improvement or even better than the use of the cache, it is best not to use filter.
2, in the case can not reduce the time accuracy, the use of the cache can bring about 10 times times performance improvement.

9.2.3 using a better parser
This is similar to creating index optimization, the index file is smaller, the search will naturally speed up. Of course, this increase is also limited. A better parser has a performance improvement of less than 20% compared to the worst parser.

101 Experience

10.1 keywords are case sensitive
Keywords such as OR and to are case-sensitive, lucene only uppercase and lowercase as ordinary words.

10.2 Read-Write Mutex
Only one write to the index at the same time can be searched while writing

10.3 File Lock
Forcibly exiting during the writing of the index will leave a lock file in the TMP directory so that subsequent writes cannot be made and can be deleted manually

10.4 Time Format
Lucene supports only one time format YYMMDDHHMMSS, so you pass a yy-mm-dd HH:mm:ss time to Lucene and it won't be treated as a time.

10.5 Setting up boost
Sometimes the weight of a field is bigger when searching, for example, you might think that the article in the title appears more valuable than the article in the main text, you can set the headline boost, then the search results will give priority to the article that appears in the title (not using the pre-order). How to use:
Field. Setboost (float boost), the default value is 1.0, which means that the need to increase the weight is set greater than 1. Lucene, Lucene. NET detailed usage and optimization

Lucene, Lucene. NET detailed usage and optimization [go]

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More