Lucene problems (5): The toomanyclause in Lucene is abnormal.

Source: Internet
Author: User

Why does this exception occur:

If rangequery, prefixquery, wildcardquery, and fuzzyquery are used during Lucene search, the toomanyclses exception may occur. Why is this exception? Example:

Take rangequery as an example. If the date range is 19990101 to 20091231, and the index file contains such date phrases as 19990102,19990103, rangequery will be extended to "19990102 or 19990103" and become two clauses. As you can imagine, if there are many dates in the index file during this period, many clauses will be generated.

The same applies to prefixquery. For example, if the query term is "Legal *", the index file contains "legal", "legal field", "Forensic", and "legal code, this query will be extended into "legal or legal proceedings", and perhaps more.

To save memory, Lucene limits the number of clauses to 1024 by default. If the limit is exceeded, A toomanyclses exception is thrown.

How can we solve this problem? Lucene provides three methods:

(1) Use filter to replace query. Of course, this is at the expense of query speed, but this problem can be mitigated through caching. For example, you can use rangefilter to replace rangequery as follows:
PreviousCode:

Booleanquery simplequery = new booleanquery (); term datelower = new term ("publishdate", startyear + "0101"); term dateupper = new term ("publishdate ", endyear + "1231"); rangequery datequery = new rangequery (datelower, dateupper, true); simplequery. add (datequery, occur. must );

Subsequent code:

 
Booleanquery simplequery = new booleanquery (); rangefilter datefilter = new rangefilter ("publishdate", startyear + "0101", endyear + "1231", true, true ); filteredquery = new filteredquery (simplequery, datefilter );

(2) Use booleanquery. setmaxclausecount (10240) to limit the number. This will increase the memory consumption. Use booleanquery. setmaxclausecount (integer. max_value) to completely remove this restriction.

(3) range query can reduce the precision as much as possible. For example, if the query does not need to be accurate to the month or date, it only needs to be accurate to the Year, it is said that the datemedils class can be used to easily solve the time conversion problem. I did not try it.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.