Lucene Query Parser syntax

Source: Internet
Author: User
Tags add numbers

Lucene provides a rich set of APIs to assemble and customize the queries you need, and you can use the powerful query syntax parsing provided by query parser to construct the query you want. This article describes the query syntax for Lucene in detail. Parse a query string into Lucene's query through the Java parser. Before you choose to use Query parser, consider the following:

If you are going to join the query syntax string in your program and then use the query parser conversion, it is strongly recommended that you use the appropriate API to construct your own query. That is, query parser is designed for manual input of advanced queries, rather than for the program stitching syntax strings. The non-participle field is also best added to the query by the appropriate API, rather than through query Parser. The analyser parser used by Query Parser is the function of converting the text that the user enters manually into the corresponding term. If the value of a field is generated by the program (for example, date fields, keyword fields, etc.), then the query should also be consistent, using the program to generate the appropriate format to query.

In the target of the query, if the fields are all program-generated text, such as Date fields, etc., it is best to use Query parser for a consistent format when querying. As for the other, such as date range query, keyword query, etc., it is best to call the appropriate API to build the query. If you only have a limited enumeration value in the Target field, it is best to provide the user with a drop-down list and then use Termquery to add to the query instead of stitching it into the query string and then using query parser to parse it.

Terms
A query will be decomposed into several term and operator, there are two kinds of term, one is a single term, the other is a phrase. A single term is the smallest unit after the parser participle, which is a simple word, such as "Test" and "Hello". A phrase is a set of words enclosed in double quotation marks, for example: "Hello Dolly", where multiple terms can be combined in a more complex query by Boolean operations.
Note: In general, it is important to choose a parser that does not interfere with the query word, as it is generally best to maintain consistency between the parser that creates the index and the query's parser (there are, of course, special cases, such as single-word indexing and word-breaker query).

Fields
Lucene supports multi-field data, you can specify a field query when you query, or you can use the default field. You can use the field name + ":" + query Word to specify the field name search. For example, let's assume that Lucene's index contains two fields, the Title field and the text field, where the text field is the default field, and when you want to find a document where the title contains "the Right way" and the text contains "Go", you can enter:
Title: "The Right" and Text:go
Or:
Title: "The Right" and go
If the field is a default field, you do not need to explicitly specify it in the query syntax. Note that using the default field may result in the following:
Title:do it Right
The above query will find the title contains "Do", the text field contains "It" and "right" documents, because text is the default field, so if you want to find the full enclosed quotation marks in the title.

Second, fuzzy query
Term Modifiers
Lucene supports the use of wildcard characters in term to support fuzzy queries.

Wildcard searches [class: Org.apache.lucene.search.WildcardQuery]
Lucene supports single or multiple character wildcard queries, matching a single character using the symbol "?", which matches multiple characters using the symbol "*".
“?” Wildcards will find all documents that meet the criteria after replacing them with one character. For example: Search for "test" and "text" You can use:
Te?t
The "*" wildcard will query for 0 or more characters after replacing the criteria. For example, to query test,tests or tester, you can use a string to search:
test*
Of course, you can also put "*" in the middle of the character
Te*t
Note: You cannot use "*" and "?" Put the first character in the query. (Lucene should be for performance reasons, so this feature is not supported)

Fuzzy searches [Org.apache.lucene.search.FuzzyQuery]
Lucene supports fuzzy search based on the editing distance algorithm, and you can use the tilde "~" to put it behind the query words, such as searching for a word similar to "roam" to use:
roam~
The query will look for words such as "foam" and "roams". It can also be said to be a similarity query.

Proximity searches [Org.apache.lucene.search.PrefixQuery]
Lucene supports specifying distance queries, and you can use the tilde "~" To add numbers after the query term. For example, to search for "Apache" and "Jakarta" within 10 characters, you can use the following syntax:
"Jakarta Apache" ~
Through this syntax support, we can single index, Word segmentation query, after the end of the word, the words must meet the spacing of 1. This guarantees a 100% recall, but the index will be bloated, and the query speed will be reduced to some extent, in general, in the 150W article data to 200W data when the performance will be significantly reduced.

Range searches [Org.apache.lucene.search.RangeQuery]
A range query allows you to specify a field's maximum and minimum values, querying all documents between them. A range query can contain or contain no maximum and minimum values, and the sort is sorted in dictionary order.
Mod_date:[20020101 to 20030101]
This will find all documents that satisfy the Mode_date field at a range greater than or equal to 20020101, less than or equal to 20030101, note: Range queries are not dedicated to date fields, and you can also make range queries on non-date fields.
Title:{aida to Carmen}
This will find all documents that have headings between Aida and Carmen but do not contain Aida and Carmen. Queries that contain the maximum and minimum values use square brackets, and the exclusion uses curly braces.

Third, priority
Boosting a term
Lucene supports setting different weights for different query terms. Set weights using the "^" symbol, put "^" at the end of the query word, while keeping up with the weight value, the greater the weight factor, the more important the word. Setting weights allows you to influence the relevance of a document by setting different weights for different query terms, if you are searching for:
Jakarta Apache
If you think that "Jakarta" is more important in the query, you can use the following syntax:
Jakarta^4 Apache
This will make the document containing the Jakarta more relevant, and you can also set the weight of the phrase as follows:
"Jakarta Apache" ^4 "Jakarta Lucene"
By default, the weight factor is 1, and of course the weight factor can be less than 1.

Iv. term operators
Boolean operators
Boolean operators can combine multiple term into a complex logical query. Lucene supports and,
+,or,not,-as an operation symbol. Note that all symbols must be in uppercase.

OR
The OR operator is the default connection operator. This means that when no operator is explicitly specified for more than one term, or is used, as long as one of the term contains, the document can be queried, as with the logical symbol | | The meaning is similar. If we query a document that contains "Jakarta Apache" or "Jakarta", we can use the following syntax:
"Jakarta Apache" Jakarta
Or
"Jakarta Apache" OR Jakarta

and
The and operator stipulates that all term must appear to satisfy the query condition, which is similar to the logical symbol && meaning. If we are searching for a document that contains both "Jakarta Apache" and "Jakarta Lucene", we can use the following syntax:
"Jakarta Apache" and "Jakarta Lucene"

+
The + operator specifies that subsequent term must appear in the document, which is the must attribute in the query word. For example, when we want to query a document that must contain "Jakarta" and can contain or contain "Lucene", we can use the following syntax:
+jakarta Apache

Not
The NOT operator specifies that the queried document must not contain the term after not, which is similar to the logical symbol! When we are searching for a document that must contain "Jakarta Apache" and cannot contain "Jakarta Lucene", we can use the following query;
"Jakarta Apache" not "Jakarta Lucene"
Note: The NOT operator cannot be used in a separate term, for example, the following query will return no results:
Not "Jakarta Apache"

-
-the operator excludes documents containing subsequent term, which is a bit similar to not, and if we want to search for "Jakarta Apache" but not "Jakarta Lucene", we use the following syntax:
"Jakarta Apache"-"Jakarta Lucene"

Grouping
Lucene supports the use of parentheses to group query expressions, which is useful in controlling Boolean control queries. For example: When the search must contain "website" and must contain "Jakarta" and "Apache", we can use the following syntax:
(Jakarta OR Apache) and website
This syntax is of great significance in eliminating ambiguity and ensuring the correctness of query expressions.

Field Grouping
Lucene supports the grouping of fields with parentheses, and when we want to query the header with "return" and "Pink ranther", we can use the following syntax:
Title: (+return + "Pink Panther")

Escaping special characters
Lucene supports escaping special characters in queries, and the following is a list of special characters for Lucene:
+-&& | | ! ( ) { } [ ] ^ " ~ * ? : \
Escape special characters We can use the symbol "\" before the character. For example we want to search: 2, we can use the following syntax:
\ (1\+1\) \:2

Tips:QueryParser.escape (q) can convert the characters in Q that contain query keywords! such as: *,? such as

English Original address: http://lucene.apache.org/java/2_4_0/queryparsersyntax.html

Modified from: http://hi.baidu.com/expertsearch/blog/item/8d4f7d355a2e413c5ab5f547.html

Transferred from: https://www.oschina.net/question/1092_560

Lucene Query Parser syntax

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.