Lucene query Syntax details (lucene query syntax)-for Kibana search statements __ Highly available architecture

Source: Internet
Author: User
Tags kibana

Lucene provides a rich API to mix and customize the queries you need, and you can use the powerful query syntax parsing provided by query parser to construct the query you want. This article introduces the query syntax of Lucene in detail. Parse a query string into a Lucene query through the Java parser. Before you choose to use Query parser, consider the following:

If you plan to splice the query syntax string into your program and then use Query parser to convert it, it is highly recommended that you use the appropriate APIs to construct your own query. In other words, query parser is designed for manual input of advanced queries, rather than for program concatenation of grammatical strings. The field of Parser is also best added to the query through the appropriate API, not through query. The analyser parser used by Query Parser to convert the text that the user manually entered into the corresponding term. If the value of a field is generated by a program (such as a date field, a keyword field, etc.), then the query should also be consistent, using the program to generate the appropriate format to query.

In the target of a query, if all of the fields are program-generated text (such as a padded date field, for example), it is best to use Query parser so that the query is in a consistent format. As for other, such as date range query, keyword query, etc., it is best to call the appropriate API to build the query. If you have only a limited enumeration value in the Target field, it is best to provide the user with a Drop-down list and then add it to the query using termquery instead of stitching it into the query string and then using query parser to resolve it.

Terms
A query is decomposed into several term and operators, there are two term, one is a single term and the other is a phrase. A single term is the smallest unit after the parser participle, and he is a simple word, such as "Test" and "Hello". A phrase is a set of words enclosed in double quotes, such as "Hello Dolly", where multiple term can be merged into a more complex query by Boolean operations.
Note: In general, the parser that creates the index and the parser of the query are best kept consistent (there are also special cases, such as Word index, Word combination query), so it is important to select a parser that does not interfere with query words.

Fields
Lucene supports multiple field data, and you can specify a field query when you query, or you can use the default field. You can use the field name + ":" + query Word to specify the field name search. For example, let's assume that Lucene's index contains two fields, Title field, and Text field, where the text field is the default field, and when you want to find a document where the title contains "the right Way" while the text contains "Go", you can enter:
Title: "The Right Way" and Text:go
Or:
Title: "The Right Way" and go
If the field is the default field, you do not need to specify it explicitly in the query syntax. Note that using the default fields may result in the following:
Title:do it Right
The above query will look for a document with "Do" in the title, and a text field containing "It" and "right", because text is the default field, so if you want to find the full enclosed quote in title.

Second, fuzzy query
Term modifiers
Lucene supports the use of wildcard characters in term to support fuzzy queries.

wildcard searches [class: Org.apache.lucene.search.WildcardQuery]
Lucene supports a single or multiple character wildcard query that matches a single character using the symbol "?" to match multiple characters using the symbol "*".
“?” Wildcard characters will find all documents that meet the criteria after replacing them with one character. For example: Search "Test" and "text" You can use:
Te?t
The "*" wildcard character is eligible after the query has been replaced with 0 or more characters. For example, query test,tests or tester, you can use a string to search for:
test*
Of course, you can also put "*" in the middle of the character
Te*t
Note: You can't put "*" and "?" Put it in the first character query. (Lucene should be for performance reasons, so this feature is not supported)

Fuzzy searches [Org.apache.lucene.search.FuzzyQuery]
Lucene supports fuzzy searches based on the edit distance algorithm, and you can use the wave symbol "~" to put it behind the query word, such as searching for a word that is similar to "roam".
roam~
The query will look for words like "foam" and "roams". It can also be said that the similarity of the query.

Proximity searches [Org.apache.lucene.search.PrefixQuery]
Lucene supports the specified distance query, and you can use the wave number "~" To add the number after the query word. For example, to search for "Apache" and "Jakarta" within 10 characters, you can use the following syntax:
"Jakarta Apache" ~10
Through this syntax support, we can the word index, Word segmentation query, after the word, to meet the words of each word must be 1 spacing. This ensures a 100% recall rate, but the index will result in a bloated index, while the query speed will also be reduced to a certain extent, in general, the 150W article data to 200W data when the performance will be significantly reduced.

Range searches [Org.apache.lucene.search.RangeQuery]
A range query allows you to specify a field's maximum and minimum value, and query for all documents in between. A range query can contain or not contain the maximum and minimum values, sorted in dictionary order.
Mod_date:[20020101 to 20030101]
This will look for all documents that meet the Mode_date field in greater than or equal to 20020101, or less than 20030101. Note: Range queries are not dedicated to date fields, and you can also make range queries on non date fields.
Title:{aida to Carmen}
This will look for documents with all headings between Aida and Carmen but that do not contain Aida and Carmen. Queries that contain the maximum and minimum values use square brackets, excluding the use of curly braces.

Third, priority
Boosting a Term
Lucene supports setting different weights for different query terms. Set weights using the "^" symbol, "^" placed in the tail of the query word, while keeping up with the weight value, the greater the weight factor, the more important the word. Setting weights allows you to influence document dependencies by setting different weights for different query words, if you are searching for:
Jakarta Apache
If you think "Jakarta" is more important in the query, you can use the following syntax:
Jakarta^4 Apache
This will give the document that contains Jakarta a higher correlation, and you can also set the phrase weights as follows:
"Jakarta Apache" ^4 "Jakarta Lucene"
By default, the weight factor is 1, and of course the weight factor can be less than 1.

Four, term operator
Boolean operators
Boolean operators can combine multiple term into a complex logical query. Lucene supports and,
+,or,not,-as an operation symbol. Note that all symbols must be uppercase.

OR
The default connection operator for the OR operator. This means that when you do not explicitly specify an operator for multiple term, you will use or, as long as one of the term contains, you can query the document, which is the logical symbol | | Similar in meaning. Suppose we query a document that contains "Jakarta Apache" or "Jakarta", we can use the following syntax:
"Jakarta Apache" Jakarta
Or
"Jakarta Apache" OR Jakarta

and
The and operator stipulates that all term must appear to satisfy the query condition, which is similar to the logical symbol && meaning. If we are searching for a document that contains both "Jakarta Apache" and "Jakarta Lucene," We can use the following syntax:
"Jakarta Apache" and "Jakarta Lucene"

+
+ operator term must appear in the document, that is, the must attribute in the query word. For example, when we want to query that a document must contain "Jakarta" and can contain or not include "Lucene", we can use the following syntax:
+jakarta Apache

Not
The not operator requires that the queried document must not contain the term after the not, which is similar to the in logical notation. When we are searching for a document that must contain "Jakarta Apache" while not containing "Jakarta Lucene", we can use the following query;
"Jakarta Apache" not "Jakarta Lucene"
Note: The NOT operator cannot be used in a separate term, for example, the following query will return no results:
Not "Jakarta Apache"

-
-operator excludes documents containing subsequent term, similar to not, assuming that we are searching for "Jakarta Apache" but do not include "Jakarta lucene", we use the following syntax:
"Jakarta Apache"-"Jakarta Lucene"

Grouping
Lucene supports the use of parentheses to group query expressions, which is useful in controlling Boolean control queries. For example: When a search must contain "website" and must contain one of "Jakarta" and "Apache", we can use the following syntax:
(Jakarta OR Apache) and website
This kind of syntax is of great significance in eliminating ambiguity and ensuring the correctness of query expressions.

Field Grouping
Lucene supports grouping fields with parentheses, and when we want to query for "return" and "Pink Ranther" in the header, we can use the following syntax:
Title: (+return + "Pink Panther")

Escaping Special Characters
Lucene supports special characters in Escape queries, and the following is a list of special characters for Lucene:
+-&& | | ! ( ) { } [ ] ^ " ~ * ? : \
Escape special characters We can use the symbol "\" before the character. For example we want to search (1+1): 2, we can use the following syntax:
\ (1\+1\) \:2

Tips:QueryParser.escape (q) Converts characters in Q that contain query keywords. such as: *,? Wait

English Original address: http://lucene.apache.org/java/2_4_0/queryparsersyntax.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.