Lucene query syntax

Source: Internet
Author: User

Lucene query syntax 

Http://lucene.apache.org/java/2_0_0/queryparsersyntax.html
From: http://liyu2000.nease.net/article/Lucene/queryparsersyntax.htm

Introduction 

Lucene provides APIs that help you create self-built queries. It also provides a powerful query language through queryparser.

This article describes the syntax supported by Lucene's query statement parser. Lucene's query statement parser is a lexical parser generated using the javacc tool, which parses query strings into Lucene query objects.

Term) 

A search statement is split into several terms and operators ). There are two types of items: separate items and phrases.

A single item is a separate word, such as "test" and "hello ".

A phrase is a group of words enclosed by double quotes, such as "Hello Dolly ".

Multiple items can be connected with a boolean operator to form a complex query statement (you will see it later ).

Note: The parser used by analyzer to create an index is the same as the parser used to parse separate items and phrases. Therefore, it is very important to select an analyzer that is not subject to query statement interference.

Field) 

Lucene supports domains. You can specify to search in a domain or use the default domain. The domain name and default domain name are determined by the implementation of the specific indexer.

You can search for the domain name + ":" + name of the search item.

For example, assume that a Lucene index contains two fields: title and text. Text is the default field. If you want to find an article titled "The Right Way" that contains "don't go this way", you can enter:

Title: "The Right Way" and text: Go

Or

Title: "Do it right" and Right

Text is the default domain, so this domain name can not be written.

Note: The domain name takes effect only for the items that follow it, So

Title: do it right

Only "do" belongs to the title domain. "It" and "right" will still be searched in the default domain (text field here ).

Term Modifiers) 

Lucene supports item modifiers to support wider search options.

Search by wildcard

Lucene supports wildcard search for a single character and multiple characters.

Use the symbol "? "Indicates the wildcard of any character.

Use the symbol "*" to indicate the wildcard of Multiple Arbitrary characters. 

A single character matches all possible single characters. For example, you can search for "text or" test "as follows:

Te? T

Multiple Arbitrary characters match 0 and more possible characters. For example, you can search for test, tests, or Tester as follows:

Test *

You can also use any character wildcard in the middle of the character.

Te * t

Note: you cannot start using the search item * or? Symbol.

Fuzzy search 

Lucene supports fuzzy search based on levenshtein distance and edit distance algorithms. To useFuzzy search only needs to add the symbol "~ ". For example, search for a spelling item similar to "Roam" and write it like this:

Roam ~

This search will find words like foam and roams.

Note: The incremental factor 0.2 search result is automatically obtained using fuzzy search.

Proximity searches) 

Lucene also supports searching words separated by a certain distance. Adjacent Search adds the symbol "~ ". For example, search for "Apache" and "Jakarta" separated by 10 words in the document, and write as follows:

"Jakarta Apache "~ 10

Boosting a term

Lucene provides the relevance level of matching events based on the terms found. to boost a term use the caret, "^", symbol with a boost factor (a number) at the end of the term you are searching. the higher the boost factor, the more relevant the term will be.

Lucene can set the similarity of matching items during search. InAdd the symbol "^" next to a number (increment value) to indicate the similarity during search. The higher the increment value, the better the relevance of the searched items. 

Boosting allows you to control the relevance of a document by boosting its term. for example, if you are searching for Jakarta Apache and you want the term "Jakarta" to be more relevant boost it using the ^ symbol along with the boost factor next to the term. you wocould type:

You can increment an item to control the relevance when searching documents. For example, if you want to search for Jakarta Apache and make "Jakarta" more relevant, add the "^" symbol and the increment value, that is, enter:

Jakarta ^ 4 Apache

This will make statements with the term Jakarta appear more relevant. You can also boost phrase terms as in the example:

This will make the generated doucment highly relevant to Jakarta as much as possible. You can also increment the phrase, as in the following example:

"Jakarta Apache" ^ 4 "Jakarta Lucene"

By default, the boost factor is 1. Although, the boost factor must be positive, it can be less than 1 (I. e.. 2)

By default, the increment value is 1. The increment value can also be less than 1 (for example, 0.2), but must be valid.

Boolean operator 

Boolean operators can connect items through logical operations. Lucene supports and, "+", or, not, and "-" operators. (Note: boolean operators must all be capitalized)

Or

The OR operator is the default join operator. This means that if there is no boolean operator between two items, the OR operator is used. The OR operator connects two items, which means to search for documents containing any items. This is the same as the set parallel operation. Symbol | can replace or.

Search for documents containing "Jakarta Apache" or "Jakarta". You can use this query:

"Jakarta Apache" Jakarta

Or

"Jakarta Apache" or Jakarta

And

The and operator matches the two documents that appear at the same time. This is equal to the set intersection operation. Symbol & can replace symbol and.

Search for documents that contain "Jakarta Apache" and "Jakarta Lucene" at the same time:

"Jakarta Apache" and "Jakarta Lucene"

+

The "+" operator or an existing operator requires that the item after the symbol "+" must exist in the corresponding domain of the document.

Search for documents that must contain "Jakarta" and may contain "Lucene:

+ Jakarta Apache

Not

The not operator is used to exclude documents that contain items after the not symbol. This is the same as the Difference Operation of the set. Symbol! Can replace the not symbol.

Search for documents containing "Jakarta Apache" but not "Jakarta Lucene:

"Jakarta Apache" not "Jakarta Lucene"

Note: The not operator cannot be used independently with items to form a query. For example, no results can be found in the following query:

Not "Jakarta Apache"

-

"-" Operators or forbid operators to exclude documents containing similar items after.

Search for "Jakarta Apache", but not "Jakarta Lucene". Use the following query:

"Jakarta Apache"-"Jakarta Lucene"

Group) 

Lucene supports using parentheses to combine words to form subqueries. This is useful for those who want to control the query of Boolean logic.

Search for documents containing "Jakarta" or "Apache" and "website:

(Jakarta or Apache) and website

This eliminates ambiguity and ensures that the website must exist, as does one of Jakarta and Apache.

Escape special characters (escaping special characters)

Lucene supports escaping special characters, because special characters are used in the query syntax. Now, special characters include

+-& |! () {} [] ^ "~ *? :/

To escape special characters, you only need to add the symbol before the character/For example, search (1 + 1): 2, Use Query

/(1/+ 1/)/: 2

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.