Through the simple example above, we have learned about conditional search for structured data. Now, let's look at full-text search-how to find the most relevant article by matching the texts in all fields. Full-text search has two most important aspects:
-
Similarity Calculation
-
You can use TF/IDF (see [Relevance-Intro]), geographical proximity, fuzzy similarity, or other algorithms to sort the results of a given query condition.
-
Text Analysis
-
After the text is cut and normalized, (a) is used to generate inverted indexes, or (B) is used to query inverted indexes.
When we are discussing similarity calculation and text analysis, we are only discussing queries, rather than filtering entries to search for. full-text search, even if all queries require similarity sorting, not all query conditions require text analysis. Because some special queries are not executed in the text, such
bool
And
function_score
. They are Boolean queries and numeric queries. Text query can be divided into two types:
-
Entry Query
-
Low Level
term
And
fuzzy
There is no text analysis for queries, but they are only queried on a single entry. Column as entry
"Foo"
Of
term
Query: searches for completely matched entries in the inverted index, and scores the TF/IDF similarity of each article containing the entry. Remember: Entry
"Foo"
Of
term
The query only searches for completely matched entries in the inverted index, and does not match
"foo"
Or
"FOO"
. When you are
not_analyzed
Domain
["Foo","Bar"]
Generate an index or
whitespace
Domain usage of analyzer
"Foo Bar"
Generate indexes. They all generate two tokens in the inverted index.
"Foo"
And
"Bar"
.
-
Full-text Query
-
High-level
match
And
query_string
The query can understand the ing of these domains: * If
date
And
integer
The query text is treated as a date or integer. * If (
not_analyzed
) Attribute text field. The query text is used as an entry to query. * However, if (
analyzed
) Attribute text field, query text will use an appropriate analyzer to generate entries, and these entries will be used for query. Once these terms are obtained, it uses an appropriate low-level query to execute each entry, and then uses the query results to calculate the similarity score of each article. We will introduce this process in detail in later chapters.
Generally, you will not directly use the keyword-based query. More, you will use the more convenient advanced full-text query (in fact, the keyword-based query is used internally) when you want
not_analyzed
When querying the exact match value for a domain, you should consider whether you use the query or filter. Because word bar queries are usually expressed as binary values.
yes
|
no
So filtering can better express them. Filter Caching:
GET /_search{ "query": { "filtered": { "filter": { "term": { "gender": "female" } } } }}
Blog migrated
Original link: http://www.callmer.com /? P = 43