Full-text search-introduction-elasticsearch-definitive-guide Translation

Source: Internet
Author: User
Tags idf
Through the simple example above, we have learned about conditional search for structured data. Now, let's look at full-text search-how to find the most relevant article by matching the texts in all fields. Full-text search has two most important aspects:
Similarity Calculation
You can use TF/IDF (see [Relevance-Intro]), geographical proximity, fuzzy similarity, or other algorithms to sort the results of a given query condition.
Text Analysis
After the text is cut and normalized, (a) is used to generate inverted indexes, or (B) is used to query inverted indexes.
When we are discussing similarity calculation and text analysis, we are only discussing queries, rather than filtering entries to search for. full-text search, even if all queries require similarity sorting, not all query conditions require text analysis. Because some special queries are not executed in the text, such boolAnd function_score. They are Boolean queries and numeric queries. Text query can be divided into two types:
Entry Query
Low Level termAnd fuzzyThere is no text analysis for queries, but they are only queried on a single entry. Column as entry "Foo"Of termQuery: searches for completely matched entries in the inverted index, and scores the TF/IDF similarity of each article containing the entry. Remember: Entry "Foo"Of termThe query only searches for completely matched entries in the inverted index, and does not match "foo"Or "FOO". When you are not_analyzedDomain ["Foo","Bar"]Generate an index or whitespaceDomain usage of analyzer "Foo Bar"Generate indexes. They all generate two tokens in the inverted index. "Foo"And "Bar".
Full-text Query
High-level matchAnd query_stringThe query can understand the ing of these domains: * If dateAnd integerThe query text is treated as a date or integer. * If ( not_analyzed) Attribute text field. The query text is used as an entry to query. * However, if ( analyzed) Attribute text field, query text will use an appropriate analyzer to generate entries, and these entries will be used for query. Once these terms are obtained, it uses an appropriate low-level query to execute each entry, and then uses the query results to calculate the similarity score of each article. We will introduce this process in detail in later chapters.
Generally, you will not directly use the keyword-based query. More, you will use the more convenient advanced full-text query (in fact, the keyword-based query is used internally) when you want not_analyzedWhen querying the exact match value for a domain, you should consider whether you use the query or filter. Because word bar queries are usually expressed as binary values. yes| noSo filtering can better express them. Filter Caching:
GET /_search{    "query": {        "filtered": {            "filter": {                "term": { "gender": "female" }            }        }    }}

Blog migrated

Original link: http://www.callmer.com /? P = 43

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.