[Elasticsearch] adjacent match (2)-multi-value field, degree of closeness and relevance

Last Update:2014-12-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

[Elasticsearch] adjacent match (2)-multi-value field, degree of closeness and relevance
Multivalue Fields)

Using phrase matching on multi-value fields produces odd behavior:

PUT /my_index/groups/1{    "names": [ "John Abraham", "Lincoln Smith"]}

Run a phrase query for Abraham Lincoln:

GET /my_index/groups/_search{    "query": {        "match_phrase": {            "names": "Abraham Lincoln"        }    }}

Surprisingly, the above document matches the query. Even if both Abraham and Lincoln are in the name array. The cause of this phenomenon is the index method of the array in ES.

When John Abraham is parsed, it generates the following information:

Location 1: john
Location 2: abraham
When Lincoln Smith is parsed, it generates:
- Location 3: lincoln
- Location 4: smith
  In other words, ES generates the same list of entries for the preceding array analysis as it does when parsing a single string, John Abraham Lincoln Smith. In our query, We query the adjacent abraham and lincoln, and these two entries exist in the index and are adjacent, so the query matches.
  Fortunately, there is a simple way to avoid this situation. With the position_offset_gap parameter, It is configured in field ing:
```
DELETE /my_index/groups/ PUT /my_index/_mapping/groups {    "properties": {        "names": {            "type":                "string",            "position_offset_gap": 100        }    }}
```
  Position_offset_gap indicates that ES needs to set a deviation value for each new element in the array. Therefore, when we re-index the above person name array, the following results will be generated:
  - Location 1: john
  - Location 2: abraham
  - Location 3: lincoln
  - Location 4: smith
    Now our phrase match cannot match this document, because the distance between abraham and lincoln is 100. You must add a slop value of 100 to match.
    
    The Closer the better (Closer is better)
    Phrase Query simply removes documents that do not contain a specific Query Phrase, and Proximity Query) -A phrase query with a slop value greater than 0 takes the closeness of the query entry into consideration the final relevance score. By setting a high slop value such as 50 or 100, you can exclude documents with words too far, but also give documents with adjacent words a higher score.
    The proximity query for quick dog matches two documents containing quick and dog, but gives quick and dog a higher score:
```
POST /my_index/my_type/_search{   "query": {      "match_phrase": {         "title": {            "query": "quick dog",            "slop":  50          }      }   }}
```
```
{  "hits": [     {        "_id":      "3",        "_score":   0.75,         "_source": {           "title": "The quick brown fox jumps over the quick dog"        }     },     {        "_id":      "2",        "_score":   0.28347334,         "_source": {           "title": "The quick brown fox jumps over the lazy dog"        }     }  ]}
```
    Use closeness to improve relevance
    Although Proximity Query is useful, all entries must appear in the document. This requirement is too strict. This problem is similar to what we have discussed in the Controlling Precision section of the Full-Text Search chapter: If six of the seven entries match, this document may be relevant to the user, but the match_phrase query will exclude it.
    Compared with the proximity matching as an absolute requirement, we can regard it as a Signal-as a member of many potential matches, contribute to the final score of each document (refer to Most Fields (Most Fields )).
    The fact that we need to accumulate the scores of multiple queries indicates that we should use bool queries to merge them.
    We can use a simple match query as an must clause. This query is used to determine which documents need to be included in the result set. The minimum_should_match parameter can be used to remove Long tail ). Then we add more specific queries in the form of a shocould clause. Each document that matches the shocould clause will increase its relevance.
```
GET /my_index/my_type/_search{  "query": {    "bool": {      "must": {        "match": {           "title": {            "query":                "quick brown fox",            "minimum_should_match": "30%"          }        }      },      "should": {        "match_phrase": {           "title": {            "query": "quick brown fox",            "slop":  50          }        }      }    }  }}
```
    There is no doubt that we can add other queries to the shocould clause. Each query is used to increase the relevance of a specific type.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[Elasticsearch] adjacent match (2)-multi-value field, degree of closeness and relevance

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

[Elasticsearch] adjacent match (2)-multi-value field, degree of closeness and relevance

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support