[Elasticsearch] control correlation (vi)-filter,functions and Random_score parameters in Function_score queries

Source: Internet
Author: User
Tags session id idf

This chapter is translated from the Elasticsearch official guide Controlling relevance a chapter.


Ascending based on a subset of filters (boosting Filtered subsets)

Back to the problem that was dealt with in ignoring TF/IDF (ignoring TF/IDF), we needed to calculate their relevance score based on the number of selling points per resort. We want to use the cached filter to influence the score, while Function_score can achieve that goal.

In the current example, we have used a function for all the documents. Now we want to use filters to divide the results into subsets (one selling point corresponds to a filter) and then apply a different function to each subset.

The function we use is named weight, which is similar to the boost parameters that are accepted in the query. The difference is that weight is not normalized to a floating-point number by lucene; it is used as is.

The structure of the query needs to be changed to accommodate multiple functions:

GET/_search{ "query" : { "function_score" : { "filter" : { "term" : { "City" : "Barcelona" }      }, "functions" : [         { "filter" : { "term" : { "features" : "wifi" }}, "weight" :1},        { "filter" : { "term" : { "features" : "garden" }}, "weight" :1},        { "filter" : { "term" : { "features" : "pool" }}, "weight" :2}      ], "score_mode" : "sum" ,     }  }}

The new features that appear in the example above are explained in the following subsections:

Filter vs Query

First, we use filter instead of query in Function_score. In the above example, we do not need to use full-text search. We just want to get all the documents that have Barcelona in the City field, and that logic uses filters to express more appropriately. The _score of all documents obtained by the filter is 1. Function_score will accept a query or a filter. If nothing is specified, then the Match_all query is used by default.

function (Functions)

The functions array is used to specify a series of functions that need to be applied. Each function in the array can also accept an optional filter, and only documents that meet the requirements of the filter will be applied by the function. In the above example, for all matching documents, weight is set to 1 (2 for the pool).

Score_mode

Each function returns a result, and we need some way to _score multiple results into one, and then merge it into the original. The Score_mode parameter specifies the normalization action, which can take the following value:

    • Multiply: function result is multiplied (default behavior)
    • Sum: The result of the function is incremented
    • Avg: Get the average of all function results
    • Max: Get the maximum function result
    • Min: Get the minimum function result
    • First: Use only the result of a function, which can have a filter, or it can have no

In the example above, we want to add the result of each function to get the final score, so we use Score_mode is sum.

Documents that do not match any of the filters retain their original _score, which is 1.


Random score calculation (randomness scoring)

You may wonder what a random score is, or why you should use it. The previous example provides a good use case. The final _score of all the results of this example is 1,2,3,4 or 5. There may be only a few resort hotels with 5 points, but we can assume that there will be many hotels with a score of 2 or 3.

As a website owner, you want to give your advertisers as many opportunities as possible to show their content. With the current query, the return order of the results with the same _score is the same every time. It is better to introduce a degree of randomness to ensure that documents with the same score have the same opportunity to display.

We want each user to see a different random order, but for the same user, when he clicks on the second page, the third page, or the next page, the order they see should be the same. This is called conformance random (consistently random).

The Random_score function, whose output is a number between 0 and 1, can produce a consistent random result when it is given the same seed value, which could be the user's session ID:

GET/_search{ "query" : { "function_score" : { "filter" : { "term" : { "City" : "Barcelona" }      }, "functions" : [        { "filter" : { "term" : { "features" : "wifi" }}, "weight" :1},        { "filter" : { "term" : { "features" : "garden" }}, "weight" :1},        { "filter" : { "term" : { "features" : "pool" }}, "weight" :2},        { "random_score" : { "seed" : "Theusers session ID" }        }      ], "score_mode" : "sum" ,    }  }}

The Random_score clause does not contain any filter, so it applies to all documents.

Of course, if you index a new document that matches the query, the order of the results will change, whether or not you use consistency randomization.


[Elasticsearch] control correlation (vi)-filter,functions and Random_score parameters in Function_score queries

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.