This chapter is translated from the Elasticsearch official guide Controlling relevance a chapter.
Ascending based on a subset of filters (boosting Filtered subsets)
Back to the problem that was dealt with in ignoring TF/IDF (ignoring TF/IDF), we needed to calculate their relevance score based on the number of selling points per resort. We want to use the cached filter to influence the score, while Function_score can achieve that goal.
In the current example, we have used a function for all the documents. Now we want to use filters to divide the results into subsets (one selling point corresponds to a filter) and then apply a different function to each subset.
The function we use is named weight, which is similar to the boost parameters that are accepted in the query. The difference is that weight is not normalized to a floating-point number by lucene; it is used as is.
The structure of the query needs to be changed to accommodate multiple functions:
GET/_search{ "query" : { "function_score" : { "filter" : { "term" : { "City" : "Barcelona" } }, "functions" : [ { "filter" : { "term" : { "features" : "wifi" }}, "weight" :1}, { "filter" : { "term" : { "features" : "garden" }}, "weight" :1}, { "filter" : { "term" : { "features" : "pool" }}, "weight" :2} ], "score_mode" : "sum" , } }}
The new features that appear in the example above are explained in the following subsections:
Filter vs Query
First, we use filter instead of query in Function_score. In the above example, we do not need to use full-text search. We just want to get all the documents that have Barcelona in the City field, and that logic uses filters to express more appropriately. The _score of all documents obtained by the filter is 1. Function_score will accept a query or a filter. If nothing is specified, then the Match_all query is used by default.
function (Functions)
The functions array is used to specify a series of functions that need to be applied. Each function in the array can also accept an optional filter, and only documents that meet the requirements of the filter will be applied by the function. In the above example, for all matching documents, weight is set to 1 (2 for the pool).
Score_mode
Each function returns a result, and we need some way to _score multiple results into one, and then merge it into the original. The Score_mode parameter specifies the normalization action, which can take the following value:
- Multiply: function result is multiplied (default behavior)
- Sum: The result of the function is incremented
- Avg: Get the average of all function results
- Max: Get the maximum function result
- Min: Get the minimum function result
- First: Use only the result of a function, which can have a filter, or it can have no
In the example above, we want to add the result of each function to get the final score, so we use Score_mode is sum.
Documents that do not match any of the filters retain their original _score, which is 1.
Random score calculation (randomness scoring)
You may wonder what a random score is, or why you should use it. The previous example provides a good use case. The final _score of all the results of this example is 1,2,3,4 or 5. There may be only a few resort hotels with 5 points, but we can assume that there will be many hotels with a score of 2 or 3.
As a website owner, you want to give your advertisers as many opportunities as possible to show their content. With the current query, the return order of the results with the same _score is the same every time. It is better to introduce a degree of randomness to ensure that documents with the same score have the same opportunity to display.
We want each user to see a different random order, but for the same user, when he clicks on the second page, the third page, or the next page, the order they see should be the same. This is called conformance random (consistently random).
The Random_score function, whose output is a number between 0 and 1, can produce a consistent random result when it is given the same seed value, which could be the user's session ID:
GET/_search{ "query" : { "function_score" : { "filter" : { "term" : { "City" : "Barcelona" } }, "functions" : [ { "filter" : { "term" : { "features" : "wifi" }}, "weight" :1}, { "filter" : { "term" : { "features" : "garden" }}, "weight" :1}, { "filter" : { "term" : { "features" : "pool" }}, "weight" :2}, { "random_score" : { "seed" : "Theusers session ID" } } ], "score_mode" : "sum" , } }}
The Random_score clause does not contain any filter, so it applies to all documents.
Of course, if you index a new document that matches the query, the order of the results will change, whether or not you use consistency randomization.
[Elasticsearch] control correlation (vi)-filter,functions and Random_score parameters in Function_score queries