Aggregation scope (scoping aggregations)
In the aggregation example given, you may have found that we omitted the query clause in the search request. The entire request is just a simple aggregation.
Aggregations can be run with search requests, but you need to understand a new concept: scope (SCOPE). By default, aggregations and queries use the same scope. In other words, aggregation is done in the set of documents that match the query.
Let's take a look at a previous aggregation example:
get/cars/transactions/_search?search_type=count{ "aggs" : { "colors" : { "terms" : { "field" : "color" } } }}
You can see that aggregations exist in isolation. In fact, it is equivalent to "do not specify a query" and "Query all Documents" in ES. The above query is internally converted as follows:
get/cars/transactions/_search?search_type=count{ "query" : { "match_all" : {} }, "aggs" : { "colors" : { "terms" : { "field" : "color" } } }}
Aggregations always work under the scope of the query, so an orphaned aggregation actually works in the scope of the Match_all query-that is, all documents.
Once we understand this, we can start customizing the aggregations. All of the previous examples calculate statistics on all data: the best selling cars, the average price of all cars, the maximum sales per month, and so on.
With scope, we can ask the question "How many different colors does Ford have?" , complete by adding a query to the request (using the match query):
Get/cars/transactions/_search { "query" : { "match" : { "make" : "Ford" } }, "aggs" : { "colors" : { "terms" : { "field" : "color" } } }}
By omitting Search_type=count, we can get search results and aggregate results as follows:
{... "hits" : { "total" :2, "max_score" :1.6931472, "hits" : [ { "_source" : { "price" :25000, "color" : "Blue" , "make" : "Ford" , "sold" : "2014-02-12" } }, { "_source" : { "price" :30000, "color" : "green" , "make" : "Ford" , "sold" : "2014-05-18" } } ] }, "aggregations" : { "colors" : { "buckets" : [ { "key" : "Blue" , "doc_count" :1}, { "key" : "green" , "doc_count" :1} ] } }}
This doesn't look like much, but it's the key to building a more powerful, advanced dashboard. You can convert any static dashboard into a real-time data browsing tool by adding a search bar. This allows users to search for entries and then see all real-time charts (which are supported by aggregations, using the scope of the query). Try it out with Hadoop!
Global Buckets
You often need your aggregations and queries to have the same scope. But sometimes you also need to search a subset of the data and aggregate on all the data.
For example, you want to know the average price of a Ford car compared to all cars. We can use a common aggregation (scope and query the same) to get the average price of a Ford car. The average price of all cars can be obtained through a global bucket.
The global bucket will contain all of your documents, regardless of the scope of the query; It completely bypassed the scope. Since it is a bucket, you can still embed aggregations in it:
get/cars/transactions/_search?search_type=count{ "query" : { "match" : { "make" : "Ford" } }, "aggs" : { "single_avg_price" : { "avg" : { "field" : "price" } }, "all" : { "global" : {}, "aggs" : { "avg_price" : { "avg" : { "field" : "price" } } } } }}
The aggregation is within the scope of the query (for example, all documents that match Ford); The global bucket does not have any parameters; Aggregations operate on all documents, regardless of the manufacturer.
The Single_avg_price indicator is calculated based on the query scope (that is, all Ford vehicles). Avg_price is an indicator that is nested in a global bucket, meaning that he ignores the concept of action and completes the calculation for all documents. The average price of all cars is represented by the average value of the aggregation.
If you've read this place in this book, you'll recognize the mantra: use it anywhere you can use a filter. This is also true for aggregations, and in the next chapter we'll look at how to filter aggregations instead of just limiting the scope of the query.
[Elasticsearch] Aggregation scope (scoping aggregations)