Real-time Search engine Elasticsearch (4)--aggregations (aggregation) API usage __elasticsearch

Source: Internet
Author: User
Tags curl ranges

The previous blog introduced the use of the simple query API in ES, and this article describes the use of the aggregation API provided by ES. The aggregation capabilities provided by ES can be used for simple data analysis. This article is still an example of the data provided above. The data are as follows:

Age Address
Studentno name male Birthday Classno Isleader
1 Liu bei Man 24 1985-02-03 1 Hunan Governor Shashi True
2 Guan yu Man 22 1987-08-23 2 Chengdu City, Sichuan Province False
3 Mrs. MI Woman 19 1990-06-12 1 Shanghai City False
4 Zhang Fei Man 20 1989-07-30 3 Beijing False
5 Zhuge liang Man 18 1992-04-27 2 Nanjing City, Jiangsu Province True
6 Sun Shangxiang Woman 16 1994-05-21 3 False
7 Chao Man 19 1991-10-20 1 Harbin City, Heilongjiang Province False
8 Zhao Man 23 1986-10-26 2 Hangzhou City, Zhejiang province False

The main contents of this article are: the use of the metric API using the Bucketing API for nesting using two types of APIs 1. Aggregation API

The aggregations API in ES is developed from the facets functionality, and the official website is in the process of replacing plans, recommending users to use the aggregations API rather than the facets API. The aggregations in ES can be grouped into the following two categories: metric(Metric) Aggregation: Metric type aggregation is mainly for the number type of data, need ES do more computational work bucketing(bucket) Aggregation: Dividing Different "barrels", Assign the data to a different bucket. Very similar to the meaning of group statements in SQL.

Metric can be used either on the entire dataset or as a bucketing of the data set in each bucket. Of course, we can see the whole data set as a big bucket, and all the data is allocated to this big bucket.

The call format for the aggregation API in ES is as follows:

' Aggregations ': {/                  /represents an aggregation operation, you can use Aggs instead
    of ' <aggregation_name> ': {        //aggregation name, can be any string. The key for the response is easy to get the correct response data quickly.
        "<aggregation_type>": {    //aggregation category, is the aggregation of various types, such as min
            <aggregation_body>      //Poly, Different aggregations have different body
        }
        [, ' aggregations ': {[<sub_aggregation>]+}]?//nested child aggregations, can have 0 or more
    }
    [, ' < Aggregation_name_2> ": {...}] *//Additional aggregation, can have 0 or more
}
1.1 Metric type (Metric) aggregation

(1) Min Aggregation

The minimum value query, which is used on the number Type field. Check the minimum age value for Class 2.

Curl-xpost "192.168.1.101:9200/student/student/_search" 
-d '
{
  "query": {         //You can first use query to get the dataset you need
    "term": {
      "Classno": "2"
    }
  },
  "Aggs": {
    "min_age": {"
      min": {
        "field": "Age"
      }
    }
  }
}
'

The results of the query are:

{"Took": 19,//Previous part of the data is the same as normal query data "Timed_out": false, "_shards": {"Total": 5, "SUCCESSF UL ": 5," failed ": 0}," hits ": {" Total ": 3," Max_score ": 1.4054651," hits ": [{" _index
          ":" Student "," _type ":" Student "," _id ":" 2 "," _score ": 1.4054651," _source ": { "Studentno": "2", "Name": "Guan Yu", "Male": "Male", "Age": "the", "Birthday": "1987-08-23"
        , "Classno": "2", "Isleader": "False"}}, {"_index": "Student", "_type": "Student", "_id": "8", "_score": 1, "_source": {"Studentno": "8", "N
          Ame ":" Zhao "," male ":" Male "," Age ":" the "," Birthday ":" 1986-10-26 "," Classno ":" 2 ", "Isleader": "False"}}, {"_index": "Student", "_type": "Student", "_ ID ":" 5 "," _scorE ": 0.30685282," _source ": {" Studentno ":" 5 "," Name ":" Zhuge Liang "," Male ":" Male ", 
    "Age": "Birthday": "1992-04-27", "Classno": "2", "Isleader": "True"}} ]}, "aggregations": {//aggregate result ' min_age ': {//previously entered aggregate name ' value ": 18,//aggregated data" value_as_string ":" 18.0 "}}}

The aggregate query above has two points to note: You can filter the data returned by query to include the complete collection of data that the aggregate operation acts on.

Sometimes we are not very interested in the complete collection of data, we just need the final aggregation results. This requirement can be achieved through query type (search_type) parameters. The following query out of the amount of data will be greatly reduced, ES will be in the query to reduce some time-consuming steps, so query efficiency will improve.

Curl-xpost "192.168.1.101:9200/student/student/_search?search_type=count"     -d//Note here Search_type=count
'
{
  "query": {             //You can use query queries to get the required dataset
    "term": {
      "Classno": "2"
    }
  },
  "Aggs": {
    ' Min_age ': {'
      min ': {
        ' field ': ' Age '}}}
'

The results of this query are:

{
...

" Aggregations ': {//                    aggregate result
    ' Min_age ': {                       //Previous aggregation name
      ' value ':                     //Aggregated data
      ' Value_as_ String ': ' 18.0 '}}}

(2) Max Aggregation

The maximum value query. The following query 2 class maximum age, the query result is 23.

Curl-xpost "192.168.1.101:9200/student/student/_search?search_type=count" 
-d '
{
  "query": {
    "term ': {
      ' Classno ': ' 2 '
    }
  },
  ' Aggs ': {
    ' max_age ': {' max ': {'
        field ': ' Age '
      }
    }
  }
}
'

(3) Sum Aggregation

Numeric summation. The following statistics query 2 classes of the total age, the query result is 63.

Curl-xpost "192.168.1.101:9200/student/student/_search?search_type=count" 
-d '
{
  "query": {
    " Term ': {
      ' Classno ': ' 2 '
    }
  },
  ' Aggs ': {
    ' sum_age ': {'
      sum ': {'
        field '
      : ' Age ' }
    }
  }
}
'

(4) AVG Aggregation

Calculates the average. The following calculates the age average of 2 classes, with a result of 21.

Curl-xpost "192.168.1.101:9200/student/student/_search?search_type=count" 
-d '
{
  "query": {
    " term ": {
      " Classno ":" 2 "
    }
  },
  " Aggs ": {
    " avg_age ": {" avg ": {"
        Field ":" Age "
      }
    }
  }
}
'

(5) Stats Aggregation

Statistical query, a one-time statistics on a field of commonly used statistical values. The following is a simple statistic for students throughout the school.

Curl-xpost "192.168.1.101:9200/student/student/_search?search_type=count" 
-d '
{
  "Aggs": {
    "stats _age ': {' stats ':
      {
        ' field ': ' Age '}}}
'

The results of the query are:

{
  ...                     Secondary data omits

  "aggregations": {"
    stats_age": {
      "Count": 8,        //Student count with age data
      "min":         Age minimum
      "max": 20.125,/         /Age Maximum
      "avg": The     Average,///age mean
      "sum": 161,        //Age sum
      "Min_as_" String ":" 16.0 ","
      max_as_string ":" 24.0 ",
      " avg_as_string ":" 20.125 ",
      " sum_as_string ":" 161.0 "
    }
  }
}

(6) Top Hits Aggregation

Take the first n data records that match the criteria. The following inquires the students who are in the top 2 of the school, only need to return the name and age of the students.

Curl-xpost "192.168.1.101:9200/student/student/_search?search_type=count"-D 
{
  "Aggs": {
    "top_age": {
      ' Top_hits ': {
        ' sort ': [/               /Sort
          {
            ' age ': {            ///In descending chronological order
              : ' desc '
            }
        ],
        ' _source ': {
          ' include ': [           //Specify return field
            ' name ',
            ' age '
          ]
        },
        ' size ': 2                 // Take the first 2 data
      }}}

The result returned is:

{
  ...

  " Aggregations ": {" Top_age ": {" hits ": {" Total
      ":
        9,
        " Max_score ": null,
        " hits ": [
          {
            ] _ Index ":" Student ","
            _type ":" Student ",
            " _id ":" 1 ",
            " _score ": null,
            " _source ": {
              " name ":" Liu Bei " ,
              "Age": "_index"
            },
            "sort": [
              "
            ]
          },
          {
            " student ":" ","
            _type ":" Student ",
            " _id ":" 8 ",
            " _score ": null,
            " _source ": {
              " name ":" Zhao ",
              " Age ":" "
            },
            ' Sort ': [
              [
            ]
          }
        ]
}}}
1.2 bucket type (bucketing) aggregation

(1) Terms Aggregation

Divides the data into several small intervals according to the specified 1 or more fields, calculates the number of records that fall on each interval, and sorts them in the specified order. The following statistics of the number of students in each class, and according to the number of students from large to small sort, to take the number of students in front of the 2 classes.

Curl-xpost "192.168.1.101:9200/student/student/_search?search_type=count" 
-d '
{
  "Aggs": {
    "terms _classno ': {"terms": {"
        field": "Classno",            //"order" according to the class number
        : {                     //sorting by number of students from large to small
          "_count": " Desc "
        },
        " size ": 2                      //Take top two}}}
'

It is noteworthy that the number of the top 2 students is actually an approximation, and the ES are implemented in the way seen here. If you want to get an exact value, you can leave the size value without specifying a full order, and then take the first 2 records in your program. Of course, this will make ES do a lot of sorting operations, the efficiency is poor.

(2) Range Aggregation

Custom interval range aggregation, we can manually divide the interval, ES will be divided by the interval to allocate data to different intervals up. Below the whole school student according to the age divides into 5 interval paragraph: Under 16 years old, 16~18, 19~21, 22~24, 24 years old, requests the statistics each age section student number.

Curl-xpost "192.168.1.101:9200/student/student/_search?search_type=count" 
-d '
{
  "Aggs": {
    "range _age ': {' range ': {'
        field ': ' Age ',
        ' ranges ': [
          {
            ' to ':
          },
          {' From
            ': ' 16 ',
            "to": "A"
          },
          {
            "from": "A", "to
            ": "A"
          },

          {
            "from": "All",
            "to": "A"
          },
          {
            "from": ' The ' '} '}}}
'

(3) Date Range Aggregation

Time interval aggregation is specifically for Date-type fields, and the main difference from range aggregation is that it can use a time-operation expression. Mainly includes + (addition) operations,-(subtraction) operations and/(rounding) operations, each operation can be used in different time fields above, the following are some examples of time-operation expressions. NOW+10Y: The 10th year from now. NOW+10M: The 10th month from now. 1990-01-10| | +20Y: The 20th year since the beginning of 1990-01-01, that is, 2010-01-01. Now/y: Represents rounding on a yearly bit. Today is 2015-09-06, then this expression evaluates to: 2015-01-01. A good rounding operation. The result is to do the flooring operation, do not know why, it is estimated that I understand the wrong-_-!!

The following are enquiries about the number of students born 25 years ago and earlier.

Curl-xpost "192.168.1.101:9200/student/student/_search?search_type=count" 
-d '
{
  "Aggs": {
    "range _age ': {"Date_range": {"
        field": "Birthday",
        "ranges": [
          {
            "to": "Now-25y"
          }
        ]
      }
    }
  }
}
'

(4) Histogram Aggregation

Histogram aggregation, which divides a number type field into N, and counts the numbers of records that fall within each interval. It is very similar to the range aggregation described earlier, except that range can be arbitrarily divided into intervals, while histogram do equal spacing. Since it is equal spacing division, then the parameters must have a distance parameter, is the interval parameter. The number of students in each age group is divided into 2 years by age.

Curl-xpost "192.168.1.101:9200/student/student/_search?search_type=count" 
-d '
{
  "Aggs": {
    " Histogram_age ': {"histogram": {"
        Field": "Age",
        "Interval": 2,               //Distance is 2
        "Min_doc_count": 1           //returns only the interval of record number greater than or equal to 1}}
'

(5) Date histogram Aggregation

Time histogram aggregation, specifically for the time type of the field to do histogram aggregation. This demand is more commonly seen, we are in the statistics, usually according to a fixed time (1 months or 1 years, etc.) to do statistics. The following statistics are the number of students born in the same year in the school.

Curl-xpost "192.168.1.101:9200/student/student/_search?search_type=count" 
-d '
{
  "Aggs": {
    "Data_ Histogram_birthday ": {" Date_histogram ": {"
        field ":" Birthday ",
        " Interval ":" Year ",              ///Yearly Statistics
        Format ': ' yyyy '                 ///Return key of Results
      }}
    }
'

The returned results are as follows, and you can see that because of the above "format": "YYYY", the returned key_as_string only returns the year's information.

{"
  buckets": [
    {
      "key_as_string": "1985",
      "key": 473385600000,
      "Doc_count": 1
    },
    {
      "key_as_string": "1986", "
      Key": 504921600000,
      "Doc_count": 1
    },
    {
      "key_as_string": " 1987 ",
      " key ": 536457600000,
      " Doc_count ": 1
    },
    {
      " key_as_string ":" 1989 ",
      " key ": 599616000000,
      "Doc_count": 1
    },
    {
      "key_as_string": "1990",
      "key": 631152000000,
      " Doc_count ": 1
    },
    {
      " key_as_string ":" 1991 ",
      " key ": 662688000000,
      " Doc_count ": 1
    },
    {
      "key_as_string": "1992", "
      Key": 694224000000,
      "Doc_count": 1
    },
    {
      "Key_as_ String ": 1994",
      "key": 757382400000,
      "Doc_count": 1
    }
  ]
}

(6) Missing Aggregation

Value defect aggregation, which is a kind of single bucket polymerization, which will only produce a "bucket" eventually. The following statistics the number of records in the student's information about the address bar defect. The statistic value is 1, because only the address defect of the sun Shangxiang with a number of 6 is studied.

Curl-xpost "192.168.1.101:9200/student/student/_search?search_type=count" 
-d '
{
  "Aggs": {
    " Missing_address ': {' missing ':
      {
        ' field ': ' Address '}}}
'
1.3 Nesting use

As has been said before, aggregation operations can be nested. Nesting allows the metric type of aggregation operations to function on every bucket. We can use the nested aggregation operations of ES to accomplish a slightly more complex statistical function. Below is a statistical analysis of the maximum age in each class.

Curl-xpost "192.168.1.101:9200/student/student/_search?search_type=count"
-d '
{
  "Aggs": {
    " Missing_address ': {"terms": {"
        field": "Classno"
      },
      "Aggs": {                 //nesting new child aggregations here
        max_age ': {'
          max ': {              //using Max to aggregate
            ' field ': ' Age '}}
'

The returned results are as follows:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.