Use of the Druid query interface

Source: Internet
Author: User
Tags curl numeric json

Core
use of the Druid query interface

Druid Query interface is the HTTP rest style query method, using the HTTP Rest style query (broker,historical, or Realtime) node data, query parameters are in JSON format, each node type will expose the same REST query interface

Curl-x POST ' <queryable_host>:<port>/druid/v2/?pretty '-h ' content-type:application/json '-D @<query_ Json_file>

Queryable_host:broker node IP port:broker node port default is 8082

Curl-l-H ' content-type:application/json '-xpost--data-binary @quickstart/aa.json http://10.20.23.41:8082/druid/v2/ ? Pretty

the types of query queries are
1, Timeseries
2, TopN
3, GroupBy
4. Time Boundary
5, Segment Metadata
6, Datasource Metadata
7. Search
8. Select

Where Timeseries, TopN, groupby for the aggregate query, time boundary, Segment Metadata, Datasource Metadata for metadata query, search query

1, Timeseries
For summary data that needs to be counted for a period of time, or for a specified time granularity, Druid can be done through timeseries.

The TimeSeries query includes fields such as the following

The field name            describes                             whether the
query type must be QueryType      , where only the timeseries query is filled out
dataSource the dataset       to query                       is
Descending          whether    the time range of the intervals query is descending, the default is ISO-8601 format         is
granularity   The time granularity at which the query results are aggregated is the filter                filter Condition                     No
aggregations         aggregation                         is      
postaggregations    Post-aggregation                            No
context             specifies some query parameter                    No

The timeseries outputs the statistics for the specified condition within each time granularity, filters the conditions by filter, and specifies the aggregation mode through aggregations and postaggregations. TimeSeries cannot output dimension information, granularity supports dimensions such as All,none,second,minute,hour,day,week,month,year

All: Total 1 output none: Deprecated

Other: The corresponding granularity statistics are output

JSON for queries

{"
  aggregations": [
    {
      "type": "Count", 
      "name": "Count"
    }
  ], 
  "intervals": " 1917-08-25t08:35:20+00:00/2017-08-25t08:35:20+00:00 ", 
  " DataSource ":" App_auto_prem_qd_pp3 ", 
  " Granularity ":" All ", 
  " postaggregations ": [], 
  " QueryType ":" TimeSeries "
}

Equivalent to SQL SELECT COUNT (1) from APP_AUTO_PREM_QD_PP3

TopN
Returns an ordered sequence of top-n for a specified dimension and sort field. TOPN supports returning the first N records and supporting the specified metric as sorted by

{
  "metric": "Sum__total_standard_premium", 
  "aggregations": [
    {
      "type": "Doublesum", 
      " FieldName ":" Total_standard_premium ", 
      " name ":" Sum__total_standard_premium "
    }
  ], 
  " Dimension " : "Is_new_car", 
  "intervals": "1917-08-29t20:05:10+00:00/2017-08-29t20:05:10+00:00", 
  "DataSource": "App_ Auto_prem_qd_pp3 ", 
  " granularity ":" All ", 
  " threshold ": 50000, 
  " postaggregations ": [], 
  " QueryType ":" TopN "
}
The field name            description                             must be
querytype     for TOPN queries, this must be TOPN              is
dataSource       the dataset to query                         is
intervals    the time range of the query, by default the ISO-8601 format is the filter                filter Condition                       No
aggregations           aggregation                       is
Postaggregations      Late aggregation                      no
dimension   for TOPN query maintenance, a TOPN query can only have one dimension  is
threshold          The n value in TOPN                        is
metric               to be counted and sorted metric             is the
context             specifies some query parameter                    No

metric: is TOPN exclusive
Way:

"Metric": "<metric_name>" By default is

"metric" in ascending order: {
    "type": "Numeric",//Specify Sort by numeric Descending "
    metric": "& Lt;metric_name> "
}

" metric ": {
    " type ":" Inverted ",//Specify Sort by numeric ascending"
    metric ":" <metric_name > "
}

" metric ": {
    " type ":" lexicographic ",//Specify Sort by dictionary
    " metric ":" <metric_name> "
}

"metric": {
    "type": "Alphanumeric",//Specify Sort by number
    "metric": "<metric_name>"
}

It is important to note that TOPN is an approximate algorithm, each segment returns the first 1000 strips to be merged to get the final result if dimension
The base is within 1000, it is accurate, and more than 1000 is the approximate value

GroupBy
GroupBy is similar to the group by operation in SQL, it can group multiple dimensions specified, also supports sorting the specified dimension, and outputs the number of limit rows, while supporting having operations

{"
  dimensions": [
    "Is_new_car", 
    "status"
  ], 
  "aggregations": [
    {
      "type": "Doublesum", " 
      fieldName": "Total_standard_premium", 
      "name": "Sum__total_standard_premium"
    }
  ], 
  "having ": {
    " type ":" GreaterThan ", 
    " aggregation ":" Sum__total_standard_premium ", 
    " value ":" 484000 "
  }, 
  "intervals": "1917-08-29t20:26:52+00:00/2017-08-29t20:26:52+00:00", 
  "Limitspec": {
    "limit": 2, 
    "type": "Default", 
    "columns": [
      {
        "direction": "Descending", 
        "Dimension": "Sum__total_ Standard_premium "
      }
    ]
  }, 
  " granularity ":" All ", 
  " postaggregations ": [], 
  " QueryType " : "GroupBy", 
  "DataSource": "App_auto_prem_qd_pp3"
}

Equivalent to SQL Select Is_new_car,status,sum (total_standard_premium) from APP_AUTO_PREM_QD_PP3 GROUP by Is_new_car,status limit 50000 having sum (total_standard_premium) >484000

{
  "version": "V1",
  "timestamp": "1917-08-30t04:26:52.000+08:00",
  "event": {
    "Sum__total_standard_ Premium ": 8.726074368E9,
    " Is_new_car ":" Yes ",
    " status ": null
  }
}, {
  " version ":" V1 ",
  " Timestamp ":" 1917-08-30t04:26:52.000+08:00 ",
  " event ": {
    " sum__total_standard_premium ": 615152.0,
    " Is_new_car ":" No ",
    " status ": null
  }
  }
The field name            description                             must be
querytype      for groupby query, the field must be GroupBy     is
dataSource the dataset          to query                     is
Dimensions the      collection of dimensions for a groupby query               is
limitspec           statistical results to sort no having             to filter the statistical results                    no
granularity            the time granularity of the aggregation of the query results         is 
postaggregations        after the aggregator                       no
intervals        The time range of the query, the default is ISO-8601 format        is
context             specifies some query parameter                    No

GroupBy Special fields Limitspec and having

Limitspec
Specify the number of rows for collations and limit

{
    "type": "Default",
    "limit": <integer_value>,
    "columns": [List of orderbycolumnspec]
}

Where columns is an array, you can specify more than one sort field, and the Sort field enables demension or metric to specify how the collation is spelled

{
    "Dimension": "<any Dimension or Metric name>",  
    "direction": < "ascending" | " Descending ">
}

 " Limitspec ": {"
    limit ": 2, 
    " type ":" Default ", 
    " columns ": [
      {
        " Direction ":" Descending ", 
        " Dimension ":" Sum__total_standard_premium "
      },
     {
        " direction ":" Ascending ", 
        " Dimension ":" Is_new_car "
      } 
    ]
  }

Having a having operation similar to that in SQL

Select

Select is similar to the select operation in SQL, where select is used to view the stored data in the Druid and to view the specified dimensions and metric in the specified filter and time period, to specify the sort order through the descending field, and to support page pull, But aggregations and postaggregations are not supported.
The JSON instance is as follows

{"
  dimensions": [
      "Status",
      "Is_new_car"
  ], 
  "Pagingspec": {
  "pagingidentifiers": {},
  "Threshold": 3
  },
  "intervals": "1917-08-25t08:35:20+00:00/2017-08-25t08:35:20+00:00", 
  "DataSource": " App_auto_prem_qd_pp3 ", 
  " granularity ":" All ", 
  " context ": {
   " skipemptybuckets ":" True "
  },
  "QueryType": "Select"
}

Equivalent to SQL statement select Status,is_new_car from APP_AUTO_PREM_QD_PP3 limit 3

[{"Timestamp": "2017-08-22t14:00:00.000z", "result": {"pagingidentifiers": {"app_auto_prem_qd_pp3_201 7-08-22t08:00:00.000+08:00_2017-08-23t08:00:00.000+08:00_2017-08-22t18:11:01.983+08:00 ": 2}," Dimensions ": [" I
      S_new_car "," status "]," metrics ": [" Total_actual_premium "," Count "," Total_standard_premium "]," events ": [{ "Segmentid": "app_auto_prem_qd_pp3_2017-08-22t08:00:00.000+08:00_2017-08-23t08:00:00.000+08:00_2017-08-22t18 : 11:01.983+08:00 "," offset ": 0," event ": {" timestamp ":" 2017-08-22t22:00:00.000+08:00 "," s
        Tatus ": null," Is_new_car ":" Yes "," Total_actual_premium ": 1012.5399780273438," Count ": 1, "Total_standard_premium": 1250.050048828125}}, {"Segmentid": "app_auto_prem_qd_pp3_2017-08-22t08:0
        0:00.000+08:00_2017-08-23t08:00:00.000+08:00_2017-08-22t18:11:01.983+08:00 "," offset ": 1," event ": { "Timestamp": "2017-08-22t22:00:00.000+08:00 "," status ": null," Is_new_car ":" Yes "," Total_actual_premium ": 708.780029296875, "Count": 1, "Total_standard_premium": 1250.050048828125}}, {"Segmentid": "App_auto_pr em_qd_pp3_2017-08-22t08:00:00.000+08:00_2017-08-23t08:00:00.000+08:00_2017-08-22t18:11:01.983+08:00 "," offset ": 2, "event": {"timestamp": "2017-08-22t22:00:00.000+08:00", "status": null, "Is_new_car" : "Yes", "Total_actual_premium": 1165.489990234375, "Count": 1, "Total_standard_premium": 1692.80 0048828125}}]}]

Specifies the offset and the number of entries in the Pagingspec for the page pull, and the next offset is returned in the result.

"Pagingspec": {
  "pagingidentifiers": {},
  "threshold": 3,
  "Fromnext": True
  }

Search

The search query returns the dimensions in the match, similar to the TOPN operation in SQL, but supports more matching operations,
JSON shows for example

{
  "querytype": "Search",
  "DataSource": "App_auto_prem_qd_pp3",
  "granularity": "All",
  "Limit": 2,
  "Searchdimensions": [
    "Data_source",
    "Department_code"
  ],
  "query": {
    "type": " Insensitive_contains ",
    " value ":" 1 "
  },
  " sort ": {
    " type ":" Lexicographic "
  },
  " Intervals ": [
    " 1917-08-25t08:35:20+00:00/2017-08-25t08:35:20+00:00 "
  ]

Dimensions of the searchdimensions search

The field name            description                             must be
querytype for the         search query, and the field must be search
dataSource the dataset            to query                  is
Searchdimensions the     dimension in which search                 is run is               limit to restrict statistical results to             No (default)
granularity        The time granularity at which the query results are aggregated         is the 
time range of the intervals           query, by default the ISO-8601 format is sort to                  Specify                  whether
the search results are sorted by               the query operation                          is the
context that              specifies some query parameters                   no filter                 filters                        No

It is important to note that search simply returns a matching dimension, does not support other aggregation operations, and if you want to topn,groupby or timeseries search as a query condition, you can do so in the filter field
Specifies various filtering methods, and the filter field also supports regular matching.
The query results are as follows:

[{
  "timestamp": "2017-08-22t08:00:00.000+08:00",
  "result": [{
    "Dimension": "Data_source",
    "value" : "226931204023",
    "Count": 2
  }, {
    "dimension": "Data_source",
    "value": "226931204055",
    "Count": 7
  }]
}

Selection of Queries

1, if possible, it is recommended to use TimeSeries and TOPN query instead of GROUPBY,GROUPBY is the most flexible query, but also the worst performance. For aggregations that do not require grouping of dimensions, TimeSeries is faster than groupby queries, grouping and sorting for individual dimensions, TOPN queries more optimized than GroupBy

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.