Core
use of the Druid query interface
Druid Query interface is the HTTP rest style query method, using the HTTP Rest style query (broker,historical, or Realtime) node data, query parameters are in JSON format, each node type will expose the same REST query interface
Curl-x POST ' <queryable_host>:<port>/druid/v2/?pretty '-h ' content-type:application/json '-D @<query_ Json_file>
Queryable_host:broker node IP port:broker node port default is 8082
Curl-l-H ' content-type:application/json '-xpost--data-binary @quickstart/aa.json http://10.20.23.41:8082/druid/v2/ ? Pretty
the types of query queries are
1, Timeseries
2, TopN
3, GroupBy
4. Time Boundary
5, Segment Metadata
6, Datasource Metadata
7. Search
8. Select
Where Timeseries, TopN, groupby for the aggregate query, time boundary, Segment Metadata, Datasource Metadata for metadata query, search query
1, Timeseries
For summary data that needs to be counted for a period of time, or for a specified time granularity, Druid can be done through timeseries.
The TimeSeries query includes fields such as the following
The field name describes whether the
query type must be QueryType , where only the timeseries query is filled out
dataSource the dataset to query is
Descending whether the time range of the intervals query is descending, the default is ISO-8601 format is
granularity The time granularity at which the query results are aggregated is the filter filter Condition No
aggregations aggregation is
postaggregations Post-aggregation No
context specifies some query parameter No
The timeseries outputs the statistics for the specified condition within each time granularity, filters the conditions by filter, and specifies the aggregation mode through aggregations and postaggregations. TimeSeries cannot output dimension information, granularity supports dimensions such as All,none,second,minute,hour,day,week,month,year
All: Total 1 output none: Deprecated
Other: The corresponding granularity statistics are output
JSON for queries
{"
aggregations": [
{
"type": "Count",
"name": "Count"
}
],
"intervals": " 1917-08-25t08:35:20+00:00/2017-08-25t08:35:20+00:00 ",
" DataSource ":" App_auto_prem_qd_pp3 ",
" Granularity ":" All ",
" postaggregations ": [],
" QueryType ":" TimeSeries "
}
Equivalent to SQL SELECT COUNT (1) from APP_AUTO_PREM_QD_PP3
TopN
Returns an ordered sequence of top-n for a specified dimension and sort field. TOPN supports returning the first N records and supporting the specified metric as sorted by
{
"metric": "Sum__total_standard_premium",
"aggregations": [
{
"type": "Doublesum",
" FieldName ":" Total_standard_premium ",
" name ":" Sum__total_standard_premium "
}
],
" Dimension " : "Is_new_car",
"intervals": "1917-08-29t20:05:10+00:00/2017-08-29t20:05:10+00:00",
"DataSource": "App_ Auto_prem_qd_pp3 ",
" granularity ":" All ",
" threshold ": 50000,
" postaggregations ": [],
" QueryType ":" TopN "
}
The field name description must be
querytype for TOPN queries, this must be TOPN is
dataSource the dataset to query is
intervals the time range of the query, by default the ISO-8601 format is the filter filter Condition No
aggregations aggregation is
Postaggregations Late aggregation no
dimension for TOPN query maintenance, a TOPN query can only have one dimension is
threshold The n value in TOPN is
metric to be counted and sorted metric is the
context specifies some query parameter No
metric: is TOPN exclusive
Way:
"Metric": "<metric_name>" By default is
"metric" in ascending order: {
"type": "Numeric",//Specify Sort by numeric Descending "
metric": "& Lt;metric_name> "
}
" metric ": {
" type ":" Inverted ",//Specify Sort by numeric ascending"
metric ":" <metric_name > "
}
" metric ": {
" type ":" lexicographic ",//Specify Sort by dictionary
" metric ":" <metric_name> "
}
"metric": {
"type": "Alphanumeric",//Specify Sort by number
"metric": "<metric_name>"
}
It is important to note that TOPN is an approximate algorithm, each segment returns the first 1000 strips to be merged to get the final result if dimension
The base is within 1000, it is accurate, and more than 1000 is the approximate value
GroupBy
GroupBy is similar to the group by operation in SQL, it can group multiple dimensions specified, also supports sorting the specified dimension, and outputs the number of limit rows, while supporting having operations
{"
dimensions": [
"Is_new_car",
"status"
],
"aggregations": [
{
"type": "Doublesum", "
fieldName": "Total_standard_premium",
"name": "Sum__total_standard_premium"
}
],
"having ": {
" type ":" GreaterThan ",
" aggregation ":" Sum__total_standard_premium ",
" value ":" 484000 "
},
"intervals": "1917-08-29t20:26:52+00:00/2017-08-29t20:26:52+00:00",
"Limitspec": {
"limit": 2,
"type": "Default",
"columns": [
{
"direction": "Descending",
"Dimension": "Sum__total_ Standard_premium "
}
]
},
" granularity ":" All ",
" postaggregations ": [],
" QueryType " : "GroupBy",
"DataSource": "App_auto_prem_qd_pp3"
}
Equivalent to SQL Select Is_new_car,status,sum (total_standard_premium) from APP_AUTO_PREM_QD_PP3 GROUP by Is_new_car,status limit 50000 having sum (total_standard_premium) >484000
{
"version": "V1",
"timestamp": "1917-08-30t04:26:52.000+08:00",
"event": {
"Sum__total_standard_ Premium ": 8.726074368E9,
" Is_new_car ":" Yes ",
" status ": null
}
}, {
" version ":" V1 ",
" Timestamp ":" 1917-08-30t04:26:52.000+08:00 ",
" event ": {
" sum__total_standard_premium ": 615152.0,
" Is_new_car ":" No ",
" status ": null
}
}
The field name description must be
querytype for groupby query, the field must be GroupBy is
dataSource the dataset to query is
Dimensions the collection of dimensions for a groupby query is
limitspec statistical results to sort no having to filter the statistical results no
granularity the time granularity of the aggregation of the query results is
postaggregations after the aggregator no
intervals The time range of the query, the default is ISO-8601 format is
context specifies some query parameter No
GroupBy Special fields Limitspec and having
Limitspec
Specify the number of rows for collations and limit
{
"type": "Default",
"limit": <integer_value>,
"columns": [List of orderbycolumnspec]
}
Where columns is an array, you can specify more than one sort field, and the Sort field enables demension or metric to specify how the collation is spelled
{
"Dimension": "<any Dimension or Metric name>",
"direction": < "ascending" | " Descending ">
}
" Limitspec ": {"
limit ": 2,
" type ":" Default ",
" columns ": [
{
" Direction ":" Descending ",
" Dimension ":" Sum__total_standard_premium "
},
{
" direction ":" Ascending ",
" Dimension ":" Is_new_car "
}
]
}
Having a having operation similar to that in SQL
Select
Select is similar to the select operation in SQL, where select is used to view the stored data in the Druid and to view the specified dimensions and metric in the specified filter and time period, to specify the sort order through the descending field, and to support page pull, But aggregations and postaggregations are not supported.
The JSON instance is as follows
{"
dimensions": [
"Status",
"Is_new_car"
],
"Pagingspec": {
"pagingidentifiers": {},
"Threshold": 3
},
"intervals": "1917-08-25t08:35:20+00:00/2017-08-25t08:35:20+00:00",
"DataSource": " App_auto_prem_qd_pp3 ",
" granularity ":" All ",
" context ": {
" skipemptybuckets ":" True "
},
"QueryType": "Select"
}
Equivalent to SQL statement select Status,is_new_car from APP_AUTO_PREM_QD_PP3 limit 3
[{"Timestamp": "2017-08-22t14:00:00.000z", "result": {"pagingidentifiers": {"app_auto_prem_qd_pp3_201 7-08-22t08:00:00.000+08:00_2017-08-23t08:00:00.000+08:00_2017-08-22t18:11:01.983+08:00 ": 2}," Dimensions ": [" I
S_new_car "," status "]," metrics ": [" Total_actual_premium "," Count "," Total_standard_premium "]," events ": [{ "Segmentid": "app_auto_prem_qd_pp3_2017-08-22t08:00:00.000+08:00_2017-08-23t08:00:00.000+08:00_2017-08-22t18 : 11:01.983+08:00 "," offset ": 0," event ": {" timestamp ":" 2017-08-22t22:00:00.000+08:00 "," s
Tatus ": null," Is_new_car ":" Yes "," Total_actual_premium ": 1012.5399780273438," Count ": 1, "Total_standard_premium": 1250.050048828125}}, {"Segmentid": "app_auto_prem_qd_pp3_2017-08-22t08:0
0:00.000+08:00_2017-08-23t08:00:00.000+08:00_2017-08-22t18:11:01.983+08:00 "," offset ": 1," event ": { "Timestamp": "2017-08-22t22:00:00.000+08:00 "," status ": null," Is_new_car ":" Yes "," Total_actual_premium ": 708.780029296875, "Count": 1, "Total_standard_premium": 1250.050048828125}}, {"Segmentid": "App_auto_pr em_qd_pp3_2017-08-22t08:00:00.000+08:00_2017-08-23t08:00:00.000+08:00_2017-08-22t18:11:01.983+08:00 "," offset ": 2, "event": {"timestamp": "2017-08-22t22:00:00.000+08:00", "status": null, "Is_new_car" : "Yes", "Total_actual_premium": 1165.489990234375, "Count": 1, "Total_standard_premium": 1692.80 0048828125}}]}]
Specifies the offset and the number of entries in the Pagingspec for the page pull, and the next offset is returned in the result.
"Pagingspec": {
"pagingidentifiers": {},
"threshold": 3,
"Fromnext": True
}
Search
The search query returns the dimensions in the match, similar to the TOPN operation in SQL, but supports more matching operations,
JSON shows for example
{
"querytype": "Search",
"DataSource": "App_auto_prem_qd_pp3",
"granularity": "All",
"Limit": 2,
"Searchdimensions": [
"Data_source",
"Department_code"
],
"query": {
"type": " Insensitive_contains ",
" value ":" 1 "
},
" sort ": {
" type ":" Lexicographic "
},
" Intervals ": [
" 1917-08-25t08:35:20+00:00/2017-08-25t08:35:20+00:00 "
]
Dimensions of the searchdimensions search
The field name description must be
querytype for the search query, and the field must be search
dataSource the dataset to query is
Searchdimensions the dimension in which search is run is limit to restrict statistical results to No (default)
granularity The time granularity at which the query results are aggregated is the
time range of the intervals query, by default the ISO-8601 format is sort to Specify whether
the search results are sorted by the query operation is the
context that specifies some query parameters no filter filters No
It is important to note that search simply returns a matching dimension, does not support other aggregation operations, and if you want to topn,groupby or timeseries search as a query condition, you can do so in the filter field
Specifies various filtering methods, and the filter field also supports regular matching.
The query results are as follows:
[{
"timestamp": "2017-08-22t08:00:00.000+08:00",
"result": [{
"Dimension": "Data_source",
"value" : "226931204023",
"Count": 2
}, {
"dimension": "Data_source",
"value": "226931204055",
"Count": 7
}]
}
Selection of Queries
1, if possible, it is recommended to use TimeSeries and TOPN query instead of GROUPBY,GROUPBY is the most flexible query, but also the worst performance. For aggregations that do not require grouping of dimensions, TimeSeries is faster than groupby queries, grouping and sorting for individual dimensions, TOPN queries more optimized than GroupBy