GroupBy Query Structure

This section will help you understand how you can build your own GroupBy Druid queries. First, we need to understand the query structure. There are 13 main properties for the GroupBy query, some of which are mandatory.

Property Description Required
queryType This should always be "GroupBy" to identify the query type. yes
dataSource Identifies the data source for the query in the Intersight backend. yes
dimensions A list of dimensions that are used for the GroupBy operation. yes
virtualColumns A list of columns that can be referenced in dimensions, aggregations, or postAggregations. no (default none)
limitSpec Allows to sort and limit the query results. no
having Identifies which rows from the GroupBy query should be returned. no
granularity Defines the granularity of the query. yes
aggregations Determines how data should be aggregated in the query. yes/no
filter Determines which rows of data should be selected for the query. yes/no
postAggregations Secondary level of aggregation on the data coming out of the Intersight backend. no
intervals The time range of the query in ISO-8601 intervals. yes
subtotalsSpec An array of arrays to return additional result sets for groupings of subsets on top level dimensions. no
context Additional object that can be used to specify certain flags. no

While aggregations and filters are not strictly mandatory based on Druid itself, you will still have to use them to get proper results from the Intersight backend.

In this guide we will look into the following components:

  • queryType
  • dataSource
  • intervals
  • granularity
  • aggregations
  • filter
  • postAggregations
  • dimensions
  • limitSpec

If you are interested in the other properties, you can find more information in the official Druid documentation for GroupBy queries.

queryType

For a GroupBy query, the queryType should always be "groupBy".

"queryType": "groupBy"

dataSource

The dataSource depends on the metric you select. To get the right one you can check the Supported Metrics documentation for your specific metric. Once you know what the datasource is, you can simply specify it like this:

"dataSource": "PhysicalEntities"

There are more advanced parameters that we are not covering in this guide. You can check the official Druid documentation for more details.

intervals

Intervals determine the time range for which data shoudl be fetched. An interval consists of a starting time and an end time. You can specify multiple pairs of start and end times inside of an array.

The format for the start and end times should follow the ISO-8601 standard. You can use the following template to create your own start/end time: YYYY-MM-DDThh:mm:ss.fffZ

The following table shows the values for the relevant placeholders:

Component Values
Year (Y) YYYY - four-digit value for the year
Month (M) MM - 01 to 12
Day (D) DD - 01 to 31
Hour (h) hh - 00 to 23
Minute (m) mm - 00 to 59
Second (s) ss - 00 to 59
Fraction of a second (f) fff - 000 to 999

The "T" denotes the border between the date and the time. The "Z" stands for UTC. Both of them are just static values and not placeholders for a different value. An example of how a time range can look like can be found below.

"intervals": [
      "2023-07-14T19:46:00.000Z/2023-07-21T19:46:00.000Z"
      "2024-02-20T19:46:00.000Z/2024-02-27T19:46:00.000Z"
    ]

granularity

Defines the granularity of the retrieved data inside of the timeranges. If you, for example, query data for one day, the granularity determines if you get one data point per hour, per thirty minutes, per fifteen minutes, and so on. Keep in mind that the amount of data points available in the source data depends on your license tier. One system could thus have multiple data points for a time interval, where another system might only have a single data point. There are three ways of defining the granularity of these time buckets:

  • Simple string
  • Millisecond durations
  • ISO8601 periods

Below we will describe examples for all three options. >Note: You will only have to select one of them for your query.

For simple strings, you have the following options:

  • all
  • none
  • second
  • minute
  • five_minute
  • ten_minute
  • fifteen_minute
  • thirty_minute
  • hour
  • six_hour
  • eight_hour
  • day
  • week
  • month
  • quarter
  • year

Most of these strings are self-explanatory. Only "all" and "none" are a bit more complex. The string "all" retrieves everything aggregated into one bucket. The string "none" retrieves everything at ingestion granularity. When put into action with simple strings, this component looks something like this:

"granularity": "hour"

Duration granularities specify the size of the result buckets in milliseconds. In addition to the size of the time bucket, you can also specify the time from which to start counting these buckets. The default is "1970-01-01T00:00:00Z". This granularity setting can look something like below:

"granularity": {
   "type": "duration",
   "duration": 3600000,
   "origin": "2012-01-01T00:30:00Z"
}

The third option for specifying granlarities is by specifiying periods in ISO8601 format. You can use any combination of years, months, weeks, hours, minutes, and seconds. The string for the period setting always starts with a "P" followed by the time bucket you want to set. For example, a time bucket of one day would be "P1D". If you want to include granularities below a day, you need to lead the hours and seconds with a "T", for example "PT1H30M" for one hour and 30 minutes. The following variables are available:

  • Year (Y)
  • Month (M)
  • Day (D)
  • Hour (H)
  • Minute (M)
  • Second (S)

There are two optional parameters. The timezone defaults to UTC but can be overriden. This only applies to the result though, not any other parts of the query. The starting time from which the time buckets are being calculated is another optional parameter and defaults to "1970-01-01T00:00:00".

This results in a granularity setting like this:

"granularity": {
   "type": "period",
   "period": PT12H,
   "timeZone": "Europe/Vienna",
   "origin": "2012-02-01T00:00:00-08:00"
}

aggregations

Aggregations can be used to summarize the data that was queried from Intersight. If you do not specify an aggregation, the response will ony contain the timestamp, but no actual data. Thus, to retrieve one or more values, we need to add one or more aggregations. Aggregations are an array like this:

"aggregations": []

Inside of this array, you can add one aggregation for each value you want to retrieve. There are a lot of different aggregators available. See the table below for an overview.

Category Type Description
Count count Computes the number of rows that match the request
Sum longSum Computes the sum of values as a 64-bit signed integer
Sum doubleSum Computes the sum of values as a 64-bit floating point value
Sum floatSum Computes the sum of values as a 32-bit floating point value
Min longMin Computes the minimum of all metric values and Long.MAX_VALUE
Min doubleMin Computes the minimum of all metric values and Double.POSITIVE_INFINITY
Min floatMin Computes the minimum of all metric values and Float.POSITIVE_INFINITY
Max longMax Computes the maximum of all metric values and Long.MIN_VALUE
Max doubleMax Computes the maximum of all metric values and Double.NEGATIVE_INFINITY
Max floatMax Computes the maximum of all metric values and Float.NEGATIVE_INFINITY
Mean doubleMean Computes the arithmetic mean of a column's values as a 64-bit floating point value
First longFirst Retrieves the first value within the interval or 0
First doubleFirst Retrieves the first value within the interval or 0
First floatFirst Retrieves the first value within the interval or 0
First stringFirst Retrieves the first value within the interval or null
Last longLast Retrieves the last value within the interval or 0
Last doubleLast Retrieves the last value within the interval or 0
Last floatLast Retrieves the last value within the interval or 0
Last stringLast Retrieves the last value within the interval or null
Any longAny Returns any long metric value
Any doubleAny Returns any double metric value
Any floatAny Returns any float metric value
Any stringAny Returns any string metric value

You can then use these aggregation types like this:

"aggregations": [{
   "type": "longSum",
   "name": my_output,
   "fieldName": "system.cpu.utilization_user_max"
}]

Most aggregations follow the format above. The "type" field will always be present to determine the type of aggregation. The "name" determines what the output will be called once it is retrieved. For "count", this is all the required fields. For all other types, you will also need to add the "fieldName", which determines which field should be queried for the operation. For the First/Last aggregators you can also add the optional "timeColumn" field, which can point to the column that contains the timestamp (default is "__time"). For stringFirst/stringLast/stringAny, there is also an optional field "maxStringBytes", which determines the number of bytes (default 1024).

The JavaScript aggregator is not enabled in the Intersight API.

filter

When you look at an entry in the Intersight backend, it is essentially just a timestamp with a lot of associated information. Data points are grouped together based on the instrument, for example based on "System CPU" (system.cpu). To make sure that we are only retrieving the rows of data that contain what we are looking for, we always need to add a filter for the instrument. You can use the other attributes to narrow the scope of what you are querying further, essentially telling Intersight which lines of data you want to retrieve.

Druid offers multiple filter options. In this guide we will only look at the simple "selector" filter. All other filters are documented in the official Druid documentation. The "selector" filter matches a specific dimension with a specific value and follows this format:

"filter": {
   "type": "selector",
   "dimension": "instrument.name",
   "value": "system.cpu"
}

The filter above would select only those entries that belong to the "system.cpu" instrument. This filter for the instrument is a basic requirement to get proper results from Intersight. You can change the value for the "dimension" field to any attribute supported by Intersight metrics. The "value" field can then be any value that you want to match against.

If you then want to filter within an instrument, you can add a second filter. To do this, we must also add a logical operation for chaining them together. Druid supports "AND", "OR", or "NOT". You can construct a query with multiple filter conditions like this:

"filter": {
   "type": "and",
   "fields": [{
         "type": "selector",
         "dimension": "model",
         "value": "UCSC-C220-M6S"
      },{
         "type": "selector",
         "dimension": "instrument.name",
         "value": "system.cpu"
   }]
}

This combines two "selector" filters with the "AND" parent filter. You could replace the "and" in this case with an "or", or "not" as well.

postAggregations

Once you have run aggregations on your data, you might be left with multiple datapoints that you want to aggregate further. This can be done using postAggregations. There are many different options for this parameter, and you can specify multiple postAggregations in a query. In this documentation we will only look at the expression option for postAggregations. You can find further options in the official Druid documentation for postAggregation.

Expressions allow us to perform operations with the data. The one we are going to focus on in this guide is calculating averages for the individual data points. The basic syntax for an expression postAggregation looks like this:

"postAggregations": [{
   "type": "expression",
   "name": hw-fan-speed-Avg,
   "expression": "(\"hw.fan.speed-Sum\" / \"count\")",
   "ordering": null
}]

As you can see, this example calculates an expression over two data points: the sum of fan speed and the number of entries. By dividing these two values, we get the average fan speed. Let's briefly go through all the fields. The "type" field will always be "expression". The "name" can be chosen by you to depending on how you would like the output value to be called. As for the expressions, there are a lot of expressions that can be used in Druid, documented here: Druid Expressions. Finally, ordering can either be empty (null) or "numericFirst".

dimensions

The dimensions determine how the grouping is actually done in the query. Without any dimensions, all data is simply retrieved as a single value per timestamp. You can then add dimensions to get multiple output values per timestamp. Let's assume that you want to query the fan speed. Without dimensions, the GroupBy query will just produce a single line. You could now add a dimension like Host Name (host.name), which will result in one line for each Host Name. You can add any number of dimensions, depending on what kind of grouping you are looking for like this:

"dimensions": [
      "host.name"
    ]

This is a simple way of specifying dimensions. You can also write more complicated dimension specifications using Druid Dimension Specs.

limitSpec

This parameter can be used to sort and limit the result of your query. This can be useful if you want to present the results in a certain order, or if you only want to retrieve a limited set of outputs to focus on. The default parameter for a limitSpec looks like this:

"limitSpec": {
   "type": "default",
   "limit": 5,
   "offset": 5,
   "columns": []
}

The "limit" is an integer that selects the top N results from a query. With the limit of 5 above, we would get at most 5 results back from the query. The "offset" is an extension to the limit - it will be processed before the limit and skip the first N results. In this case, an offset of 5 would mean that the query skips the results 1 to 5 and actually selects the results from 6 to 10. This can be useful for pagination. This means that you can select 5 results at a time while still having the option to go to the next bucket of 5 results. By default the "offset" is set to 0, thus the first five results are retrieved. Keep in mind that this is calculated at query time. If some change in the data changes the order in which results are returned, the top N results will also change.

With "columns" you can filter the search result, which will also affect the other two values. This parameter is currently not supported via the Intersight API.

The "dimension" refers to one of the dimensions of the metric. The "direction" can be either ascending" or "descending". Finally, the "dimensionOrder" determines how the sorting should happen. There are a few options you can choose from:

  • lexicographic"(default): sorts strings based on their UTF-8 byte array representation
  • numeric: sorts numbers by their value
  • alphanumeric: if your values are a combination of strings and numbers
  • strlen: sorts by string length

Alternatively you could also just provide a simple string with the dimension name, which will result in an ascending lexographic order.