Metrics Model
The Cisco Observability Platform ingests metrics in OpenTelemetryâ„¢ Line Protocol (OTLP) format from your OpenTelemetry-compatible agents or collectors, converts them into the Cisco Observability Platform metrics model, enriches them with derived data, and stores them in the MELT+ store in the Cisco Observability Platform metrics model format.
The Cisco Observability Platform metrics model is similar to the OpenTelemetry data model for metrics, with important differences. This page describes the Cisco Observability Platform metrics model and explains how it relates to the OTLP format.
Note: This document contains references to third-party documentation. Cisco not own any rights and assumes no responsibility for the accuracy or completeness of such third-party documentation.
What is a Metric
A metric is a numerical measurement, sampled over a specific timeframe and typically with a fixed frequency, such as TargetConnectionErrorCount
. Cisco Observability Platform metrics are registered by a domain or feature in a type system. A metric has properties, a content type, and a category.
Terminology
Term |
|
Measurement event |
The act of recording one metric. For example, the act of recording the request latency of entity <N> . A measurement event is associated with exactly one timestamp. |
Metric category |
The default consumption function used when no consumption function is supplied in a query. The metric category defines how the platform computes a metric's value from a given content type. See Cisco Observability Platform Metric Categories. |
Metric data point |
A summary or aggregation of multiple numerical measurements, typically taken over a specific time range and at a fixed frequency. For example, a request duration is reported for the last minute. |
Metric timeseries |
A series of metric data points having the same entity ID, metric type, source, and a unique set of attributes. Solutions like Cisco Cloud Observability display metric timeseries as graphs in which time ranges are represented as (startTime , granularity ), whereas OpenTelemetry represents time ranges as (startTime , endTime ). Both representations are interchangeable. On graphs, solutions like Cisco Cloud Observability associate metric data points with only timestamp , startTime . |
Metric type |
A unique way to identify what the metric corresponds to and consists of the name of the metric, its content type, data type, and so on. For example, calls or https.response.size . |
Time aggregation |
An aggregation of measurement events or metric data points within the same timeseries. There are two types of time aggregations: • Measurement events that are converted into metric data points. Typically, the conversion is done on the client-side, but can also be done on the server-side. • Metric data points are aggregated to get fewer metric data points. Typically, this aggregation is done on the server-side, but can also be done on the client-side. |
Space aggregation |
An aggregation of metric data points having the same time ranges from multiple metric timeseries. |
The OpenTelemetry Data Model for Metrics
The OpenTelemetry data model for metrics specifies data formats and protocols for the import, transportation, and export of metrics. It includes the OpenTelemetry Line Protocol (OTLP) and the OpenTelemetry Timeseries Model.
OpenTelemetry Line Protocol
The OpenTelemetry Line Protocol (OTLP) defines how the OpenTelemetry metric stream is encoded and transported over gRPC or HTTP 1.1 to an OpenTelemetry timeseries store. Each metric stream is identified by its name, attributes, originating resource, and OTLP type (point kind). There can be more than one metric stream per instrument in the event model.
The Cisco Observability Platform supports the following OTLP types. Each OTLP type maps to a specific aggregation function or functions.
OTLP Type (Point Kind) |
Description |
Aggregation Function |
Monotonic Supported? |
Supported Aggregation Temporalities |
Sum |
The sum of all measurement event values. |
sum() |
Yes |
Delta |
Gauge |
A sampled value at a given time. Gauges do not provide an aggregation semantic. Instead, they provide a "last sample value". For this reason, the startTime is not meaningful for gauges; instead, it is a point event associated with endTimestamp, unlike the other OTLP types above. See the gauge definition in the metrics protobuf. |
latest() |
No |
- |
Summary |
The Cisco Observability Platform supports this OTLP type only when p0 and p100 are provided along with sum and count. |
- |
- |
- |
OpenTelemetry Timeseries Model
The OpenTelemetry Timeseries Model specifies how OpenTelemetry backends store metrics—in other words, the at rest format of metrics at their destination.
Pre-ingest, Ingest, and Post-ingest Granularity
- Pre-ingest metric granularity depends on the data source. The granularity of sampling can vary depending on the collector's configuration or the data source.
- Supported ingest granularities are one minute and five minutes. Granularities that fall within a threshold range (±3s) of defined ingest granularities are also acceptable. At ingest time, the Cisco Observability Platform aggregates metrics collected at sub-minute granularities into one-minute granularities by default. Cisco Observability Platform schemas define metrics and ingest granularities for all registered entity types.
- Post-ingest, the platform may aggregate metrics into higher granularities and compute roll-ups and summaries both at an entity's relationship level and at an entity's attribute level.
Metric Retention
The Cisco Observability Platform retains one-minute aggregations for eight days, and one-hour aggregations for 367 days.
Metric Content Types
The Cisco Observability Platform metrics model defines Sum
, Distribution
, and Gauge
metric content types. The following table explains how the platform maps OTLP types to Cisco Observability Platform metric content types.
OTLP Type |
Cisco Observability Platform Metric Content Type |
Description |
Examples |
Fields |
Gauge |
Gauge |
Same as OTLP type. Can be long or double depending on an entity's metricTypes attribute. The startTime attribute is mandatory, unlike in the OTLP Gauge type. |
system.cpu.utilization system.memory.utilization Room Temperature |
current groupCount |
Sum |
Sum |
Same as OTLP type. Can be long or double type depending on an entity's metricTypes attribute. |
The monetary value of transactions The number of requests system.paging.faults system.cpu.time Net profit value of stocks Current Queue Size Active Requests system.memory.usagesystem.paging.usage Heap size Memory buffer sizes |
sum groupCount : Number of base entities participating in space-aggregation. Default value is 1. |
Summary , when p0 and p100 are provided along with sum and count . Additional rules for conversion from OTLP type Summary to Cisco Observability Platform type Distribution : OTLP type Summary supports only double values. If the MetricType.type is defined as long , then any Summary.sum double value will be rounded to a Distribution.sum long value. The same applies to p0 and p100 values. This may result in a loss of precision. It is the domain's or agent's responsibility to not use the fraction part of the OTLP type Summary , when declaring the MetricType.type as long . If fractions parts need to be preserved, the distribution should be of the type double . If Summary reports quantiles other than p0 and p100 , they will be ignored during the conversion. |
Distribution |
Captures sum , min , max , and count . Useful for getting averages. Can be long or double depending on an entity's metricTypes attribute. All fields are mandatory. Is a superset of Sum . So, all Sum use cases can be addressed using Distribution . However, Distribution is costlier in processing and storage. Do not use unless required. |
http.server.duration rpc.client.request.size |
sum groupCount count min max |
Metric Categories
The Cisco Observability Platform metrics model introduces the concept of metric categories. Metric categories do not exist in the OpenTelemetry data model for metrics. When you look at metric graphs on solutions like Cisco Cloud Observability, you see a single value for each timestamp even though some Cisco Observability Platform metric content types can have multiple values. This single value is a calculation based on the metric category. The metric category is a consumption function -- a mathematical function that defines how that single value is calculated. Therefore, the best practice is to assign a metric category to each metric in your OpenTelemetry collector's configuration.
The Cisco Observability Platform metric model defines the following metric categories:
AVERAGE
CURRENT
CURRENT_PER_INSTRUMENTED_ENTITY
RATE_PER_MIN
RATE_PER_MIN_PER_SEC
SUM
SUM_PER_INSTRUMENTED_ENTITY
The following table lists each Cisco Observability Platform metric category, the mathematical formula that the platform uses to calculate a single value to display, and what metric content types the platform can assign to that metric category. For example, the platform assigns metrics of category CURRENT
to content type Gauge
.
Cisco Observability Platform Metric Category |
Description |
Mathematical Formula |
Allowed Metric Content Types |
Sample Usage |
AVERAGE |
Mathematical average |
(sum / count) |
Distribution |
For a metric request-latency using content type Distribution , and sending latencies of 100 requests:
sum = 320
count = 100
AVERAGE = 320/100 = 3.2 seconds |
CURRENT |
The current value |
current |
Gauge |
|
CURRENT_PER_INSTRUMENTED_ENTITY |
Average in spatial dimension |
(current / groupCount) |
Gauge |
For a metric system.cpu.utilization reported from 2 nodes as 10%, 20%: current = 30 (see Space Aggregation) for type Gauge ) groupCount = 2 CURRENT_PER_INSTRUMENTED_ENTITY = 30/2 = 15% |
RATE_PER_MIN |
Rate of change per minute |
(sum / granularity) (in minutes) where granularity = endTime - startTime |
Sum Distribution *Gauge |
For a metric Number-Of-Requests with content type Sum , one call every second, and an agent reporting every 30 seconds, RATE_PER_MIN and RATE_PER_MIN_PER_SEC are: * for timestamp = 60s, sum = 30, RATE_PER_MIN = 30/0.5=60, RATE_PER_MIN_PER_SEC = 30/30=1 * for timestamp = 120s, sum = 30, RATE_PER_MIN = 30/0.5=60, RATE_PER_MIN_PER_SEC = 30/30=1 |
RATE_PER_MIN_PER_SEC |
Rate of change per second |
Same as RATE_PER_MIN , but granularity , endTime , and startTime are in seconds. |
Sum Distribution Gauge |
|
SUM |
Mathematical sum |
|
Sum
Distribution |
|
SUM_PER_INSTRUMENTED_ENTITY |
Average in spatial dimension |
(sum / groupCount) |
Sum Distribution |
|
Consumption Functions
A consumption function is a mathematical function, which can show a different view or aggregation of metric data. For example, a max
function gives the maximum value of all metric data points in the given time range. A consumption function is similar to a metric category, except that it can be supplied dynamically at query time. In other words, in addition to a default value based on a category, a consumption function can be used when querying metrics. This is useful when you want to override the metric category and apply a different mathematical function for a query.
Consumption Function |
Description or Underlying Formula |
Allowed Content Types |
min |
min |
Distribution |
max |
max |
Distribution |
p |
Percentile. Any percentile value can be queried from the underlying digest summary using this consumption function. Example: p99.98 |
Histogram |
count |
count |
Distribution |
groupCount |
Number of base entities participating in a space aggregation. |
Sum Distribution Gauge |
stdDev |
Standard deviation |
Sum Distribution Gauge |
sumCumulative |
Latest sum value of cumulative metrics |
Sum Distribution |
value |
A reference to the metric category's underlying function. VALUE = CATEGORY function |
Sum Distribution Gauge |
Aggregations
Space Aggregations
The Cisco Observability Platform does not support space aggregations if the timeseries have different aggregation temporalities, such as Delta
and Cumulative
timeseries.
Cisco Observability Platform Metric Content Type |
Aggregation Temporality |
Space Aggregations |
Space Aggregated Type |
Sum |
Delta |
sum = sum(sums) groupCount = sum(groupCounts) |
Sum |
Distribution |
Delta |
sum = sum(sums) groupCount = sum(groupCounts) count = sum(counts) min = min(mins) max = max(maxes) |
Distribution |
Gauge |
Not applicable |
current = sum(currents) groupCount = sum(groupCounts) |
Gauge |
Time Aggregations
The Cisco Observability Platform converts all Cumulative
metrics into Delta
before storing them. We handle resets and gaps in Cumulative
metrics as follows:
- We detect resets and gaps based on the
StartTimeUnixNano
present in the metric packet and a comparison of that value with the same value in a previously received metric packet. If the current value is greater than the previous value, we treat it as a reset. Also, in case of monotonically increasing metrics, if the current metric value is less than previous metric value, then we treat it as a reset.
- We treat drops in the continuous data flow of metric data as gaps. The first metric we receive after a gap is not stored in our backend but is instead used as the new reference point to calculate future
Delta
metrics.
Cisco Observability Platform Metric Content Type |
Aggregation Temporality |
Time Aggregations |
Time Aggregated Type |
Sum |
Delta |
sum = sum(sum) groupCount = max(groupCounts) |
Sum |
Distribution |
Delta |
sum = sum(sums) groupCount = max(groupCounts) count = sum(counts) min = min(mins) max = max(maxes) |
Distribution |
Gauge |
Not applicable |
current = latest(currents) groupCount = latest(groupCounts) |
Gauge |
OpenTelemetry™ and Kubernetes® (as applicable) are trademarks of The Linux Foundation®.