Cisco Observability Platform

This documentation and the Cisco Observability Platform functionalities it describes are subject to change. Data saved on the platform may disappear and APIs may change without notice.

Metrics Model

The Cisco Observability Platform ingests metrics in OpenTelemetry™ Line Protocol (OTLP) format from your OpenTelemetry-compatible agents or collectors, converts them into the Cisco Observability Platform metrics model, enriches them with derived data, and stores them in the MELT+ store in the Cisco Observability Platform metrics model format.

The Cisco Observability Platform metrics model is similar to the OpenTelemetry data model for metrics, with important differences. This page describes the Cisco Observability Platform metrics model and explains how it relates to the OTLP format.

Note: This document contains references to third-party documentation. Cisco not own any rights and assumes no responsibility for the accuracy or completeness of such third-party documentation.

What is a Metric

A metric is a numerical measurement, sampled over a specific timeframe and typically with a fixed frequency, such as TargetConnectionErrorCount. Cisco Observability Platform metrics are registered by a domain or feature in a type system. A metric has properties, a content type, and a category.

Terminology

Term
Measurement event	The act of recording one metric. For example, the act of recording the request latency of entity `<N>`. A measurement event is associated with exactly one timestamp.
Metric category	The default consumption function used when no consumption function is supplied in a query. The metric category defines how the platform computes a metric's value from a given content type. See Cisco Observability Platform Metric Categories.
Metric data point	A summary or aggregation of multiple numerical measurements, typically taken over a specific time range and at a fixed frequency. For example, a request duration is reported for the last minute.
Metric timeseries	A series of metric data points having the same entity ID, metric type, source, and a unique set of attributes. Solutions like Cisco Cloud Observability display metric timeseries as graphs in which time ranges are represented as (`startTime`, `granularity`), whereas OpenTelemetry represents time ranges as (`startTime`, `endTime`). Both representations are interchangeable. On graphs, solutions like Cisco Cloud Observability associate metric data points with only `timestamp`, `startTime`.
Metric type	A unique way to identify what the metric corresponds to and consists of the name of the metric, its content type, data type, and so on. For example, `calls` or `https.response.size`.
Time aggregation	An aggregation of measurement events or metric data points within the same timeseries. There are two types of time aggregations: • Measurement events that are converted into metric data points. Typically, the conversion is done on the client-side, but can also be done on the server-side. • Metric data points are aggregated to get fewer metric data points. Typically, this aggregation is done on the server-side, but can also be done on the client-side.
Space aggregation	An aggregation of metric data points having the same time ranges from multiple metric timeseries.

The OpenTelemetry Data Model for Metrics

The OpenTelemetry data model for metrics specifies data formats and protocols for the import, transportation, and export of metrics. It includes the OpenTelemetry Line Protocol (OTLP) and the OpenTelemetry Timeseries Model.

OpenTelemetry Line Protocol

The OpenTelemetry Line Protocol (OTLP) defines how the OpenTelemetry metric stream is encoded and transported over gRPC or HTTP 1.1 to an OpenTelemetry timeseries store. Each metric stream is identified by its name, attributes, originating resource, and OTLP type (point kind). There can be more than one metric stream per instrument in the event model.

The Cisco Observability Platform supports the following OTLP types. Each OTLP type maps to a specific aggregation function or functions.

OTLP Type (Point Kind)	Description	Aggregation Function	Monotonic Supported?	Supported Aggregation Temporalities
`Sum`	The sum of all measurement event values.	`sum()`	Yes	Delta
`Gauge`	A sampled value at a given time. Gauges do not provide an aggregation semantic. Instead, they provide a "last sample value". For this reason, the startTime is not meaningful for gauges; instead, it is a point event associated with endTimestamp, unlike the other OTLP types above. See the gauge definition in the metrics protobuf.	`latest()`	No	-
`Summary`	The Cisco Observability Platform supports this OTLP type only when p0 and p100 are provided along with sum and count.	-	-	-

OpenTelemetry Timeseries Model

The OpenTelemetry Timeseries Model specifies how OpenTelemetry backends store metrics—in other words, the at rest format of metrics at their destination.

The Cisco Observability Platform Metrics Model

Pre-ingest, Ingest, and Post-ingest Granularity

Pre-ingest metric granularity depends on the data source. The granularity of sampling can vary depending on the collector's configuration or the data source.
Supported ingest granularities are one minute and five minutes. Granularities that fall within a threshold range (±3s) of defined ingest granularities are also acceptable. At ingest time, the Cisco Observability Platform aggregates metrics collected at sub-minute granularities into one-minute granularities by default. Cisco Observability Platform schemas define metrics and ingest granularities for all registered entity types.
Post-ingest, the platform may aggregate metrics into higher granularities and compute roll-ups and summaries both at an entity's relationship level and at an entity's attribute level.

Metric Retention

The Cisco Observability Platform retains one-minute aggregations for eight days, and one-hour aggregations for 367 days.

Metric Content Types

The Cisco Observability Platform metrics model defines Sum, Distribution, and Gauge metric content types. The following table explains how the platform maps OTLP types to Cisco Observability Platform metric content types.

OTLP Type	Cisco Observability Platform Metric Content Type	Description	Examples	Fields
`Gauge`	`Gauge`	Same as OTLP type. Can be `long` or `double` depending on an entity's `metricTypes` attribute. The `startTime` attribute is mandatory, unlike in the OTLP `Gauge` type.	`system.cpu.utilization` `system.memory.utilization` Room Temperature	`current` `groupCount`
`Sum`	`Sum`	Same as OTLP type. Can be `long` or `double` type depending on an entity's `metricTypes` attribute.	The monetary value of transactions The number of requests `system.paging.faults` `system.cpu.time` Net profit value of stocks Current Queue Size Active Requests `system.memory.usagesystem.paging.usage` Heap size Memory buffer sizes	`sum` `groupCount`: Number of base entities participating in space-aggregation. Default value is 1.
`Summary`, when `p0` and `p100` are provided along with `sum` and `count`. Additional rules for conversion from OTLP type `Summary` to Cisco Observability Platform type `Distribution`: OTLP type `Summary` supports only `double` values. If the `MetricType.type` is defined as `long`, then any `Summary.sum` `double` value will be rounded to a `Distribution.sum` `long` value. The same applies to `p0` and `p100` values. This may result in a loss of precision. It is the domain's or agent's responsibility to not use the fraction part of the OTLP type `Summary`, when declaring the `MetricType.type` as `long`. If fractions parts need to be preserved, the distribution should be of the type `double`. If `Summary` reports quantiles other than `p0` and `p100`, they will be ignored during the conversion.	`Distribution`	Captures `sum`, `min`, `max`, and `count`. Useful for getting averages. Can be `long` or `double` depending on an entity's `metricTypes` attribute. All fields are mandatory. Is a superset of `Sum`. So, all `Sum` use cases can be addressed using `Distribution`. However, `Distribution` is costlier in processing and storage. Do not use unless required.	`http.server.duration` `rpc.client.request.size`	`sum` `groupCount` `count` `min` `max`

Metric Categories

The Cisco Observability Platform metrics model introduces the concept of metric categories. Metric categories do not exist in the OpenTelemetry data model for metrics. When you look at metric graphs on solutions like Cisco Cloud Observability, you see a single value for each timestamp even though some Cisco Observability Platform metric content types can have multiple values. This single value is a calculation based on the metric category. The metric category is a consumption function -- a mathematical function that defines how that single value is calculated. Therefore, the best practice is to assign a metric category to each metric in your OpenTelemetry collector's configuration.

The Cisco Observability Platform metric model defines the following metric categories:

AVERAGE
CURRENT
CURRENT_PER_INSTRUMENTED_ENTITY
RATE_PER_MIN
RATE_PER_MIN_PER_SEC
SUM
SUM_PER_INSTRUMENTED_ENTITY

The following table lists each Cisco Observability Platform metric category, the mathematical formula that the platform uses to calculate a single value to display, and what metric content types the platform can assign to that metric category. For example, the platform assigns metrics of category CURRENT to content type Gauge.

Cisco Observability Platform Metric Category	Description	Mathematical Formula	Allowed Metric Content Types	Sample Usage
`AVERAGE`	Mathematical average	`(sum / count)`	`Distribution`	For a metric request-latency using content type `Distribution`, and sending latencies of 100 requests: `sum` = 320 `count` = 100 `AVERAGE` = 320/100 = 3.2 seconds
`CURRENT`	The current value	current	`Gauge`
`CURRENT_PER_INSTRUMENTED_ENTITY`	Average in spatial dimension	`(current / groupCount)`	`Gauge`	For a metric `system.cpu.utilization` reported from 2 nodes as 10%, 20%: `current` = 30 (see Space Aggregation) for type `Gauge`) `groupCount` = 2 `CURRENT_PER_INSTRUMENTED_ENTITY` = 30/2 = 15%
`RATE_PER_MIN`	Rate of change per minute	`(sum / granularity)` (in minutes) where `granularity = endTime - startTime`	`Sum` `Distribution` *`Gauge`	For a metric `Number-Of-Requests` with content type `Sum`, one call every second, and an agent reporting every 30 seconds, `RATE_PER_MIN` and `RATE_PER_MIN_PER_SEC` are: * for `timestamp` = 60s, `sum` = 30, `RATE_PER_MIN` = 30/0.5=60, `RATE_PER_MIN_PER_SEC` = 30/30=1 * for `timestamp` = 120s, `sum` = 30, `RATE_PER_MIN` = 30/0.5=60, `RATE_PER_MIN_PER_SEC` = 30/30=1
`RATE_PER_MIN_PER_SEC`	Rate of change per second	Same as `RATE_PER_MIN`, but `granularity`, `endTime`, and `startTime` are in seconds.	`Sum` `Distribution` `Gauge`
`SUM`	Mathematical sum		`Sum` `Distribution`
`SUM_PER_INSTRUMENTED_ENTITY`	Average in spatial dimension	`(sum / groupCount)`	`Sum` `Distribution`

Consumption Functions

A consumption function is a mathematical function, which can show a different view or aggregation of metric data. For example, a max function gives the maximum value of all metric data points in the given time range. A consumption function is similar to a metric category, except that it can be supplied dynamically at query time. In other words, in addition to a default value based on a category, a consumption function can be used when querying metrics. This is useful when you want to override the metric category and apply a different mathematical function for a query.

Consumption Function	Description or Underlying Formula	Allowed Content Types
`min`	`min`	`Distribution`
`max`	`max`	`Distribution`
`p`	Percentile. Any percentile value can be queried from the underlying digest summary using this consumption function. Example: `p99.98`	`Histogram`
`count`	`count`	`Distribution`
`groupCount`	Number of base entities participating in a space aggregation.	`Sum` `Distribution` `Gauge`
`stdDev`	Standard deviation	`Sum` `Distribution` `Gauge`
`sumCumulative`	Latest sum value of cumulative metrics	`Sum` `Distribution`
`value`	A reference to the metric category's underlying function. VALUE = CATEGORY function	`Sum` `Distribution` `Gauge`

Aggregations

Space Aggregations

The Cisco Observability Platform does not support space aggregations if the timeseries have different aggregation temporalities, such as Delta and Cumulative timeseries.

Cisco Observability Platform Metric Content Type	Aggregation Temporality	Space Aggregations	Space Aggregated Type
`Sum`	Delta	`sum = sum(sums)` `groupCount = sum(groupCounts)`	`Sum`
`Distribution`	Delta	`sum = sum(sums)` `groupCount = sum(groupCounts)` `count = sum(counts)` `min = min(mins)` `max = max(maxes)`	`Distribution`
`Gauge`	Not applicable	`current = sum(currents)` `groupCount = sum(groupCounts)`	`Gauge`

Time Aggregations

The Cisco Observability Platform converts all Cumulative metrics into Delta before storing them. We handle resets and gaps in Cumulative metrics as follows:

We detect resets and gaps based on the StartTimeUnixNano present in the metric packet and a comparison of that value with the same value in a previously received metric packet. If the current value is greater than the previous value, we treat it as a reset. Also, in case of monotonically increasing metrics, if the current metric value is less than previous metric value, then we treat it as a reset.
We treat drops in the continuous data flow of metric data as gaps. The first metric we receive after a gap is not stored in our backend but is instead used as the new reference point to calculate future Delta metrics.

Cisco Observability Platform Metric Content Type	Aggregation Temporality	Time Aggregations	Time Aggregated Type
`Sum`	`Delta`	`sum = sum(sum)` `groupCount = max(groupCounts)`	`Sum`
`Distribution`	`Delta`	`sum = sum(sums)` `groupCount = max(groupCounts)` `count = sum(counts)` `min = min(mins)` `max = max(maxes)`	`Distribution`
`Gauge`	Not applicable	`current = latest(currents)` `groupCount = latest(groupCounts)`	`Gauge`

OpenTelemetry™ and Kubernetes® (as applicable) are trademarks of The Linux Foundation®.