aci-monitoring

ACI Moniring Service is an application that, using the REST API, allows to
fetch any information from the Cisco Application Centric Infrastructure (ACI)
controller. The collected information will then be saved in the
Prometheus database and then it will be possible to
visualize it using the Grafana tool. This application
complements the monitoring functionality available through the APIC GUI, as
the data available there has short retention periods.

Why Cisco ACI?

Cisco
ACI

is a modern solution that provides an efficient and scalable communication
infrastructure for Data Center. Features such as efficient transport for
applications, nonblocking architecture, and easy scaling from a performance
perspective are not all features of ACI. ACI is also primarily an SDN
(Software Defined Network) environment. Therefore, a feature as important as
those presented above is the ability to fully manage ACI using programming
tools
and
APIs. This is a possibility that is
currently little used and gives great opportunities for integration,
automation (via tools like
Ansible and
Terraform),
and building on the use of ACI as the foundation of the entire ecosystem of
mutually cooperating components of the Data Center.

In the ACI controller and directly from the devices, we also have classical
methods used for network monitoring, such as
SNMP
and Syslog. However, they only allow access to part of the information in
relation to the API, which allows access to all objects from over 16k+
classes
of the object
model. In addition, through SNMP we do not have access to such information as
the application model or parameters of communication applications, because as
such they are not available from typical network devices, in which MIBs for
SNMP were used from.

Access to data via API greatly facilitates access to information, as there is
no need to use, for example, classical and unreliable methods of parsing the
results of commands. The model of data access via the
REST protocol includes one more important feature,
it allows some data processing to be transferred to the server side. This is
the same approach as we have in a typical database engine, where the SQL query
contains filters and
aggregators
,
which are performed on the server side on local data, significantly improving
the search speed and reducing the amount of information needed to be sent.

Architecture

The architecture of this application had a few things in mind:

  • Poll data from ACI
    • It could subscribe to changes as well, but the poller is more stable
    • Polling should be efficient enough to get all information from ACI controllers
  • Transfer data to Prometheus and Grafana
    • Both tools are perfect to use that kind of data
    • Data can be modified using the extensive aggregation language and
      Prometheus filters
  • Use as small resources as its possible
    • Both in the matter of memory and CPU usage
  • GUI should allow easily add new monitored properties
    • It needs an interface to add complex filters in a simple way
    • Possibility to monitor any object and attribute from the ACI object model

The entire architecture has been designed in a microservice model to
facilitate software maintenance and development. The structure of the
application components and their dependencies are presented in the figure
below:

data flow graph

Components

  • Data Poller - a component written in Python that connects to the APIC
    controller, logs in and uses a REST query to retrieve data about the objects
    specified in the configuration. This script also starts the HTTP server and
    uses it to provide the retrieved value in the form of the Prometheus metric.

  • Configuration API - Component that provides API for configuration
    management of a list of ACI monitoring classes. This configuration is stored
    in the Redis database. This API is used by the GUI for configuration
    management.

  • Configuration GUI - Simple GUI for creation and deletion of monitoring
    metrics.

  • Prometheus - database optimized to store data in the form of
    time-series. Its configuration specifies the access data to the server, to
    which it will periodically (by default every 15s) send a query for the
    current value of all available metrics. The collected data is stored in an
    internal data format and a complex language is available to specify the
    possible operations on it.

  • Grafana - is a popular tool for visualizing various types of data, which
    can be downloaded from various backends, including Prometheus.

Tooling

The tooling has been selected carefully to decide what would match the best
expectations.

Component Solution Description Alternative solution
GUI React with MUI Powerful framework, yet simple to work with. Its simple to add even complex display logic. Any modern framework could work fine.
Database Redis There is only small amount of data that needs to be stored. Redis has very small requirements, yet suits the expectations. Any database could be used, although in the future Redis has one great capability that it has built in pub/sub, so the application may subscribe to the changes easily.
Database API Flask Easy to use and flexible web application development framework. There are none as Flask is one of the most popular micro-framework for Python web application development in recent times.
Dashboard & Monitoring Prometheus and Grafana Common tool for that purpose. If Prometheus would not be needed, InfluxDB with Grafana could be a better choice.

Code quality

To ensure better code quality, there is set up CI/CD to:

  • run unit tests
  • run Python linter
  • ensure common code formatting
  • ensure types safety
  • audit security of 3rd-party libraries

Quickstart

It is possible to run and test the operation of the application using the
Cisco APIC Simulator
Sandbox
.
It is a simulated infrastructure that provides full access to ACI RESTful APIs
over http(s) with XML and JSON encodings. For ACI production infrastructure,
it is strongly recommended to create a dedicated user with RO permissions
only.

To get aci-monitoring up and running on Linux or macOS system run the
following commands.

git clone -b main https://github.com/SoftFlowTech/aci-monitoring.git
cd aci-monitoring
tee local.env <<EOF
ACI_URL=https://apic-ip-address
ACI_USERNAME=aciROuser
ACI_PASSWORD=aciROuserPass
EOF
docker compose pull
docker compose up -d

The whole application will be available after a few seconds.
Open the URL http://localhost:5006 in a web-browser.
You should see the GUI homepage.
In the top-right corner you can add new metrics to monitor.

gui

Grafana is available at http://localhost:5001. The
default credentials are:

  • Username: admin
  • Password: admin

Prometheus is also available at
http://localhost:5002 and can be used to write and
test queries about the collected data.

The container images are built and published in
DockerHub.

Dependencies

This project relies only on Docker and docker-compose meeting these requirements:

  • The Docker version must be at least 20.10.10.
  • The docker-compose version must be at least 1.28.0.

To check the version installed on your system run docker --version and
docker-compose --version.

Configure metrics

To add new metrics, you should visit Configuration
GUI
, and click "Add new" button.

There will appear simple form, where you may set:

  • Metric name: for Prometheus/Grafana
  • Class name: ACI MO class name that should be monitored
  • Attributes (one or many): list of attributes that should be monitored in
    this class
  • Query Filter: list of filters, to select only the MOs that are expected

Example configuration:

add new metric

Alternatively, you may use "Advanced" button, to provide raw ACI query filter:

add new metric adv

Example ACI Classes

Below is a list of useful Cisco ACI classes to select from.

Class Description
eqptEgrBytes5min current Egress Bytes stats in 5 minute
eqptEgrDropPkts5min current Egress Drop Packets stats in 5 minute
eqptEgrPkts5min current Egress Packets stats in 5 minute
eqptFan Equipment Fan
eqptFanStats current fan stats
eqptFanStats5min current fan stats in 5 minute
eqptFruPower5min current FRU power stats in 5 minute
eqptIngrBytes5min current Ingress Bytes stats in 5 minute
eqptIngrDropPkts5min current Ingress Drop Packets stats in 5 minute
eqptIngrErrPkts5min current Ingress Error Packets stats in 5 minute
eqptIngrPkts5min current Ingress Packets stats in 5 minute
eqptPsPower5min current power supply stats in 5 minute
eqptPsu Power Supply Unit
eqptPsuSlot Power Supply Slot
eqptTemp5min current temperature stats in 5 minute
ethpmDOMStats Digital Optical Monitor, Transceiver details
ethpmPhysIf Physical Interface Runtime State (ethpm)
fabricNodeHealth5min current node health stats in 5 minute
fabricOverallHealth5min current overall fabric health stats in 5 minute
fvOverallHealth15min current overall tenant health stats in 15 minute
l1PhysIf Layer 1 Physical Interface Configuration
licenseManager License Manager
procSysCPU5min current System cpu stats in 5 minute

Configure Grafana

You can use any data visualization method available in Grafana. Data obtained
from Prometheus can be aggregated and statistical operations can be performed
on them. Sample chart:

grafana

Development status

aci-monitoring is beta software, but it has already been used in production,
and it has an extensive test suite.

Contributing

Help in testing, development, documentation and other tasks is highly
appreciated and useful to the project.

License

Copyright 2022 (c) SoftFlow.

Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License.
You may obtain a copy of the License at:

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.

View code on GitHub

Code Exchange Community

Get help, share code, and collaborate with other developers in the Code Exchange community.View Community
Disclaimer:
Cisco provides Code Exchange for convenience and informational purposes only, with no support of any kind. This page contains information and links from third-party websites that are governed by their own separate terms. Reference to a project or contributor on this page does not imply any affiliation with or endorsement by Cisco.