ACI Moniring Service is an application that, using the REST API, allows to
fetch any information from the Cisco Application Centric Infrastructure (ACI)
controller. The collected information will then be saved in the
Prometheus database and then it will be possible to
visualize it using the Grafana tool. This application
complements the monitoring functionality available through the APIC GUI, as
the data available there has short retention periods.
Cisco
ACI
is a modern solution that provides an efficient and scalable communication
infrastructure for Data Center. Features such as efficient transport for
applications, nonblocking architecture, and easy scaling from a performance
perspective are not all features of ACI. ACI is also primarily an SDN
(Software Defined Network) environment. Therefore, a feature as important as
those presented above is the ability to fully manage ACI using programming
tools and
APIs. This is a possibility that is
currently little used and gives great opportunities for integration,
automation (via tools like
Ansible and
Terraform),
and building on the use of ACI as the foundation of the entire ecosystem of
mutually cooperating components of the Data Center.
In the ACI controller and directly from the devices, we also have classical
methods used for network monitoring, such as
SNMP
and Syslog. However, they only allow access to part of the information in
relation to the API, which allows access to all objects from over 16k+
classes of the object
model. In addition, through SNMP we do not have access to such information as
the application model or parameters of communication applications, because as
such they are not available from typical network devices, in which MIBs for
SNMP were used from.
Access to data via API greatly facilitates access to information, as there is
no need to use, for example, classical and unreliable methods of parsing the
results of commands. The model of data access via the
REST protocol includes one more important feature,
it allows some data processing to be transferred to the server side. This is
the same approach as we have in a typical database engine, where the SQL query
contains filters and
aggregators,
which are performed on the server side on local data, significantly improving
the search speed and reducing the amount of information needed to be sent.
The architecture of this application had a few things in mind:
The entire architecture has been designed in a microservice model to
facilitate software maintenance and development. The structure of the
application components and their dependencies are presented in the figure
below:
Data Poller - a component written in Python that connects to the APIC
controller, logs in and uses a REST query to retrieve data about the objects
specified in the configuration. This script also starts the HTTP server and
uses it to provide the retrieved value in the form of the Prometheus metric.
Configuration API - Component that provides API for configuration
management of a list of ACI monitoring classes. This configuration is stored
in the Redis database. This API is used by the GUI for configuration
management.
Configuration GUI - Simple GUI for creation and deletion of monitoring
metrics.
Prometheus - database optimized to store data in the form of
time-series. Its configuration specifies the access data to the server, to
which it will periodically (by default every 15s) send a query for the
current value of all available metrics. The collected data is stored in an
internal data format and a complex language is available to specify the
possible operations on it.
Grafana - is a popular tool for visualizing various types of data, which
can be downloaded from various backends, including Prometheus.
The tooling has been selected carefully to decide what would match the best
expectations.
Component | Solution | Description | Alternative solution |
---|---|---|---|
GUI | React with MUI | Powerful framework, yet simple to work with. Its simple to add even complex display logic. | Any modern framework could work fine. |
Database | Redis | There is only small amount of data that needs to be stored. Redis has very small requirements, yet suits the expectations. | Any database could be used, although in the future Redis has one great capability that it has built in pub/sub, so the application may subscribe to the changes easily. |
Database API | Flask | Easy to use and flexible web application development framework. | There are none as Flask is one of the most popular micro-framework for Python web application development in recent times. |
Dashboard & Monitoring | Prometheus and Grafana | Common tool for that purpose. | If Prometheus would not be needed, InfluxDB with Grafana could be a better choice. |
To ensure better code quality, there is set up CI/CD to:
It is possible to run and test the operation of the application using the
Cisco APIC Simulator
Sandbox.
It is a simulated infrastructure that provides full access to ACI RESTful APIs
over http(s)
with XML and JSON encodings. For ACI production infrastructure,
it is strongly recommended to create a dedicated user with RO permissions
only.
To get aci-monitoring up and running on Linux
or macOS
system run the
following commands.
git clone -b main https://github.com/SoftFlowTech/aci-monitoring.git cd aci-monitoring tee local.env <<EOF ACI_URL=https://apic-ip-address ACI_USERNAME=aciROuser ACI_PASSWORD=aciROuserPass EOF docker compose pull docker compose up -d
The whole application will be available after a few seconds.
Open the URL http://localhost:5006
in a web-browser.
You should see the GUI homepage.
In the top-right corner you can add new metrics to monitor.
Grafana is available at http://localhost:5001
. The
default credentials are:
Prometheus is also available at
http://localhost:5002
and can be used to write and
test queries about the collected data.
The container images are built and published in
DockerHub.
This project relies only on Docker and docker-compose meeting these requirements:
20.10.10
.1.28.0
.To check the version installed on your system run docker --version
and
docker-compose --version
.
To add new metrics, you should visit Configuration
GUI, and click "Add new" button.
There will appear simple form, where you may set:
Example configuration:
Alternatively, you may use "Advanced" button, to provide raw ACI query filter:
Below is a list of useful Cisco ACI classes to select from.
Class | Description |
---|---|
eqptEgrBytes5min |
current Egress Bytes stats in 5 minute |
eqptEgrDropPkts5min |
current Egress Drop Packets stats in 5 minute |
eqptEgrPkts5min |
current Egress Packets stats in 5 minute |
eqptFan |
Equipment Fan |
eqptFanStats |
current fan stats |
eqptFanStats5min |
current fan stats in 5 minute |
eqptFruPower5min |
current FRU power stats in 5 minute |
eqptIngrBytes5min |
current Ingress Bytes stats in 5 minute |
eqptIngrDropPkts5min |
current Ingress Drop Packets stats in 5 minute |
eqptIngrErrPkts5min |
current Ingress Error Packets stats in 5 minute |
eqptIngrPkts5min |
current Ingress Packets stats in 5 minute |
eqptPsPower5min |
current power supply stats in 5 minute |
eqptPsu |
Power Supply Unit |
eqptPsuSlot |
Power Supply Slot |
eqptTemp5min |
current temperature stats in 5 minute |
ethpmDOMStats |
Digital Optical Monitor, Transceiver details |
ethpmPhysIf |
Physical Interface Runtime State (ethpm) |
fabricNodeHealth5min |
current node health stats in 5 minute |
fabricOverallHealth5min |
current overall fabric health stats in 5 minute |
fvOverallHealth15min |
current overall tenant health stats in 15 minute |
l1PhysIf |
Layer 1 Physical Interface Configuration |
licenseManager |
License Manager |
procSysCPU5min |
current System cpu stats in 5 minute |
You can use any data visualization method available in Grafana. Data obtained
from Prometheus can be aggregated and statistical operations can be performed
on them. Sample chart:
aci-monitoring
is beta software, but it has already been used in production,
and it has an extensive test suite.
Help in testing, development, documentation and other tasks is highly
appreciated and useful to the project.
Copyright 2022 (c) SoftFlow.
Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License.
You may obtain a copy of the License at:
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.
Owner
Contributors
Categories
Products
Application Centric Infrastructure (ACI)Programming Languages
TypeScriptLicense
Code Exchange Community
Get help, share code, and collaborate with other developers in the Code Exchange community.View Community