This demo is built to showcase how you AI might assist you in troubleshooting network issues. This demo was presented at Cisco Developer Days 2024 and API Days Paris 2024. Check out the recording from Developer Days to see how this solution works.
The components used by this demo are:
10.10.20.50
, developer
/C1sco12345
)ncclient
wrapper I wrote as a netconf client for telegraf.20.10+
๐ณ
ncpeek
to pull telemetry data from network devices.chatgpt-4o-latest
is used. ๐Note
You might need to run the containers locally, the docker version of the sandbox needs to be updated.
When an alert is triggered in Grafana, a webhook is sent, prompting the LLM to initiate an analysis of the alert and establish connections with network devices to identify the root cause of the issue following a plan the LLM creates.
Once the initial analysis is complete, the LLM presents a concise summary of its findings to the users, along with actionable items.
For this demo one alarm was created.
if avgNeighbors(30sec) < avgNeighbors(30min) : send Alarmโ
When the average number of ISIS neighbors in a lapse of 30 second is less than the average number of ISIS neighbors in a lapse of 30 minutes, the alarm will trigger a webhook for the LLM.
This signal that a stable ISIS neighbor that was working on the last 30 minutes was lost, and allows to work with N
number of ISIS neighbors.
Environment variables are injected through the use of the Makefile on root of the project.
Important
For the demo to work, you must set the next environment variables.
OPENAI_API_KEY=<YOUR_OPENAI_API_KEY> WEBEX_TEAMS_ACCESS_TOKEN=<YOUR_TEAM_ACCESS_TOKEN> WEBEX_APPROVED_USERS_MAIL=<MAILS_OF_USERS_APPROVED_SEPARATED_BY_COMMAS> WEBEX_USERNAME=<YOUR_WEBEX_USERNAME> WEBEX_ROOM_ID=<THE_WEBEX_ROOM_ID>
Note
The webex variables are only needed if you interact with the LLM using webex. However you need to modify the python accordingly.
If you prefer to use another client, you need to:
To get your webex token go to https://developer.webex.com/docs/bots and create a bot.
To get the WEBEX_ROOM_ID
the easiest way is to open a room with your bot in the webex app. Once you have your room, you can get the WEBEX_ROOM_ID
by using API list room, use your token created before.
For testing, you can use the GRAFANA_WEB_HOOK
env var to send webhooks to other site, such as https://webhook.site/
If you have access to smith.langchain.com (recommended for view LLM operations) add your project ID and API key.
GRAFANA_WEB_HOOK=<WEB_HOOK_URL> LANGCHAIN_PROJECT=<YOUR_LANGCHAIN_PROJECT_ID> LANGCHAIN_API_KEY=<YOUR_LANGCHAIN_API_KEY> LANGCHAIN_TRACING_V2=true LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
The .env.local file is used to define all variables used by the containers.
In a production environment, this file should be kept out of version control using the .gitignore
file.
This demo uses a CML instance from the Cisco DevNet sandbox. You can also use a dedicated CML instance or a NSO sandbox. ๐๏ธ
After acquiring your sandbox, stop the default topology and wipe it out. ๐งน
Then, import the topology file used for this demo and start the lab.
The TIG stack requires Docker and IP reachability to the CML instance. For this demo, I used the sandbox VM 10.10.20.50
.
First time, build the TIG stack.
make build-tig
Subsequent runs of the TIG stack you can run the containers.
make run-tig
Telegraf
docker exec -it telegraf bash
then tail -F /tmp/telegraf-grpc.log
.
Influxdb
admin
/admin123
Grafana
admin
/admin
General > Network Telemetry
to see the grafana dashboard.The llm_agent directory provides the entry point for the application, the app file
The llm container runs on the sandbox VM 10.10.20.50
.
make run-llm
The demo involves shutting down one interface, causing an ISIS
failure, and allowing the LLM to diagnose the issue and implement a fix.
In the images below, GigabitEthernet5
was shutting down on cat8000-v0
resulting in losing its ISIS adjacency with cat8000-v2
You can watch the recorded demo here
Note
The recoding was done as a backup demo. It doesn't have audio or instructions.
On Grafana, you can observe the ISIS count decreasing and triggering an alarm.
Next, you will receive a webex notification from grafana and the LLM will receive the webhook. The webhook triggers the LLM to start looking at what the issue is and how to resolve it.
make run-tig
to destroy and create the TIG containers.Code Exchange Community
Get help, share code, and collaborate with other developers in the Code Exchange community.View Community