Cisco SASE Site Onboarding Troubleshooting & Monitoring Cookbook
Programmatic Observability across SD-WAN and Secure Access
Companion to the Cisco SASE Site Onboarding API Cookbook
Version 1.0 | April 2026 Target Audience: Ops/Monitoring Teams, MSPs, Observability Engineers
Cisco Confidential
Table of Contents
- Introduction
- Section 1: Proactive Monitoring & Alerting
- Section 2: Troubleshooting Site Connectivity Use Case
- Section 3: Cross-Platform Event Correlation
Introduction
This cookbook is a companion to the Cisco SASE Site Onboarding API Cookbook. While the onboarding cookbook covers the provisioning workflow (creating sites, tunnel profiles, configuration groups, and policy groups), this cookbook focuses on what happens after deployment: monitoring tunnel health, troubleshooting connectivity issues, and correlating events across both the SD-WAN and Secure Access platforms.
The cookbook is structured around three operational phases:
- Proactive Monitoring — Set up push-based alerting so you know about tunnel issues before users report them
- Troubleshooting — When something breaks, systematically investigate using APIs from both platforms
- Event Correlation — Example on how to build a unified cross-platform timeline for forensic analysis and SIEM integration
When to Use the API vs. the UI
The Secure Access dashboard and SD-WAN Manager UI provide excellent interactive tools for ad-hoc, single-site troubleshooting: visual tunnel health heatmaps, topology diagrams, drill-down alarm views. For one-off investigations, the UI is often the fastest path.
The API is the right tool when you need to:
- Monitor tunnel health across dozens or hundreds of sites (fleet-scale polling)
- Feed SASE telemetry into external observability platforms (Splunk, Datadog, Grafana, ServiceNow)
- Build automated runbooks (e.g., open a ticket when a tunnel goes Inactive)
- Correlate events across SD-WAN and Secure Access to build a unified incident timeline
- Perform post-incident forensics by querying time-bounded events from both platforms
Prerequisites
This cookbook assumes you have completed the authentication steps described in the SASE Site Onboarding API Cookbook. Specifically, you need:
- A valid SD-WAN Manager API session (JWT token + XSRF token, per Release 20.18+ JWT-based authentication)
- A valid Secure Access OAuth 2.0 Bearer token (client credentials flow)
- At least one onboarded site with deployed SSE tunnel profiles
For authentication details, refer to the Cisco SASE Site Onboarding API Workflow.
API Endpoints Used in This Cookbook
SD-WAN Manager API
| Purpose | Endpoint | Notes |
|---|---|---|
| SSE Tunnel Status | GET /dataservice/device/sig/getSigTunnelList?lastNHours={n} |
Tunnel state, HA pair, site mapping |
| Alarms | POST /dataservice/alarms |
Filterable by severity, time, site |
| Events | POST /dataservice/event |
Low-level state change notifications |
| Audit Log | POST /dataservice/auditlog |
Config change history |
| Device Status | GET /dataservice/device/system/status?deviceId={ip} |
CPU, memory, uptime |
| App-Route Stats | POST /dataservice/statistics/approute/aggregation |
Loss, latency, jitter, vQoE score per tunnel (aggregation query) |
Secure Access API
| Purpose | Endpoint | Notes |
|---|---|---|
| List NTGs + Status | GET /deployments/v2/networktunnelgroups?includeStatuses=true |
NTG config and tunnel states |
| Activity Search | GET /reports/v2/activity?from=...&to=... |
DNS, proxy, firewall events |
| Network Tunnel Logs | GET /reports/v2/networkTunnelLogs?from=...&to=... |
Tunnel up/down events |
| Activity Search (by type) | GET /reports/v2/activity?from=...&to=...&type={type} |
Filter by type=dns, proxy, firewall, etc. |
Section 1: Proactive Monitoring & Alerting
The goal of proactive monitoring is to know about tunnel issues before your users do. Both SD-WAN Manager and Secure Access support push-based notification mechanisms that alert you the moment a tunnel goes down — and auto-clear when it recovers.
This section covers how to configure proactive alerting on both platforms so that tunnel state changes trigger immediate notifications to your ops team, Slack channel, Webex space, ServiceNow instance, or any HTTP-capable endpoint.
1.1 SD-WAN Manager: Webhook Notifications for Alarms
SD-WAN Manager supports webhook notifications that send an HTTP POST request to an external system in real-time whenever an alarm is raised or cleared. This is the recommended approach for proactive monitoring — it eliminates the need to poll the alarm API.
Configuration Path: Monitor > Logs > Alarms > Alarm Notifications > Add Alarm Notification
Setup Steps
Step 1. Define the notification rule. Give it a descriptive name (e.g., "SSE Tunnel Alerts").
Step 2. Select alarm types. For SASE tunnel monitoring, select at minimum:
Cisco Secure Access— SSE tunnel credential / provider failureTracker_State_Change— L7 probe failure → traffic reroute to DIABFD_Site_Down/BFD_Site_Up— site-level overlay loss (context, not SSE-specific)Interface_State_Change— WAN link down on the edge deviceControl_Node_Down— control-plane lossCPU_Usage/Memory_Usage— edge device resource exhaustion
Step 3. Select object scope. Choose "All Devices" for fleet-wide coverage, or select specific devices for targeted monitoring.
Step 4. Select severity. Recommend Critical and Major at minimum.
Step 5. Configure the webhook delivery:
- Channel: Choose Slack, Webex, or Custom (for arbitrary HTTP endpoints)
- Webhook URL: The endpoint that will receive the HTTP POST
- Webhook Threshold: Maximum notifications per minute (e.g., 100 for production, lower for testing)
- For Custom webhooks (Release 20.16.1+): optionally configure authentication headers and credentials
Configuration via API
The Alarm Notification / Webhook configuration exposed at Monitor > Logs > Alarms > Alarm Notifications > Add Alarm Notification can also be provisioned via the SD-WAN Manager API. This is the preferred approach when provisioning notifications across many tenants or when integrating with infrastructure-as-code pipelines.
POST https://{gateway_url}/dataservice/notifications/rule
Headers:
Authorization: Bearer {jwt-token}
X-XSRF-TOKEN: {xsrf-token}
Content-Type: application/json
Example curl request:
curl -X POST \
"https://{gateway_url}/dataservice/notifications/rule" \
-H "Authorization: Bearer {jwt-token}" \
-H "X-XSRF-TOKEN: {xsrf-token}" \
-H "Content-Type: application/json" \
-d '{
"notificationRuleName": "SSE Tunnel Alerts - Webhook to SIEM",
"alarmName": "Cisco Secure Access",
"severity": "Medium",
"webhookUrl": "https://your-listener.example.com/sdwan-webhook",
"webhookUsername": "apiuser",
"webhookPassword": "{webhook-password}",
"devicesAttached": "1712133e-0246-4281-8152-3317b796d2bc",
"emailThreshold": 5,
"accountDetails": "noreply@example.com",
"updatedBy": "admin"
}'
Expected response: 202 Accepted with the created rule echoed back (password is returned in encrypted form).
Webhook Payload Format
When an alarm fires, SD-WAN Manager sends a JSON payload via HTTP POST. The key fields are:
{
"rule_name_display": "Tracker_State_Change",
"severity": "Critical",
"message": "SSE tunnel Tunnel16000001 is down on device LON-01",
"host_name": "LON-01",
"system_ip": "169.254.10.26",
"site_id": "3",
"entry_time": 1774981376892,
"deviceId": "10.255.1.10",
...
}
Note: The fields shown above are representative of SD-WAN webhook payloads. Specific fields and their format may vary across releases and alarm types — verify against your environment before writing automation against specific field names.
Integration Targets
| Target | Channel Type | Notes |
|---|---|---|
| Slack | Built-in | Use a Slack Incoming Webhook URL |
| Webex | Built-in | Use a Webex Incoming Webhook from apphub.webex.com |
| ServiceNow | Custom | Point to a Scripted REST API in ServiceNow |
| Splunk / SIEM | Custom | Point to an HEC (HTTP Event Collector) endpoint |
| Custom Runbook | Custom | Point to your own Flask/Express HTTP listener |
Note: The webhook receives notifications, not raw events. SD-WAN Manager correlates the related events into alarms before publishing the notifications to the HTTP target listener. This means you get deduplicated, severity-classified notifications rather than a flood of low-level events.
⚠ Warning: Do not use the real-time monitoring of devices APIs (
/dataservice/device/...) for continuous polling. These are designed for interactive, per-device use. For ongoing monitoring, use webhooks (push) or bulk statistics APIs (pull).
1.2 Secure Access: Alert Rules for Network Tunnel Groups
Secure Access provides a built-in Alert Rules feature that monitors the status of Network Tunnel Groups at regular intervals and sends notifications when alert conditions are met. Alert rules support both email and webhook delivery.
Configuration Path: Monitor > Management > Alert Rules > + Add Alert Rule
Available Alert Types for Network Tunnels
| Alert Type | Trigger Condition | Recommended Severity |
|---|---|---|
| Network tunnel group disconnected | "A Network Tunnel Group is disconnected" (prose) / "Network tunnel group disconnected" (the value that appears in the alertType field of the actual alert payload) |
Critical |
| Hub down in network tunnel group | A Hub is down in a Network Tunnel Group | Warning |
Setup Steps
Step 1. General Settings. Name the alert rule (e.g., "NTG Disconnected - All Sites") and set the severity.
Step 2. Alert Conditions. Configure which NTGs trigger the alert. Filter by NTG name (e.g., contains "LON") or region (e.g., "Europe"). Use "Require all conditions" or "Require any condition" logic.
Step 3. Notifications. Add email recipients (comma-separated list). When the alert fires, recipients receive an email with the alert name, severity, event time, type, conditions, and a direct link to view the alert in the Secure Access dashboard.
Step 4. Review and Save.
Webhook Delivery via Third-Party Integrations API
To send alert notifications to a webhook endpoint (for SIEM, Slack, or ServiceNow integration), configure a webhook as a Third-Party Integration using the Secure Access API:
Step A: Create the webhook integration
POST https://api.sse.cisco.com/admin/v2/integrations
Headers:
Authorization: Bearer {oauth-token}
Content-Type: application/json
Body:
{
"name": "SASE Monitoring Webhook",
"type": "webhook.v1",
"webhookConfig": {
"url": "https://your-listener.example.com/webhook",
"headers": ["Content-Type: application/json"]
}
}
Example curl request:
curl -X POST \
"https://api.sse.cisco.com/admin/v2/integrations" \
-H "Authorization: Bearer {oauth-token}" \
-H "Content-Type: application/json" \
-d '{
"name": "SASE Monitoring Webhook",
"type": "webhook.v1",
"webhookConfig": {
"url": "https://your-listener.example.com/webhook",
"headers": ["Content-Type: application/json"]
}
}'
Example response (excerpt):
{
"id": "3a9a27ea-d15e-4406-9b41-7b0a9ea1eac9",
"name": "SASE Monitoring Webhook",
"type": "webhook.v1",
"webhookConfig": {
"url": "https://your-listener.example.com/webhook",
"headers": ["Content-Type: application/json"]
},
"createdAt": "2026-04-21T10:00:00Z"
}
💡 Tip: Capture the
idvalue from the response — you will need it as{intId}in Step B to attach credentials to this integration.
Step B: Add Basic Auth credentials to the webhook
POST https://api.sse.cisco.com/admin/v2/integrations/{intId}/credentials
Headers:
Authorization: Bearer {oauth-token}
Content-Type: application/json
Body:
{
"name": "webhook-credential",
"type": "basic-auth",
"value": {
"username": "apiuser",
"password": "{webhook-password}"
}
}
Example curl request:
curl -X POST \
"https://api.sse.cisco.com/admin/v2/integrations/{intId}/credentials" \
-H "Authorization: Bearer {oauth-token}" \
-H "Content-Type: application/json" \
-d '{
"name": "webhook-credential",
"type": "basic-auth",
"value": {
"username": "apiuser",
"password": "{webhook-password}"
}
}'
Example response (excerpt):
{
"id": "cred-5c8d1a2b-...",
"integrationId": "3a9a27ea-d15e-4406-9b41-7b0a9ea1eac9",
"name": "webhook-credential",
"type": "basic-auth"
}
Note: The response does not echo the credential value (password). After creation, credentials cannot be retrieved — they can only be replaced.
1.3 End-to-End Proactive Architecture
For comprehensive proactive monitoring, configure webhooks on both platforms and route them to a shared listener or observability pipeline:
This architecture provides:
- Real-time detection: No polling delay. SD-WAN alerts arrive within seconds; Secure Access alerts arrive after a dampening window (to suppress transient events).
- Both sides of the story: If a tunnel flaps, you get a notification from SD-WAN (the device saw it go down) and from Secure Access (the cloud saw the peer disconnect), confirming the scope.
- Reduced MTTR: Ops teams are notified before users report the issue, enabling proactive response.
💡 Tip: Combine this proactive alerting layer with the Section 3 correlation script. When a webhook fires, trigger the correlation script for the affected site to immediately generate a cross-platform timeline — giving your ops team both the alert and the forensic context in one shot.
Section 2: Troubleshooting Site Connectivity Use Case
When proactive alerts fire — or when an issue is reported through other channels — the following workflow provides a systematic, API-driven investigation across both platforms.
Scenario: Site Reports Intermittent Internet Access
A branch site (site name: LON, site-id: 3) reports that users experience intermittent loss of internet connectivity. The site was onboarded using the SASE Site Onboarding workflow with SIA (Secure Internet Access) tunnel profiles. The following workflow walks through a systematic API-driven investigation.
Phase 1: Triage — Identify the Tunnel State from Both Sides
The first step is to determine whether the tunnels are healthy from both the SD-WAN and Secure Access perspectives. A mismatch between the two views is itself a critical diagnostic signal.
Step 1.1: Query SD-WAN SSE Tunnel Status
Use the SD-WAN Manager SIG tunnel list API to get the current state of all SSE tunnels. The lastNHours parameter controls the time window.
GET https://{gateway_url}/dataservice/device/sig/getSigTunnelList?lastNHours=24
Headers:
Authorization: Bearer {jwt-token}
X-XSRF-TOKEN: {xsrf-token}
Content-Type: application/json
Example curl request:
curl -X GET \
"https://{gateway_url}/dataservice/device/sig/getSigTunnelList?lastNHours=24" \
-H "Authorization: Bearer {jwt-token}" \
-H "X-XSRF-TOKEN: {xsrf-token}" \
-H "Content-Type: application/json"
Example response (excerpt):
{
"data": [
{
"device-state": "Up",
"tunnel-if-name": "Tunnel16000001",
"vdevice-name": "10.255.1.10",
"vdevice-host-name": "LON-01",
"tracker-state": "Up",
"site-id": "3",
"site-name": "LON",
"ha-pair": "active",
"tunnel-type": "IPSEC",
"tunnelType": "SSE-Public access",
"destination-data-center": "13.53.178.16",
"provider": "Cisco Secure Access",
"vmanage-system-ip": "169.254.10.26",
"tunnel-name": "C8K-12F26FD0-4AF0-61CD-...",
"sig-state": "Up",
"lastupdated": 1774477483191
}
],
"header": { }
}
Key fields to inspect in the response:
| Field | Values | Meaning |
|---|---|---|
device-state |
Up / Down | Device-level tunnel interface status |
sig-state |
Up / Down | SSE tunnel operational state |
tracker-state |
Up / Down | L7 health-check (HTTP probe) result |
ha-pair |
active / backup | HA role for this tunnel |
destination-data-center |
IP address | SSE PoP the tunnel connects to |
site-id / site-name |
3 / LON | Site identification for filtering |
💡 Tip: The response includes a large
headerblock with UI column metadata. Your code should parse only thedata[]array and discard theheaderobject.
Diagnostic logic:
- All three states are Up: Tunnel is healthy from the SD-WAN side. Proceed to check the SSE side.
sig-stateis Up buttracker-stateis Down: The IPsec tunnel is established but the L7 health probe is failing. When the tracker fails, the SD-WAN device automatically reroutes traffic to the default route in the service VPN. Depending on the network design, this means traffic takes an alternate path — typically DIA (Direct Internet Access) at the local router, or backhaul to a DC/Hub — and the security enforcement point shifts accordingly. This is a deliberate design choice rather than an outage: users retain internet access, but the traffic is no longer reaching Secure Access for enforcement. Confirm by checking Secure Access Activity Search (Step 3.1) — an affected site with a failing tracker will show zero activity from that site even though connectivity appears normal.device-stateis Down: The tunnel interface itself is down. Check the device health (Phase 4) and interface configuration.
Step 1.2: Query Secure Access NTG Tunnel Status
Now query the same tunnels from the Secure Access side. Use the includeStatuses=true parameter to get tunnel state information embedded in the NTG response.
GET https://api.sse.cisco.com/deployments/v2/networktunnelgroups?includeStatuses=true
Headers:
Authorization: Bearer {oauth-token}
Accept: application/json
Example curl request:
curl -X GET \
"https://api.sse.cisco.com/deployments/v2/networktunnelgroups?includeStatuses=true" \
-H "Authorization: Bearer {oauth-token}" \
-H "Accept: application/json"
Example response (excerpt):
{
"data": [
{
"id": 671740362,
"organizationId": 1234567,
"name": "C8K",
"region": "eu-north-1",
"deviceType": "Catalyst SDWAN",
"status": "connected",
"hubs": [
{
"id": 671740361,
"isPrimary": true,
"datacenter": { "name": "sse-eun-1-1-0" },
"status": { "status": "UP", "time": "2026-04-23T22:19:38Z" },
"tunnelsCount": 1,
"tunnelsStatus": [
{
"status": "UP",
"dcName": "Stockholm 1",
"dcDesc": "STOCKHOLM-1",
"ikeState": "ESTABLISHED",
"ipsecState": "INSTALLED",
"peerIp": "x.x.x.x",
"localIp": "x.x.x.x",
"data": { "bytesIn": "19860", "bytesOut": "21484",
"packetsIn": "186", "packetsOut": "216" }
}
]
},
{
"id": 671740363,
"isPrimary": false,
"datacenter": { "name": "sse-eun-1-1-1" },
"status": { "status": "UP", "time": "2026-04-23T22:18:58Z" },
"tunnelsCount": 1,
"tunnelsStatus": [
{ "status": "UP", "dcName": "Stockholm 2",
"localIp": "x.x.x.x", "peerIp": "x.x.x.x" }
]
}
]
}
],
"offset": 0, "limit": 10, "total": 5
}
Key fields to inspect:
name— The NTG identifier — exact-match join key to SD-WANtunnel-namestatus— NTG-level state:connectedordisconnectedhubs[].isPrimary— Identifies the primary (true) vs backup (false) hub — maps to SD-WANha-pairhubs[].status.status— Hub-level state (UP / DOWN)hubs[].tunnelsStatus[].status— Individual tunnel state (UP / DOWN)hubs[].tunnelsStatus[].localIp— Hub IP — matches SD-WANdestination-data-centerhubs[].tunnelsStatus[].peerIp— SD-WAN device's WAN-side IP (seen from the SSE side)hubs[].tunnelsStatus[].dcName/dcDesc— Human-readable datacenter name (e.g., Stockholm 1)hubs[].tunnelsStatus[].ikeState/ipsecState— Crypto state indicators (ESTABLISHED / INSTALLED)
Note: The response may also include NTGs with
tunnelsCount: 0andtunnelsStatus: []— these are typically pre-configured regional templates with no SD-WAN device currently connected. Filter these out when correlating unless you're specifically investigating provisioning gaps.
Step 1.3: Cross-Reference Both Views
The critical diagnostic value comes from comparing the two views. Use the cross-platform field mappings in Section 1.4 as the authoritative reference — the primary join key is SD-WAN tunnel-name ↔ Secure Access NTG name (exact string match). At the hub level, SD-WAN destination-data-center matches Secure Access hubs[].tunnelsStatus[].localIp. The table below shows the most common state combinations and their interpretation:
| SD-WAN Says (sig / tracker) | Secure Access Says (tunnelsStatus.status) | Diagnosis |
|---|---|---|
| Up / Up | UP | Healthy. Both sides report tunnel Up and passing L7 probes. |
| Up / Down | UP | L7 probe failing. Traffic rerouted to default route per network design — security enforcement point shifts accordingly. |
| Up / Up | DOWN | IPsec / credential mismatch. Check NTG config and device crypto state. |
| Down / Down | DOWN | Full outage. Check device & WAN link. |
| Up (active) + Down (backup) | UP (one hub) + DOWN (other) | A failover occurred. Investigate why primary failed (Step 4.2 — device health). |
Form a working hypothesis before proceeding. The table above gives you an initial diagnosis category — but it's a hypothesis, not a conclusion. Before moving to Phase 2, state explicitly what you believe the issue is and which evidence supports it. For example:
"SD-WAN reports tunnel Up with tracker Down; Secure Access reports the tunnel status UP on the primary hub. Working hypothesis: L7 probe is failing, device has rerouted traffic to the default route per network design, traffic is no longer reaching SSE. Phase 2 will confirm by (a) checking alarm history for
Tracker_State_Changetransitions (and, secondarily,Cisco Secure Accessalarms on the device) and (b) checking Activity Search for zero site activity during the incident window."
Then proceed to Phase 2 with targeted queries against the specific evidence your hypothesis needs — don't run every Phase 2 step indiscriminately.
1.4 Cross-Platform Field Reference
The same tunnel is represented differently on each platform. Use this table as the authoritative reference when joining data across SD-WAN Manager and Secure Access. All mappings below have been validated against real API responses from both platforms.
| Concept | SD-WAN Field | Secure Access Field | Notes |
|---|---|---|---|
| Tunnel identity (PRIMARY JOIN KEY) | tunnel-name |
name (at NTG root) |
Exact string match, e.g., C8K-3B06D343-6B6B-... |
| HA role | ha-pair ("active" / "backup") |
hubs[].isPrimary (true / false) |
Active maps to isPrimary=true |
| Hub / Datacenter IP (SECONDARY JOIN) | destination-data-center |
hubs[].tunnelsStatus[].localIp |
Exact IP match; narrows to specific primary/backup hub |
| SD-WAN device WAN IP | (implicit — device WAN IP) | hubs[].tunnelsStatus[].peerIp |
Seen from the SSE side |
| Site | site-id / site-name |
(no direct equivalent) | SSE has no site concept |
| Device | vdevice-name / vdevice-host-name |
(inferable via NTG name) | Device identifier |
| Tunnel state (up) | sig-state = "Up" |
hubs[].tunnelsStatus[].status = "UP" |
Case differs |
| L7 probe state | tracker-state |
(no equivalent) | SD-WAN-only concept |
| Datacenter display name | (only IP is returned) | hubs[].tunnelsStatus[].dcName / dcDesc |
e.g., "Stockholm 1" / "STOCKHOLM-1" |
| IKE / IPsec state | (derived from sig-state) |
tunnelsStatus[].ikeState / ipsecState |
"ESTABLISHED" / "INSTALLED" |
| Traffic counters | device-packets-in / device-packets-out |
tunnelsStatus[].data.bytesIn / bytesOut / packetsIn / packetsOut |
Different units on each side |
Practical join pattern: one SD-WAN device typically produces two rows in the SIG tunnel list (one active, one backup), both sharing the same tunnel-name. On the SSE side, that maps to one NTG with two hubs (primary + backup). Join on tunnel-name to identify the device, then join each SD-WAN row to a specific SSE hub using destination-data-center ↔ hubs[].tunnelsStatus[].localIp.
💡 Tip: This mapping is referenced throughout Section 2 (Troubleshooting) and Section 3 (Event Correlation). The correlation script in Section 3 implements these exact join keys; if you need to build your own automation, use this table as the source of truth for cross-platform field equivalence.
Phase 2: Investigate — What Happened and When?
Once you have identified a state mismatch or degradation, the next step is to reconstruct what happened by querying event and alarm streams from both platforms.
Step 2.1: Query SD-WAN Alarms for the Site
Use the POST-based alarm query API to retrieve alarms filtered to the affected site. Filter by site_id and a time window that covers the reported issue.
POST https://{gateway_url}/dataservice/alarms
Body:
{
"query": {
"condition": "AND",
"rules": [
{"value": ["24"], "field": "entry_time",
"type": "date", "operator": "last_n_hours"},
{"value": ["3"], "field": "site_id",
"type": "string", "operator": "in"}
]
},
"size": 100
}
Example curl request:
curl -X POST \
"https://{gateway_url}/dataservice/alarms" \
-H "Authorization: Bearer {jwt-token}" \
-H "X-XSRF-TOKEN: {xsrf-token}" \
-H "Content-Type: application/json" \
-d '{"query":{"condition":"AND","rules":[
{"value":["24"],"field":"entry_time","type":"date","operator":"last_n_hours"},
{"value":["3"],"field":"site_id","type":"string","operator":"in"}
]},"size":100}'
Example response (excerpt):
{
"data": [
{
"rule_name_display": "Cisco Secure Access",
"severity": "Critical",
"active": true,
"entry_time": 1774981376892,
"message": "SSE tunnel Tunnel16000001 is down on device LON-01",
"host_name": "LON-01",
"system_ip": "169.254.10.26",
"site_id": "3",
"devices": [{ "system-ip": "169.254.10.26" }],
"values": [{ "tunnel-name": "C8K-...", "tracker-state": "Down" }]
}
],
"header": { }
}
SSE-related alarm types to look for: Cisco Secure Access, Tracker_State_Change, BFD_Site_Down, BFD_Site_Up, Interface_State_Change, Control_Node_Down.
Step 2.2: Query SD-WAN Events for the Device
Events provide a lower-level view than alarms. Filter by device IP (system-ip) for the specific WAN edge router at the affected site.
POST https://{gateway_url}/dataservice/event
Body:
{
"query": {
"condition": "AND",
"rules": [
{"value": ["24"], "field": "entry_time",
"type": "date", "operator": "last_n_hours"},
{"value": ["169.254.10.26"], "field": "system_ip",
"type": "string", "operator": "in"}
]
},
"size": 500
}
Example curl request:
curl -X POST \
"https://{gateway_url}/dataservice/event" \
-H "Authorization: Bearer {jwt-token}" \
-H "X-XSRF-TOKEN: {xsrf-token}" \
-H "Content-Type: application/json" \
-d '{"query":{"condition":"AND","rules":[
{"value":["24"],"field":"entry_time","type":"date","operator":"last_n_hours"},
{"value":["169.254.10.26"],"field":"system_ip","type":"string","operator":"in"}
]},"size":500}'
Example response (excerpt):
{
"data": [
{
"system_ip": "169.254.10.26",
"vmanage_system_ip": "169.254.10.26",
"tenant": "default",
"device_type": "vedge",
"entry_time": 1666259117528,
"statcycletime": 1666259117528,
"eventname": "bfd-state-change",
"component": "BFD",
"severity_level": "major",
"host_name": "LON-01",
"event": {
"bfd-state-change": {
"src-ip": "10.0.5.11",
"dst-ip": "10.1.16.16",
"local-system-ip": "169.254.10.26",
"local-color": "lte",
"remote-system-ip": "172.16.255.16",
"remote-color": "lte",
"new-state": "down",
"proto": "ipsec",
"flap-reason": "na"
}
},
"details": "host-name=LON-01; src-ip=10.0.5.11; dst-ip=10.1.16.16; ...",
"id": "xh_I9IMBLPMz2to0aA3r"
}
],
"pageInfo": {
"startTime": "1589073783045",
"endTime": "1589072535795",
"count": 15
}
}
Note on events vs. alarms:
/dataservice/eventreturns lower-level raw events with names likebfd-state-change,memory-usage,interface-state-change./dataservice/alarms(Step 2.1) returns correlated, severity-classified alarms with rule names likeBFD_Node_Down,Cisco Secure Access, andTracker_State_Change. When troubleshooting, start with alarms for actionable signal and drill into events only when you need the lower-level device-generated trace.
💡 Tip: The
event.{eventname}sub-object contains the structured event payload with fields specific to that event type. Thedetailsfield is a human-readable serialization of the same data.
Step 2.3: Query Secure Access Network Tunnel Logs
Retrieve tunnel establishment and teardown events from the Secure Access Reporting API. These logs capture when tunnels transitioned between states. (For background on this endpoint, see the Onboarding Cookbook — Events for Network Tunnels.)
GET https://api.sse.cisco.com/reports/v2/networkTunnelLogs?from=-1days&to=now&limit=100
Headers:
Authorization: Bearer {oauth-token}
Accept: application/json
Example curl request:
curl -X GET \
"https://api.sse.cisco.com/reports/v2/networkTunnelLogs?from=-1days&to=now&limit=100" \
-H "Authorization: Bearer {oauth-token}" \
-H "Accept: application/json"
Example response (excerpt):
{
"data": [
{
"timestamp": 1774981376,
"serviceName": "IKE",
"level": "WARNING",
"message": "Alert: IKE message sent retransmission - peer not responding",
"networkTunnelGroupLabel": "C8K-8C05B196-508F-D706-C68E-8CF918AEC1FD",
"networkTunnelGroupId": 671740929
}
]
}
Note: The time range set by the
toandfromquery parameters cannot exceed 30 days. This is a hard limit across the Secure Access Reporting API endpoints (/reports/v2/networkTunnelLogs,/reports/v2/activity, and related endpoints). For longer-term data retention, enable logging to Amazon S3 via the Secure Access logging configuration.
Phase 3: Validate — Is Traffic Actually Flowing?
A tunnel may show as Active on both sides, but traffic might not be flowing correctly. Use the following APIs to verify actual traffic throughput.
Step 3.1: Query Secure Access Activity Search
Check whether Secure Access is processing traffic from the affected site by querying the Activity Search endpoint. If the tunnel is up but no activity appears, the issue is likely in the SD-WAN traffic policy (Application Priority & SLA configuration). (The Activity Search endpoint is introduced in the Onboarding Cookbook for post-deployment verification; here we use it to validate traffic flow during an active investigation.)
GET https://api.sse.cisco.com/reports/v2/activity?from=-60minutes&to=now&limit=50
Headers:
Authorization: Bearer {oauth-token}
Accept: application/json
Example curl request:
curl -X GET \
"https://api.sse.cisco.com/reports/v2/activity?from=-60minutes&to=now&limit=50" \
-H "Authorization: Bearer {oauth-token}" \
-H "Accept: application/json"
Example response (excerpt):
{
"data": [
{
"timestamp": 1774981400000,
"identities": [
{
"id": 12345,
"label": "LON-NTG",
"type": {
"id": 68,
"label": "Network Tunnel",
"type": "networkTunnel"
}
}
],
"verdict": "allowed",
"type": "dns",
"internalip": "10.1.1.42",
"externalip": "203.0.113.88",
"domain": "example.com"
}
]
}
Look for activity entries whose identity matches the site. Traffic arriving through the tunnel will show the site identity associated with the NTG.
Step 3.2: Query SD-WAN App-Route Statistics
Retrieve loss, latency, jitter, and vQoE score metrics for the tunnel interfaces to determine whether the underlay path quality is contributing to the issue. This endpoint uses an aggregation query — specify the fields to aggregate by (typically name for per-tunnel results) and the metrics to compute (averaged loss, latency, jitter, vQoE score).
POST https://{gateway_url}/dataservice/statistics/approute/aggregation
Headers:
Authorization: Bearer {jwt-token}
X-XSRF-TOKEN: {xsrf-token}
Content-Type: application/json
Body:
{
"query": {
"condition": "AND",
"rules": [
{"value": ["1"], "field": "entry_time",
"type": "date", "operator": "last_n_hours"},
{"value": ["169.254.10.26"], "field": "local_system_ip",
"type": "string", "operator": "in"}
]
},
"aggregation": {
"field": [
{"property": "name", "sequence": 1, "size": 6000}
],
"metrics": [
{"property": "loss_percentage", "type": "avg"},
{"property": "latency", "type": "avg"},
{"property": "jitter", "type": "avg"},
{"property": "vqoe_score", "type": "avg"}
]
}
}
Example curl request:
curl -X POST \
"https://{gateway_url}/dataservice/statistics/approute/aggregation" \
-H "Authorization: Bearer {jwt-token}" \
-H "X-XSRF-TOKEN: {xsrf-token}" \
-H "Content-Type: application/json" \
-d '{
"query": {
"condition": "AND",
"rules": [
{"value":["1"],"field":"entry_time","type":"date","operator":"last_n_hours"},
{"value":["169.254.10.26"],"field":"local_system_ip","type":"string","operator":"in"}
]
},
"aggregation": {
"field": [{"property":"name","sequence":1,"size":6000}],
"metrics": [
{"property":"loss_percentage","type":"avg"},
{"property":"latency","type":"avg"},
{"property":"jitter","type":"avg"},
{"property":"vqoe_score","type":"avg"}
]
}
}'
Example response (excerpt):
{
"data": [
{
"name": "169.254.10.26:biz-internet-13.53.178.16:sig",
"vqoe_score": 10,
"latency": 19.3356,
"loss_percentage": 0,
"jitter": 9.02238
},
{
"name": "169.254.10.26:biz-internet-13.51.243.70:sig",
"vqoe_score": 9,
"latency": 24.12,
"loss_percentage": 0.2,
"jitter": 11.45
}
],
"header": { }
}
Phase 4: Root Cause — What Changed?
Many tunnel issues are caused by recent configuration changes or device resource exhaustion. These APIs help identify the root cause.
Step 4.1: Query SD-WAN Audit Log
Check whether a configuration change was pushed shortly before the issue started. Audit logs capture template pushes, policy deployments, and other administrative actions.
POST https://{gateway_url}/dataservice/auditlog
Body:
{
"query": {
"condition": "AND",
"rules": [
{"value": ["24"], "field": "entry_time",
"type": "date", "operator": "last_n_hours"}
]
},
"size": 100
}
Example curl request:
curl -X POST \
"https://{gateway_url}/dataservice/auditlog" \
-H "Authorization: Bearer {jwt-token}" \
-H "X-XSRF-TOKEN: {xsrf-token}" \
-H "Content-Type: application/json" \
-d '{"query":{"condition":"AND","rules":[
{"value":["24"],"field":"entry_time","type":"date","operator":"last_n_hours"}
]},"size":100}'
Example response (excerpt):
{
"data": [
{
"entry_time": 1659003590292,
"statcycletime": 1659003590292,
"tenant": "default",
"logid": "918c6d64-729c-4304-b2c7-955a7c0e6a61",
"logmodule": "user",
"logfeature": "user",
"logdeviceid": "172.16.255.22",
"loguser": "viptela-device-901d032c-0721-438e-82ba-1d429598d879",
"logusersrcip": "172.16.255.19",
"logmessage": "Invalidated session due to server session idle time out for User: viptela-device-901d032c-0721-438e-82ba-1d429598d879",
"auditdetails": [
"[22-Aug-2022 10:19:50 UTC] Session invalidated due to idle timeout"
],
"id": "iWJRRIIBMbLIyWbe1qH"
}
],
"header": {
"generatedOn": 1655622555494,
"columns": [ ],
"fields": [ ]
}
}
Key fields:
entry_time— epoch ms timestamp for temporal correlation with the incident windowlogmodule/logfeature— categorize the action (e.g.,user,template,policy,device)loguser— user identity that initiated the actionlogusersrcip— source IP the action was initiated fromlogmessage— primary human-readable description of the actionauditdetails— array of timestamped detail lines for deeper forensic review
When investigating a tunnel issue, look for entries whose entry_time falls within or just before the incident window, and filter logmodule / logfeature for relevant categories (template deployments, policy changes, device modifications). The logmessage field typically contains enough context to understand the action; auditdetails provides the audit trail if deeper investigation is needed.
Step 4.2: Check Device Health
High CPU or memory usage on the WAN edge device can cause tunnel instability. Query the device system status to check resource utilization.
GET https://{gateway_url}/dataservice/device/system/status?deviceId=169.254.10.26
Headers:
Authorization: Bearer {jwt-token}
X-XSRF-TOKEN: {xsrf-token}
Example curl request:
curl -X GET \
"https://{gateway_url}/dataservice/device/system/status?deviceId=169.254.10.26" \
-H "Authorization: Bearer {jwt-token}" \
-H "X-XSRF-TOKEN: {xsrf-token}"
Example response (excerpt):
{
"data": [
{
"vdevice-name": "10.255.1.10",
"vdevice-host-name": "LON-01",
"cpu_user": 62.4,
"cpu_system": 8.1,
"mem_used": 3942000,
"mem_total": 8388608,
"disk_avail": 62.0,
"uptime": "12d 4h 18m"
}
]
}
Key fields: cpu_user, mem_used, mem_total, disk_avail.
SD-WAN Manager publishes WAN Edge health thresholds that classify devices as Good / Fair / Poor based on CPU and memory utilization:
| Health | Cisco WAN Edge (IOS-XE) | Cisco vEdge |
|---|---|---|
| Good | CPU < 80%, Memory < 88% | CPU < 75%, Memory < 75% |
| Fair | CPU ≥ 80%, Memory ≥ 88% | CPU ≥ 75%, Memory ≥ 75% |
| Poor | CPU ≥ 90%, Memory ≥ 93% | CPU ≥ 90%, Memory ≥ 90% |
Use these as a starting point for programmatic health checks. A device in the "Fair" range warrants investigation as a contributing factor to tunnel instability; "Poor" is a likely direct cause. For continuous monitoring, configure the corresponding resource-exhaustion alarm rules via Alarm Notifications (Section 1.1) to push threshold breaches to your webhook listener.
Section 3: Cross-Platform Event Correlation
Why Event Correlation Matters
The SD-WAN Manager and Secure Access dashboards each provide their own view of tunnel events.
Building a unified cross-platform timeline requires combining events from both the SD-WAN Manager and Secure Access APIs, which this section demonstrates through a working Python script.
Correlation Architecture
Join strategy:
- Primary join key: SD-WAN
tunnel-name↔ Secure Access NTGname(exact match) - Hub-level join: SD-WAN
destination-data-center↔ SSEhubs[].tunnelsStatus[].localIp - HA role join: SD-WAN
ha-pair(active/backup) ↔ SSEhubs[].isPrimary(true/false)
See Section 3 for the complete cross-platform field reference. The correlation script uses these mappings to build a per-tunnel unified view rather than merging events into a flat timeline.
Sample Code: Correlated Event Timeline
A working Python implementation is available as a companion file. It authenticates to both platforms, retrieves events within a configurable time window, joins them by tunnel using the field mappings described above, and prints a per-tunnel correlated report.
Note: This script is provided as a reference implementation. Adapt the authentication, error handling, and output format to your environment.
Example Output
When run against a lab with three onboarded sites, the per-tunnel correlated report produces output like this:
================================================================================
SASE CORRELATED TUNNEL REPORT
Site: 3 | Window: last 24h | Tunnels: 3
================================================================================
--------------------------------------------------------------------------------
TUNNEL: C8K-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXX1
Device: R-SITE1 (10.0.0.1) Site: 1 (SITE1)
Status: [HEALTHY]
Diagnosis: Both sides report the tunnel Up and passing L7 probes.
ROLE SD-WAN Secure Access
ACTIVE sig=Up tracker=Up dc=<DC-IP-A> UP <Region> DC 1 (<DC-IP-A>)
BACKUP sig=Up tracker=Up dc=<DC-IP-B> UP <Region> DC 2 (<DC-IP-B>)
Events in window: (none)
--------------------------------------------------------------------------------
TUNNEL: C8K-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXX2
Device: R-SITE2 (10.0.0.5) Site: 5 (SITE2)
Status: [HEALTHY]
Diagnosis: Both sides report the tunnel Up and passing L7 probes.
ROLE SD-WAN Secure Access
ACTIVE sig=Up tracker=Up dc=<DC-IP-C> UP <Region> DC 3 (<DC-IP-C>)
BACKUP sig=Up tracker=Up dc=<DC-IP-D> UP <Region> DC 4 (<DC-IP-D>)
Events in window (1):
09:18:13 UTC [SSE] WARNING IKE Alert: IKE message sent retransm
--------------------------------------------------------------------------------
TUNNEL: C8K-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXX3
Device: R-SITE3 (10.0.0.2) Site: 2 (SITE3)
Status: [HEALTHY]
Diagnosis: Both sides report the tunnel Up and passing L7 probes.
ROLE SD-WAN Secure Access
ACTIVE sig=Up tracker=Up dc=<DC-IP-A> UP <Region> DC 1 (<DC-IP-A>)
BACKUP sig=Up tracker=Up dc=<DC-IP-B> UP <Region> DC 2 (<DC-IP-B>)
Events in window: (none)
================================================================================
Note how the report groups by tunnel rather than producing a flat timeline. For each tunnel, both platforms' views are presented side-by-side, followed by a status label and diagnosis derived from the cross-reference logic in Section 1.3. Events within the query window are attached to the tunnels they reference, so the ops engineer sees not just what happened, but which tunnel was affected and what the current state of both sides looks like.