Cisco SASE Site Onboarding Troubleshooting & Monitoring Cookbook

Programmatic Observability across SD-WAN and Secure Access

Companion to the Cisco SASE Site Onboarding API Cookbook

Version 1.0 | April 2026 Target Audience: Ops/Monitoring Teams, MSPs, Observability Engineers

Cisco Confidential

Introduction
Section 1: Proactive Monitoring & Alerting
Section 2: Troubleshooting Site Connectivity Use Case
Section 3: Cross-Platform Event Correlation

Introduction

This cookbook is a companion to the Cisco SASE Site Onboarding API Cookbook. While the onboarding cookbook covers the provisioning workflow (creating sites, tunnel profiles, configuration groups, and policy groups), this cookbook focuses on what happens after deployment: monitoring tunnel health, troubleshooting connectivity issues, and correlating events across both the SD-WAN and Secure Access platforms.

The cookbook is structured around three operational phases:

Proactive Monitoring — Set up push-based alerting so you know about tunnel issues before users report them
Troubleshooting — When something breaks, systematically investigate using APIs from both platforms
Event Correlation — Example on how to build a unified cross-platform timeline for forensic analysis and SIEM integration

When to Use the API vs. the UI

The Secure Access dashboard and SD-WAN Manager UI provide excellent interactive tools for ad-hoc, single-site troubleshooting: visual tunnel health heatmaps, topology diagrams, drill-down alarm views. For one-off investigations, the UI is often the fastest path.

The API is the right tool when you need to:

Monitor tunnel health across dozens or hundreds of sites (fleet-scale polling)
Feed SASE telemetry into external observability platforms (Splunk, Datadog, Grafana, ServiceNow)
Build automated runbooks (e.g., open a ticket when a tunnel goes Inactive)
Correlate events across SD-WAN and Secure Access to build a unified incident timeline
Perform post-incident forensics by querying time-bounded events from both platforms

Prerequisites

This cookbook assumes you have completed the authentication steps described in the SASE Site Onboarding API Cookbook. Specifically, you need:

A valid SD-WAN Manager API session (JWT token + XSRF token, per Release 20.18+ JWT-based authentication)
A valid Secure Access OAuth 2.0 Bearer token (client credentials flow)
At least one onboarded site with deployed SSE tunnel profiles

For authentication details, refer to the Cisco SASE Site Onboarding API Workflow.

API Endpoints Used in This Cookbook

SD-WAN Manager API

Purpose	Endpoint	Notes
SSE Tunnel Status	`GET /dataservice/device/sig/getSigTunnelList?lastNHours={n}`	Tunnel state, HA pair, site mapping
Alarms	`POST /dataservice/alarms`	Filterable by severity, time, site
Events	`POST /dataservice/event`	Low-level state change notifications
Audit Log	`POST /dataservice/auditlog`	Config change history
Device Status	`GET /dataservice/device/system/status?deviceId={ip}`	CPU, memory, uptime
App-Route Stats	`POST /dataservice/statistics/approute/aggregation`	Loss, latency, jitter, vQoE score per tunnel (aggregation query)

Secure Access API

Purpose	Endpoint	Notes
List NTGs + Status	`GET /deployments/v2/networktunnelgroups?includeStatuses=true`	NTG config and tunnel states
Activity Search	`GET /reports/v2/activity?from=...&to=...`	DNS, proxy, firewall events
Network Tunnel Logs	`GET /reports/v2/networkTunnelLogs?from=...&to=...`	Tunnel up/down events
Activity Search (by type)	`GET /reports/v2/activity?from=...&to=...&type={type}`	Filter by `type=dns`, `proxy`, `firewall`, etc.

Section 1: Proactive Monitoring & Alerting

The goal of proactive monitoring is to know about tunnel issues before your users do. Both SD-WAN Manager and Secure Access support push-based notification mechanisms that alert you the moment a tunnel goes down — and auto-clear when it recovers.

This section covers how to configure proactive alerting on both platforms so that tunnel state changes trigger immediate notifications to your ops team, Slack channel, Webex space, ServiceNow instance, or any HTTP-capable endpoint.

1.1 SD-WAN Manager: Webhook Notifications for Alarms

SD-WAN Manager supports webhook notifications that send an HTTP POST request to an external system in real-time whenever an alarm is raised or cleared. This is the recommended approach for proactive monitoring — it eliminates the need to poll the alarm API.

Configuration Path: Monitor > Logs > Alarms > Alarm Notifications > Add Alarm Notification

Setup Steps

Step 1. Define the notification rule. Give it a descriptive name (e.g., "SSE Tunnel Alerts").

Step 2. Select alarm types. For SASE tunnel monitoring, select at minimum:

Cisco Secure Access — SSE tunnel credential / provider failure
Tracker_State_Change — L7 probe failure → traffic reroute to DIA
BFD_Site_Down / BFD_Site_Up — site-level overlay loss (context, not SSE-specific)
Interface_State_Change — WAN link down on the edge device
Control_Node_Down — control-plane loss
CPU_Usage / Memory_Usage — edge device resource exhaustion

Step 3. Select object scope. Choose "All Devices" for fleet-wide coverage, or select specific devices for targeted monitoring.

Step 4. Select severity. Recommend Critical and Major at minimum.

Step 5. Configure the webhook delivery:

Channel: Choose Slack, Webex, or Custom (for arbitrary HTTP endpoints)
Webhook URL: The endpoint that will receive the HTTP POST
Webhook Threshold: Maximum notifications per minute (e.g., 100 for production, lower for testing)
For Custom webhooks (Release 20.16.1+): optionally configure authentication headers and credentials

Configuration via API

The Alarm Notification / Webhook configuration exposed at Monitor > Logs > Alarms > Alarm Notifications > Add Alarm Notification can also be provisioned via the SD-WAN Manager API. This is the preferred approach when provisioning notifications across many tenants or when integrating with infrastructure-as-code pipelines.

POST https://{gateway_url}/dataservice/notifications/rule
Headers:
  Authorization: Bearer {jwt-token}
  X-XSRF-TOKEN: {xsrf-token}
  Content-Type: application/json

Example curl request:

curl -X POST \
  "https://{gateway_url}/dataservice/notifications/rule" \
  -H "Authorization: Bearer {jwt-token}" \
  -H "X-XSRF-TOKEN: {xsrf-token}" \
  -H "Content-Type: application/json" \
  -d '{
    "notificationRuleName": "SSE Tunnel Alerts - Webhook to SIEM",
    "alarmName": "Cisco Secure Access",
    "severity": "Medium",
    "webhookUrl": "https://your-listener.example.com/sdwan-webhook",
    "webhookUsername": "apiuser",
    "webhookPassword": "{webhook-password}",
    "devicesAttached": "1712133e-0246-4281-8152-3317b796d2bc",
    "emailThreshold": 5,
    "accountDetails": "noreply@example.com",
    "updatedBy": "admin"
  }'

Expected response: 202 Accepted with the created rule echoed back (password is returned in encrypted form).

Webhook Payload Format

When an alarm fires, SD-WAN Manager sends a JSON payload via HTTP POST. The key fields are:

{
  "rule_name_display": "Tracker_State_Change",
  "severity": "Critical",
  "message": "SSE tunnel Tunnel16000001 is down on device LON-01",
  "host_name": "LON-01",
  "system_ip": "169.254.10.26",
  "site_id": "3",
  "entry_time": 1774981376892,
  "deviceId": "10.255.1.10",
  ...
}

Note: The fields shown above are representative of SD-WAN webhook payloads. Specific fields and their format may vary across releases and alarm types — verify against your environment before writing automation against specific field names.

Integration Targets

Target	Channel Type	Notes
Slack	Built-in	Use a Slack Incoming Webhook URL
Webex	Built-in	Use a Webex Incoming Webhook from `apphub.webex.com`
ServiceNow	Custom	Point to a Scripted REST API in ServiceNow
Splunk / SIEM	Custom	Point to an HEC (HTTP Event Collector) endpoint
Custom Runbook	Custom	Point to your own Flask/Express HTTP listener

Note: The webhook receives notifications, not raw events. SD-WAN Manager correlates the related events into alarms before publishing the notifications to the HTTP target listener. This means you get deduplicated, severity-classified notifications rather than a flood of low-level events.

⚠ Warning: Do not use the real-time monitoring of devices APIs (/dataservice/device/...) for continuous polling. These are designed for interactive, per-device use. For ongoing monitoring, use webhooks (push) or bulk statistics APIs (pull).

1.2 Secure Access: Alert Rules for Network Tunnel Groups

Secure Access provides a built-in Alert Rules feature that monitors the status of Network Tunnel Groups at regular intervals and sends notifications when alert conditions are met. Alert rules support both email and webhook delivery.

Configuration Path: Monitor > Management > Alert Rules > + Add Alert Rule

Available Alert Types for Network Tunnels

Alert Type	Trigger Condition	Recommended Severity
Network tunnel group disconnected	"A Network Tunnel Group is disconnected" (prose) / `"Network tunnel group disconnected"` (the value that appears in the `alertType` field of the actual alert payload)	Critical
Hub down in network tunnel group	A Hub is down in a Network Tunnel Group	Warning

Setup Steps

Step 1. General Settings. Name the alert rule (e.g., "NTG Disconnected - All Sites") and set the severity.

Step 2. Alert Conditions. Configure which NTGs trigger the alert. Filter by NTG name (e.g., contains "LON") or region (e.g., "Europe"). Use "Require all conditions" or "Require any condition" logic.

Step 3. Notifications. Add email recipients (comma-separated list). When the alert fires, recipients receive an email with the alert name, severity, event time, type, conditions, and a direct link to view the alert in the Secure Access dashboard.

Step 4. Review and Save.

Webhook Delivery via Third-Party Integrations API

To send alert notifications to a webhook endpoint (for SIEM, Slack, or ServiceNow integration), configure a webhook as a Third-Party Integration using the Secure Access API:

Step A: Create the webhook integration

POST https://api.sse.cisco.com/admin/v2/integrations
Headers:
  Authorization: Bearer {oauth-token}
  Content-Type: application/json

Body:

{
  "name": "SASE Monitoring Webhook",
  "type": "webhook.v1",
  "webhookConfig": {
    "url": "https://your-listener.example.com/webhook",
    "headers": ["Content-Type: application/json"]
  }
}

Example curl request:

curl -X POST \
  "https://api.sse.cisco.com/admin/v2/integrations" \
  -H "Authorization: Bearer {oauth-token}" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "SASE Monitoring Webhook",
    "type": "webhook.v1",
    "webhookConfig": {
      "url": "https://your-listener.example.com/webhook",
      "headers": ["Content-Type: application/json"]
    }
  }'

Example response (excerpt):

{
  "id": "3a9a27ea-d15e-4406-9b41-7b0a9ea1eac9",
  "name": "SASE Monitoring Webhook",
  "type": "webhook.v1",
  "webhookConfig": {
    "url": "https://your-listener.example.com/webhook",
    "headers": ["Content-Type: application/json"]
  },
  "createdAt": "2026-04-21T10:00:00Z"
}

💡 Tip: Capture the id value from the response — you will need it as {intId} in Step B to attach credentials to this integration.

Step B: Add Basic Auth credentials to the webhook

POST https://api.sse.cisco.com/admin/v2/integrations/{intId}/credentials
Headers:
  Authorization: Bearer {oauth-token}
  Content-Type: application/json

Body:

{
  "name": "webhook-credential",
  "type": "basic-auth",
  "value": {
    "username": "apiuser",
    "password": "{webhook-password}"
  }
}

Example curl request:

curl -X POST \
  "https://api.sse.cisco.com/admin/v2/integrations/{intId}/credentials" \
  -H "Authorization: Bearer {oauth-token}" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "webhook-credential",
    "type": "basic-auth",
    "value": {
      "username": "apiuser",
      "password": "{webhook-password}"
    }
  }'

Example response (excerpt):

{
  "id": "cred-5c8d1a2b-...",
  "integrationId": "3a9a27ea-d15e-4406-9b41-7b0a9ea1eac9",
  "name": "webhook-credential",
  "type": "basic-auth"
}

Note: The response does not echo the credential value (password). After creation, credentials cannot be retrieved — they can only be replaced.

1.3 End-to-End Proactive Architecture

For comprehensive proactive monitoring, configure webhooks on both platforms and route them to a shared listener or observability pipeline:

This architecture provides:

Real-time detection: No polling delay. SD-WAN alerts arrive within seconds; Secure Access alerts arrive after a dampening window (to suppress transient events).
Both sides of the story: If a tunnel flaps, you get a notification from SD-WAN (the device saw it go down) and from Secure Access (the cloud saw the peer disconnect), confirming the scope.
Reduced MTTR: Ops teams are notified before users report the issue, enabling proactive response.

💡 Tip: Combine this proactive alerting layer with the Section 3 correlation script. When a webhook fires, trigger the correlation script for the affected site to immediately generate a cross-platform timeline — giving your ops team both the alert and the forensic context in one shot.

Section 2: Troubleshooting Site Connectivity Use Case

When proactive alerts fire — or when an issue is reported through other channels — the following workflow provides a systematic, API-driven investigation across both platforms.

Scenario: Site Reports Intermittent Internet Access

A branch site (site name: LON, site-id: 3) reports that users experience intermittent loss of internet connectivity. The site was onboarded using the SASE Site Onboarding workflow with SIA (Secure Internet Access) tunnel profiles. The following workflow walks through a systematic API-driven investigation.

Phase 1: Triage — Identify the Tunnel State from Both Sides

The first step is to determine whether the tunnels are healthy from both the SD-WAN and Secure Access perspectives. A mismatch between the two views is itself a critical diagnostic signal.

Step 1.1: Query SD-WAN SSE Tunnel Status

Use the SD-WAN Manager SIG tunnel list API to get the current state of all SSE tunnels. The lastNHours parameter controls the time window.

GET https://{gateway_url}/dataservice/device/sig/getSigTunnelList?lastNHours=24
Headers:
  Authorization: Bearer {jwt-token}
  X-XSRF-TOKEN: {xsrf-token}
  Content-Type: application/json

Example curl request:

curl -X GET \
  "https://{gateway_url}/dataservice/device/sig/getSigTunnelList?lastNHours=24" \
  -H "Authorization: Bearer {jwt-token}" \
  -H "X-XSRF-TOKEN: {xsrf-token}" \
  -H "Content-Type: application/json"

Example response (excerpt):

{
  "data": [
    {
      "device-state": "Up",
      "tunnel-if-name": "Tunnel16000001",
      "vdevice-name": "10.255.1.10",
      "vdevice-host-name": "LON-01",
      "tracker-state": "Up",
      "site-id": "3",
      "site-name": "LON",
      "ha-pair": "active",
      "tunnel-type": "IPSEC",
      "tunnelType": "SSE-Public access",
      "destination-data-center": "13.53.178.16",
      "provider": "Cisco Secure Access",
      "vmanage-system-ip": "169.254.10.26",
      "tunnel-name": "C8K-12F26FD0-4AF0-61CD-...",
      "sig-state": "Up",
      "lastupdated": 1774477483191
    }
  ],
  "header": { }
}

Key fields to inspect in the response:

Field	Values	Meaning
`device-state`	Up / Down	Device-level tunnel interface status
`sig-state`	Up / Down	SSE tunnel operational state
`tracker-state`	Up / Down	L7 health-check (HTTP probe) result
`ha-pair`	active / backup	HA role for this tunnel
`destination-data-center`	IP address	SSE PoP the tunnel connects to
`site-id` / `site-name`	3 / LON	Site identification for filtering

💡 Tip: The response includes a large header block with UI column metadata. Your code should parse only the data[] array and discard the header object.

Diagnostic logic:

All three states are Up: Tunnel is healthy from the SD-WAN side. Proceed to check the SSE side.
sig-state is Up but tracker-state is Down: The IPsec tunnel is established but the L7 health probe is failing. When the tracker fails, the SD-WAN device automatically reroutes traffic to the default route in the service VPN. Depending on the network design, this means traffic takes an alternate path — typically DIA (Direct Internet Access) at the local router, or backhaul to a DC/Hub — and the security enforcement point shifts accordingly. This is a deliberate design choice rather than an outage: users retain internet access, but the traffic is no longer reaching Secure Access for enforcement. Confirm by checking Secure Access Activity Search (Step 3.1) — an affected site with a failing tracker will show zero activity from that site even though connectivity appears normal.
device-state is Down: The tunnel interface itself is down. Check the device health (Phase 4) and interface configuration.

Step 1.2: Query Secure Access NTG Tunnel Status

Now query the same tunnels from the Secure Access side. Use the includeStatuses=true parameter to get tunnel state information embedded in the NTG response.

GET https://api.sse.cisco.com/deployments/v2/networktunnelgroups?includeStatuses=true
Headers:
  Authorization: Bearer {oauth-token}
  Accept: application/json

Example curl request:

curl -X GET \
  "https://api.sse.cisco.com/deployments/v2/networktunnelgroups?includeStatuses=true" \
  -H "Authorization: Bearer {oauth-token}" \
  -H "Accept: application/json"

Example response (excerpt):

{
  "data": [
    {
      "id": 671740362,
      "organizationId": 1234567,
      "name": "C8K",
      "region": "eu-north-1",
      "deviceType": "Catalyst SDWAN",
      "status": "connected",
      "hubs": [
        {
          "id": 671740361,
          "isPrimary": true,
          "datacenter": { "name": "sse-eun-1-1-0" },
          "status": { "status": "UP", "time": "2026-04-23T22:19:38Z" },
          "tunnelsCount": 1,
          "tunnelsStatus": [
            {
              "status": "UP",
              "dcName": "Stockholm 1",
              "dcDesc": "STOCKHOLM-1",
              "ikeState": "ESTABLISHED",
              "ipsecState": "INSTALLED",
              "peerIp": "x.x.x.x",
              "localIp": "x.x.x.x",
              "data": { "bytesIn": "19860", "bytesOut": "21484",
                        "packetsIn": "186", "packetsOut": "216" }
            }
          ]
        },
        {
          "id": 671740363,
          "isPrimary": false,
          "datacenter": { "name": "sse-eun-1-1-1" },
          "status": { "status": "UP", "time": "2026-04-23T22:18:58Z" },
          "tunnelsCount": 1,
          "tunnelsStatus": [
            { "status": "UP", "dcName": "Stockholm 2",
              "localIp": "x.x.x.x", "peerIp": "x.x.x.x" }
          ]
        }
      ]
    }
  ],
  "offset": 0, "limit": 10, "total": 5
}

Key fields to inspect:

name — The NTG identifier — exact-match join key to SD-WAN tunnel-name
status — NTG-level state: connected or disconnected
hubs[].isPrimary — Identifies the primary (true) vs backup (false) hub — maps to SD-WAN ha-pair
hubs[].status.status — Hub-level state (UP / DOWN)
hubs[].tunnelsStatus[].status — Individual tunnel state (UP / DOWN)
hubs[].tunnelsStatus[].localIp — Hub IP — matches SD-WAN destination-data-center
hubs[].tunnelsStatus[].peerIp — SD-WAN device's WAN-side IP (seen from the SSE side)
hubs[].tunnelsStatus[].dcName / dcDesc — Human-readable datacenter name (e.g., Stockholm 1)
hubs[].tunnelsStatus[].ikeState / ipsecState — Crypto state indicators (ESTABLISHED / INSTALLED)

Note: The response may also include NTGs with tunnelsCount: 0 and tunnelsStatus: [] — these are typically pre-configured regional templates with no SD-WAN device currently connected. Filter these out when correlating unless you're specifically investigating provisioning gaps.

Step 1.3: Cross-Reference Both Views

The critical diagnostic value comes from comparing the two views. Use the cross-platform field mappings in Section 1.4 as the authoritative reference — the primary join key is SD-WAN tunnel-name ↔ Secure Access NTG name (exact string match). At the hub level, SD-WAN destination-data-center matches Secure Access hubs[].tunnelsStatus[].localIp. The table below shows the most common state combinations and their interpretation:

SD-WAN Says (sig / tracker)	Secure Access Says (tunnelsStatus.status)	Diagnosis
Up / Up	UP	Healthy. Both sides report tunnel Up and passing L7 probes.
Up / Down	UP	L7 probe failing. Traffic rerouted to default route per network design — security enforcement point shifts accordingly.
Up / Up	DOWN	IPsec / credential mismatch. Check NTG config and device crypto state.
Down / Down	DOWN	Full outage. Check device & WAN link.
Up (active) + Down (backup)	UP (one hub) + DOWN (other)	A failover occurred. Investigate why primary failed (Step 4.2 — device health).

Form a working hypothesis before proceeding. The table above gives you an initial diagnosis category — but it's a hypothesis, not a conclusion. Before moving to Phase 2, state explicitly what you believe the issue is and which evidence supports it. For example:

"SD-WAN reports tunnel Up with tracker Down; Secure Access reports the tunnel status UP on the primary hub. Working hypothesis: L7 probe is failing, device has rerouted traffic to the default route per network design, traffic is no longer reaching SSE. Phase 2 will confirm by (a) checking alarm history for Tracker_State_Change transitions (and, secondarily, Cisco Secure Access alarms on the device) and (b) checking Activity Search for zero site activity during the incident window."

Then proceed to Phase 2 with targeted queries against the specific evidence your hypothesis needs — don't run every Phase 2 step indiscriminately.

1.4 Cross-Platform Field Reference

The same tunnel is represented differently on each platform. Use this table as the authoritative reference when joining data across SD-WAN Manager and Secure Access. All mappings below have been validated against real API responses from both platforms.

Concept	SD-WAN Field	Secure Access Field	Notes
Tunnel identity (PRIMARY JOIN KEY)	`tunnel-name`	`name` (at NTG root)	Exact string match, e.g., `C8K-3B06D343-6B6B-...`
HA role	`ha-pair` ("active" / "backup")	`hubs[].isPrimary` (true / false)	Active maps to `isPrimary=true`
Hub / Datacenter IP (SECONDARY JOIN)	`destination-data-center`	`hubs[].tunnelsStatus[].localIp`	Exact IP match; narrows to specific primary/backup hub
SD-WAN device WAN IP	(implicit — device WAN IP)	`hubs[].tunnelsStatus[].peerIp`	Seen from the SSE side
Site	`site-id` / `site-name`	(no direct equivalent)	SSE has no site concept
Device	`vdevice-name` / `vdevice-host-name`	(inferable via NTG name)	Device identifier
Tunnel state (up)	`sig-state = "Up"`	`hubs[].tunnelsStatus[].status = "UP"`	Case differs
L7 probe state	`tracker-state`	(no equivalent)	SD-WAN-only concept
Datacenter display name	(only IP is returned)	`hubs[].tunnelsStatus[].dcName` / `dcDesc`	e.g., "Stockholm 1" / "STOCKHOLM-1"
IKE / IPsec state	(derived from `sig-state`)	`tunnelsStatus[].ikeState` / `ipsecState`	"ESTABLISHED" / "INSTALLED"
Traffic counters	`device-packets-in` / `device-packets-out`	`tunnelsStatus[].data.bytesIn` / `bytesOut` / `packetsIn` / `packetsOut`	Different units on each side

Practical join pattern: one SD-WAN device typically produces two rows in the SIG tunnel list (one active, one backup), both sharing the same tunnel-name. On the SSE side, that maps to one NTG with two hubs (primary + backup). Join on tunnel-name to identify the device, then join each SD-WAN row to a specific SSE hub using destination-data-center ↔ hubs[].tunnelsStatus[].localIp.

💡 Tip: This mapping is referenced throughout Section 2 (Troubleshooting) and Section 3 (Event Correlation). The correlation script in Section 3 implements these exact join keys; if you need to build your own automation, use this table as the source of truth for cross-platform field equivalence.

Phase 2: Investigate — What Happened and When?

Once you have identified a state mismatch or degradation, the next step is to reconstruct what happened by querying event and alarm streams from both platforms.

Step 2.1: Query SD-WAN Alarms for the Site

Use the POST-based alarm query API to retrieve alarms filtered to the affected site. Filter by site_id and a time window that covers the reported issue.

POST https://{gateway_url}/dataservice/alarms

Body:

{
  "query": {
    "condition": "AND",
    "rules": [
      {"value": ["24"], "field": "entry_time",
       "type": "date", "operator": "last_n_hours"},
      {"value": ["3"], "field": "site_id",
       "type": "string", "operator": "in"}
    ]
  },
  "size": 100
}

Example curl request:

curl -X POST \
  "https://{gateway_url}/dataservice/alarms" \
  -H "Authorization: Bearer {jwt-token}" \
  -H "X-XSRF-TOKEN: {xsrf-token}" \
  -H "Content-Type: application/json" \
  -d '{"query":{"condition":"AND","rules":[
    {"value":["24"],"field":"entry_time","type":"date","operator":"last_n_hours"},
    {"value":["3"],"field":"site_id","type":"string","operator":"in"}
  ]},"size":100}'

Example response (excerpt):

{
  "data": [
    {
      "rule_name_display": "Cisco Secure Access",
      "severity": "Critical",
      "active": true,
      "entry_time": 1774981376892,
      "message": "SSE tunnel Tunnel16000001 is down on device LON-01",
      "host_name": "LON-01",
      "system_ip": "169.254.10.26",
      "site_id": "3",
      "devices": [{ "system-ip": "169.254.10.26" }],
      "values": [{ "tunnel-name": "C8K-...", "tracker-state": "Down" }]
    }
  ],
  "header": { }
}

SSE-related alarm types to look for: Cisco Secure Access, Tracker_State_Change, BFD_Site_Down, BFD_Site_Up, Interface_State_Change, Control_Node_Down.

Step 2.2: Query SD-WAN Events for the Device

Events provide a lower-level view than alarms. Filter by device IP (system-ip) for the specific WAN edge router at the affected site.

POST https://{gateway_url}/dataservice/event

Body:

{
  "query": {
    "condition": "AND",
    "rules": [
      {"value": ["24"], "field": "entry_time",
       "type": "date", "operator": "last_n_hours"},
      {"value": ["169.254.10.26"], "field": "system_ip",
       "type": "string", "operator": "in"}
    ]
  },
  "size": 500
}

Example curl request:

curl -X POST \
  "https://{gateway_url}/dataservice/event" \
  -H "Authorization: Bearer {jwt-token}" \
  -H "X-XSRF-TOKEN: {xsrf-token}" \
  -H "Content-Type: application/json" \
  -d '{"query":{"condition":"AND","rules":[
    {"value":["24"],"field":"entry_time","type":"date","operator":"last_n_hours"},
    {"value":["169.254.10.26"],"field":"system_ip","type":"string","operator":"in"}
  ]},"size":500}'

Example response (excerpt):

{
  "data": [
    {
      "system_ip": "169.254.10.26",
      "vmanage_system_ip": "169.254.10.26",
      "tenant": "default",
      "device_type": "vedge",
      "entry_time": 1666259117528,
      "statcycletime": 1666259117528,
      "eventname": "bfd-state-change",
      "component": "BFD",
      "severity_level": "major",
      "host_name": "LON-01",
      "event": {
        "bfd-state-change": {
          "src-ip": "10.0.5.11",
          "dst-ip": "10.1.16.16",
          "local-system-ip": "169.254.10.26",
          "local-color": "lte",
          "remote-system-ip": "172.16.255.16",
          "remote-color": "lte",
          "new-state": "down",
          "proto": "ipsec",
          "flap-reason": "na"
        }
      },
      "details": "host-name=LON-01; src-ip=10.0.5.11; dst-ip=10.1.16.16; ...",
      "id": "xh_I9IMBLPMz2to0aA3r"
    }
  ],
  "pageInfo": {
    "startTime": "1589073783045",
    "endTime": "1589072535795",
    "count": 15
  }
}

Note on events vs. alarms: /dataservice/event returns lower-level raw events with names like bfd-state-change, memory-usage, interface-state-change. /dataservice/alarms (Step 2.1) returns correlated, severity-classified alarms with rule names like BFD_Node_Down, Cisco Secure Access, and Tracker_State_Change. When troubleshooting, start with alarms for actionable signal and drill into events only when you need the lower-level device-generated trace.

💡 Tip: The event.{eventname} sub-object contains the structured event payload with fields specific to that event type. The details field is a human-readable serialization of the same data.

Step 2.3: Query Secure Access Network Tunnel Logs

Retrieve tunnel establishment and teardown events from the Secure Access Reporting API. These logs capture when tunnels transitioned between states. (For background on this endpoint, see the Onboarding Cookbook — Events for Network Tunnels.)

GET https://api.sse.cisco.com/reports/v2/networkTunnelLogs?from=-1days&to=now&limit=100
Headers:
  Authorization: Bearer {oauth-token}
  Accept: application/json

Example curl request:

curl -X GET \
  "https://api.sse.cisco.com/reports/v2/networkTunnelLogs?from=-1days&to=now&limit=100" \
  -H "Authorization: Bearer {oauth-token}" \
  -H "Accept: application/json"

Example response (excerpt):

{
  "data": [
    {
      "timestamp": 1774981376,
      "serviceName": "IKE",
      "level": "WARNING",
      "message": "Alert: IKE message sent retransmission - peer not responding",
      "networkTunnelGroupLabel": "C8K-8C05B196-508F-D706-C68E-8CF918AEC1FD",
      "networkTunnelGroupId": 671740929
    }
  ]
}

Note: The time range set by the to and from query parameters cannot exceed 30 days. This is a hard limit across the Secure Access Reporting API endpoints (/reports/v2/networkTunnelLogs, /reports/v2/activity, and related endpoints). For longer-term data retention, enable logging to Amazon S3 via the Secure Access logging configuration.

Phase 3: Validate — Is Traffic Actually Flowing?

A tunnel may show as Active on both sides, but traffic might not be flowing correctly. Use the following APIs to verify actual traffic throughput.

Step 3.1: Query Secure Access Activity Search

Check whether Secure Access is processing traffic from the affected site by querying the Activity Search endpoint. If the tunnel is up but no activity appears, the issue is likely in the SD-WAN traffic policy (Application Priority & SLA configuration). (The Activity Search endpoint is introduced in the Onboarding Cookbook for post-deployment verification; here we use it to validate traffic flow during an active investigation.)

GET https://api.sse.cisco.com/reports/v2/activity?from=-60minutes&to=now&limit=50
Headers:
  Authorization: Bearer {oauth-token}
  Accept: application/json

Example curl request:

curl -X GET \
  "https://api.sse.cisco.com/reports/v2/activity?from=-60minutes&to=now&limit=50" \
  -H "Authorization: Bearer {oauth-token}" \
  -H "Accept: application/json"

Example response (excerpt):

{
  "data": [
    {
      "timestamp": 1774981400000,
      "identities": [
        {
          "id": 12345,
          "label": "LON-NTG",
          "type": {
            "id": 68,
            "label": "Network Tunnel",
            "type": "networkTunnel"
          }
        }
      ],
      "verdict": "allowed",
      "type": "dns",
      "internalip": "10.1.1.42",
      "externalip": "203.0.113.88",
      "domain": "example.com"
    }
  ]
}

Look for activity entries whose identity matches the site. Traffic arriving through the tunnel will show the site identity associated with the NTG.

Step 3.2: Query SD-WAN App-Route Statistics

Retrieve loss, latency, jitter, and vQoE score metrics for the tunnel interfaces to determine whether the underlay path quality is contributing to the issue. This endpoint uses an aggregation query — specify the fields to aggregate by (typically name for per-tunnel results) and the metrics to compute (averaged loss, latency, jitter, vQoE score).

POST https://{gateway_url}/dataservice/statistics/approute/aggregation
Headers:
  Authorization: Bearer {jwt-token}
  X-XSRF-TOKEN: {xsrf-token}
  Content-Type: application/json

Body:

{
  "query": {
    "condition": "AND",
    "rules": [
      {"value": ["1"], "field": "entry_time",
       "type": "date", "operator": "last_n_hours"},
      {"value": ["169.254.10.26"], "field": "local_system_ip",
       "type": "string", "operator": "in"}
    ]
  },
  "aggregation": {
    "field": [
      {"property": "name", "sequence": 1, "size": 6000}
    ],
    "metrics": [
      {"property": "loss_percentage", "type": "avg"},
      {"property": "latency", "type": "avg"},
      {"property": "jitter", "type": "avg"},
      {"property": "vqoe_score", "type": "avg"}
    ]
  }
}

Example curl request:

curl -X POST \
  "https://{gateway_url}/dataservice/statistics/approute/aggregation" \
  -H "Authorization: Bearer {jwt-token}" \
  -H "X-XSRF-TOKEN: {xsrf-token}" \
  -H "Content-Type: application/json" \
  -d '{
    "query": {
      "condition": "AND",
      "rules": [
        {"value":["1"],"field":"entry_time","type":"date","operator":"last_n_hours"},
        {"value":["169.254.10.26"],"field":"local_system_ip","type":"string","operator":"in"}
      ]
    },
    "aggregation": {
      "field": [{"property":"name","sequence":1,"size":6000}],
      "metrics": [
        {"property":"loss_percentage","type":"avg"},
        {"property":"latency","type":"avg"},
        {"property":"jitter","type":"avg"},
        {"property":"vqoe_score","type":"avg"}
      ]
    }
  }'

Example response (excerpt):

{
  "data": [
    {
      "name": "169.254.10.26:biz-internet-13.53.178.16:sig",
      "vqoe_score": 10,
      "latency": 19.3356,
      "loss_percentage": 0,
      "jitter": 9.02238
    },
    {
      "name": "169.254.10.26:biz-internet-13.51.243.70:sig",
      "vqoe_score": 9,
      "latency": 24.12,
      "loss_percentage": 0.2,
      "jitter": 11.45
    }
  ],
  "header": { }
}

Phase 4: Root Cause — What Changed?

Many tunnel issues are caused by recent configuration changes or device resource exhaustion. These APIs help identify the root cause.

Step 4.1: Query SD-WAN Audit Log

Check whether a configuration change was pushed shortly before the issue started. Audit logs capture template pushes, policy deployments, and other administrative actions.

POST https://{gateway_url}/dataservice/auditlog

Body:

{
  "query": {
    "condition": "AND",
    "rules": [
      {"value": ["24"], "field": "entry_time",
       "type": "date", "operator": "last_n_hours"}
    ]
  },
  "size": 100
}

Example curl request:

curl -X POST \
  "https://{gateway_url}/dataservice/auditlog" \
  -H "Authorization: Bearer {jwt-token}" \
  -H "X-XSRF-TOKEN: {xsrf-token}" \
  -H "Content-Type: application/json" \
  -d '{"query":{"condition":"AND","rules":[
    {"value":["24"],"field":"entry_time","type":"date","operator":"last_n_hours"}
  ]},"size":100}'

Example response (excerpt):

{
  "data": [
    {
      "entry_time": 1659003590292,
      "statcycletime": 1659003590292,
      "tenant": "default",
      "logid": "918c6d64-729c-4304-b2c7-955a7c0e6a61",
      "logmodule": "user",
      "logfeature": "user",
      "logdeviceid": "172.16.255.22",
      "loguser": "viptela-device-901d032c-0721-438e-82ba-1d429598d879",
      "logusersrcip": "172.16.255.19",
      "logmessage": "Invalidated session due to server session idle time out for User: viptela-device-901d032c-0721-438e-82ba-1d429598d879",
      "auditdetails": [
        "[22-Aug-2022 10:19:50 UTC] Session invalidated due to idle timeout"
      ],
      "id": "iWJRRIIBMbLIyWbe1qH"
    }
  ],
  "header": {
    "generatedOn": 1655622555494,
    "columns": [ ],
    "fields": [ ]
  }
}

Key fields:

entry_time — epoch ms timestamp for temporal correlation with the incident window
logmodule / logfeature — categorize the action (e.g., user, template, policy, device)
loguser — user identity that initiated the action
logusersrcip — source IP the action was initiated from
logmessage — primary human-readable description of the action
auditdetails — array of timestamped detail lines for deeper forensic review

When investigating a tunnel issue, look for entries whose entry_time falls within or just before the incident window, and filter logmodule / logfeature for relevant categories (template deployments, policy changes, device modifications). The logmessage field typically contains enough context to understand the action; auditdetails provides the audit trail if deeper investigation is needed.

Step 4.2: Check Device Health

High CPU or memory usage on the WAN edge device can cause tunnel instability. Query the device system status to check resource utilization.

GET https://{gateway_url}/dataservice/device/system/status?deviceId=169.254.10.26
Headers:
  Authorization: Bearer {jwt-token}
  X-XSRF-TOKEN: {xsrf-token}

Example curl request:

curl -X GET \
  "https://{gateway_url}/dataservice/device/system/status?deviceId=169.254.10.26" \
  -H "Authorization: Bearer {jwt-token}" \
  -H "X-XSRF-TOKEN: {xsrf-token}"

Example response (excerpt):

{
  "data": [
    {
      "vdevice-name": "10.255.1.10",
      "vdevice-host-name": "LON-01",
      "cpu_user": 62.4,
      "cpu_system": 8.1,
      "mem_used": 3942000,
      "mem_total": 8388608,
      "disk_avail": 62.0,
      "uptime": "12d 4h 18m"
    }
  ]
}

Key fields: cpu_user, mem_used, mem_total, disk_avail.

SD-WAN Manager publishes WAN Edge health thresholds that classify devices as Good / Fair / Poor based on CPU and memory utilization:

Health	Cisco WAN Edge (IOS-XE)	Cisco vEdge
Good	CPU < 80%, Memory < 88%	CPU < 75%, Memory < 75%
Fair	CPU ≥ 80%, Memory ≥ 88%	CPU ≥ 75%, Memory ≥ 75%
Poor	CPU ≥ 90%, Memory ≥ 93%	CPU ≥ 90%, Memory ≥ 90%

Use these as a starting point for programmatic health checks. A device in the "Fair" range warrants investigation as a contributing factor to tunnel instability; "Poor" is a likely direct cause. For continuous monitoring, configure the corresponding resource-exhaustion alarm rules via Alarm Notifications (Section 1.1) to push threshold breaches to your webhook listener.

Section 3: Cross-Platform Event Correlation

Why Event Correlation Matters

The SD-WAN Manager and Secure Access dashboards each provide their own view of tunnel events.

Building a unified cross-platform timeline requires combining events from both the SD-WAN Manager and Secure Access APIs, which this section demonstrates through a working Python script.

Correlation Architecture

Join strategy:

Primary join key: SD-WAN tunnel-name ↔ Secure Access NTG name (exact match)
Hub-level join: SD-WAN destination-data-center ↔ SSE hubs[].tunnelsStatus[].localIp
HA role join: SD-WAN ha-pair (active/backup) ↔ SSE hubs[].isPrimary (true/false)

See Section 3 for the complete cross-platform field reference. The correlation script uses these mappings to build a per-tunnel unified view rather than merging events into a flat timeline.

Sample Code: Correlated Event Timeline

A working Python implementation is available as a companion file. It authenticates to both platforms, retrieves events within a configurable time window, joins them by tunnel using the field mappings described above, and prints a per-tunnel correlated report.

Note: This script is provided as a reference implementation. Adapt the authentication, error handling, and output format to your environment.

Example Output

When run against a lab with three onboarded sites, the per-tunnel correlated report produces output like this:

================================================================================
  SASE CORRELATED TUNNEL REPORT
  Site: 3 | Window: last 24h | Tunnels: 3
================================================================================

--------------------------------------------------------------------------------
  TUNNEL: C8K-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXX1
  Device: R-SITE1 (10.0.0.1)   Site: 1 (SITE1)
  Status: [HEALTHY]
  Diagnosis: Both sides report the tunnel Up and passing L7 probes.

  ROLE     SD-WAN                          Secure Access
  ACTIVE   sig=Up tracker=Up dc=<DC-IP-A>  UP <Region> DC 1 (<DC-IP-A>)
  BACKUP   sig=Up tracker=Up dc=<DC-IP-B>  UP <Region> DC 2 (<DC-IP-B>)

  Events in window: (none)

--------------------------------------------------------------------------------
  TUNNEL: C8K-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXX2
  Device: R-SITE2 (10.0.0.5)   Site: 5 (SITE2)
  Status: [HEALTHY]
  Diagnosis: Both sides report the tunnel Up and passing L7 probes.

  ROLE     SD-WAN                          Secure Access
  ACTIVE   sig=Up tracker=Up dc=<DC-IP-C>  UP <Region> DC 3 (<DC-IP-C>)
  BACKUP   sig=Up tracker=Up dc=<DC-IP-D>  UP <Region> DC 4 (<DC-IP-D>)

  Events in window (1):
    09:18:13 UTC  [SSE]  WARNING IKE       Alert: IKE message sent retransm

--------------------------------------------------------------------------------
  TUNNEL: C8K-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXX3
  Device: R-SITE3 (10.0.0.2)   Site: 2 (SITE3)
  Status: [HEALTHY]
  Diagnosis: Both sides report the tunnel Up and passing L7 probes.

  ROLE     SD-WAN                          Secure Access
  ACTIVE   sig=Up tracker=Up dc=<DC-IP-A>  UP <Region> DC 1 (<DC-IP-A>)
  BACKUP   sig=Up tracker=Up dc=<DC-IP-B>  UP <Region> DC 2 (<DC-IP-B>)

  Events in window: (none)
================================================================================

Note how the report groups by tunnel rather than producing a flat timeline. For each tunnel, both platforms' views are presented side-by-side, followed by a status label and diagnosis derived from the cross-reference logic in Section 1.3. Events within the query window are attached to the tunnels they reference, so the ops engineer sees not just what happened, but which tunnel was affected and what the current state of both sides looks like.

Cisco SASE Site Onboarding Troubleshooting & Monitoring Cookbook

Table of Contents

Introduction

When to Use the API vs. the UI

Prerequisites

API Endpoints Used in This Cookbook

SD-WAN Manager API

Secure Access API

Section 1: Proactive Monitoring & Alerting

1.1 SD-WAN Manager: Webhook Notifications for Alarms

Setup Steps

Configuration via API

Webhook Payload Format

Integration Targets

1.2 Secure Access: Alert Rules for Network Tunnel Groups

Available Alert Types for Network Tunnels

Setup Steps

Webhook Delivery via Third-Party Integrations API

Step A: Create the webhook integration

Step B: Add Basic Auth credentials to the webhook

1.3 End-to-End Proactive Architecture

Section 2: Troubleshooting Site Connectivity Use Case

Scenario: Site Reports Intermittent Internet Access

Phase 1: Triage — Identify the Tunnel State from Both Sides

Step 1.1: Query SD-WAN SSE Tunnel Status

Step 1.2: Query Secure Access NTG Tunnel Status

Step 1.3: Cross-Reference Both Views

1.4 Cross-Platform Field Reference

Phase 2: Investigate — What Happened and When?

Step 2.1: Query SD-WAN Alarms for the Site

Step 2.2: Query SD-WAN Events for the Device

Step 2.3: Query Secure Access Network Tunnel Logs

Phase 3: Validate — Is Traffic Actually Flowing?

Step 3.1: Query Secure Access Activity Search

Step 3.2: Query SD-WAN App-Route Statistics

Phase 4: Root Cause — What Changed?

Step 4.1: Query SD-WAN Audit Log

Step 4.2: Check Device Health

Section 3: Cross-Platform Event Correlation

Why Event Correlation Matters

Correlation Architecture

Sample Code: Correlated Event Timeline

Example Output