SP Oncall is an experimental AI-driven network investigation system for Service Provider (SP) networks. It automates network diagnostics and troubleshooting by analyzing device state, identifying issues, and generating detailed root-cause reports. I'm mostly using it to learn and demo about AI solutions for networking.
Think of SP Oncall as a team of specialized AI agents that work together to investigate network problems:
The graph is a linear orchestration pipeline: input_validator_node → context_investigation → primary_investigation → rca_assessor_node → report_generator. Each investigation phase is a sub-graph that handles its own retries internally.
Inside each investigation sub-graph, you'll see four internal nodes in LangGraph Studio: plan_device (creates investigation strategy), execute_device (queries network devices), collect_device_result (aggregates findings), and assess_device (evaluates if objective is met). If assessment fails and retries remain, the phase loops back to execute.
max_retries times (default: 3) before moving on.skills/ as Markdown files. Manual queries use all skills; alert-triggered investigations filter by event type.Before you can use SP Oncall, you'll need these tools installed on your system:
Windows users: This project requires a Unix-like environment. Install WSL (Windows Subsystem for Linux) to run it on Windows.
git clone https://github.com/jillesca/sp_oncall
cd sp_oncall
make installCopy .env.example to .env and fill in the required values:
cp .env.example .env
Required keys:
| Variable | Description |
|---|---|
OPENAI_API_KEY |
OpenAI API key |
LANGSMITH_API_KEY |
LangSmith API key (for tracing) |
LANGSMITH_PROJECT |
LangSmith project name (e.g. sp_oncall) |
LANGSMITH_TRACING |
Set to true to enable tracing |
See the Configuration Reference below for all available options.
SP Oncall uses gNMIBuddy MCP server to query network devices. Point mcp_config.json at your running gNMIBuddy instance:
{
"gNMIBuddy": {
"transport": "http",
"url": "http://localhost:8000/mcp"
}
}make run
This starts the LangGraph development server. Open LangGraph Studio at the URL shown in the terminal.
In LangGraph Studio, start a new thread and type a query:
Check BGP neighbors on xrd-1
How are my PE routers performing?
Investigate all core P devices
For the optional alert-driven companion flow, see Optional Observability Integration below.
Don't have network devices? No problem! Use the DevNet XRd Sandbox — a free environment for testing.
To automatically configure gNMI on the XRd DevNet sandbox, run this helper script:
ANSIBLE_HOST_KEY_CHECKING=False \ bash -c 'TMPDIR=$(mktemp -d) \ && trap "rm -rf $TMPDIR" EXIT \ && curl -s https://raw.githubusercontent.com/jillesca/gNMIBuddy/refs/heads/main/ansible-helper/xrd_apply_config.yaml > "$TMPDIR/playbook.yaml" \ && curl -s https://raw.githubusercontent.com/jillesca/gNMIBuddy/refs/heads/main/ansible-helper/hosts > "$TMPDIR/hosts" \ && uvx --from "ansible-core==2.19.2" --with "paramiko,ansible" ansible-playbook "$TMPDIR/playbook.yaml" -i "$TMPDIR/hosts"'
You can manually enable gNMI on each XRd device. Apply this configuration to all XRd devices:
grpc port 57777 no-tls
Don't forget to commit your changes to XRd.
All SP_ONCALL_* variables can be set in your .env file. See .env.example for the full list with comments.
| Variable | Default | Description |
|---|---|---|
SP_ONCALL_MAX_RETRIES |
3 |
Max execution retries per device investigation. Also overridable from LangGraph Studio. |
SP_ONCALL_FAST_MODEL |
openai/gpt-4o-mini |
Model used for structured output parsing — faster and cheaper than the main reasoning model. |
SP_ONCALL_LOG_LEVEL |
info |
Log level for sp_oncall modules (debug | info | warning | error). |
SP_ONCALL_LANGCHAIN_DEBUG |
false |
Enable verbose LangChain debug tracing. |
SP_ONCALL_MODULE_LEVELS |
— | Per-module log overrides (e.g. sp_oncall.nodes=debug,langgraph=error). Run make logger-names to list modules. |
SP_ONCALL_LOG_FILE |
— | Write logs to a file in addition to stdout. |
SP_ONCALL_EXTERNAL_SUPPRESSION_MODE |
langgraph |
Suppress noisy external library logs (langgraph | none). |
OPENROUTER_API_KEY |
— | Required only when using openrouter/* models (e.g. openrouter/anthropic/claude-sonnet-4). |
In LangGraph Studio, click Manage Assistants to select the main reasoning model. Available models are defined in src/configuration.py under LLMModel and include OpenAI and OpenRouter options.
Investigation strategies live in skills/ as Markdown files following the agentskills.io specification. Alert-triggered runs filter by event_type via src/util/skill_routing.py; manual queries use all available skills.
For detailed logging configuration, see src/logging/README.md.
For domain terminology (Alert, Investigation, Device Profile, Thread, etc.), see CONTEXT.md.
SP Oncall works on its own with manual queries in LangGraph Studio. If you want to experiment with an observability-driven workflow, use it together with xrd-observability-stack, which provides Grafana, Alertmanager, Prometheus, and the external webhook-receiver service that forwards alerts into SP Oncall.
webhook-receiver service.NetworkAlert and calls POST /runs on the LangGraph API.The scripts/test_alert.sh helper sends sample Grafana-style alerts to a webhook endpoint (useful for testing with xrd-observability-stack). It is experimental and not required for manual usage.
# Show the curl commands without sending (dry run) bash scripts/test_alert.sh --dry-run # Send a specific alert type bash scripts/test_alert.sh interface_down bash scripts/test_alert.sh bgp_down bash scripts/test_alert.sh isis_down bash scripts/test_alert.sh topology_degraded bash scripts/test_alert.sh interface_flapping bash scripts/test_alert.sh interface_errors
By default the script posts to http://localhost:8080/alert. Override with WEBHOOK_URL= if your receiver is running elsewhere. The receiver is not part of this repository — start it from xrd-observability-stack.
# If you cloned the repo # Shutdown an interface for quick test ANSIBLE_HOST_KEY_CHECKING=False \ uvx --from "ansible-core==2.19.2" --with "paramiko,ansible" \ ansible-playbook ansible-helper/xrd_apply_config.yaml -i ansible-helper/hosts
Owner
Contributors
Categories
NetworkingProducts
IOS XRProgramming Languages
PythonLicense
Code Exchange Community
Get help, share code, and collaborate with other developers in the Code Exchange community.View Community