SP Oncall is an experiment about a network investigation system that automates complex network diagnostics and troubleshooting for Service Provider (SP) networks. It uses artificial intelligence to analyze network devices, identify issues, and provide detailed reports. I'm mostly using it to learn and demo about AI solutions for networking.
Think of SP Oncall as a team of specialized AI agents that work together to investigate network problems:
Before you can use SP Oncall, you'll need these tools installed on your system:
Windows users: This project requires a Unix-like environment. Install WSL (Windows Subsystem for Linux) to run it on Windows.
git clone https://github.com/jillesca/sp_oncall
cd sp_oncall
Create a .env
file in the project root with your API keys:
# .env file - Required for operation
OPENAI_API_KEY=your-openai-api-key-here
LANGSMITH_API_KEY=your-langsmith-api-key-here
LANGSMITH_PROJECT=your-project-name
LANGSMITH_TRACING=true
LANGSMITH_ENDPOINT=https://api.smith.langchain.com
SP Oncall uses gNMIBuddy MCP server to extract data from network devices. The MCP configuration is defined in mcp_config.json.
{ "gNMIBuddy": { "command": "uvx", "args": [ "--from", "git+https://github.com/jillesca/gNMIBuddy.git", "gnmibuddy-mcp" ], "transport": "stdio", "env": { "NETWORK_INVENTORY": "xrd_sandbox.json" } } }
Note
If you're not using the DevNet Sandbox, replace xrd_sandbox.json
with your own device inventory file.
Install dependencies and start the investigation system:
# First time only - installs all required Python packages make install # Start the investigation system (opens a web interface) make run
The make install
command will:
uv.lock
(ensures consistency).The make run
command will:
Once running, you can ask the system to investigate your network in various ways:
Single Device Investigation:
"Check BGP neighbors on xrd-1"
"Review the health of xrd-8"
Multi-Device by Role:
"How are my PE routers performing?" (PE = Provider Edge - routers that connect to customer networks)
"Check all route reflectors" (Route reflectors help distribute routing information)
"Investigate all core P devices" (P = Provider - core network routers)
Pattern-Based Investigation:
"Check interfaces on devices matching 'xrd-*'"
Don't have network devices? No problem! Use the DevNet XRd Sandbox - a free environment for testing.
To automatically configure gNMI on the XRd DevNet sandbox, you can use this helper script:
ANSIBLE_HOST_KEY_CHECKING=False \ bash -c 'TMPDIR=$(mktemp -d) \ && trap "rm -rf $TMPDIR" EXIT \ && curl -s https://raw.githubusercontent.com/jillesca/gNMIBuddy/refs/heads/main/ansible-helper/xrd_apply_config.yaml > "$TMPDIR/playbook.yaml" \ && curl -s https://raw.githubusercontent.com/jillesca/gNMIBuddy/refs/heads/main/ansible-helper/hosts > "$TMPDIR/hosts" \ && uvx --from ansible-core --with "paramiko,ansible" ansible-playbook "$TMPDIR/playbook.yaml" -i "$TMPDIR/hosts"'
You can manually enable gNMI on each XRd device. Apply this configuration to all XRd devices:
grpc port 57777 no-tls
Don't forget to commit
your changes to XRd.
On the Manage Assistants button in the web interface, you can select different AI models to try:
gpt-4
, gpt-4o-mini
, gpt-5-nano
(default - most capable)qwen3:8b
, llama3.1
(experimental - poor results, runs locally)The system uses predefined investigation strategies stored as JSON files in the /plans
directory:
The system includes comprehensive logging to help you understand what's happening:
SP_ONCALL_LOG_LEVEL=debug
for detailed loggingSP_ONCALL_LANGCHAIN_DEBUG=true
for LangChain framework loggingSP_ONCALL_MODULE_LEVELS="sp_oncall.nodes=debug,langgraph=error"
)
make logger-names
to see all available logging modulesSP_ONCALL_DEBUG_CAPTURE=1
to automatically save complex objects to log files for offline analysisFor detailed logging configuration and advanced features, see src/logging/README.md. For debug capture objects, see docs/DEBUG_CAPTURE.md.
Code Exchange Community
Get help, share code, and collaborate with other developers in the Code Exchange community.View Community